Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Redis Data Structures and Objects

Tech May 19 1

Redis provides three specialized data structures optimized for specific use cases:

Structure Use Case
Bitmaps Space-efficient storage for binary state tracking (daily/monthly user check-ins)
HyperLogLog Approximate cardinality estimation for massive datasets (UV counting)
GEO Geospatial data storage and proximity queries

Bitmaps

A bitmap is essentially a byte array where each bit represents a binary state (0 or 1). Introduced in Redis 2.2.0, these operations extend the string type rather than creating a new data type.

Core Operations


GETBIT key position          // Retrieve bit value at specified position
SETBIT key position value   // Set bit value (0 or 1) at position
BITOP operation dest source [source...]
// Operations: AND, OR, NOT, XOR - combines bitmaps and stores result
BITCOUNT key [start end]    // Count bits set to 1

Movie Streaming Application Example

Consider a video platform tracking which movies are watched daily:

  • Each day maintains a bitmap initialized to zeros
  • When a movie is played, its ID determines the bit position to set
Metric Solution
Check if specific movie was played today Calculate bit offset from movie ID, read the corresponding bit
Count movies played today Count set bits in today's bitmap
Count movies in weekly/monthly/yearly range OR all relevant bitmaps, then count resulting bits
Find unplayed movies in a year OR all bitmaps, bits still at 0 indicate unwatched movies

127.0.0.1:6379> SETBIT movies_20240301 100 1
(integer) 0
127.0.0.1:6379> SETBIT movies_20240301 205 1
(integer) 0
127.0.0.1:6379> SETBIT movies_20240302 100 1
(integer) 0
127.0.0.1:6379> BITCOUNT movies_20240301
(integer) 2
127.0.0.1:6379> BITOP OR period_result movies_20240301 movies_20240302
(integer) 1
127.0.0.1:6379> BITCOUNT period_result
(integer) 1

HyperLogLog

HyperLogLog implements an algorithm for estimating set cardinality with minimal memory footprint. It trades exactness for efficiency, making it suitable for massive-scale unique visitor counting.

Comparing approaches for unique user counting:

Approach Memory Usage Trade-offs
SET type Each user ID stored as string Accurate but memory-intensive
Bitmap One bit per possible user Better space efficiency
HyperLogLog Fixed 12KB per counter 0.81% error rate, constant memory

Commands


PFADD key element [element ...]   // Add elements to counter
PFCOUNT key [key ...]             // Get estimated cardinality
PFMERGE dest source [source ...]  // Merge multiple counters

Characteristics

  • Does not store actual data, only cardinality estimate
  • Standard error approximately 0.81%
  • Memory allocation grows incrementally as element are added
  • Merged structures always occupy exactly 12KB

GEO (Geospatial)

Redis GEO enables storing coordinate pairs and performing location-based queries.


GEOADD locations longitude latitude member [longitude latitude member ...]
GEOPOS locations member [member ...]
GEODIST locations pointA pointB [unit]
GEORADIUS locations longitude latitude radius unit [WITHCOORD] [WITHDIST] [COUNT count]
GEORADIUSBYMEMBER locations member radius unit [WITHCOORD] [WITHDIST]
GEOHASH locations member [member ...]

127.0.0.1:6379> GEOADD cities 116.4 39.9 Beijing
(integer) 1
127.0.0.1:6379> GEOADD cities 121.4 31.2 Shanghai
(integer) 1
127.0.0.1:6379> GEOPOS cities Beijing
1) "116.39999866485596"
   "39.900000092026976"
127.0.0.1:6379> GEODIST cities Beijing Shanghai km
"1067.8390"

Core Data Structures

Simple Dynamic Strings (SDS)

Structure Definition

Redis 4.0+ defines five SDS variants optimized for different string lengths:


typedef char *simple_string;

struct __attribute__ ((__packed__)) sds_header_8 {
    uint8_t length;           /* bytes currently used */
    uint8_t allocated;        /* total capacity excluding header and null */
    unsigned char flags;      /* type identifier (3 bits) */
    char buffer[];             /* flexible array member for actual data */
};

struct __attribute__ ((__packed__)) sds_header_16 {
    uint16_t length;
    uint16_t allocated;
    unsigned char flags;
    char buffer[];
};

struct __attribute__ ((__packed__)) sds_header_32 {
    uint32_t length;
    uint32_t allocated;
    unsigned char flags;
    char buffer[];
};

struct __attribute__ ((__packed__)) sds_header_64 {
    uint64_t length;
    uint64_t allocated;
    unsigned char flags;
    char buffer[];
};
Field Purpose
length Current string size (excluding terminator)
allocated Total capacity reserved
flags Type indicator using 3 bits
buffer Actual character storage

Advantages Over C Strings

Aspect C String SDS Benefit
Length retrieval O(n) via strlen O(1) via len field Eliminates performance bottleneck
Buffer overflow Possible with strcat/strcpy Bounds checking before write Memory safety guaranteed
Reallocation Required on every modification Pre-allocation and lazy freeing Fewer memory operations
Binary data Uses '\0' sentinel (text only) Uses length field Stores any binary content

Applications

  • Direct string value storage
  • AOF module buffers
  • Client input/output buffers

Linked Lists

Implementation


typedef struct list_node {
    struct list_node *previous;
    struct list_node *next;
    void *payload;
} list_node;

typedef struct list_iterator {
    list_node *current;
    int traverse_direction;
} list_iterator;

typedef struct linked_list {
    list_node *head;
    list_node *tail;
    void *(*duplicate_fn)(void *ptr);
    void (*deallocate_fn)(void *ptr);
    int (*compare_fn)(void *ptr, void *key);
    unsigned long count;
} linked_list;

Properties

  • Doubly-linked, circular structure
  • O(1) head/tail/length access
  • Generic storage via void pointers

Usage

Lists power LIST commands, publish-subscribe patterns, slow query logging, and monitoring features.

Hash Tables (Dictionaries)

Internal Structure


typedef struct hash_entry {
    void *hash_key;
    union {
        void *value_ptr;
        uint64_t unsigned_val;
        int64_t signed_val;
        double floating_val;
    } stored_value;
    struct hash_entry *collision_next;
} hash_entry;

typedef struct hash_table {
    hash_entry **buckets;
    unsigned long table_size;
    unsigned long size_mask;
    unsigned long active_entries;
} hash_table;

typedef struct dictionary {
    struct dict_type *operations;
    void *private_context;
    hash_table tables[2];           /* primary and rehash targets */
    long rehash_position;           /* -1 when idle, index when active */
    unsigned long active_iterators;
} dictionary;

Hash Function Evolution

Redis versions show different hashing strategies:

  • 2.8-3.2: MurmurHash2 (good distribution even with patterned keys)
  • 4.0+: SipHash (improved security against collision attacks)

uint64_t compute_hash(const void *key, int length) {
    return siphash(key, length, hash_seed);
}

Rehashing Mechanism

Redis maintains two hash tables per dictionary. Rehashing proceeds in stages:

  1. Allocate space in the second table based on element count
  2. Gradually migrate entries from table[0] to table[1]
  3. Deallocate old table, swap references, prepare fresh table[1]

Expansion and Contraction

Load Factor Formula Redis Behavior
entries / bucket_count Can exceed 1.0
  • Expansion triggers: Load factor > 1 (normal) or > 5 (during BGSAVE/BGREWRITEAOF)
  • Contraction triggers: Load factor < 0.1

Incremental Rehashing

To prevent service interruption during large-scale rehashing:

  1. Set rehashidx to 0, indicating rehash in progress
  2. Each CRUD operation migrates one bucket's worth of entries
  3. Rehash completes when rehashidx returns to -1

Lookup queries check both tables during transition. New writes target the new table.

Skip Lists

Concept

Skip lists extend sorted linked lists with layered indices, enabling O(log n) search through a space-time tradeoff.

A skip list contains multiple levels where higher levels act as express lanes:

Level Contains
Highest Fewest elements, largest gaps
Lowest All elements, consecutive spacing

Time Complexities

  • Search: O(log n) average, O(n) worst case
  • Insert: O(log n) - find position, create random levels
  • Delete: O(log n) - unlink from all levels

int generate_random_level() {
    int lvl = 1;
    while ((rand() & 0xFFFF) < (SKIPLIST_P * 0xFFFF))
        lvl++;
    return (lvl < MAX_LEVEL) ? lvl : MAX_LEVEL;
}

The constant ZSKIPLIST_P (0.25) means level n+1 is 1/4 as likely as level n.

Redis Implementation


typedef struct skip_node {
    simple_string *element;
    double ranking_score;
    struct skip_node *backward;
    struct skip_level {
        struct skip_node *forward;
        unsigned int span;     /* distance between nodes */
    } levels[];
} skip_node;

typedef struct skip_list {
    skip_node *header;
    skip_node *tail;
    unsigned long element_count;
    int current_max_level;
} skip_list;

Comparison with Alternatives

Structure Range Queries Single Lookup Memory Implementation
Skip List Efficient O(log n) ~1.33 pointers/node Straightforward
Balanced Tree Requires traversal O(log n) 2 pointers/node Complex rotations
Hash Table Not supported O(1) avg Varies Moderate

Integer Sets

Structure


typedef struct integer_collection {
    uint32_t encoding;    /* 16, 32, or 64-bit */
    uint32_t size;        /* element count */
    int8_t contents[];    /* flexible array storage */
} integer_collection;

Integer sets store unique integers in sorted order using contiguous memory.

Upgrade Mechanism

When adding values requiring more bits than current encoding:

  1. Reallocate storage for new, larger encoding
  2. Convert existing elements to new format
  3. Insert new element while maintaining sorted order

Upgrades provide flexibility and memory efficiency—they only expand when necessary and are irreversible.

Compressed Lists

Layout

A compressed list is a sequential memory structure containing:

  • Total byte length
  • Offset to last element
  • Element count
  • Individual entries
  • End terminator (0xFF)

Entry Structure

Component Size Purpose
previous_entry_length 1-5 bytes Enables backward traversal
encoding 1-5 bytes Type and content length
content Variable Actual value storage

Cascade Update Concern

Modifying entries can trigger cascade reallocations when previous_entry_length fields need expansion. This worst-case O(n²) scenario rarely occurs with random data but remains a theoretical concern.


Redis Object System

Object Overview

Redis wraps底层数据结构 with an object layer providing type information and reference counting. Keys are always strings; values can be strings, lists, sets, hashes, or sorted sets.

Object Structure


typedef struct redis_object {
    unsigned type:4;           /* STRING, LIST, SET, ZSET, HASH */
    unsigned encoding:4;       /* Actual storage format */
    void *internal_data;       /* Pointer to underlying structure */
    int reference_count;       /* For memory management */
} redis_object;

Encoding Options by Type

Type Encodings Available Conditions
String INT, EMBSTR, RAW Integer values use INT encoding
List ZIPLIST, LINKEDLIST Small lists use ZIPLIST
Hash ZIPLIST, HASHTABLE Few fields use ZIPLIST
Set INTSET, HASHTABLE All integers, small cardinality
ZSet ZIPLIST, SKIPLIST+HASHTABLE Small datasets prefer ZIPLIST

String Object Encodings

  • INT: Direct 64-bit integer storage
  • EMBSTR: Embedded SDS for short strings (< 44 bytes)
  • RAW: Separate heap allocation for longer strings

ZSet Hybrid Structure

The SKIPLIST+HASHTABLE encoding combines both structures:

  • Skip list maintains ordered iteration
  • Hash table provides O(1) member lookup

Memory Management

Redis uses reference counting for garbage collection. Shared objects (integer strings) reduce memory when multiple keys reference identical values.

Integer strings up to 2^63-1 can be shared across the entire database, exemplifying the flyweight pattern.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.