Home > Tech > Content

Redis Data Structures and Objects

Tech May 19 1

Redis provides three specialized data structures optimized for specific use cases:

Structure	Use Case
Bitmaps	Space-efficient storage for binary state tracking (daily/monthly user check-ins)
HyperLogLog	Approximate cardinality estimation for massive datasets (UV counting)
GEO	Geospatial data storage and proximity queries

Bitmaps

A bitmap is essentially a byte array where each bit represents a binary state (0 or 1). Introduced in Redis 2.2.0, these operations extend the string type rather than creating a new data type.

Core Operations


GETBIT key position          // Retrieve bit value at specified position
SETBIT key position value   // Set bit value (0 or 1) at position
BITOP operation dest source [source...]
// Operations: AND, OR, NOT, XOR - combines bitmaps and stores result
BITCOUNT key [start end]    // Count bits set to 1

Movie Streaming Application Example

Consider a video platform tracking which movies are watched daily:

Each day maintains a bitmap initialized to zeros
When a movie is played, its ID determines the bit position to set

Metric	Solution
Check if specific movie was played today	Calculate bit offset from movie ID, read the corresponding bit
Count movies played today	Count set bits in today's bitmap
Count movies in weekly/monthly/yearly range	OR all relevant bitmaps, then count resulting bits
Find unplayed movies in a year	OR all bitmaps, bits still at 0 indicate unwatched movies


127.0.0.1:6379> SETBIT movies_20240301 100 1
(integer) 0
127.0.0.1:6379> SETBIT movies_20240301 205 1
(integer) 0
127.0.0.1:6379> SETBIT movies_20240302 100 1
(integer) 0
127.0.0.1:6379> BITCOUNT movies_20240301
(integer) 2
127.0.0.1:6379> BITOP OR period_result movies_20240301 movies_20240302
(integer) 1
127.0.0.1:6379> BITCOUNT period_result
(integer) 1

HyperLogLog

HyperLogLog implements an algorithm for estimating set cardinality with minimal memory footprint. It trades exactness for efficiency, making it suitable for massive-scale unique visitor counting.

Comparing approaches for unique user counting:

Approach	Memory Usage	Trade-offs
SET type	Each user ID stored as string	Accurate but memory-intensive
Bitmap	One bit per possible user	Better space efficiency
HyperLogLog	Fixed 12KB per counter	0.81% error rate, constant memory

Commands


PFADD key element [element ...]   // Add elements to counter
PFCOUNT key [key ...]             // Get estimated cardinality
PFMERGE dest source [source ...]  // Merge multiple counters

Characteristics

Does not store actual data, only cardinality estimate
Standard error approximately 0.81%
Memory allocation grows incrementally as element are added
Merged structures always occupy exactly 12KB

GEO (Geospatial)

Redis GEO enables storing coordinate pairs and performing location-based queries.


GEOADD locations longitude latitude member [longitude latitude member ...]
GEOPOS locations member [member ...]
GEODIST locations pointA pointB [unit]
GEORADIUS locations longitude latitude radius unit [WITHCOORD] [WITHDIST] [COUNT count]
GEORADIUSBYMEMBER locations member radius unit [WITHCOORD] [WITHDIST]
GEOHASH locations member [member ...]


127.0.0.1:6379> GEOADD cities 116.4 39.9 Beijing
(integer) 1
127.0.0.1:6379> GEOADD cities 121.4 31.2 Shanghai
(integer) 1
127.0.0.1:6379> GEOPOS cities Beijing
1) "116.39999866485596"
   "39.900000092026976"
127.0.0.1:6379> GEODIST cities Beijing Shanghai km
"1067.8390"

Core Data Structures

Simple Dynamic Strings (SDS)

Structure Definition

Redis 4.0+ defines five SDS variants optimized for different string lengths:


typedef char *simple_string;

struct __attribute__ ((__packed__)) sds_header_8 {
    uint8_t length;           /* bytes currently used */
    uint8_t allocated;        /* total capacity excluding header and null */
    unsigned char flags;      /* type identifier (3 bits) */
    char buffer[];             /* flexible array member for actual data */
};

struct __attribute__ ((__packed__)) sds_header_16 {
    uint16_t length;
    uint16_t allocated;
    unsigned char flags;
    char buffer[];
};

struct __attribute__ ((__packed__)) sds_header_32 {
    uint32_t length;
    uint32_t allocated;
    unsigned char flags;
    char buffer[];
};

struct __attribute__ ((__packed__)) sds_header_64 {
    uint64_t length;
    uint64_t allocated;
    unsigned char flags;
    char buffer[];
};

Field	Purpose
length	Current string size (excluding terminator)
allocated	Total capacity reserved
flags	Type indicator using 3 bits
buffer	Actual character storage

Advantages Over C Strings

Aspect	C String	SDS	Benefit
Length retrieval	O(n) via strlen	O(1) via len field	Eliminates performance bottleneck
Buffer overflow	Possible with strcat/strcpy	Bounds checking before write	Memory safety guaranteed
Reallocation	Required on every modification	Pre-allocation and lazy freeing	Fewer memory operations
Binary data	Uses '\0' sentinel (text only)	Uses length field	Stores any binary content

Applications

Direct string value storage
AOF module buffers
Client input/output buffers

Linked Lists

Implementation


typedef struct list_node {
    struct list_node *previous;
    struct list_node *next;
    void *payload;
} list_node;

typedef struct list_iterator {
    list_node *current;
    int traverse_direction;
} list_iterator;

typedef struct linked_list {
    list_node *head;
    list_node *tail;
    void *(*duplicate_fn)(void *ptr);
    void (*deallocate_fn)(void *ptr);
    int (*compare_fn)(void *ptr, void *key);
    unsigned long count;
} linked_list;

Properties

Doubly-linked, circular structure
O(1) head/tail/length access
Generic storage via void pointers

Usage

Lists power LIST commands, publish-subscribe patterns, slow query logging, and monitoring features.

Hash Tables (Dictionaries)

Internal Structure


typedef struct hash_entry {
    void *hash_key;
    union {
        void *value_ptr;
        uint64_t unsigned_val;
        int64_t signed_val;
        double floating_val;
    } stored_value;
    struct hash_entry *collision_next;
} hash_entry;

typedef struct hash_table {
    hash_entry **buckets;
    unsigned long table_size;
    unsigned long size_mask;
    unsigned long active_entries;
} hash_table;

typedef struct dictionary {
    struct dict_type *operations;
    void *private_context;
    hash_table tables[2];           /* primary and rehash targets */
    long rehash_position;           /* -1 when idle, index when active */
    unsigned long active_iterators;
} dictionary;

Hash Function Evolution

Redis versions show different hashing strategies:

2.8-3.2: MurmurHash2 (good distribution even with patterned keys)
4.0+: SipHash (improved security against collision attacks)


uint64_t compute_hash(const void *key, int length) {
    return siphash(key, length, hash_seed);
}

Rehashing Mechanism

Redis maintains two hash tables per dictionary. Rehashing proceeds in stages:

Allocate space in the second table based on element count
Gradually migrate entries from table[0] to table[1]
Deallocate old table, swap references, prepare fresh table[1]

Expansion and Contraction

Load Factor Formula	Redis Behavior
entries / bucket_count	Can exceed 1.0

Expansion triggers: Load factor > 1 (normal) or > 5 (during BGSAVE/BGREWRITEAOF)
Contraction triggers: Load factor < 0.1

Incremental Rehashing

To prevent service interruption during large-scale rehashing:

Set rehashidx to 0, indicating rehash in progress
Each CRUD operation migrates one bucket's worth of entries
Rehash completes when rehashidx returns to -1

Lookup queries check both tables during transition. New writes target the new table.

Skip Lists

Concept

Skip lists extend sorted linked lists with layered indices, enabling O(log n) search through a space-time tradeoff.

A skip list contains multiple levels where higher levels act as express lanes:

Level	Contains
Highest	Fewest elements, largest gaps
Lowest	All elements, consecutive spacing

Time Complexities

Search: O(log n) average, O(n) worst case
Insert: O(log n) - find position, create random levels
Delete: O(log n) - unlink from all levels


int generate_random_level() {
    int lvl = 1;
    while ((rand() & 0xFFFF) < (SKIPLIST_P * 0xFFFF))
        lvl++;
    return (lvl < MAX_LEVEL) ? lvl : MAX_LEVEL;
}

The constant ZSKIPLIST_P (0.25) means level n+1 is 1/4 as likely as level n.

Redis Implementation


typedef struct skip_node {
    simple_string *element;
    double ranking_score;
    struct skip_node *backward;
    struct skip_level {
        struct skip_node *forward;
        unsigned int span;     /* distance between nodes */
    } levels[];
} skip_node;

typedef struct skip_list {
    skip_node *header;
    skip_node *tail;
    unsigned long element_count;
    int current_max_level;
} skip_list;

Comparison with Alternatives

Structure	Range Queries	Single Lookup	Memory	Implementation
Skip List	Efficient	O(log n)	~1.33 pointers/node	Straightforward
Balanced Tree	Requires traversal	O(log n)	2 pointers/node	Complex rotations
Hash Table	Not supported	O(1) avg	Varies	Moderate

Integer Sets

Structure


typedef struct integer_collection {
    uint32_t encoding;    /* 16, 32, or 64-bit */
    uint32_t size;        /* element count */
    int8_t contents[];    /* flexible array storage */
} integer_collection;

Integer sets store unique integers in sorted order using contiguous memory.

Upgrade Mechanism

When adding values requiring more bits than current encoding:

Reallocate storage for new, larger encoding
Convert existing elements to new format
Insert new element while maintaining sorted order

Upgrades provide flexibility and memory efficiency—they only expand when necessary and are irreversible.

Compressed Lists

Layout

A compressed list is a sequential memory structure containing:

Total byte length
Offset to last element
Element count
Individual entries
End terminator (0xFF)

Entry Structure

Component	Size	Purpose
previous_entry_length	1-5 bytes	Enables backward traversal
encoding	1-5 bytes	Type and content length
content	Variable	Actual value storage

Cascade Update Concern

Modifying entries can trigger cascade reallocations when previous_entry_length fields need expansion. This worst-case O(n²) scenario rarely occurs with random data but remains a theoretical concern.

Redis Object System

Object Overview

Redis wraps底层数据结构 with an object layer providing type information and reference counting. Keys are always strings; values can be strings, lists, sets, hashes, or sorted sets.

Object Structure


typedef struct redis_object {
    unsigned type:4;           /* STRING, LIST, SET, ZSET, HASH */
    unsigned encoding:4;       /* Actual storage format */
    void *internal_data;       /* Pointer to underlying structure */
    int reference_count;       /* For memory management */
} redis_object;

Encoding Options by Type

Type	Encodings Available	Conditions
String	INT, EMBSTR, RAW	Integer values use INT encoding
List	ZIPLIST, LINKEDLIST	Small lists use ZIPLIST
Hash	ZIPLIST, HASHTABLE	Few fields use ZIPLIST
Set	INTSET, HASHTABLE	All integers, small cardinality
ZSet	ZIPLIST, SKIPLIST+HASHTABLE	Small datasets prefer ZIPLIST

String Object Encodings

INT: Direct 64-bit integer storage
EMBSTR: Embedded SDS for short strings (< 44 bytes)
RAW: Separate heap allocation for longer strings

ZSet Hybrid Structure

The SKIPLIST+HASHTABLE encoding combines both structures:

Skip list maintains ordered iteration
Hash table provides O(1) member lookup

Memory Management

Redis uses reference counting for garbage collection. Shared objects (integer strings) reduce memory when multiple keys reference identical values.

Integer strings up to 2^63-1 can be shared across the entire database, exemplifying the flyweight pattern.

Tags: Redis data-structures Bitmaps HyperLogLog

Back to List

Prev: Understanding Spring Transaction Propagation Mechanism

Next: Modern Admin Dashboard Boilerplate: React 18, Vite, Ant Design, and Granular Role-Based Access Control

Fading Coder