Redis Hash Table Architecture and Progressive Rehashing
The underlying key-value mapping utilizes an array of linked lists. The primary container manages two separate hash tables (buckets[2]) to facilitate online resizing. The buckets[0] slot holds the active data, while buckets[1] is allocated exclusively for progressive migration. Additional fields track element counts (item_counts[2]), the current migration index (migration_cursor, set to -1 when idle), and the table size exponent (enforcing power-of-two dimensions to optimize modulo operations into bitwise AND).
c struct BucketNode { void *index_key; union { void *payload; uint64_t uint_val; int64_t int_val; double float_val; } data; struct BucketNode *link; // Next node in chain void *extra_data[]; };
Each BucketNode encapsulates the indexing key, a unioned value payload, and a pointer to the next node within the same array slot, forming a singly linked list to resolve hash collisions.
Initialization
c HashMap* createHashMap(HashMapOps *ops) { size_t meta_sz = ops->getMetaSize ? ops->getMetaSize() : 0; HashMap *map = allocMem(sizeof(*map) + meta_sz); if (meta_sz) clearMemory(map->meta, meta_sz); setupHashMap(map, ops); return map; }
int setupHashMap(HashMap *map, HashMapOps *ops) { resetTable(map, 0); resetTable(map, 1); map->ops = ops; map->migration_cursor = -1; map->pause_migration = 0; return STATUS_OK; }
Instantiation allocates memory for the main structure along with any required metadata, followed by initializing the operational callbacks and setting the migration state to idle.
Key-Value Insertion
c int insertRecord(HashMap *map, void *key, void *value) { BucketNode *node = addRawKey(map, key, NULL); if (!node) return STATUS_ERR; if (!map->ops->key_only_mode) assignValue(map, node, value); return STATUS_OK; }
Adding an element involves creating the raw node first. If the insertion succeeds and the map is configured to store values (not key-only), the value is assigned.
c BucketNode* addRawKey(HashMap *map, void *key, BucketNode **existing) { void *slot = findInsertionSlot(map, key, existing); if (!slot) return NULL; if (map->ops->dupKey) key = map->ops->dupKey(map, key); return placeNodeAtSlot(map, key, slot); }
The raw insertion logic locates the appropriate array slot. If the key already exists, it fails. Otherwise, it duplicates the key if necessary and places the node.
c void* findInsertionSlot(HashMap *map, const void *key, BucketNode **existing) { uint64_t hash = computeHash(map, key); if (existing) *existing = NULL; if (isMigrating(map)) performMigrationStep(map); if (checkCapacityGrowth(map) == STATUS_ERR) return NULL;
for (int tbl = 0; tbl <= 1; tbl++) {
unsigned long idx = hash & sizeMask(map, tbl);
BucketNode *curr = map->buckets[tbl][idx];
while (curr) {
if (compareKeys(map, key, curr->index_key)) {
if (existing) *existing = curr;
return NULL;
}
curr = curr->link;
}
if (!isMigrating(map)) break;
}
BucketNode **target_bucket = &map->buckets[isMigrating(map) ? 1 : 0][hash & sizeMask(map, isMigrating(map) ? 1 : 0)];
return target_bucket;
}
When locating a slot, both the primary and secondary tables are queried during active migration. If the key is absent, the target bucket is returned. Crucially, during migration, new entry are strictly routed to the secondary table (buckets[1]). Read, update, and delete operations query buckets[0] first, then buckets[1], ensurign the primary table's size only decreases.
c BucketNode* placeNodeAtSlot(HashMap *map, void *key, void *position) { BucketNode **bucket = position; int target_idx = isMigrating(map) ? 1 : 0; BucketNode *node = allocMem(sizeof(*node)); node->index_key = key; node->link = *bucket; // Head insertion (O(1)) *bucket = node; map->item_counts[target_idx]++; return node; }
The node is prepended to the linked list at the calculated index. Head insertion guarantees constant time complexity. The value is not set at this stage; it is assigned subsequently by assignValue if applicable.
Capacity Expansion
c static int checkCapacityGrowth(HashMap *map) { if (isMigrating(map)) return STATUS_OK; if (getCapacity(map, 0) == 0) return expandTable(map, BASE_CAPACITY);
if ((resize_enabled && map->item_counts[0] >= getCapacity(map, 0)) ||
(resize_not_forbidden && map->item_counts[0] / getCapacity(map, 0) > FORCED_GROWTH_LIMIT)) {
if (!isExpansionPermitted(map)) return STATUS_OK;
return expandTable(map, map->item_counts[0] + 1);
}
return STATUS_OK;
}
Growth is triggered when the load factor reaches 1 under normal conditions. During background persistence (copy-on-write), memory page splits are expensive. To mitigate this, the threshold spikes to FORCED_GROWTH_LIMIT (typically 5), avoiding massive memory duplications. The target capacity is always the smallest power of two greater than the requested size.
c int expandTable(HashMap *map, unsigned long min_size) { if (isMigrating(map) || map->item_counts[0] > min_size) return STATUS_ERR;
unsigned char new_exp = nextPowerOfTwoExp(min_size);
unsigned long new_cap = 1UL << new_exp;
BucketNode **new_table = allocZeroedMem(new_cap * sizeof(BucketNode*));
if (map->buckets[0] == NULL) {
map->size_exps[0] = new_exp;
map->buckets[0] = new_table;
return STATUS_OK;
}
map->size_exps[1] = new_exp;
map->buckets[1] = new_table;
map->migration_cursor = 0;
return STATUS_OK;
}
Expansion allocates the secondary array. If the primary table is uninitialized, it takes the new array directly. Otherwise, the secondary table is assigned, and the migration cursor is set to 0 to begin progressive rehashing.
Progressive Rehashing
c int executeMigrationStep(HashMap *map, int steps) { int empty_visits = steps * 10; if (resize_forbidden || !isMigrating(map)) return 0;
while (steps-- && map->item_counts[0] != 0) {
BucketNode *curr, *subsequent;
while (map->buckets[0][map->migration_cursor] == NULL) {
map->migration_cursor++;
if (--empty_visits == 0) return 1;
}
curr = map->buckets[0][map->migration_cursor];
while (curr) {
subsequent = curr->link;
uint64_t new_idx;
if (isShrinking(map)) {
new_idx = map->migration_cursor & sizeMask(map, 1);
} else {
new_idx = computeHash(map, curr->index_key) & sizeMask(map, 1);
}
curr->link = map->buckets[1][new_idx];
map->buckets[1][new_idx] = curr;
map->item_counts[0]--;
map->item_counts[1]++;
curr = subsequent;
}
map->buckets[0][map->migration_cursor] = NULL;
map->migration_cursor++;
}
if (map->item_counts[0] == 0) {
freeMem(map->buckets[0]);
map->buckets[0] = map->buckets[1];
map->item_counts[0] = map->item_counts[1];
map->size_exps[0] = map->size_exps[1];
resetTable(map, 1);
map->migration_cursor = -1;
return 0; // Completed
}
return 1; // Ongoing
}
Migration occurs incrementally to avoid blocking the main thread. Each invocation proceses a specified number of bucket chains. To prevent stalling on sparse regions with many empty buckets, a limit on empty visits is enforced. Nodes are relocated to the secondary table using head insertion. When shrinking, the index is derived by masking the original index, whereas expanding requires recalculating the hash. Once the primary table is completely drained, the secondary table is promoted, and the migration state is cleared.