Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

HashMap Internals: Arrays, Linked Lists, and Red-Black Trees

Tech May 13 1

Architectural Overview

HashMap implements the Map interface atop a hash table. It accepts a single null key, allows multiple null values, guarantees no iteration ordering, and offers no built-in synchronization. Prior to Java 8, the structure consisted exclusively of an array of singly-linked lists. Modern releases augment this with red-black trees so that severe collision clusters degrade to logarithmic rather than linear retrieval time.

Underlying Data Layout

The backing store is a resizable array of nodes commonly called buckets. Each slot may contain a lone node, a linked sequence of nodes, or a balanced tree. The default array length is sixteen, and every valid capacity remains a power of two.

Indexing does not use the raw hashCode() directly. Instead, the implementation first disperses the bits of the hash value:

static int spread(int h) {
    return (h == 0) ? 0 : h ^ (h >>> 16);
}

The bucket index is then derived through bitwise masking:

int index = (capacity - 1) & spread(key.hashCode());

This replacement for modulo division is valid only because the capacity is constrained to powers of two, yielding faster execution than the % operator while keeping entries uniformly distributed.

Insertion Mechanics

When put(key, value) is invoked, the runtime performs the following steps:

  1. Compute the dispersed hash of the supplied key.
  2. Mask the hash to obtain a legal bucket offset.
  3. If the target bucket is empty, instantiate a Node and install it directly.
  4. If the first node in the bucket matches the supplied key through both hash code equality and equals(), overwrite the mapped value and return.
  5. Otherwise, when the bucket already hosts a tree structure, delegate insertion to the tree-specific routine.
  6. If the bucket contains a linked list, traverse to the tail and append the new entry. During traversal, if a matching key is found, update its value and exit.
  7. After appending, if the list length reaches eight and the current array length is at least sixty-four, replace the list with a red-black tree. Should the array be smaller than sixtty-four, prefer doubling the array size rather than treeifying.
  8. Increment the entry count, and if it exceeds the threshold defined by capacity × loadFactor (default 0.75), execute a resize.

Tail insertion supersedes the head-insertion strategy found in older releases, removing the circular-reference hazards that previously surfaced during concurrent expansion.

Lookup Path

Retrieval mirrors insertion in reverse:

  1. Disperse the key’s hash and locate the corresponding bucket.
  2. Return null immediate if the bucket is vacant.
  3. If the first node matches, return its stored value.
  4. Otherwise, search within the tree if the bucket is treified, or walk the linked list sequentially.

A successful match requires both identical hash codes and equals() agreement; sharing only the hash code is insufficient.

Expansion Strategy

Resize doubles the array capacity and repositions existing entries. Because capacity always remains a power of two, entries do not need full rehashing. Instead, the algorithm inspects one additional bit of the dispersed hash that corresponds to the old capacity:

  • If the bit is zero, the entry stays at its original index.
  • If the bit is one, the entry moves to oldIndex + oldCapacity.

During migration, lists split into two separate chains: a low chain that remains at the old offset and a high chain shifted upward by the previous capacity. Trees undergo an analogous split, and if either fragment shrinks to six or fewer nodes, it reverts to a plain linked list to reduce memory and maintenance overhead.

Version Comparison

Aspect JDK 7 JDK 8+
Collision storage Singly-linked list only Linked list plus red-black tree
Insertion order Head insertion Tail insertion
Index recomputation during resize Full rehash modulo new size Single-bit test for offset adjustment
Concurrent expansion risk Circular list possible Eliminated by tail insertion and split logic
Worst-case lookup complexity O(n) O(log n) after treeification

Practical Illustration

import java.util.Map;
import java.util.HashMap;

public class BucketExplorer {
    public static void main(String[] args) {
        Map<Integer, String> dictionary = new HashMap<>(16);

        dictionary.put(100, "Alice");
        dictionary.put(200, "Bob");
        dictionary.put(300, "Charlie");
        dictionary.put(null, "Root");

        System.out.println(dictionary.get(100));
        System.out.println(dictionary.get(null));

        for (int idx = 0; idx < 20; idx++) {
            dictionary.put(idx, "Val" + idx);
        }
        System.out.println("Backing array length: " + fetchTableLength(dictionary));
    }

    private static int fetchTableLength(Map<?, ?> dictionary) {
        try {
            java.lang.reflect.Field bucketField = HashMap.class.getDeclaredField("table");
            bucketField.setAccessible(true);
            Object[] buckets = (Object[]) bucketField.get(dictionary);
            return buckets == null ? 0 : buckets.length;
        } catch (ReflectiveOperationException e) {
            throw new RuntimeException(e);
        }
    }
}

Design Trade-offs

The default load factor of 0.75 balances memory consumption against collision frequancy. Raising it conserves space but lengthens chains; lowering it speeds lookups at the cost of more frequent doubling. Treeification thresholds—eight for promotion and six for demotion—were selected according to Poisson distribution models of hash collisions, which prevent unnecessary oscillation between list and tree representations. Constraining capacity to powers of two enables bitwise masking to replace modulo arithmetic and allows resize operations to exploit single-bit tests for rapid relocation.

Tags: Java

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.