Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

HashMap Implementation and Internal Mechanics in Java

Tech 1

Key Characteristics

  1. No guaranteed order of elements during iteration.
  2. Keys and values can be null, but only one null key is allowed.
  3. Keys are unique, enforced by the underlying data structure.
  4. Pre-JDK 1.8: implemented with an array of linked lists. From JDK 1.8 onward, long chains (length > 8) in buckets are converted to red-black trees when the table size exceeds 64, improving worst-case lookup performance.
  5. Tree conversion thresholds: a bucket switches from a linked list to a red-black tree only if (a) the table length ≥ 64 and (b) the chain length > 8.

Underlying Data Structure

In JDK 1.8+, the internal storage is a Node<K,V>[] table, initialized lazily on the first put() call rather than in the constructor. Each Node implements Map.Entry<K, V> and may form part of a linked list or a red-black tree with in a bucket.

Insertion Workflow Example

HashMap<String, Integer> map = new HashMap<>();
map.put("Alice", 10);
map.put("Bob", 20);
map.put("Charlie", 30);

When inserting a key-value pair:

  1. The key’s hashCode() is computed.
  2. This hash is mixed using: h ^ (h >>> 16) to improve distribution.
  3. The final index is derived via: (table.length - 1) & hash.

If the target bucket is empty, the entry is placed directly. If occupied, the key is compared using equals(). If equal, the value is updated; otherwise, a new node is appended to the chain.

Hash Computation Details

The standardd hash transformation:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

This spreads higher-bit variance into the lower bits, which is critical because the index calculation only uses the lower log₂(table.length) bits. Without this step, patterns in hash codes could cause clustering.

Alternative hashing strategies like modulo arithmetic (hash % length) are avoided because bitwise AND with (length - 1) is faster—but only when length is a power of two.

Why Table Size Is a Power of Two

Using a table size that is a power of two allows efficient index computation via hash & (length - 1), which is equivalent to hash % length but much faster. If the size weren’t a power of two, this equivalence would break, leading to poor distribution and more collisions.

When a custom initial capacity is provided (e.g., new HashMap<>(20)), the implementation rounds it up to the next power of two (e.g., 32) using bit manipulation:

// Simplified logic
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;

Handling Hash Collisions and Equality

  • Equal objects must have equal hash codes (per Object contract).
  • If two keys have the same hash code but are not equals(), they coexist in the same bucket as separate nodes.
  • For String and other well-designed classes:
    • Equal values → same hash code and equals() == true.
    • Different values → equals() == false; hash codes may still collide (due to 32-bit reduction from arbitrary object state).

Thus, after hash-based bucket selection, equals() is always used to confirm key identity before updating or inserting.

Miscellaneous Implementation Notes

  • Lazy table initialization: The internal array is created on the first insertion, not during construction.
  • Load factor and resizing: When the number of entries exceeds capacity * loadFactor (default 0.75), the table doubles in size.
  • In bulk operations like putMapEntries, the capacity estimate includes a +1 adjustment to avoid premature resizing: float ft = ((float)size / loadFactor) + 1.0F;.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.