HashMap Implementation and Internal Mechanics in Java
Key Characteristics
- No guaranteed order of elements during iteration.
- Keys and values can be
null, but only onenullkey is allowed. - Keys are unique, enforced by the underlying data structure.
- Pre-JDK 1.8: implemented with an array of linked lists. From JDK 1.8 onward, long chains (length > 8) in buckets are converted to red-black trees when the table size exceeds 64, improving worst-case lookup performance.
- Tree conversion thresholds: a bucket switches from a linked list to a red-black tree only if (a) the table length ≥ 64 and (b) the chain length > 8.
Underlying Data Structure
In JDK 1.8+, the internal storage is a Node<K,V>[] table, initialized lazily on the first put() call rather than in the constructor. Each Node implements Map.Entry<K, V> and may form part of a linked list or a red-black tree with in a bucket.
Insertion Workflow Example
HashMap<String, Integer> map = new HashMap<>();
map.put("Alice", 10);
map.put("Bob", 20);
map.put("Charlie", 30);
When inserting a key-value pair:
- The key’s
hashCode()is computed. - This hash is mixed using:
h ^ (h >>> 16)to improve distribution. - The final index is derived via:
(table.length - 1) & hash.
If the target bucket is empty, the entry is placed directly. If occupied, the key is compared using equals(). If equal, the value is updated; otherwise, a new node is appended to the chain.
Hash Computation Details
The standardd hash transformation:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
This spreads higher-bit variance into the lower bits, which is critical because the index calculation only uses the lower log₂(table.length) bits. Without this step, patterns in hash codes could cause clustering.
Alternative hashing strategies like modulo arithmetic (hash % length) are avoided because bitwise AND with (length - 1) is faster—but only when length is a power of two.
Why Table Size Is a Power of Two
Using a table size that is a power of two allows efficient index computation via hash & (length - 1), which is equivalent to hash % length but much faster. If the size weren’t a power of two, this equivalence would break, leading to poor distribution and more collisions.
When a custom initial capacity is provided (e.g., new HashMap<>(20)), the implementation rounds it up to the next power of two (e.g., 32) using bit manipulation:
// Simplified logic
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
Handling Hash Collisions and Equality
- Equal objects must have equal hash codes (per
Objectcontract). - If two keys have the same hash code but are not
equals(), they coexist in the same bucket as separate nodes. - For
Stringand other well-designed classes:- Equal values → same hash code and
equals() == true. - Different values →
equals() == false; hash codes may still collide (due to 32-bit reduction from arbitrary object state).
- Equal values → same hash code and
Thus, after hash-based bucket selection, equals() is always used to confirm key identity before updating or inserting.
Miscellaneous Implementation Notes
- Lazy table initialization: The internal array is created on the first insertion, not during construction.
- Load factor and resizing: When the number of entries exceeds
capacity * loadFactor(default 0.75), the table doubles in size. - In bulk operations like
putMapEntries, the capacity estimate includes a+1adjustment to avoid premature resizing:float ft = ((float)size / loadFactor) + 1.0F;.