Java

HashMap: Buckets and Collisions

HashMap is the workhorse of Java collections—the go-to data structure for key-value storage that powers everything from caching layers to configuration systems. Yet many developers treat it as a black box, unaware of the elegant engineering that enables its O(1) average-case performance.

This article takes you inside HashMap's implementation: how it transforms hash codes into bucket indices, handles collisions through chaining, and manages capacity through dynamic resizing. Understanding these internals isn't just academic—it's the difference between a HashMap that performs flawlessly and one that degrades to O(n) in production.

Whether you're optimizing a hot code path, debugging mysterious performance issues, or preparing for senior-level interviews, mastering HashMap internals is essential knowledge for any serious Java developer.

🎯 At a Glance

What it is: HashMap is Java's primary key-value data structure, using hash-based bucket storage for O(1) average-case lookups.
Key insights:
  • Keys are converted to bucket indices via hashCode() and bitwise operations
  • Collisions are handled by chaining (linked list, tree after threshold)
  • Load factor (0.75) triggers resize at 75% capacity
  • Capacity must be power of 2 for efficient index calculation
  • Never use mutable objects as keys
Complexity:
OperationAverageWorst Case
get(key)O(1)O(n)*
put(key, value)O(1)O(n)*
remove(key)O(1)O(n)*
containsKeyO(1)O(n)*

*Worst case with poor hash distribution; O(log n) after treeification (Java 8+)

One-liner: HashMap trades memory for speed by using hash codes to directly locate entries in an array of buckets.

Interactive Visualizer

See HashMap operations in action - watch how keys are hashed to bucket indices and how collisions are handled through chaining:

Loading visualizer...

📋 Prerequisites

Before reading this article, you should understand:

  • Part 3: equals() & hashCode() Contract - Critical for understanding how HashMap finds entries
  • Basic Java generics - HashMap<K,V> uses generic types
  • Big O notation - To understand performance characteristics

📘 Introduction

🔥 The Black Friday Incident

"Black Friday, 2019. Our e-commerce search service ground to a halt. Response times jumped from 5ms to 45 seconds. The culprit? A HashMap with 2 million product entries—all landing in the same bucket."
Our product catalog used a custom ProductKey class as HashMap keys. The developer had overridden equals() but forgot hashCode(). Every single key returned the default identity hash from Object.hashCode(), which happened to collide spectacularly due to our JVM configuration.

Two million entries. One bucket. A linked list 2 million nodes long.

That day, I learned that HashMap's O(1) promise is a contract—and like any contract, it requires both parties to uphold their end. HashMap provides the infrastructure; you provide proper hash codes.

🌍 Why HashMap Internals Matter

HashMap is everywhere in Java applications:

  • Caching layers - In-memory caches, session stores
  • Configuration - Properties, feature flags
  • Data processing - Grouping, deduplication, lookups
  • Framework internals - Spring beans, Hibernate caches

Understanding its internals helps you:

  • Avoid performance pitfalls - Proper capacity planning, key design
  • Debug production issues - Recognize symptoms of hash collision problems
  • Make informed decisions - When to use HashMap vs TreeMap vs LinkedHashMap
  • Ace interviews - HashMap internals is a top-10 Java interview topic

🎯 What You'll Learn

  1. Bucket array structure - How entries are stored and located
  2. Hash function internals - How hashCode() becomes a bucket index
  3. Collision handling - Chaining strategy and its implications
  4. Load factor & resizing - When and how HashMap grows
  5. Capacity planning - Choosing the right initial capacity
  6. Key design principles - Creating hash-friendly keys

🧠 Mental Model

Think of HashMap as a Library with Numbered Shelves

Imagine a library where books are stored on numbered shelves (buckets):

TEXT(17 lines)
Code
Loading syntax highlighter...

The Core Algorithm

TEXT(15 lines)
Code
Loading syntax highlighter...

Visual: HashMap Memory Layout

TEXT(27 lines)
Code
Loading syntax highlighter...
Key insight: Good hash distribution = entries spread evenly = short chains = fast lookups.

🔬 Deep Dive

1. The Bucket Array

At the heart of HashMap lies a simple array:

JAVA(16 lines)
Code
Loading syntax highlighter...
Why cache the hash?
  • Hash is computed once during put()
  • Reused during get(), remove(), and resize operations
  • Comparing hashes is faster than comparing keys (especially for Strings)

2. The Hash Function

HashMap doesn't use hashCode() directly. It applies additional mixing:
JAVA(5 lines)
Code
Loading syntax highlighter...
Why XOR with upper bits?
TEXT(9 lines)
Code
Loading syntax highlighter...
Problem solved: When capacity is small (e.g., 16), only lower bits determine the bucket index. If many keys differ only in upper bits, they'd all collide. XOR spreads that entropy.

3. Index Calculation

JAVA(2 lines)
Code
Loading syntax highlighter...
Why bitwise AND instead of modulo?
TEXT(8 lines)
Code
Loading syntax highlighter...
Why capacity must be power of 2:
TEXT(5 lines)
Code
Loading syntax highlighter...

4. Collision Resolution: Chaining

When two keys map to the same bucket, HashMap chains them:

JAVA(50 lines)
Code
Loading syntax highlighter...
Performance implication:
  • Empty bucket: O(1) - direct array access
  • Chain of length n: O(n) - must traverse the chain
  • This is why hash distribution matters!

5. Load Factor and Threshold

JAVA(7 lines)
Code
Loading syntax highlighter...
Why 0.75?
Load FactorTrade-off
0.5More memory, fewer collisions, faster
0.75Balanced (default)
1.0Less memory, more collisions, slower

The value 0.75 is based on statistical analysis - it provides a good balance between space efficiency and collision probability.

6. Resize Operation

When size exceeds threshold, HashMap doubles its capacity:

JAVA(23 lines)
Code
Loading syntax highlighter...
Resize is expensive:
  • Allocates new array (2x size)
  • Rehashes every single entry
  • Causes GC pressure
  • Can cause latency spikes in production
Visual: How entries are redistributed
TEXT(12 lines)
Code
Loading syntax highlighter...

7. Initial Capacity Planning

JAVA(11 lines)
Code
Loading syntax highlighter...
Resize cascade with default capacity:
TEXT(6 lines)
Code
Loading syntax highlighter...

Each resize:

  • Allocates new array
  • Copies all entries
  • Causes GC pressure
  • Increases latency

⚠️ Common Mistakes

Mistake 1: Mutable Keys

Problem:
JAVA(13 lines)
Code
Loading syntax highlighter...
Solution:
JAVA(9 lines)
Code
Loading syntax highlighter...
Why: HashMap stores entries based on hashCode(). If key mutates, its hash changes, but the entry stays in the old bucket. It becomes invisible to lookups.

Mistake 2: Missing hashCode() Override

Problem:
JAVA(24 lines)
Code
Loading syntax highlighter...
Solution:
JAVA(17 lines)
Code
Loading syntax highlighter...
Rule: If two objects are equals(), they MUST have the same hashCode().

Mistake 3: Ignoring Initial Capacity

Problem:
JAVA(9 lines)
Code
Loading syntax highlighter...
Solution:
JAVA(8 lines)
Code
Loading syntax highlighter...
Impact:
  • Avoids resize overhead
  • Reduces GC pressure
  • Prevents latency spikes
  • Memory is allocated upfront (predictable)

Mistake 4: Modifying Map During Iteration

Problem:
JAVA(10 lines)
Code
Loading syntax highlighter...
Solution 1: Use Iterator.remove()
JAVA(7 lines)
Code
Loading syntax highlighter...
Solution 2: Use removeIf() (Java 8+)
JAVA
Code
Loading syntax highlighter...
Solution 3: Collect and remove
JAVA(6 lines)
Code
Loading syntax highlighter...

Mistake 5: Using HashMap in Multi-threaded Code

Problem:
JAVA(14 lines)
Code
Loading syntax highlighter...
Solution:
JAVA(18 lines)
Code
Loading syntax highlighter...

🐛 Debug This

Find the bug in this code:

JAVA(36 lines)
Code
Loading syntax highlighter...
🔍 Hint: What happens to the hash code when lastLogin changes?
✅ Solution:
The Bug: lastLogin is mutable and included in equals() and hashCode().
When a user logs in and updateLastLogin() is called:
  1. The User object's hashCode() changes
  2. The entry is now in the wrong bucket
  3. cache.get(user) looks in the new bucket (based on new hash)
  4. Entry is never found - it's still in the old bucket
The Fix:
JAVA(20 lines)
Code
Loading syntax highlighter...
Rule: Only include immutable, identity-defining fields in equals() and hashCode().

💻 Practical Exercises

Exercise 1: Basic HashMap Operations

⭐ Difficulty: Easy | ⏱️ Time: 10 minutes

Task: Implement a word frequency counter.
JAVA(5 lines)
Code
Loading syntax highlighter...
Test:
JAVA(4 lines)
Code
Loading syntax highlighter...
✅ Solution:
JAVA(11 lines)
Code
Loading syntax highlighter...

Exercise 2: Proper Initial Capacity

⭐⭐ Difficulty: Easy-Medium | ⏱️ Time: 15 minutes

Task: Create a HashMap that will hold exactly 10,000 entries without ever resizing.
JAVA(4 lines)
Code
Loading syntax highlighter...
Requirements:
  • Calculate the minimum initial capacity needed
  • Verify no resize occurs when adding 10,000 entries
✅ Solution:
JAVA(31 lines)
Code
Loading syntax highlighter...

Exercise 3: Custom Key with Proper Contract

⭐⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes

Task: Implement a CompositeKey class that can be safely used as a HashMap key.
JAVA(8 lines)
Code
Loading syntax highlighter...
Requirements:
  • Two keys are equal if namespace, name, AND version match
  • Must be immutable
  • Must work correctly in HashMap
✅ Solution:
JAVA(50 lines)
Code
Loading syntax highlighter...

Exercise 4: Diagnose Hash Distribution

⭐⭐⭐⭐ Difficulty: Medium-Hard | ⏱️ Time: 25 minutes

Task: Write a utility to analyze hash distribution in a HashMap.
JAVA(18 lines)
Code
Loading syntax highlighter...
✅ Solution:
JAVA(88 lines)
Code
Loading syntax highlighter...

Exercise 5: LRU Cache Foundation

⭐⭐⭐⭐⭐ Difficulty: Hard | ⏱️ Time: 30 minutes

Task: Implement a simple LRU (Least Recently Used) cache using HashMap + custom doubly-linked list.
JAVA(17 lines)
Code
Loading syntax highlighter...
Requirements:
  • Both get() and put() should be O(1)
  • Most recently used items should not be evicted
  • Least recently used items should be evicted first
✅ Solution:
JAVA(87 lines)
Code
Loading syntax highlighter...

📝 Summary & Key Takeaways

What We Covered

  1. Bucket Array Structure - HashMap uses a Node<K,V>[] array where entries are stored based on their hash code
  2. Hash Function - Keys are hashed using hashCode() ^ (hashCode >>> 16) to spread entropy across all bits
  3. Index Calculation - Bucket index is computed as hash & (capacity - 1), requiring power-of-2 capacity
  4. Collision Handling - Collisions are resolved by chaining entries in linked lists (trees after threshold)
  5. Load Factor (0.75) - HashMap resizes when size exceeds capacity × loadFactor
  6. Capacity Planning - Pre-sizing with expectedSize / 0.75 + 1 avoids expensive resize operations

🎯 Production Checklist

  • Use immutable objects as keys (String, Integer, records)
  • Always override both equals() and hashCode() together
  • Pre-size HashMap when expected size is known: new HashMap<>(expectedSize / 0.75 + 1)
  • Use ConcurrentHashMap for multi-threaded access
  • Use computeIfAbsent() for lazy initialization patterns
  • Use merge() for counting/aggregation patterns
  • Never modify keys after putting them in the map
  • Use removeIf() or Iterator.remove() for safe removal during iteration

📊 Key Numbers to Remember

MetricValueWhy It Matters
Default capacity16Starting bucket array size
Default load factor0.75Resize threshold
Treeify threshold8Chain length to convert to tree
Min treeify capacity64Minimum capacity for treeification

🎤 Senior-Level Interview Questions

Question 1: How does HashMap calculate the bucket index from a key?

Answer:

HashMap uses a two-step process:

  1. Hash the key: hash = key.hashCode() ^ (key.hashCode() >>> 16)
    • XORs upper 16 bits with lower 16 bits to spread entropy
  2. Calculate index: index = hash & (capacity - 1)
    • Bitwise AND with (capacity - 1) is equivalent to modulo
    • Works because capacity is always a power of 2
Why this matters: The XOR step ensures that keys differing only in upper bits don't all collide. The bitwise AND is faster than modulo operation.

Question 2: What happens when two different keys have the same hash code?

Answer:
This is called a hash collision. HashMap handles it through chaining:
  1. Both entries go to the same bucket
  2. They form a linked list within that bucket
  3. On lookup, HashMap walks the chain, comparing each key with equals()
  4. In Java 8+, if chain exceeds 8 nodes and capacity ≥ 64, it converts to a red-black tree (O(log n) instead of O(n))
Performance impact: Collisions degrade performance from O(1) to O(n) for linked lists, or O(log n) for trees.

Question 3: Why must HashMap capacity be a power of 2?

Answer:

For efficient index calculation using bitwise AND instead of modulo:

JAVA
Code
Loading syntax highlighter...
When capacity is a power of 2 (e.g., 16), capacity - 1 is all 1s in binary (e.g., 15 = 1111). This makes hash & (capacity - 1) equivalent to hash % capacity, but much faster.
Additional benefit: During resize (doubling capacity), each entry either stays in the same bucket or moves to old_index + old_capacity. This can be determined by checking a single bit.

Question 4: What is the default load factor and why was 0.75 chosen?

Answer:
The default load factor is 0.75.
Trade-offs:
  • Lower (e.g., 0.5): Fewer collisions, more memory usage, faster operations
  • Higher (e.g., 1.0): More collisions, less memory usage, slower operations
  • 0.75: Balanced trade-off between time and space
Statistical basis: With random hash distribution, 0.75 keeps the average chain length around 0.5-1.0 nodes, which provides near-O(1) performance while using memory efficiently.

Question 5: What happens if you use a mutable object as a HashMap key and then modify it?

Answer:
The entry becomes unreachable ("lost"):
  1. Entry is stored in bucket based on original hashCode()
  2. When object is mutated, hashCode() changes
  3. get() calculates new hash, looks in different bucket
  4. Entry is never found, even though it still exists
This causes:
  • Memory leaks (entry never garbage collected)
  • Incorrect containsKey() results
  • Silently failing lookups
Prevention:
  • Use immutable keys (String, Integer, records)
  • Only include immutable fields in equals() and hashCode()

Question 6: How would you diagnose a HashMap performance problem in production?

Answer:
Symptoms to look for:
  • Slow get()/put() operations
  • High CPU usage in HashMap methods
  • Thread contention (if incorrectly shared)
Diagnostic steps:
  1. Check key distribution:
    JAVA(2 lines)
    Code
    Loading syntax highlighter...
  2. Verify equals/hashCode contract:
    • Are both overridden?
    • Do equal objects have same hashCode?
    • Are keys mutable?
  3. Check initial capacity:
    • Is the map resizing frequently?
    • What's the current capacity vs size?
  4. Monitor for concurrency issues:
    • Is HashMap shared between threads without synchronization?
    • Look for infinite loops, corrupted state
Quick fix checklist:
  • Set appropriate initial capacity
  • Use ConcurrentHashMap for shared access
  • Ensure proper equals/hashCode implementation
  • Use immutable keys

📋 Quick Reference

When to Use HashMap

Use when:
  • Need fast O(1) key-value lookups
  • Keys have good hash distribution
  • Iteration order doesn't matter
  • Single-threaded access (or using ConcurrentHashMap)
Don't use when:
  • Need sorted keys → use TreeMap
  • Need insertion order → use LinkedHashMap
  • Need thread-safety → use ConcurrentHashMap
  • Keys are enums → use EnumMap (faster)

Complexity Table

OperationAverageWorst Case
put(key, value)O(1)O(n) / O(log n)*
get(key)O(1)O(n) / O(log n)*
remove(key)O(1)O(n) / O(log n)*
containsKey(key)O(1)O(n) / O(log n)*
containsValue(value)O(n)O(n)
size()O(1)O(1)
isEmpty()O(1)O(1)
iterationO(n + capacity)O(n + capacity)

*O(log n) after treeification in Java 8+

Key Configuration

ParameterDefaultPurpose
Initial capacity16Starting bucket array size
Load factor0.75Resize threshold = capacity × loadFactor
Treeify threshold8Chain length to convert to tree
Untreeify threshold6Tree size to convert back to list
Min treeify capacity64Minimum capacity before treeification

Capacity Formula

JAVA(5 lines)
Code
Loading syntax highlighter...

Null Handling

KeyValue
✅ One null key allowed✅ Multiple null values allowed

Common Patterns

JAVA(14 lines)
Code
Loading syntax highlighter...

🏁 Conclusion

HashMap's elegance lies in its simplicity: convert a key to a number, use that number to find a bucket, and chain collisions together. This straightforward approach delivers O(1) average-case performance for the most common operations in programming—storing and retrieving data by key.

But this simplicity comes with a contract. HashMap provides the infrastructure; you provide:

  • Proper hash codes that distribute evenly across buckets
  • Immutable keys that don't change after insertion
  • Consistent equals/hashCode that satisfy the Java object contract
  • Appropriate sizing to avoid expensive resize operations

When you honor this contract, HashMap becomes an incredibly powerful tool. When you violate it—as we saw in the Black Friday incident—HashMap can become your biggest performance bottleneck.

In the next article, we'll explore how Java 8 strengthened HashMap against worst-case scenarios by introducing tree bins, and how this protects your application from hash collision attacks.


🔄 Review Schedule

  • After 1 day: Re-read "At a Glance" and "Mental Model" sections
  • After 3 days: Redo Exercises 1-2 without looking at solutions
  • After 7 days: Answer all Interview Questions out loud
  • After 14 days: Do the "Debug This" exercise again
  • After 30 days: Review Quick Reference, check if you remember key numbers (0.75, 16, 8)

  • Part 3: equals() & hashCode() - The contract that makes HashMap work
  • Part 9: HashSet - Uses HashMap internally
  • Part 13: HashMap Tree Bins - Deep dive into Java 8+ treeification
  • Part 15: LinkedHashMap - HashMap with predictable iteration order

Next Up

  • Part 13: HashMap - Tree Bins & Java 8+ Optimizations - Learn how HashMap defends against hash collision attacks

External Resources