HashMap: Buckets and Collisions

HashMap is the workhorse of Java collections—the go-to data structure for key-value storage that powers everything from caching layers to configuration systems. Yet many developers treat it as a black box, unaware of the elegant engineering that enables its O(1) average-case performance.

This article takes you inside HashMap's implementation: how it transforms hash codes into bucket indices, handles collisions through chaining, and manages capacity through dynamic resizing. Understanding these internals isn't just academic—it's the difference between a HashMap that performs flawlessly and one that degrades to O(n) in production.

Whether you're optimizing a hot code path, debugging mysterious performance issues, or preparing for senior-level interviews, mastering HashMap internals is essential knowledge for any serious Java developer.

🎯 At a Glance

What it is: HashMap is Java's primary key-value data structure, using hash-based bucket storage for O(1) average-case lookups.

Key insights:

Keys are converted to bucket indices via hashCode() and bitwise operations
Collisions are handled by chaining (linked list, tree after threshold)
Load factor (0.75) triggers resize at 75% capacity
Capacity must be power of 2 for efficient index calculation
Never use mutable objects as keys

Complexity:

Operation	Average	Worst Case
get(key)	O(1)	O(n)*
put(key, value)	O(1)	O(n)*
remove(key)	O(1)	O(n)*
containsKey	O(1)	O(n)*

*Worst case with poor hash distribution; O(log n) after treeification (Java 8+)

One-liner: HashMap trades memory for speed by using hash codes to directly locate entries in an array of buckets.

Interactive Visualizer

See HashMap operations in action - watch how keys are hashed to bucket indices and how collisions are handled through chaining:

HashMap Operations
Avg: O(1)Worst: O(n)
1 / 20
Initialize HashMap with 8 buckets. Each bucket is a linked list for collision handling.
Real-World Use Cases
📚 Caches
🔑 Session stores
📊 Counting/Grouping
🔗 Database indexes
#️⃣ Hash Function
index = hashCode(key) % capacity
Same key → same index (deterministic) • Different keys may collide → chaining
⚠️ Good hash function minimizes collisions • Load factor > 0.75 triggers resize
Ready to hash...
Bucket Array (capacity: 8)
0
∅
1
∅
2
∅
3
∅
4
∅
5
∅
6
∅
7
∅
Pseudocode
1function put(key, value):
2  hash = hashCode(key)
3  index = hash % capacity
4  bucket = buckets[index]
5  for entry in bucket:
6    if entry.key == key:
7      entry.value = value
8      return
9  bucket.add(Entry(key, value))
10 
11function get(key):
12  hash = hashCode(key)
13  index = hash % capacity
14  bucket = buckets[index]
15  for entry in bucket:
16    if entry.key == key:
17      return entry.value
18  return null
Variables
—
Keyboard Shortcuts
P Play / Pause
[ Step back
] Step forward
R Reset
Speed
Current bucket
Current bucket
Insert/Update
Insert/Update
Found
Found
Not found
Not found
P · [ ] · R
Loading visualizer...

📋 Prerequisites

Before reading this article, you should understand:

Part 3: equals() & hashCode() Contract - Critical for understanding how HashMap finds entries
Basic Java generics - HashMap<K,V> uses generic types
Big O notation - To understand performance characteristics

📘 Introduction

🔥 The Black Friday Incident

"Black Friday, 2019. Our e-commerce search service ground to a halt. Response times jumped from 5ms to 45 seconds. The culprit? A HashMap with 2 million product entries—all landing in the same bucket."

Our product catalog used a custom ProductKey class as HashMap keys. The developer had overridden equals() but forgot hashCode(). Every single key returned the default identity hash from Object.hashCode(), which happened to collide spectacularly due to our JVM configuration.

Two million entries. One bucket. A linked list 2 million nodes long.

That day, I learned that HashMap's O(1) promise is a contract—and like any contract, it requires both parties to uphold their end. HashMap provides the infrastructure; you provide proper hash codes.

🌍 Why HashMap Internals Matter

HashMap is everywhere in Java applications:

Caching layers - In-memory caches, session stores
Configuration - Properties, feature flags
Data processing - Grouping, deduplication, lookups
Framework internals - Spring beans, Hibernate caches

Understanding its internals helps you:

Avoid performance pitfalls - Proper capacity planning, key design
Debug production issues - Recognize symptoms of hash collision problems
Make informed decisions - When to use HashMap vs TreeMap vs LinkedHashMap
Ace interviews - HashMap internals is a top-10 Java interview topic

🎯 What You'll Learn

Bucket array structure - How entries are stored and located
Hash function internals - How hashCode() becomes a bucket index
Collision handling - Chaining strategy and its implications
Load factor & resizing - When and how HashMap grows
Capacity planning - Choosing the right initial capacity
Key design principles - Creating hash-friendly keys

🧠 Mental Model

Think of HashMap as a Library with Numbered Shelves

Imagine a library where books are stored on numbered shelves (buckets):

TEXT(17 lines)
Code
Loading syntax highlighter...

The Core Algorithm

TEXT(15 lines)
Code
Loading syntax highlighter...

Visual: HashMap Memory Layout

TEXT(27 lines)
Code
Loading syntax highlighter...

Key insight: Good hash distribution = entries spread evenly = short chains = fast lookups.

🔬 Deep Dive

1. The Bucket Array

At the heart of HashMap lies a simple array:

JAVA(16 lines)
Code
Loading syntax highlighter...

Why cache the hash?

Hash is computed once during put()
Reused during get(), remove(), and resize operations
Comparing hashes is faster than comparing keys (especially for Strings)

2. The Hash Function

HashMap doesn't use hashCode() directly. It applies additional mixing:

JAVA(5 lines)
Code
Loading syntax highlighter...

Why XOR with upper bits?

TEXT(9 lines)
Code
Loading syntax highlighter...

Problem solved: When capacity is small (e.g., 16), only lower bits determine the bucket index. If many keys differ only in upper bits, they'd all collide. XOR spreads that entropy.

3. Index Calculation

JAVA(2 lines)
Code
Loading syntax highlighter...

Why bitwise AND instead of modulo?

TEXT(8 lines)
Code
Loading syntax highlighter...

Why capacity must be power of 2:

TEXT(5 lines)
Code
Loading syntax highlighter...

4. Collision Resolution: Chaining

When two keys map to the same bucket, HashMap chains them:

JAVA(50 lines)
Code
Loading syntax highlighter...

Performance implication:

Empty bucket: O(1) - direct array access
Chain of length n: O(n) - must traverse the chain
This is why hash distribution matters!

5. Load Factor and Threshold

JAVA(7 lines)
Code
Loading syntax highlighter...

Why 0.75?

Load Factor	Trade-off
0.5	More memory, fewer collisions, faster
0.75	Balanced (default)
1.0	Less memory, more collisions, slower

The value 0.75 is based on statistical analysis - it provides a good balance between space efficiency and collision probability.

6. Resize Operation

When size exceeds threshold, HashMap doubles its capacity:

JAVA(23 lines)
Code
Loading syntax highlighter...

Resize is expensive:

Allocates new array (2x size)
Rehashes every single entry
Causes GC pressure
Can cause latency spikes in production

Visual: How entries are redistributed

TEXT(12 lines)
Code
Loading syntax highlighter...

7. Initial Capacity Planning

JAVA(11 lines)
Code
Loading syntax highlighter...

Resize cascade with default capacity:

TEXT(6 lines)
Code
Loading syntax highlighter...

Each resize:

Allocates new array
Copies all entries
Causes GC pressure
Increases latency

⚠️ Common Mistakes

Mistake 1: Mutable Keys

❌ Problem:

JAVA(13 lines)
Code
Loading syntax highlighter...

✅ Solution:

JAVA(9 lines)
Code
Loading syntax highlighter...

Why: HashMap stores entries based on hashCode(). If key mutates, its hash changes, but the entry stays in the old bucket. It becomes invisible to lookups.

Mistake 2: Missing hashCode() Override

❌ Problem:

JAVA(24 lines)
Code
Loading syntax highlighter...

✅ Solution:

JAVA(17 lines)
Code
Loading syntax highlighter...

Rule: If two objects are equals(), they MUST have the same hashCode().

Mistake 3: Ignoring Initial Capacity

❌ Problem:

JAVA(9 lines)
Code
Loading syntax highlighter...

✅ Solution:

JAVA(8 lines)
Code
Loading syntax highlighter...

Impact:

Avoids resize overhead
Reduces GC pressure
Prevents latency spikes
Memory is allocated upfront (predictable)

Mistake 4: Modifying Map During Iteration

❌ Problem:

JAVA(10 lines)
Code
Loading syntax highlighter...

✅ Solution 1: Use Iterator.remove()

JAVA(7 lines)
Code
Loading syntax highlighter...

✅ Solution 2: Use removeIf() (Java 8+)

JAVA
Code
Loading syntax highlighter...

✅ Solution 3: Collect and remove

JAVA(6 lines)
Code
Loading syntax highlighter...

Mistake 5: Using HashMap in Multi-threaded Code

❌ Problem:

JAVA(14 lines)
Code
Loading syntax highlighter...

✅ Solution:

JAVA(18 lines)
Code
Loading syntax highlighter...

🐛 Debug This

Find the bug in this code:

JAVA(36 lines)
Code
Loading syntax highlighter...

🔍 Hint: What happens to the hash code when lastLogin changes?

✅ Solution:

The Bug: lastLogin is mutable and included in equals() and hashCode().

When a user logs in and updateLastLogin() is called:

The User object's hashCode() changes
The entry is now in the wrong bucket
cache.get(user) looks in the new bucket (based on new hash)
Entry is never found - it's still in the old bucket

The Fix:

JAVA(20 lines)
Code
Loading syntax highlighter...

Rule: Only include immutable, identity-defining fields in equals() and hashCode().

💻 Practical Exercises

Exercise 1: Basic HashMap Operations

⭐ Difficulty: Easy | ⏱️ Time: 10 minutes

Task: Implement a word frequency counter.

JAVA(5 lines)
Code
Loading syntax highlighter...

Test:

JAVA(4 lines)
Code
Loading syntax highlighter...

✅ Solution:

JAVA(11 lines)
Code
Loading syntax highlighter...

Exercise 2: Proper Initial Capacity

⭐⭐ Difficulty: Easy-Medium | ⏱️ Time: 15 minutes

Task: Create a HashMap that will hold exactly 10,000 entries without ever resizing.

JAVA(4 lines)
Code
Loading syntax highlighter...

Requirements:

Calculate the minimum initial capacity needed
Verify no resize occurs when adding 10,000 entries

✅ Solution:

JAVA(31 lines)
Code
Loading syntax highlighter...

Exercise 3: Custom Key with Proper Contract

⭐⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes

Task: Implement a CompositeKey class that can be safely used as a HashMap key.

JAVA(8 lines)
Code
Loading syntax highlighter...

Requirements:

Two keys are equal if namespace, name, AND version match
Must be immutable
Must work correctly in HashMap

✅ Solution:

JAVA(50 lines)
Code
Loading syntax highlighter...

Exercise 4: Diagnose Hash Distribution

⭐⭐⭐⭐ Difficulty: Medium-Hard | ⏱️ Time: 25 minutes

Task: Write a utility to analyze hash distribution in a HashMap.

JAVA(18 lines)
Code
Loading syntax highlighter...

✅ Solution:

JAVA(88 lines)
Code
Loading syntax highlighter...

Exercise 5: LRU Cache Foundation

⭐⭐⭐⭐⭐ Difficulty: Hard | ⏱️ Time: 30 minutes

Task: Implement a simple LRU (Least Recently Used) cache using HashMap + custom doubly-linked list.

JAVA(17 lines)
Code
Loading syntax highlighter...

Requirements:

Both get() and put() should be O(1)
Most recently used items should not be evicted
Least recently used items should be evicted first

✅ Solution:

JAVA(87 lines)
Code
Loading syntax highlighter...

📝 Summary & Key Takeaways

What We Covered

Bucket Array Structure - HashMap uses a Node<K,V>[] array where entries are stored based on their hash code
Hash Function - Keys are hashed using hashCode() ^ (hashCode >>> 16) to spread entropy across all bits
Index Calculation - Bucket index is computed as hash & (capacity - 1), requiring power-of-2 capacity
Collision Handling - Collisions are resolved by chaining entries in linked lists (trees after threshold)
Load Factor (0.75) - HashMap resizes when size exceeds capacity × loadFactor
Capacity Planning - Pre-sizing with expectedSize / 0.75 + 1 avoids expensive resize operations

🎯 Production Checklist

Use immutable objects as keys (String, Integer, records)
Always override both equals() and hashCode() together
Pre-size HashMap when expected size is known: new HashMap<>(expectedSize / 0.75 + 1)
Use ConcurrentHashMap for multi-threaded access
Use computeIfAbsent() for lazy initialization patterns
Use merge() for counting/aggregation patterns
Never modify keys after putting them in the map
Use removeIf() or Iterator.remove() for safe removal during iteration

📊 Key Numbers to Remember

Metric	Value	Why It Matters
Default capacity	16	Starting bucket array size
Default load factor	0.75	Resize threshold
Treeify threshold	8	Chain length to convert to tree
Min treeify capacity	64	Minimum capacity for treeification

🎤 Senior-Level Interview Questions

Question 1: How does HashMap calculate the bucket index from a key?

Answer:

HashMap uses a two-step process:

Hash the key: hash = key.hashCode() ^ (key.hashCode() >>> 16)
- XORs upper 16 bits with lower 16 bits to spread entropy
Calculate index: index = hash & (capacity - 1)
- Bitwise AND with (capacity - 1) is equivalent to modulo
- Works because capacity is always a power of 2

Why this matters: The XOR step ensures that keys differing only in upper bits don't all collide. The bitwise AND is faster than modulo operation.

Question 2: What happens when two different keys have the same hash code?

Answer:

This is called a hash collision. HashMap handles it through chaining:

Both entries go to the same bucket
They form a linked list within that bucket
On lookup, HashMap walks the chain, comparing each key with equals()
In Java 8+, if chain exceeds 8 nodes and capacity ≥ 64, it converts to a red-black tree (O(log n) instead of O(n))

Performance impact: Collisions degrade performance from O(1) to O(n) for linked lists, or O(log n) for trees.

Question 3: Why must HashMap capacity be a power of 2?

Answer:

For efficient index calculation using bitwise AND instead of modulo:

JAVA
Code
Loading syntax highlighter...

When capacity is a power of 2 (e.g., 16), capacity - 1 is all 1s in binary (e.g., 15 = 1111). This makes hash & (capacity - 1) equivalent to hash % capacity, but much faster.

Additional benefit: During resize (doubling capacity), each entry either stays in the same bucket or moves to old_index + old_capacity. This can be determined by checking a single bit.

Question 4: What is the default load factor and why was 0.75 chosen?

Answer:

The default load factor is 0.75.

Trade-offs:

Lower (e.g., 0.5): Fewer collisions, more memory usage, faster operations
Higher (e.g., 1.0): More collisions, less memory usage, slower operations
0.75: Balanced trade-off between time and space

Statistical basis: With random hash distribution, 0.75 keeps the average chain length around 0.5-1.0 nodes, which provides near-O(1) performance while using memory efficiently.

Question 5: What happens if you use a mutable object as a HashMap key and then modify it?

Answer:

The entry becomes unreachable ("lost"):

Entry is stored in bucket based on original hashCode()
When object is mutated, hashCode() changes
get() calculates new hash, looks in different bucket
Entry is never found, even though it still exists

This causes:

Memory leaks (entry never garbage collected)
Incorrect containsKey() results
Silently failing lookups

Prevention:

Use immutable keys (String, Integer, records)
Only include immutable fields in equals() and hashCode()

Question 6: How would you diagnose a HashMap performance problem in production?

Answer:

Symptoms to look for:

Slow get()/put() operations
High CPU usage in HashMap methods
Thread contention (if incorrectly shared)

Diagnostic steps:

Check key distribution:

JAVA(2 lines)
Code
Loading syntax highlighter...

Verify equals/hashCode contract:
- Are both overridden?
- Do equal objects have same hashCode?
- Are keys mutable?
Check initial capacity:
- Is the map resizing frequently?
- What's the current capacity vs size?
Monitor for concurrency issues:
- Is HashMap shared between threads without synchronization?
- Look for infinite loops, corrupted state

Quick fix checklist:

Set appropriate initial capacity
Use ConcurrentHashMap for shared access
Ensure proper equals/hashCode implementation
Use immutable keys

📋 Quick Reference

When to Use HashMap

✅ Use when:

Need fast O(1) key-value lookups
Keys have good hash distribution
Iteration order doesn't matter
Single-threaded access (or using ConcurrentHashMap)

❌ Don't use when:

Need sorted keys → use TreeMap
Need insertion order → use LinkedHashMap
Need thread-safety → use ConcurrentHashMap
Keys are enums → use EnumMap (faster)

Complexity Table

Operation	Average	Worst Case
put(key, value)	O(1)	O(n) / O(log n)*
get(key)	O(1)	O(n) / O(log n)*
remove(key)	O(1)	O(n) / O(log n)*
containsKey(key)	O(1)	O(n) / O(log n)*
containsValue(value)	O(n)	O(n)
size()	O(1)	O(1)
isEmpty()	O(1)	O(1)
iteration	O(n + capacity)	O(n + capacity)

*O(log n) after treeification in Java 8+

Key Configuration

Parameter	Default	Purpose
Initial capacity	16	Starting bucket array size
Load factor	0.75	Resize threshold = capacity × loadFactor
Treeify threshold	8	Chain length to convert to tree
Untreeify threshold	6	Tree size to convert back to list
Min treeify capacity	64	Minimum capacity before treeification

Capacity Formula

JAVA(5 lines)
Code
Loading syntax highlighter...

Null Handling

Key	Value
✅ One null key allowed	✅ Multiple null values allowed

Common Patterns

JAVA(14 lines)
Code
Loading syntax highlighter...

🏁 Conclusion

HashMap's elegance lies in its simplicity: convert a key to a number, use that number to find a bucket, and chain collisions together. This straightforward approach delivers O(1) average-case performance for the most common operations in programming—storing and retrieving data by key.

But this simplicity comes with a contract. HashMap provides the infrastructure; you provide:

Proper hash codes that distribute evenly across buckets
Immutable keys that don't change after insertion
Consistent equals/hashCode that satisfy the Java object contract
Appropriate sizing to avoid expensive resize operations

When you honor this contract, HashMap becomes an incredibly powerful tool. When you violate it—as we saw in the Black Friday incident—HashMap can become your biggest performance bottleneck.

In the next article, we'll explore how Java 8 strengthened HashMap against worst-case scenarios by introducing tree bins, and how this protects your application from hash collision attacks.

🔄 Review Schedule

After 1 day: Re-read "At a Glance" and "Mental Model" sections
After 3 days: Redo Exercises 1-2 without looking at solutions
After 7 days: Answer all Interview Questions out loud
After 14 days: Do the "Debug This" exercise again
After 30 days: Review Quick Reference, check if you remember key numbers (0.75, 16, 8)

Part 3: equals() & hashCode() - The contract that makes HashMap work
Part 9: HashSet - Uses HashMap internally
Part 13: HashMap Tree Bins - Deep dive into Java 8+ treeification
Part 15: LinkedHashMap - HashMap with predictable iteration order

Next Up

Part 13: HashMap - Tree Bins & Java 8+ Optimizations - Learn how HashMap defends against hash collision attacks

HashMap: Buckets and Collisions

🎯 At a Glance

Interactive Visualizer

HashMap Operations

📋 Prerequisites

📘 Introduction

🔥 The Black Friday Incident

🌍 Why HashMap Internals Matter

🎯 What You'll Learn

🧠 Mental Model

Think of HashMap as a Library with Numbered Shelves

The Core Algorithm

Visual: HashMap Memory Layout

🔬 Deep Dive

1. The Bucket Array

2. The Hash Function

3. Index Calculation

4. Collision Resolution: Chaining

5. Load Factor and Threshold

6. Resize Operation

7. Initial Capacity Planning

⚠️ Common Mistakes

Mistake 1: Mutable Keys

Mistake 2: Missing hashCode() Override

Mistake 3: Ignoring Initial Capacity

Mistake 4: Modifying Map During Iteration

Mistake 5: Using HashMap in Multi-threaded Code

🐛 Debug This

💻 Practical Exercises

Exercise 1: Basic HashMap Operations

Exercise 2: Proper Initial Capacity

Exercise 3: Custom Key with Proper Contract

Exercise 4: Diagnose Hash Distribution

Exercise 5: LRU Cache Foundation

📝 Summary & Key Takeaways

What We Covered

🎯 Production Checklist

📊 Key Numbers to Remember

🎤 Senior-Level Interview Questions

Question 1: How does HashMap calculate the bucket index from a key?

Question 2: What happens when two different keys have the same hash code?

Question 3: Why must HashMap capacity be a power of 2?

Question 4: What is the default load factor and why was 0.75 chosen?

Question 5: What happens if you use a mutable object as a HashMap key and then modify it?

Question 6: How would you diagnose a HashMap performance problem in production?

📋 Quick Reference

When to Use HashMap

Complexity Table

Key Configuration

Capacity Formula

Null Handling

Common Patterns

🏁 Conclusion

🔄 Review Schedule

🔗 Related & Next

Related Articles in This Series

Next Up

External Resources

Tags:

HashMap Operations