HashMap: Buckets and Collisions
HashMap is the workhorse of Java collections—the go-to data structure for key-value storage that powers everything from caching layers to configuration systems. Yet many developers treat it as a black box, unaware of the elegant engineering that enables its O(1) average-case performance.
This article takes you inside HashMap's implementation: how it transforms hash codes into bucket indices, handles collisions through chaining, and manages capacity through dynamic resizing. Understanding these internals isn't just academic—it's the difference between a HashMap that performs flawlessly and one that degrades to O(n) in production.
Whether you're optimizing a hot code path, debugging mysterious performance issues, or preparing for senior-level interviews, mastering HashMap internals is essential knowledge for any serious Java developer.
🎯 At a Glance
- Keys are converted to bucket indices via
hashCode()and bitwise operations - Collisions are handled by chaining (linked list, tree after threshold)
- Load factor (0.75) triggers resize at 75% capacity
- Capacity must be power of 2 for efficient index calculation
- Never use mutable objects as keys
| Operation | Average | Worst Case |
|---|---|---|
| get(key) | O(1) | O(n)* |
| put(key, value) | O(1) | O(n)* |
| remove(key) | O(1) | O(n)* |
| containsKey | O(1) | O(n)* |
*Worst case with poor hash distribution; O(log n) after treeification (Java 8+)
Interactive Visualizer
See HashMap operations in action - watch how keys are hashed to bucket indices and how collisions are handled through chaining:
Loading visualizer...
📋 Prerequisites
Before reading this article, you should understand:
- Part 3: equals() & hashCode() Contract - Critical for understanding how HashMap finds entries
- Basic Java generics - HashMap<K,V> uses generic types
- Big O notation - To understand performance characteristics
📘 Introduction
🔥 The Black Friday Incident
ProductKey class as HashMap keys. The developer had overridden equals() but forgot hashCode(). Every single key returned the default identity hash from Object.hashCode(), which happened to collide spectacularly due to our JVM configuration.Two million entries. One bucket. A linked list 2 million nodes long.
That day, I learned that HashMap's O(1) promise is a contract—and like any contract, it requires both parties to uphold their end. HashMap provides the infrastructure; you provide proper hash codes.
🌍 Why HashMap Internals Matter
HashMap is everywhere in Java applications:
- Caching layers - In-memory caches, session stores
- Configuration - Properties, feature flags
- Data processing - Grouping, deduplication, lookups
- Framework internals - Spring beans, Hibernate caches
Understanding its internals helps you:
- Avoid performance pitfalls - Proper capacity planning, key design
- Debug production issues - Recognize symptoms of hash collision problems
- Make informed decisions - When to use HashMap vs TreeMap vs LinkedHashMap
- Ace interviews - HashMap internals is a top-10 Java interview topic
🎯 What You'll Learn
- Bucket array structure - How entries are stored and located
- Hash function internals - How hashCode() becomes a bucket index
- Collision handling - Chaining strategy and its implications
- Load factor & resizing - When and how HashMap grows
- Capacity planning - Choosing the right initial capacity
- Key design principles - Creating hash-friendly keys
🧠 Mental Model
Think of HashMap as a Library with Numbered Shelves
Imagine a library where books are stored on numbered shelves (buckets):
TEXT(17 lines)CodeLoading syntax highlighter...
The Core Algorithm
TEXT(15 lines)CodeLoading syntax highlighter...
Visual: HashMap Memory Layout
TEXT(27 lines)CodeLoading syntax highlighter...
🔬 Deep Dive
1. The Bucket Array
At the heart of HashMap lies a simple array:
JAVA(16 lines)CodeLoading syntax highlighter...
- Hash is computed once during
put() - Reused during
get(),remove(), and resize operations - Comparing hashes is faster than comparing keys (especially for Strings)
2. The Hash Function
hashCode() directly. It applies additional mixing:JAVA(5 lines)CodeLoading syntax highlighter...
TEXT(9 lines)CodeLoading syntax highlighter...
3. Index Calculation
JAVA(2 lines)CodeLoading syntax highlighter...
TEXT(8 lines)CodeLoading syntax highlighter...
TEXT(5 lines)CodeLoading syntax highlighter...
4. Collision Resolution: Chaining
When two keys map to the same bucket, HashMap chains them:
JAVA(50 lines)CodeLoading syntax highlighter...
- Empty bucket: O(1) - direct array access
- Chain of length n: O(n) - must traverse the chain
- This is why hash distribution matters!
5. Load Factor and Threshold
JAVA(7 lines)CodeLoading syntax highlighter...
| Load Factor | Trade-off |
|---|---|
| 0.5 | More memory, fewer collisions, faster |
| 0.75 | Balanced (default) |
| 1.0 | Less memory, more collisions, slower |
The value 0.75 is based on statistical analysis - it provides a good balance between space efficiency and collision probability.
6. Resize Operation
When size exceeds threshold, HashMap doubles its capacity:
JAVA(23 lines)CodeLoading syntax highlighter...
- Allocates new array (2x size)
- Rehashes every single entry
- Causes GC pressure
- Can cause latency spikes in production
TEXT(12 lines)CodeLoading syntax highlighter...
7. Initial Capacity Planning
JAVA(11 lines)CodeLoading syntax highlighter...
TEXT(6 lines)CodeLoading syntax highlighter...
Each resize:
- Allocates new array
- Copies all entries
- Causes GC pressure
- Increases latency
⚠️ Common Mistakes
Mistake 1: Mutable Keys
JAVA(13 lines)CodeLoading syntax highlighter...
JAVA(9 lines)CodeLoading syntax highlighter...
hashCode(). If key mutates, its hash changes, but the entry stays in the old bucket. It becomes invisible to lookups.Mistake 2: Missing hashCode() Override
JAVA(24 lines)CodeLoading syntax highlighter...
JAVA(17 lines)CodeLoading syntax highlighter...
equals(), they MUST have the same hashCode().Mistake 3: Ignoring Initial Capacity
JAVA(9 lines)CodeLoading syntax highlighter...
JAVA(8 lines)CodeLoading syntax highlighter...
- Avoids resize overhead
- Reduces GC pressure
- Prevents latency spikes
- Memory is allocated upfront (predictable)
Mistake 4: Modifying Map During Iteration
JAVA(10 lines)CodeLoading syntax highlighter...
JAVA(7 lines)CodeLoading syntax highlighter...
JAVACodeLoading syntax highlighter...
JAVA(6 lines)CodeLoading syntax highlighter...
Mistake 5: Using HashMap in Multi-threaded Code
JAVA(14 lines)CodeLoading syntax highlighter...
JAVA(18 lines)CodeLoading syntax highlighter...
🐛 Debug This
Find the bug in this code:
JAVA(36 lines)CodeLoading syntax highlighter...
lastLogin changes?lastLogin is mutable and included in equals() and hashCode().updateLastLogin() is called:- The
Userobject'shashCode()changes - The entry is now in the wrong bucket
cache.get(user)looks in the new bucket (based on new hash)- Entry is never found - it's still in the old bucket
JAVA(20 lines)CodeLoading syntax highlighter...
equals() and hashCode().💻 Practical Exercises
Exercise 1: Basic HashMap Operations
⭐ Difficulty: Easy | ⏱️ Time: 10 minutes
JAVA(5 lines)CodeLoading syntax highlighter...
JAVA(4 lines)CodeLoading syntax highlighter...
JAVA(11 lines)CodeLoading syntax highlighter...
Exercise 2: Proper Initial Capacity
⭐⭐ Difficulty: Easy-Medium | ⏱️ Time: 15 minutes
JAVA(4 lines)CodeLoading syntax highlighter...
- Calculate the minimum initial capacity needed
- Verify no resize occurs when adding 10,000 entries
JAVA(31 lines)CodeLoading syntax highlighter...
Exercise 3: Custom Key with Proper Contract
⭐⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes
CompositeKey class that can be safely used as a HashMap key.JAVA(8 lines)CodeLoading syntax highlighter...
- Two keys are equal if namespace, name, AND version match
- Must be immutable
- Must work correctly in HashMap
JAVA(50 lines)CodeLoading syntax highlighter...
Exercise 4: Diagnose Hash Distribution
⭐⭐⭐⭐ Difficulty: Medium-Hard | ⏱️ Time: 25 minutes
JAVA(18 lines)CodeLoading syntax highlighter...
JAVA(88 lines)CodeLoading syntax highlighter...
Exercise 5: LRU Cache Foundation
⭐⭐⭐⭐⭐ Difficulty: Hard | ⏱️ Time: 30 minutes
JAVA(17 lines)CodeLoading syntax highlighter...
- Both get() and put() should be O(1)
- Most recently used items should not be evicted
- Least recently used items should be evicted first
JAVA(87 lines)CodeLoading syntax highlighter...
📝 Summary & Key Takeaways
What We Covered
- Bucket Array Structure - HashMap uses a
Node<K,V>[]array where entries are stored based on their hash code - Hash Function - Keys are hashed using
hashCode() ^ (hashCode >>> 16)to spread entropy across all bits - Index Calculation - Bucket index is computed as
hash & (capacity - 1), requiring power-of-2 capacity - Collision Handling - Collisions are resolved by chaining entries in linked lists (trees after threshold)
- Load Factor (0.75) - HashMap resizes when size exceeds
capacity × loadFactor - Capacity Planning - Pre-sizing with
expectedSize / 0.75 + 1avoids expensive resize operations
🎯 Production Checklist
- Use immutable objects as keys (String, Integer, records)
- Always override both
equals()andhashCode()together - Pre-size HashMap when expected size is known:
new HashMap<>(expectedSize / 0.75 + 1) - Use
ConcurrentHashMapfor multi-threaded access - Use
computeIfAbsent()for lazy initialization patterns - Use
merge()for counting/aggregation patterns - Never modify keys after putting them in the map
- Use
removeIf()orIterator.remove()for safe removal during iteration
📊 Key Numbers to Remember
| Metric | Value | Why It Matters |
|---|---|---|
| Default capacity | 16 | Starting bucket array size |
| Default load factor | 0.75 | Resize threshold |
| Treeify threshold | 8 | Chain length to convert to tree |
| Min treeify capacity | 64 | Minimum capacity for treeification |
🎤 Senior-Level Interview Questions
Question 1: How does HashMap calculate the bucket index from a key?
HashMap uses a two-step process:
-
Hash the key:
hash = key.hashCode() ^ (key.hashCode() >>> 16)- XORs upper 16 bits with lower 16 bits to spread entropy
-
Calculate index:
index = hash & (capacity - 1)- Bitwise AND with (capacity - 1) is equivalent to modulo
- Works because capacity is always a power of 2
Question 2: What happens when two different keys have the same hash code?
- Both entries go to the same bucket
- They form a linked list within that bucket
- On lookup, HashMap walks the chain, comparing each key with
equals() - In Java 8+, if chain exceeds 8 nodes and capacity ≥ 64, it converts to a red-black tree (O(log n) instead of O(n))
Question 3: Why must HashMap capacity be a power of 2?
For efficient index calculation using bitwise AND instead of modulo:
JAVACodeLoading syntax highlighter...
capacity - 1 is all 1s in binary (e.g., 15 = 1111). This makes hash & (capacity - 1) equivalent to hash % capacity, but much faster.old_index + old_capacity. This can be determined by checking a single bit.Question 4: What is the default load factor and why was 0.75 chosen?
- Lower (e.g., 0.5): Fewer collisions, more memory usage, faster operations
- Higher (e.g., 1.0): More collisions, less memory usage, slower operations
- 0.75: Balanced trade-off between time and space
Question 5: What happens if you use a mutable object as a HashMap key and then modify it?
- Entry is stored in bucket based on original
hashCode() - When object is mutated,
hashCode()changes get()calculates new hash, looks in different bucket- Entry is never found, even though it still exists
- Memory leaks (entry never garbage collected)
- Incorrect
containsKey()results - Silently failing lookups
- Use immutable keys (String, Integer, records)
- Only include immutable fields in
equals()andhashCode()
Question 6: How would you diagnose a HashMap performance problem in production?
- Slow
get()/put()operations - High CPU usage in HashMap methods
- Thread contention (if incorrectly shared)
-
Check key distribution:JAVA(2 lines)CodeLoading syntax highlighter...
-
Verify equals/hashCode contract:
- Are both overridden?
- Do equal objects have same hashCode?
- Are keys mutable?
-
Check initial capacity:
- Is the map resizing frequently?
- What's the current capacity vs size?
-
Monitor for concurrency issues:
- Is HashMap shared between threads without synchronization?
- Look for infinite loops, corrupted state
- Set appropriate initial capacity
- Use ConcurrentHashMap for shared access
- Ensure proper equals/hashCode implementation
- Use immutable keys
📋 Quick Reference
When to Use HashMap
- Need fast O(1) key-value lookups
- Keys have good hash distribution
- Iteration order doesn't matter
- Single-threaded access (or using ConcurrentHashMap)
- Need sorted keys → use
TreeMap - Need insertion order → use
LinkedHashMap - Need thread-safety → use
ConcurrentHashMap - Keys are enums → use
EnumMap(faster)
Complexity Table
| Operation | Average | Worst Case |
|---|---|---|
| put(key, value) | O(1) | O(n) / O(log n)* |
| get(key) | O(1) | O(n) / O(log n)* |
| remove(key) | O(1) | O(n) / O(log n)* |
| containsKey(key) | O(1) | O(n) / O(log n)* |
| containsValue(value) | O(n) | O(n) |
| size() | O(1) | O(1) |
| isEmpty() | O(1) | O(1) |
| iteration | O(n + capacity) | O(n + capacity) |
*O(log n) after treeification in Java 8+
Key Configuration
| Parameter | Default | Purpose |
|---|---|---|
| Initial capacity | 16 | Starting bucket array size |
| Load factor | 0.75 | Resize threshold = capacity × loadFactor |
| Treeify threshold | 8 | Chain length to convert to tree |
| Untreeify threshold | 6 | Tree size to convert back to list |
| Min treeify capacity | 64 | Minimum capacity before treeification |
Capacity Formula
JAVA(5 lines)CodeLoading syntax highlighter...
Null Handling
| Key | Value |
|---|---|
| ✅ One null key allowed | ✅ Multiple null values allowed |
Common Patterns
JAVA(14 lines)CodeLoading syntax highlighter...
🏁 Conclusion
HashMap's elegance lies in its simplicity: convert a key to a number, use that number to find a bucket, and chain collisions together. This straightforward approach delivers O(1) average-case performance for the most common operations in programming—storing and retrieving data by key.
But this simplicity comes with a contract. HashMap provides the infrastructure; you provide:
- Proper hash codes that distribute evenly across buckets
- Immutable keys that don't change after insertion
- Consistent equals/hashCode that satisfy the Java object contract
- Appropriate sizing to avoid expensive resize operations
When you honor this contract, HashMap becomes an incredibly powerful tool. When you violate it—as we saw in the Black Friday incident—HashMap can become your biggest performance bottleneck.
In the next article, we'll explore how Java 8 strengthened HashMap against worst-case scenarios by introducing tree bins, and how this protects your application from hash collision attacks.
🔄 Review Schedule
- After 1 day: Re-read "At a Glance" and "Mental Model" sections
- After 3 days: Redo Exercises 1-2 without looking at solutions
- After 7 days: Answer all Interview Questions out loud
- After 14 days: Do the "Debug This" exercise again
- After 30 days: Review Quick Reference, check if you remember key numbers (0.75, 16, 8)
🔗 Related & Next
Related Articles in This Series
- Part 3: equals() & hashCode() - The contract that makes HashMap work
- Part 9: HashSet - Uses HashMap internally
- Part 13: HashMap Tree Bins - Deep dive into Java 8+ treeification
- Part 15: LinkedHashMap - HashMap with predictable iteration order
Next Up
- Part 13: HashMap - Tree Bins & Java 8+ Optimizations - Learn how HashMap defends against hash collision attacks