Java

HashMap: Tree Bins and Performance

Java 8 fundamentally changed how HashMap handles collisions. Before Java 8, a bucket with many collisions degraded to O(n) lookup time. After Java 8, heavily collided buckets automatically convert to balanced trees, maintaining O(log n) performance. This change wasn't just an optimization - it was a critical security fix against HashDoS attacks.

In this article, we'll explore how tree bins work, when they activate, and why this matters for your production systems.

πŸ“‹ At a Glance

AspectDetails
FeatureTree bins (Red-Black Trees) in HashMap buckets
IntroducedJava 8
Treeify Threshold8 nodes in bucket
Untreeify Threshold6 nodes (after resize)
Min Capacity for Tree64 buckets
Key BenefitO(log n) worst case instead of O(n)

🎯 What You'll Learn

After reading this article, you will be able to:

  • Explain tree bins: What they are and why they were introduced
  • Understand the thresholds: Why 8, 6, and 64 are the magic numbers
  • Describe HashDoS attacks: How they worked and how Java 8 mitigates them
  • Debug tree bin behavior: Identify when buckets treeify
  • Optimize for tree bins: Use Comparable keys effectively
  • Make informed decisions: HashMap vs TreeMap trade-offs

πŸ”₯ Production Story: The HashDoS Incident

In 2011, a devastating attack vector was publicly disclosed: HashDoS (Hash Denial of Service). By carefully crafting HTTP request parameters with colliding hash codes, attackers could degrade Java web applications from O(1) to O(nΒ²) performance, causing complete service denial.

The attack scenario:
JAVA(11 lines)
Code
Loading syntax highlighter...
Real-world impact:
  • Apache Tomcat, JBoss, Jetty all vulnerable
  • Simple HTTP request could consume 100% CPU for minutes
  • A few malicious requests could take down entire server
How Java 8 fixed it:
JAVA(10 lines)
Code
Loading syntax highlighter...
The numbers:
  • Before Java 8: 10,000 colliding keys β†’ ~50 million comparisons
  • After Java 8: 10,000 colliding keys β†’ ~130,000 comparisons
  • 385x improvement in worst case

This is why understanding tree bins matters - it's not just performance, it's security.


🧠 Mental Model: The Bucket Evolution

Think of each HashMap bucket as a storage system that adapts to load:

TEXT(30 lines)
Code
Loading syntax highlighter...
Key insight: The bucket intelligently adapts - simple list for typical use (fast), tree for pathological cases (safe).

πŸ”¬ Deep Dive

1. The Magic Constants

JAVA(4 lines)
Code
Loading syntax highlighter...
Why 8 for treeify?

Based on Poisson distribution analysis. With a good hash function and load factor 0.75:

TEXT(15 lines)
Code
Loading syntax highlighter...
Why 6 for untreeify (hysteresis)?

If we used 8 for both, a bucket oscillating around 8 entries would constantly convert back and forth. The gap (8 to treeify, 6 to untreeify) prevents this thrashing.

TEXT(3 lines)
Code
Loading syntax highlighter...
Why 64 for min capacity?

With small maps, it's cheaper to resize than to treeify:

JAVA(5 lines)
Code
Loading syntax highlighter...

2. TreeNode Structure

When a bucket treeifies, nodes are converted from Node to TreeNode:
JAVA(18 lines)
Code
Loading syntax highlighter...
Memory overhead:
  • Node: ~32 bytes (header + 4 fields)
  • TreeNode: ~56 bytes (header + 9 fields)
  • ~75% more memory per entry when treeified

This is why treeification is only for pathological cases - the memory cost is significant.

3. The Treeify Process

JAVA(31 lines)
Code
Loading syntax highlighter...

4. Ordering in Trees: Comparable and tieBreakOrder

Red-Black trees need a total ordering. What if keys don't implement Comparable?
JAVA(12 lines)
Code
Loading syntax highlighter...
Performance implication:
JAVA(17 lines)
Code
Loading syntax highlighter...
Best practice: If you expect high collision rates, make your key class implement Comparable.

5. The Untreeify Process

When does a tree convert back to a list?

JAVA(16 lines)
Code
Loading syntax highlighter...
When untreeify happens:
  1. HashMap resizes (doubles capacity)
  2. Entries in treeified bucket get split between two new buckets
  3. If either new bucket has ≀6 entries, it becomes a list again

6. Visualizing Tree Bin Behavior

TEXT(25 lines)
Code
Loading syntax highlighter...

7. Security Context: HashDoS Deep Dive

How the attack worked (pre-Java 8):
JAVA(22 lines)
Code
Loading syntax highlighter...
Why Java 8 tree bins fix it:
JAVA(9 lines)
Code
Loading syntax highlighter...
Additional mitigations in modern Java:
JAVA(7 lines)
Code
Loading syntax highlighter...

⚠️ Common Mistakes

Mistake 1: Ignoring Comparable for High-Collision Keys

❌ Problem:
JAVA(15 lines)
Code
Loading syntax highlighter...
βœ… Solution:
JAVA(16 lines)
Code
Loading syntax highlighter...

Mistake 2: Assuming All Operations Benefit from Trees

❌ Problem:
JAVA(11 lines)
Code
Loading syntax highlighter...
βœ… Understanding:
JAVA(9 lines)
Code
Loading syntax highlighter...

Mistake 3: Trying to Force or Prevent Treeification

❌ Problem:
JAVA(5 lines)
Code
Loading syntax highlighter...
βœ… Solution:
JAVA(7 lines)
Code
Loading syntax highlighter...

Mistake 4: Not Knowing When Trees Exist

❌ Problem:
JAVA(6 lines)
Code
Loading syntax highlighter...
βœ… Solution:
JAVA(38 lines)
Code
Loading syntax highlighter...

Mistake 5: Confusing TreeMap and HashMap Tree Bins

❌ Problem:
JAVA(2 lines)
Code
Loading syntax highlighter...
βœ… Understanding:
AspectHashMap (with tree bins)TreeMap
Primary structureHash tableRed-Black Tree
Tree binsOnly for collision bucketsEntire structure
OrderingNo ordering (hash-based)Sorted by keys
Normal lookupO(1) averageO(log n) always
Collision lookupO(log n) worst caseO(log n) always
NavigableMapNoYes
MemoryLess (mostly Nodes)More (all TreeNodes)
JAVA(5 lines)
Code
Loading syntax highlighter...

πŸ› Debug This: The Unexplained Slowdown

A production service experiences periodic slowdowns. Profiling shows excessive time in HashMap.get() operations. The map contains user sessions:
JAVA(25 lines)
Code
Loading syntax highlighter...
Why is get() slow? What would you fix?

βœ… Solution:
The bug: hashCode() only uses channel, which has only 3 possible values ("web", "mobile", "api"). All sessions go into just 3 buckets!

With thousands of sessions:

  • Each bucket has hundreds or thousands of entries
  • Buckets are treeified (good), but still O(log n) per lookup
  • And since SessionId doesn't implement Comparable, trees use inefficient tieBreakOrder
The fix:
JAVA(24 lines)
Code
Loading syntax highlighter...
After the fix:
  • Sessions distributed across many buckets (good hash distribution)
  • Most buckets have 0-3 entries (no trees needed)
  • If trees form, Comparable ensures efficient ordering

πŸ’» Exercises

Exercise 1: Force Treeification

⭐⭐ Difficulty: Medium | ⏱️ Time: 15 minutes

Task: Create a scenario that forces HashMap to treeify a bucket, then verify it happened.
JAVA(9 lines)
Code
Loading syntax highlighter...
βœ… Solution:
JAVA(52 lines)
Code
Loading syntax highlighter...

Exercise 2: HashDoS Simulation (Educational)

⭐⭐⭐ Difficulty: Medium | ⏱️ Time: 20 minutes

Task: Compare HashMap performance with good vs. bad hash distribution to understand the attack vector.
JAVA(8 lines)
Code
Loading syntax highlighter...
βœ… Solution:
JAVA(60 lines)
Code
Loading syntax highlighter...

Exercise 3: Analyze HashMap Distribution

⭐⭐⭐ Difficulty: Medium-Hard | ⏱️ Time: 20 minutes

Task: Create a utility to analyze HashMap bucket distribution and detect potential issues.
JAVA(18 lines)
Code
Loading syntax highlighter...
βœ… Solution:
JAVA(107 lines)
Code
Loading syntax highlighter...

Exercise 4: Comparable vs Non-Comparable Performance

⭐⭐⭐ Difficulty: Medium | ⏱️ Time: 15 minutes

Task: Benchmark tree bin performance with and without Comparable implementation.
JAVA(6 lines)
Code
Loading syntax highlighter...
βœ… Solution:
JAVA(78 lines)
Code
Loading syntax highlighter...

Exercise 5: Custom HashMap with Tree Bin Logging

⭐⭐⭐⭐ Difficulty: Hard | ⏱️ Time: 25 minutes

Task: Create a HashMap wrapper that logs when treeification occurs.
JAVA(8 lines)
Code
Loading syntax highlighter...
βœ… Solution:
JAVA(101 lines)
Code
Loading syntax highlighter...

πŸ“ Summary & Key Takeaways

The Tree Bin Mechanism

TEXT(21 lines)
Code
Loading syntax highlighter...

Key Constants

ConstantValuePurpose
TREEIFY_THRESHOLD8Convert list β†’ tree when bucket has 8+ entries
UNTREEIFY_THRESHOLD6Convert tree β†’ list when bucket has ≀6 entries (after resize)
MIN_TREEIFY_CAPACITY64Don't treeify if map capacity < 64 (resize instead)

Performance Characteristics

OperationNormal CaseWith Tree Bins
get()O(1) averageO(log n) worst case
put()O(1) averageO(log n) worst case
remove()O(1) averageO(log n) worst case
iterationO(n)O(n)

Best Practices

  1. Implement Comparable on key classes that might collide heavily
  2. Use good hashCode() implementations (Objects.hash or similar)
  3. Don't try to prevent trees - they exist for your protection
  4. Consider TreeMap if you need sorted keys (always O(log n))
  5. Monitor in production if you suspect hash collision issues

🎀 Senior-Level Interview Questions

Question #1: When does a HashMap bucket convert to a tree?

What interviewers want to hear: Understanding of the specific conditions.
Strong answer: A bucket converts to a tree (treeifies) when THREE conditions are met:
  1. The bucket contains 8 or more entries (TREEIFY_THRESHOLD = 8)
  2. The total map capacity is at least 64 (MIN_TREEIFY_CAPACITY = 64)
  3. A new entry is being added that would increase the bucket size to 8+

If capacity is less than 64, HashMap resizes instead of treeifying - spreading entries across more buckets is more efficient for small maps.


Question #2: Why is the treeify threshold 8 specifically?

What interviewers want to hear: Understanding of the probabilistic reasoning.
Strong answer: The threshold is based on Poisson distribution analysis. With a good hash function and load factor of 0.75, the probability of 8 or more entries landing in the same bucket is approximately 0.00000006 (six in a hundred million). This is so rare under normal conditions that if it happens, it almost certainly indicates either a poor hash function or a deliberate collision attack. The value 8 provides a good balance - high enough to avoid unnecessary overhead from tree creation in normal use, low enough to protect against attacks before performance degrades significantly.

Question #3: What is HashDoS and how do tree bins mitigate it?

What interviewers want to hear: Understanding of the security implications.
Strong answer: HashDoS (Hash Denial of Service) is an attack where an adversary sends data with keys that all hash to the same bucket. Pre-Java 8, this degraded HashMap operations from O(1) to O(n), making total insertion time O(nΒ²). With 10,000 colliding keys, this meant ~50 million comparisons from a single request - enough to consume 100% CPU for minutes.

Tree bins mitigate this by converting heavily-loaded buckets to Red-Black trees. This changes worst-case lookup from O(n) to O(log n). With 10,000 collisions, that's ~130,000 operations instead of ~50 million - a 385x improvement. The attack becomes impractical because the computational cost to the server is bounded.


Question #4: How does HashMap order keys in tree bins when keys don't implement Comparable?

What interviewers want to hear: Knowledge of the tieBreakOrder mechanism.
Strong answer: When keys don't implement Comparable, HashMap uses a fallback ordering called tieBreakOrder:
  1. First, it compares the fully-qualified class names of the keys
  2. If class names are equal, it uses System.identityHashCode() to establish ordering

This ensures a consistent total ordering for the Red-Black tree, even with non-Comparable keys. However, it's less efficient than using proper Comparable implementations because: (a) string comparison of class names has overhead, and (b) identityHashCode-based ordering doesn't preserve any logical key ordering, potentially making tree operations less cache-friendly.


Question #5: When does a tree bin convert back to a linked list?

What interviewers want to hear: Understanding of the untreeify process.
Strong answer: Tree bins convert back to linked lists (untreeify) during resize operations when the new bucket has 6 or fewer entries (UNTREEIFY_THRESHOLD = 6). This happens because resize doubles the capacity and splits each bucket's entries between two new buckets based on a hash bit. If the split results in small buckets, maintaining tree structure has more overhead than benefit. The gap between 8 (treeify) and 6 (untreeify) prevents thrashing - a bucket oscillating around 7 entries won't constantly convert back and forth.

Question #6: Do tree bins have performance overhead?

What interviewers want to hear: Balanced understanding of trade-offs.
Strong answer: Yes, tree bins have several overheads:
  1. Memory: TreeNode uses ~56 bytes vs Node's ~32 bytes (75% more per entry)
  2. Insertion: Building and maintaining Red-Black tree requires rotations and recoloring
  3. Constant factors: Tree operations have higher constant factors than list traversal for small n

However, these overheads are acceptable because:

  • Trees only form in pathological cases (collision attacks or bad hash functions)
  • The O(log n) guarantee protects against worst-case degradation
  • In normal operation, most buckets have 0-2 entries (no trees)

The design philosophy is: "Pay small overhead in rare edge cases to guarantee bounded worst-case performance."


🏁 Conclusion

Tree bins represent one of the most important defensive programming decisions in the JDK. By automatically converting heavily-loaded buckets to Red-Black trees, HashMap provides:

  • Security: Protection against HashDoS attacks
  • Predictability: Bounded O(log n) worst-case performance
  • Transparency: No code changes required - it "just works"
Key takeaways:
  1. Tree bins activate at 8+ entries per bucket (if capacity β‰₯ 64)
  2. They convert back to lists at ≀6 entries during resize
  3. Implementing Comparable on key classes optimizes tree performance
  4. Good hash distribution prevents trees from forming at all
  5. Don't try to game the system - let HashMap manage itself
Your next step: Continue to Part 14 (TreeMap & NavigableMap) to understand when you should use a tree-based map from the start, or Part 18 (ConcurrentHashMap) to learn about thread-safe maps.

πŸ“… Review Schedule for This Article

DayTaskTime
Day 1Review the thresholds (8, 6, 64) and why5 min
Day 3Redo Exercise 1 (Force Treeification)15 min
Day 7Answer interview questions without looking10 min
Day 14Redo Exercise 2 (HashDoS Simulation)15 min
Day 30Review mental model and security implications5 min

Next in series: [Part 14: TreeMap & NavigableMap - Red-Black Trees & Range Queries]