HashMap: Tree Bins and Performance
Java 8 fundamentally changed how HashMap handles collisions. Before Java 8, a bucket with many collisions degraded to O(n) lookup time. After Java 8, heavily collided buckets automatically convert to balanced trees, maintaining O(log n) performance. This change wasn't just an optimization - it was a critical security fix against HashDoS attacks.
In this article, we'll explore how tree bins work, when they activate, and why this matters for your production systems.
π At a Glance
| Aspect | Details |
|---|---|
| Feature | Tree bins (Red-Black Trees) in HashMap buckets |
| Introduced | Java 8 |
| Treeify Threshold | 8 nodes in bucket |
| Untreeify Threshold | 6 nodes (after resize) |
| Min Capacity for Tree | 64 buckets |
| Key Benefit | O(log n) worst case instead of O(n) |
π― What You'll Learn
After reading this article, you will be able to:
- Explain tree bins: What they are and why they were introduced
- Understand the thresholds: Why 8, 6, and 64 are the magic numbers
- Describe HashDoS attacks: How they worked and how Java 8 mitigates them
- Debug tree bin behavior: Identify when buckets treeify
- Optimize for tree bins: Use Comparable keys effectively
- Make informed decisions: HashMap vs TreeMap trade-offs
π₯ Production Story: The HashDoS Incident
In 2011, a devastating attack vector was publicly disclosed: HashDoS (Hash Denial of Service). By carefully crafting HTTP request parameters with colliding hash codes, attackers could degrade Java web applications from O(1) to O(nΒ²) performance, causing complete service denial.
JAVA(11 lines)CodeLoading syntax highlighter...
- Apache Tomcat, JBoss, Jetty all vulnerable
- Simple HTTP request could consume 100% CPU for minutes
- A few malicious requests could take down entire server
JAVA(10 lines)CodeLoading syntax highlighter...
- Before Java 8: 10,000 colliding keys β ~50 million comparisons
- After Java 8: 10,000 colliding keys β ~130,000 comparisons
- 385x improvement in worst case
This is why understanding tree bins matters - it's not just performance, it's security.
π§ Mental Model: The Bucket Evolution
Think of each HashMap bucket as a storage system that adapts to load:
TEXT(30 lines)CodeLoading syntax highlighter...
π¬ Deep Dive
1. The Magic Constants
JAVA(4 lines)CodeLoading syntax highlighter...
Based on Poisson distribution analysis. With a good hash function and load factor 0.75:
TEXT(15 lines)CodeLoading syntax highlighter...
If we used 8 for both, a bucket oscillating around 8 entries would constantly convert back and forth. The gap (8 to treeify, 6 to untreeify) prevents this thrashing.
TEXT(3 lines)CodeLoading syntax highlighter...
With small maps, it's cheaper to resize than to treeify:
JAVA(5 lines)CodeLoading syntax highlighter...
2. TreeNode Structure
Node to TreeNode:JAVA(18 lines)CodeLoading syntax highlighter...
- Node: ~32 bytes (header + 4 fields)
- TreeNode: ~56 bytes (header + 9 fields)
- ~75% more memory per entry when treeified
This is why treeification is only for pathological cases - the memory cost is significant.
3. The Treeify Process
JAVA(31 lines)CodeLoading syntax highlighter...
4. Ordering in Trees: Comparable and tieBreakOrder
Comparable?JAVA(12 lines)CodeLoading syntax highlighter...
JAVA(17 lines)CodeLoading syntax highlighter...
Comparable.5. The Untreeify Process
When does a tree convert back to a list?
JAVA(16 lines)CodeLoading syntax highlighter...
- HashMap resizes (doubles capacity)
- Entries in treeified bucket get split between two new buckets
- If either new bucket has β€6 entries, it becomes a list again
6. Visualizing Tree Bin Behavior
TEXT(25 lines)CodeLoading syntax highlighter...
7. Security Context: HashDoS Deep Dive
JAVA(22 lines)CodeLoading syntax highlighter...
JAVA(9 lines)CodeLoading syntax highlighter...
JAVA(7 lines)CodeLoading syntax highlighter...
β οΈ Common Mistakes
Mistake 1: Ignoring Comparable for High-Collision Keys
JAVA(15 lines)CodeLoading syntax highlighter...
JAVA(16 lines)CodeLoading syntax highlighter...
Mistake 2: Assuming All Operations Benefit from Trees
JAVA(11 lines)CodeLoading syntax highlighter...
JAVA(9 lines)CodeLoading syntax highlighter...
Mistake 3: Trying to Force or Prevent Treeification
JAVA(5 lines)CodeLoading syntax highlighter...
JAVA(7 lines)CodeLoading syntax highlighter...
Mistake 4: Not Knowing When Trees Exist
JAVA(6 lines)CodeLoading syntax highlighter...
JAVA(38 lines)CodeLoading syntax highlighter...
Mistake 5: Confusing TreeMap and HashMap Tree Bins
JAVA(2 lines)CodeLoading syntax highlighter...
| Aspect | HashMap (with tree bins) | TreeMap |
|---|---|---|
| Primary structure | Hash table | Red-Black Tree |
| Tree bins | Only for collision buckets | Entire structure |
| Ordering | No ordering (hash-based) | Sorted by keys |
| Normal lookup | O(1) average | O(log n) always |
| Collision lookup | O(log n) worst case | O(log n) always |
| NavigableMap | No | Yes |
| Memory | Less (mostly Nodes) | More (all TreeNodes) |
JAVA(5 lines)CodeLoading syntax highlighter...
π Debug This: The Unexplained Slowdown
HashMap.get() operations. The map contains user sessions:JAVA(25 lines)CodeLoading syntax highlighter...
get() slow? What would you fix?hashCode() only uses channel, which has only 3 possible values ("web", "mobile", "api"). All sessions go into just 3 buckets!With thousands of sessions:
- Each bucket has hundreds or thousands of entries
- Buckets are treeified (good), but still O(log n) per lookup
- And since
SessionIddoesn't implementComparable, trees use inefficienttieBreakOrder
JAVA(24 lines)CodeLoading syntax highlighter...
- Sessions distributed across many buckets (good hash distribution)
- Most buckets have 0-3 entries (no trees needed)
- If trees form,
Comparableensures efficient ordering
π» Exercises
Exercise 1: Force Treeification
ββ Difficulty: Medium | β±οΈ Time: 15 minutes
JAVA(9 lines)CodeLoading syntax highlighter...
JAVA(52 lines)CodeLoading syntax highlighter...
Exercise 2: HashDoS Simulation (Educational)
βββ Difficulty: Medium | β±οΈ Time: 20 minutes
JAVA(8 lines)CodeLoading syntax highlighter...
JAVA(60 lines)CodeLoading syntax highlighter...
Exercise 3: Analyze HashMap Distribution
βββ Difficulty: Medium-Hard | β±οΈ Time: 20 minutes
JAVA(18 lines)CodeLoading syntax highlighter...
JAVA(107 lines)CodeLoading syntax highlighter...
Exercise 4: Comparable vs Non-Comparable Performance
βββ Difficulty: Medium | β±οΈ Time: 15 minutes
Comparable implementation.JAVA(6 lines)CodeLoading syntax highlighter...
JAVA(78 lines)CodeLoading syntax highlighter...
Exercise 5: Custom HashMap with Tree Bin Logging
ββββ Difficulty: Hard | β±οΈ Time: 25 minutes
JAVA(8 lines)CodeLoading syntax highlighter...
JAVA(101 lines)CodeLoading syntax highlighter...
π Summary & Key Takeaways
The Tree Bin Mechanism
TEXT(21 lines)CodeLoading syntax highlighter...
Key Constants
| Constant | Value | Purpose |
|---|---|---|
| TREEIFY_THRESHOLD | 8 | Convert list β tree when bucket has 8+ entries |
| UNTREEIFY_THRESHOLD | 6 | Convert tree β list when bucket has β€6 entries (after resize) |
| MIN_TREEIFY_CAPACITY | 64 | Don't treeify if map capacity < 64 (resize instead) |
Performance Characteristics
| Operation | Normal Case | With Tree Bins |
|---|---|---|
| get() | O(1) average | O(log n) worst case |
| put() | O(1) average | O(log n) worst case |
| remove() | O(1) average | O(log n) worst case |
| iteration | O(n) | O(n) |
Best Practices
- Implement Comparable on key classes that might collide heavily
- Use good hashCode() implementations (Objects.hash or similar)
- Don't try to prevent trees - they exist for your protection
- Consider TreeMap if you need sorted keys (always O(log n))
- Monitor in production if you suspect hash collision issues
π€ Senior-Level Interview Questions
Question #1: When does a HashMap bucket convert to a tree?
- The bucket contains 8 or more entries (TREEIFY_THRESHOLD = 8)
- The total map capacity is at least 64 (MIN_TREEIFY_CAPACITY = 64)
- A new entry is being added that would increase the bucket size to 8+
If capacity is less than 64, HashMap resizes instead of treeifying - spreading entries across more buckets is more efficient for small maps.
Question #2: Why is the treeify threshold 8 specifically?
Question #3: What is HashDoS and how do tree bins mitigate it?
Tree bins mitigate this by converting heavily-loaded buckets to Red-Black trees. This changes worst-case lookup from O(n) to O(log n). With 10,000 collisions, that's ~130,000 operations instead of ~50 million - a 385x improvement. The attack becomes impractical because the computational cost to the server is bounded.
Question #4: How does HashMap order keys in tree bins when keys don't implement Comparable?
- First, it compares the fully-qualified class names of the keys
- If class names are equal, it uses
System.identityHashCode()to establish ordering
This ensures a consistent total ordering for the Red-Black tree, even with non-Comparable keys. However, it's less efficient than using proper Comparable implementations because: (a) string comparison of class names has overhead, and (b) identityHashCode-based ordering doesn't preserve any logical key ordering, potentially making tree operations less cache-friendly.
Question #5: When does a tree bin convert back to a linked list?
Question #6: Do tree bins have performance overhead?
- Memory: TreeNode uses ~56 bytes vs Node's ~32 bytes (75% more per entry)
- Insertion: Building and maintaining Red-Black tree requires rotations and recoloring
- Constant factors: Tree operations have higher constant factors than list traversal for small n
However, these overheads are acceptable because:
- Trees only form in pathological cases (collision attacks or bad hash functions)
- The O(log n) guarantee protects against worst-case degradation
- In normal operation, most buckets have 0-2 entries (no trees)
The design philosophy is: "Pay small overhead in rare edge cases to guarantee bounded worst-case performance."
π Conclusion
Tree bins represent one of the most important defensive programming decisions in the JDK. By automatically converting heavily-loaded buckets to Red-Black trees, HashMap provides:
- Security: Protection against HashDoS attacks
- Predictability: Bounded O(log n) worst-case performance
- Transparency: No code changes required - it "just works"
- Tree bins activate at 8+ entries per bucket (if capacity β₯ 64)
- They convert back to lists at β€6 entries during resize
- Implementing
Comparableon key classes optimizes tree performance - Good hash distribution prevents trees from forming at all
- Don't try to game the system - let HashMap manage itself
π Review Schedule for This Article
| Day | Task | Time |
|---|---|---|
| Day 1 | Review the thresholds (8, 6, 64) and why | 5 min |
| Day 3 | Redo Exercise 1 (Force Treeification) | 15 min |
| Day 7 | Answer interview questions without looking | 10 min |
| Day 14 | Redo Exercise 2 (HashDoS Simulation) | 15 min |
| Day 30 | Review mental model and security implications | 5 min |