Performance Pitfalls
π At a Glance
| Aspect | Details |
|---|---|
| Focus | Real-world performance, memory footprint, common production issues |
| Key Insight | Big-O complexity hides constant factors that dominate in practice |
| Profiling Tools | JMH for benchmarks, JFR/JMC for production monitoring |
| Memory Reality | ArrayList of 1M Integers: 4MB data + ~20MB overhead |
| Top Pitfalls | Wrong collection type, forgetting to size, memory leaks |
"One-liner: Understanding collection performance means knowing both complexity AND real-world behavior under production load.
π― What You'll Learn
- Why Big-O complexity doesn't tell the whole story
- Memory footprint of different collection types
- How to benchmark collections properly with JMH
- Production monitoring with JFR and JMC
- Common performance pitfalls and how to avoid them
- Memory leak patterns specific to collections
Prerequisites
Before reading this article, ensure you understand:
- Part 6-7: ArrayList and LinkedList internals
- Part 12-13: HashMap internals and tree bins
- Part 18-19: ConcurrentHashMap
- Part 27: Testing and debugging techniques
Production Story: The N+1 That Wasn't in the Database
The alert came at 2 AM: payment processing latency had spiked from 50ms to 800ms. The on-call engineer checked the usual suspectsβdatabase queries, external APIs, networkβall looked fine.
HashMap.get().JAVA(17 lines)CodeLoading syntax highlighter...
rules.keySet()creates an iterator for 50,000 elementsstartsWith()comparison on each key creates substring checks- For an order with 10 items Γ 5 tags = 50 iterations Γ 50,000 = 2.5 million operations
JAVA(25 lines)CodeLoading syntax highlighter...
Mental Model: The Performance Iceberg
What you see in Big-O: βββββββ βO(1) β β HashMap.get() βββββββ ββββββββββββββββββββββββββββββββββββββ Water Line What actually happens: ββββββββββββββββββββββββββββββββββββββββ β hashCode() computation β β Array access (L1/L2/L3 cache miss?) β β equals() comparison β β Tree traversal (if bin > 8) β β Memory allocation for iterator β β GC pressure from temporary objects β β CPU branch prediction misses β β Memory locality / cache coherence β ββββββββββββββββββββββββββββββββββββββββ
Constant Factors That Matter
HashMap.get() "O(1)": βββ hashCode(): 1-100ns (depends on key type) βββ Array access: 1ns (L1) to 100ns (RAM) βββ equals(): 1ns (int) to 1ΞΌs (deep object) βββ Potential tree walk: +O(log n) ArrayList.get(i) "O(1)": βββ Bounds check: 1ns βββ Array access: 1-100ns (cache dependent) βββ Boxing/unboxing: 20-50ns (if primitives) TreeMap.get() "O(log n)": βββ Each compare: 10-100ns βββ Tree depth: logβ(n) comparisons βββ Better cache locality than HashMap! βββ At n=1000: ~10 comparisons β 1ΞΌs
When O(log n) Beats O(1)
Performance crossover point (typical): Operations β per second β β TreeMap 10M ββββββββββββββββββββββββββββββββ β / HashMap 1M ββββββ/βββββββββββββββββββββββββ β / / 100K ββββ/βββββββββββββββββ/βββββββββ β / ββββββ¬βββββ¬βββββ¬βββββ¬ββββββββ 10 100 1K 10K elements For small collections (< 1000 elements): - TreeMap often faster due to cache locality - HashMap overhead (array, load factor) dominates - Profile don't assume!
Deep Dive: Memory Footprint Reality
Object Headers and Alignment
Every Java object has overhead: βββββββββββββββββββββββββββββββββββββββββββ β 64-bit JVM Object Layout β βββββββββββββββββββββββββββββββββββββββββββ€ β Mark Word: 8 bytes β β Class Pointer: 4 bytes (compressed) β β Array Length: 4 bytes (arrays only) β β Padding: to 8-byte boundary β βββββββββββββββββββββββββββββββββββββββββββ Integer object: 16 bytes for 4 bytes of data! - Mark word: 8 bytes - Class pointer: 4 bytes - int value: 4 bytes - Total: 16 bytes (4x overhead!)
Collection Memory Calculator
JAVA(47 lines)CodeLoading syntax highlighter...
Memory Comparison Table
Storing 1,000,000 integers: Collection Type | Memory Used | Overhead -------------------------|-------------|---------- int[] (primitive) | 4 MB | 0% ArrayList<Integer> | 24 MB | 500% LinkedList<Integer> | 48 MB | 1100% HashSet<Integer> | 56 MB | 1300% TreeSet<Integer> | 64 MB | 1500% Why such high overhead? - Integer boxing: 16 bytes per element (vs 4 bytes) - Node/Entry objects: 32-40 bytes each - Array overhead and slack space
Primitive Collection Libraries
JAVA(17 lines)CodeLoading syntax highlighter...
Deep Dive: JMH Benchmarking
Setting Up JMH
XML(14 lines)CodeLoading syntax highlighter...
Collection Benchmark Example
JAVA(80 lines)CodeLoading syntax highlighter...
Sample Benchmark Results
Benchmark (size) Mode Cnt Score Error Units CollectionBenchmark.arrayListContains 100 avgt 20 48.234 Β± 1.231 ns/op CollectionBenchmark.arrayListContains 10000 avgt 20 4823.456 Β± 45.678 ns/op CollectionBenchmark.arrayListContains 1000000 avgt 20 523456.789 Β± 1234.5 ns/op CollectionBenchmark.hashSetContains 100 avgt 20 8.234 Β± 0.456 ns/op CollectionBenchmark.hashSetContains 10000 avgt 20 9.123 Β± 0.567 ns/op CollectionBenchmark.hashSetContains 1000000 avgt 20 12.456 Β± 0.789 ns/op CollectionBenchmark.treeSetContains 100 avgt 20 15.234 Β± 0.678 ns/op CollectionBenchmark.treeSetContains 10000 avgt 20 45.678 Β± 1.234 ns/op CollectionBenchmark.treeSetContains 1000000 avgt 20 98.765 Β± 2.345 ns/op Key insight: HashSet.contains() scales O(1), ArrayList scales O(n) At 1M elements: HashSet is 42,000x faster for lookups!
Common Benchmarking Mistakes
JAVA(41 lines)CodeLoading syntax highlighter...
Deep Dive: Production Monitoring with JFR
Enabling JFR
BASH(12 lines)CodeLoading syntax highlighter...
Programmatic JFR Events
JAVA(39 lines)CodeLoading syntax highlighter...
JFR Collection Analysis Queries
JAVA(26 lines)CodeLoading syntax highlighter...
Common Performance Pitfalls
Pitfall 1: Not Pre-sizing Collections
JAVA(17 lines)CodeLoading syntax highlighter...
Pitfall 2: Wrong Map for Access Pattern
JAVA(14 lines)CodeLoading syntax highlighter...
Pitfall 3: Contains on Wrong Collection
JAVA(17 lines)CodeLoading syntax highlighter...
Pitfall 4: Streams on Small Collections
JAVA(14 lines)CodeLoading syntax highlighter...
Pitfall 5: Creating Collections in Hot Paths
JAVA(18 lines)CodeLoading syntax highlighter...
Pitfall 6: Boxing in Primitive Operations
JAVA(14 lines)CodeLoading syntax highlighter...
Pitfall 7: ConcurrentHashMap Compound Operations
JAVA(19 lines)CodeLoading syntax highlighter...
Memory Leak Patterns
Pattern 1: Static Collection Growing Forever
JAVA(25 lines)CodeLoading syntax highlighter...
Pattern 2: Listener Collections
JAVA(28 lines)CodeLoading syntax highlighter...
Pattern 3: Thread-Local Collections
JAVA(34 lines)CodeLoading syntax highlighter...
Pattern 4: Key Objects That Change
JAVA(27 lines)CodeLoading syntax highlighter...
Detecting Memory Leaks
JAVA(24 lines)CodeLoading syntax highlighter...
GC Implications
Collection Type and GC Pressure
GC Impact by Collection Type: ArrayList: βββ Single large array (few objects) βββ Resizing creates garbage (old arrays) βββ Good for G1/ZGC (humongous object handling) βββ Pre-size to minimize GC LinkedList: βββ Many small Node objects βββ High allocation rate during adds βββ Scattered in memory (poor locality) βββ Avoid in throughput-critical paths HashMap: βββ Entry nodes per element βββ Resize copies entire table βββ Tree bins at threshold (more objects) βββ Use initial capacity wisely TreeMap: βββ TreeNode per element βββ Rebalancing doesn't create garbage βββ More predictable GC behavior βββ Good for real-time systems
Reducing GC Pressure
JAVA(32 lines)CodeLoading syntax highlighter...
Debug This! Collection Performance Challenges
Challenge 1: The Slow Aggregation
JAVA(15 lines)CodeLoading syntax highlighter...
Solution
containsKey + get).JAVA(16 lines)CodeLoading syntax highlighter...
Challenge 2: Memory Mystery
JAVA(21 lines)CodeLoading syntax highlighter...
Solution
String.split() returns substrings that reference the original large string (in some JVM versions) or still allocate full strings. The bigger issue is that we're keeping references to strings derived from the large file content.JAVA(16 lines)CodeLoading syntax highlighter...
Challenge 3: The Concurrent Bottleneck
JAVA(18 lines)CodeLoading syntax highlighter...
Solution
JAVA(17 lines)CodeLoading syntax highlighter...
- Lock-free increment
- Striped counters reduce contention
- 10-100x better throughput under high contention
π» Exercises
Exercise 1: Benchmark ArrayList vs LinkedList Insertion
Write a JMH benchmark comparing insertion performance:
JAVA(12 lines)CodeLoading syntax highlighter...
Solution
JAVA(65 lines)CodeLoading syntax highlighter...
Exercise 2: Find the Memory Leak
JAVA(34 lines)CodeLoading syntax highlighter...
Solution
JAVA(46 lines)CodeLoading syntax highlighter...
Exercise 3: Optimize This Code
JAVA(43 lines)CodeLoading syntax highlighter...
Solution
JAVA(55 lines)CodeLoading syntax highlighter...
Senior-Level Interview Questions
Question 1: When would you choose TreeMap over HashMap despite O(log n) vs O(1)?
Several scenarios favor TreeMap:
- Need sorted iteration: HashMap iteration order is undefined
- Range queries:
subMap(),headMap(),tailMap()are O(log n) - NavigableMap operations:
floorKey(),ceilingKey(),higherKey() - Small collections: Constant factors make TreeMap competitive under ~1000 elements
- Memory-constrained: TreeMap has more predictable memory usage
- Real-time systems: No sudden O(n) resize operations
- Consistent iteration performance: Better cache locality than HashMap
JAVA(7 lines)CodeLoading syntax highlighter...
Question 2: How would you diagnose a collection-related memory leak in production?
Systematic approach:
- Identify symptoms: OOM errors, growing heap, GC pauses increasing
- Enable monitoring: JFR recording, heap metrics via JMX
- Capture heap dump:
jmap -dump:live,format=b,file=heap.hprof <pid> - Analyze with MAT/VisualVM:
- Dominator tree shows retention paths
- Look for collections with unexpected size
- Check for static fields holding collections
JAVA(12 lines)CodeLoading syntax highlighter...
- Add preventive monitoring:
JAVA(3 lines)CodeLoading syntax highlighter...
Question 3: Explain the performance implications of ConcurrentHashMap's design choices.
ConcurrentHashMap makes several design trade-offs:
Pre-Java 8: Segments (16 default) βββ Each segment is independent lock βββ 16x improvement over synchronized HashMap βββ But still lock contention within segment Java 8+: Per-node synchronization + CAS βββ Lock only the bin being modified βββ CAS for updates when possible βββ Scales much better with core count βββ Size tracked with LongAdder (striped counters)
get(): Lock-free with volatile readsput(): CAS for empty bin, synchronized for collisionsize(): Approximate (sum of counters) unless forcedcomputeIfAbsent(): Atomic but may block bin
JAVA(12 lines)CodeLoading syntax highlighter...
Question 4: How do you benchmark collection performance properly?
Proper benchmarking requires understanding JVM behavior:
- Use JMH - handles warmup, JIT compilation, dead code elimination
- Avoid common mistakes:
JAVA(11 lines)CodeLoading syntax highlighter...
- Control state properly:
JAVA(6 lines)CodeLoading syntax highlighter...
- Measure what matters:
JAVA(3 lines)CodeLoading syntax highlighter...
- Account for GC:
JAVACodeLoading syntax highlighter...
- Verify with
-prof gcto see allocation rates
Question 5: What's the N+1 problem with collections and how do you avoid it?
N+1 with collections occurs when:
JAVA(9 lines)CodeLoading syntax highlighter...
- Batch loading:
JAVA(4 lines)CodeLoading syntax highlighter...
- Pre-indexed structures:
JAVA(3 lines)CodeLoading syntax highlighter...
- JOIN FETCH in JPA:
JAVA(2 lines)CodeLoading syntax highlighter...
- Stream with proper batching:
JAVA(3 lines)CodeLoading syntax highlighter...
Question 6: How do you choose initial capacity for HashMap?
HashMap capacity planning:
JAVA(10 lines)CodeLoading syntax highlighter...
Without pre-sizing (adding 1M elements): βββ Start: capacity 16 βββ Resize at 12: copy 12 elements, new capacity 32 βββ Resize at 24: copy 24 elements, new capacity 64 βββ ... 17 more resizes ... βββ Each resize: O(n) copy + O(n) rehash βββ Total extra work: ~2M operations, ~2M garbage objects With pre-sizing: βββ Start: capacity 1,398,102 (next power of 2) βββ No resizes needed βββ Total extra work: 0
- Pre-size when you know approximate size
- For streams: use
Collectors.toMap()(no control) or collect to sized map - Consider load factor trade-off: lower = more memory, fewer collisions
Summary & Key Takeaways
Performance Reality Checklist
β Big-O is guidance, not gospel - Constant factors matter at small N - Cache locality can dominate - Profile, don't assume β Memory awareness - Object headers: 12-16 bytes each - Boxing: 16 bytes per Integer vs 4 bytes per int - Collection overhead: 32-64 bytes per entry β Benchmarking discipline - Use JMH for accurate results - Watch for dead code elimination - Include GC in measurements β Production monitoring - JFR for non-invasive profiling - Metrics for collection sizes - Alerts for unbounded growth β Common pitfalls avoided - Pre-size when possible - Right collection for access pattern - Clean up listeners and caches - Immutable keys for maps
Performance Decision Quick Reference
For lookup-heavy code: βββ HashSet/HashMap: O(1) average βββ Pre-size to avoid resizing βββ Good hashCode() is critical For iteration-heavy code: βββ ArrayList: best cache locality βββ LinkedHashMap: predictable order βββ Avoid LinkedList For concurrent access: βββ ConcurrentHashMap: fine-grained locking βββ CopyOnWriteArrayList: read-heavy, rare writes βββ Avoid Collections.synchronized* For memory efficiency: βββ Primitive collections (Eclipse, Trove, FastUtil) βββ EnumSet/EnumMap for enum keys βββ ArrayList over LinkedList
Conclusion
Performance optimization is about understanding trade-offs, not memorizing complexity tables. The N+1 problem in our production story wasn't a database issueβit was a data structure issue. By restructuring our pricing rules into a prefix-indexed map, we achieved a 67x performance improvement.
Key principles to remember:
- Profile first: Don't optimize based on assumptions
- Constant factors matter: O(1) can be slower than O(log n) in practice
- Memory matters: GC pressure from collection overhead is real
- Right tool for the job: HashMap isn't always the answer
- Monitor in production: Unbounded collections become memory leaks
In Part 29, we'll bring everything together with a comprehensive decision guide and quick reference cheatsheet for choosing the right collection.
π Review Schedule
- Day 1: Review performance comparison table and memory footprint calculations
- Day 3: Complete one JMH benchmark exercise
- Day 7: Practice memory leak detection patterns
- Day 14: Review interview questions and explain concepts aloud
- Day 30: Revisit production story and implement monitoring in your codebase
Series Navigation
- Previous: Part 27: Testing & Debugging Collections
- Next: Part 29: Decision Guide & Quick Reference
- Series Start: Part 0: How to Use This Series