Java

Performance Pitfalls

πŸ“‹ At a Glance

AspectDetails
FocusReal-world performance, memory footprint, common production issues
Key InsightBig-O complexity hides constant factors that dominate in practice
Profiling ToolsJMH for benchmarks, JFR/JMC for production monitoring
Memory RealityArrayList of 1M Integers: 4MB data + ~20MB overhead
Top PitfallsWrong collection type, forgetting to size, memory leaks
"
One-liner: Understanding collection performance means knowing both complexity AND real-world behavior under production load.

🎯 What You'll Learn

  1. Why Big-O complexity doesn't tell the whole story
  2. Memory footprint of different collection types
  3. How to benchmark collections properly with JMH
  4. Production monitoring with JFR and JMC
  5. Common performance pitfalls and how to avoid them
  6. Memory leak patterns specific to collections

Prerequisites

Before reading this article, ensure you understand:

  • Part 6-7: ArrayList and LinkedList internals
  • Part 12-13: HashMap internals and tree bins
  • Part 18-19: ConcurrentHashMap
  • Part 27: Testing and debugging techniques

Production Story: The N+1 That Wasn't in the Database

The alert came at 2 AM: payment processing latency had spiked from 50ms to 800ms. The on-call engineer checked the usual suspectsβ€”database queries, external APIs, networkβ€”all looked fine.

After enabling JFR in production, the flame graph revealed something unexpected. Most time was spent not in I/O, but in... HashMap.get().
JAVA(17 lines)
Code
Loading syntax highlighter...
The problem? O(nΒ²) hidden in collection iteration, not a database N+1.
Root Cause Analysis:
  1. rules.keySet() creates an iterator for 50,000 elements
  2. startsWith() comparison on each key creates substring checks
  3. For an order with 10 items Γ— 5 tags = 50 iterations Γ— 50,000 = 2.5 million operations
The Fix:
JAVA(25 lines)
Code
Loading syntax highlighter...
Result: Latency dropped from 800ms to 12msβ€”a 67x improvement by choosing the right data structure.

Mental Model: The Performance Iceberg

        What you see in Big-O:
              β”Œβ”€β”€β”€β”€β”€β”
              β”‚O(1) β”‚  ← HashMap.get()
              β””β”€β”€β”€β”€β”€β”˜
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Water Line

        What actually happens:
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  hashCode() computation              β”‚
    β”‚  Array access (L1/L2/L3 cache miss?) β”‚
    β”‚  equals() comparison                 β”‚
    β”‚  Tree traversal (if bin > 8)         β”‚
    β”‚  Memory allocation for iterator      β”‚
    β”‚  GC pressure from temporary objects  β”‚
    β”‚  CPU branch prediction misses        β”‚
    β”‚  Memory locality / cache coherence   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Constant Factors That Matter

HashMap.get() "O(1)":
β”œβ”€β”€ hashCode(): 1-100ns (depends on key type)
β”œβ”€β”€ Array access: 1ns (L1) to 100ns (RAM)
β”œβ”€β”€ equals(): 1ns (int) to 1ΞΌs (deep object)
└── Potential tree walk: +O(log n)

ArrayList.get(i) "O(1)":
β”œβ”€β”€ Bounds check: 1ns
β”œβ”€β”€ Array access: 1-100ns (cache dependent)
└── Boxing/unboxing: 20-50ns (if primitives)

TreeMap.get() "O(log n)":
β”œβ”€β”€ Each compare: 10-100ns
β”œβ”€β”€ Tree depth: logβ‚‚(n) comparisons
β”œβ”€β”€ Better cache locality than HashMap!
└── At n=1000: ~10 comparisons β‰ˆ 1ΞΌs

When O(log n) Beats O(1)

Performance crossover point (typical):

Operations β”‚
per second β”‚
           β”‚  TreeMap
    10M ───│────●───────────────────────
           β”‚   /                   HashMap
     1M ───│──/─────────────────●───────
           β”‚ /                 /
   100K ───│/─────────────────/─────────
           β”‚                 /
           └────┬────┬────┬────┬────────
                10  100  1K  10K  elements

For small collections (< 1000 elements):
- TreeMap often faster due to cache locality
- HashMap overhead (array, load factor) dominates
- Profile don't assume!

Deep Dive: Memory Footprint Reality

Object Headers and Alignment

Every Java object has overhead:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         64-bit JVM Object Layout        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Mark Word:        8 bytes               β”‚
β”‚ Class Pointer:    4 bytes (compressed)  β”‚
β”‚ Array Length:     4 bytes (arrays only) β”‚
β”‚ Padding:          to 8-byte boundary    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Integer object: 16 bytes for 4 bytes of data!
- Mark word:      8 bytes
- Class pointer:  4 bytes
- int value:      4 bytes
- Total:         16 bytes (4x overhead!)

Collection Memory Calculator

JAVA(47 lines)
Code
Loading syntax highlighter...

Memory Comparison Table

Storing 1,000,000 integers:

Collection Type          | Memory Used | Overhead
-------------------------|-------------|----------
int[] (primitive)        |    4 MB     |   0%
ArrayList<Integer>       |   24 MB     | 500%
LinkedList<Integer>      |   48 MB     | 1100%
HashSet<Integer>         |   56 MB     | 1300%
TreeSet<Integer>         |   64 MB     | 1500%

Why such high overhead?
- Integer boxing: 16 bytes per element (vs 4 bytes)
- Node/Entry objects: 32-40 bytes each
- Array overhead and slack space

Primitive Collection Libraries

JAVA(17 lines)
Code
Loading syntax highlighter...

Deep Dive: JMH Benchmarking

Setting Up JMH

XML(14 lines)
Code
Loading syntax highlighter...

Collection Benchmark Example

JAVA(80 lines)
Code
Loading syntax highlighter...

Sample Benchmark Results

Benchmark                          (size)  Mode  Cnt       Score    Error  Units
CollectionBenchmark.arrayListContains   100  avgt   20      48.234 Β±  1.231  ns/op
CollectionBenchmark.arrayListContains 10000  avgt   20    4823.456 Β± 45.678  ns/op
CollectionBenchmark.arrayListContains 1000000 avgt  20  523456.789 Β± 1234.5  ns/op

CollectionBenchmark.hashSetContains     100  avgt   20       8.234 Β±  0.456  ns/op
CollectionBenchmark.hashSetContains   10000  avgt   20       9.123 Β±  0.567  ns/op
CollectionBenchmark.hashSetContains 1000000  avgt   20      12.456 Β±  0.789  ns/op

CollectionBenchmark.treeSetContains     100  avgt   20      15.234 Β±  0.678  ns/op
CollectionBenchmark.treeSetContains   10000  avgt   20      45.678 Β±  1.234  ns/op
CollectionBenchmark.treeSetContains 1000000  avgt   20      98.765 Β±  2.345  ns/op

Key insight: HashSet.contains() scales O(1), ArrayList scales O(n)
At 1M elements: HashSet is 42,000x faster for lookups!

Common Benchmarking Mistakes

JAVA(41 lines)
Code
Loading syntax highlighter...

Deep Dive: Production Monitoring with JFR

Enabling JFR

BASH(12 lines)
Code
Loading syntax highlighter...

Programmatic JFR Events

JAVA(39 lines)
Code
Loading syntax highlighter...

JFR Collection Analysis Queries

JAVA(26 lines)
Code
Loading syntax highlighter...

Common Performance Pitfalls

Pitfall 1: Not Pre-sizing Collections

JAVA(17 lines)
Code
Loading syntax highlighter...

Pitfall 2: Wrong Map for Access Pattern

JAVA(14 lines)
Code
Loading syntax highlighter...

Pitfall 3: Contains on Wrong Collection

JAVA(17 lines)
Code
Loading syntax highlighter...

Pitfall 4: Streams on Small Collections

JAVA(14 lines)
Code
Loading syntax highlighter...

Pitfall 5: Creating Collections in Hot Paths

JAVA(18 lines)
Code
Loading syntax highlighter...

Pitfall 6: Boxing in Primitive Operations

JAVA(14 lines)
Code
Loading syntax highlighter...

Pitfall 7: ConcurrentHashMap Compound Operations

JAVA(19 lines)
Code
Loading syntax highlighter...

Memory Leak Patterns

Pattern 1: Static Collection Growing Forever

JAVA(25 lines)
Code
Loading syntax highlighter...

Pattern 2: Listener Collections

JAVA(28 lines)
Code
Loading syntax highlighter...

Pattern 3: Thread-Local Collections

JAVA(34 lines)
Code
Loading syntax highlighter...

Pattern 4: Key Objects That Change

JAVA(27 lines)
Code
Loading syntax highlighter...

Detecting Memory Leaks

JAVA(24 lines)
Code
Loading syntax highlighter...

GC Implications

Collection Type and GC Pressure

GC Impact by Collection Type:

ArrayList:
β”œβ”€β”€ Single large array (few objects)
β”œβ”€β”€ Resizing creates garbage (old arrays)
β”œβ”€β”€ Good for G1/ZGC (humongous object handling)
└── Pre-size to minimize GC

LinkedList:
β”œβ”€β”€ Many small Node objects
β”œβ”€β”€ High allocation rate during adds
β”œβ”€β”€ Scattered in memory (poor locality)
└── Avoid in throughput-critical paths

HashMap:
β”œβ”€β”€ Entry nodes per element
β”œβ”€β”€ Resize copies entire table
β”œβ”€β”€ Tree bins at threshold (more objects)
└── Use initial capacity wisely

TreeMap:
β”œβ”€β”€ TreeNode per element
β”œβ”€β”€ Rebalancing doesn't create garbage
β”œβ”€β”€ More predictable GC behavior
└── Good for real-time systems

Reducing GC Pressure

JAVA(32 lines)
Code
Loading syntax highlighter...

Debug This! Collection Performance Challenges

Challenge 1: The Slow Aggregation

JAVA(15 lines)
Code
Loading syntax highlighter...
Solution
Problem: Two hash lookups per transaction (containsKey + get).
JAVA(16 lines)
Code
Loading syntax highlighter...
Performance improvement: 30-50% faster for large datasets.

Challenge 2: Memory Mystery

JAVA(21 lines)
Code
Loading syntax highlighter...
Solution
Problem: String.split() returns substrings that reference the original large string (in some JVM versions) or still allocate full strings. The bigger issue is that we're keeping references to strings derived from the large file content.
JAVA(16 lines)
Code
Loading syntax highlighter...

Challenge 3: The Concurrent Bottleneck

JAVA(18 lines)
Code
Loading syntax highlighter...
Solution
Problem: Global lock creates contention; all threads wait on single lock.
JAVA(17 lines)
Code
Loading syntax highlighter...
Why LongAdder?
  • Lock-free increment
  • Striped counters reduce contention
  • 10-100x better throughput under high contention

πŸ’» Exercises

Exercise 1: Benchmark ArrayList vs LinkedList Insertion

Write a JMH benchmark comparing insertion performance:

JAVA(12 lines)
Code
Loading syntax highlighter...
Solution
JAVA(65 lines)
Code
Loading syntax highlighter...

Exercise 2: Find the Memory Leak

JAVA(34 lines)
Code
Loading syntax highlighter...
Solution
JAVA(46 lines)
Code
Loading syntax highlighter...

Exercise 3: Optimize This Code

JAVA(43 lines)
Code
Loading syntax highlighter...
Solution
JAVA(55 lines)
Code
Loading syntax highlighter...

Senior-Level Interview Questions

Question 1: When would you choose TreeMap over HashMap despite O(log n) vs O(1)?

Strong Answer:

Several scenarios favor TreeMap:

  1. Need sorted iteration: HashMap iteration order is undefined
  2. Range queries: subMap(), headMap(), tailMap() are O(log n)
  3. NavigableMap operations: floorKey(), ceilingKey(), higherKey()
  4. Small collections: Constant factors make TreeMap competitive under ~1000 elements
  5. Memory-constrained: TreeMap has more predictable memory usage
  6. Real-time systems: No sudden O(n) resize operations
  7. Consistent iteration performance: Better cache locality than HashMap
JAVA(7 lines)
Code
Loading syntax highlighter...

Strong Answer:

Systematic approach:

  1. Identify symptoms: OOM errors, growing heap, GC pauses increasing
  2. Enable monitoring: JFR recording, heap metrics via JMX
  3. Capture heap dump: jmap -dump:live,format=b,file=heap.hprof <pid>
  4. Analyze with MAT/VisualVM:
    • Dominator tree shows retention paths
    • Look for collections with unexpected size
    • Check for static fields holding collections
JAVA(12 lines)
Code
Loading syntax highlighter...
  1. Add preventive monitoring:
JAVA(3 lines)
Code
Loading syntax highlighter...

Question 3: Explain the performance implications of ConcurrentHashMap's design choices.

Strong Answer:

ConcurrentHashMap makes several design trade-offs:

Segmented locking (pre-Java 8) β†’ Node-level CAS (Java 8+)
Pre-Java 8: Segments (16 default)
β”œβ”€β”€ Each segment is independent lock
β”œβ”€β”€ 16x improvement over synchronized HashMap
└── But still lock contention within segment

Java 8+: Per-node synchronization + CAS
β”œβ”€β”€ Lock only the bin being modified
β”œβ”€β”€ CAS for updates when possible
β”œβ”€β”€ Scales much better with core count
└── Size tracked with LongAdder (striped counters)
Key operations:
  • get(): Lock-free with volatile reads
  • put(): CAS for empty bin, synchronized for collision
  • size(): Approximate (sum of counters) unless forced
  • computeIfAbsent(): Atomic but may block bin
Performance implications:
JAVA(12 lines)
Code
Loading syntax highlighter...

Question 4: How do you benchmark collection performance properly?

Strong Answer:

Proper benchmarking requires understanding JVM behavior:

  1. Use JMH - handles warmup, JIT compilation, dead code elimination
  2. Avoid common mistakes:
JAVA(11 lines)
Code
Loading syntax highlighter...
  1. Control state properly:
JAVA(6 lines)
Code
Loading syntax highlighter...
  1. Measure what matters:
JAVA(3 lines)
Code
Loading syntax highlighter...
  1. Account for GC:
JAVA
Code
Loading syntax highlighter...
  1. Verify with -prof gc to see allocation rates

Question 5: What's the N+1 problem with collections and how do you avoid it?

Strong Answer:

N+1 with collections occurs when:

JAVA(9 lines)
Code
Loading syntax highlighter...
Solutions:
  1. Batch loading:
JAVA(4 lines)
Code
Loading syntax highlighter...
  1. Pre-indexed structures:
JAVA(3 lines)
Code
Loading syntax highlighter...
  1. JOIN FETCH in JPA:
JAVA(2 lines)
Code
Loading syntax highlighter...
  1. Stream with proper batching:
JAVA(3 lines)
Code
Loading syntax highlighter...

Question 6: How do you choose initial capacity for HashMap?

Strong Answer:

HashMap capacity planning:

JAVA(10 lines)
Code
Loading syntax highlighter...
Why it matters:
Without pre-sizing (adding 1M elements):
β”œβ”€β”€ Start: capacity 16
β”œβ”€β”€ Resize at 12: copy 12 elements, new capacity 32
β”œβ”€β”€ Resize at 24: copy 24 elements, new capacity 64
β”œβ”€β”€ ... 17 more resizes ...
β”œβ”€β”€ Each resize: O(n) copy + O(n) rehash
└── Total extra work: ~2M operations, ~2M garbage objects

With pre-sizing:
β”œβ”€β”€ Start: capacity 1,398,102 (next power of 2)
β”œβ”€β”€ No resizes needed
└── Total extra work: 0
Best practices:
  • Pre-size when you know approximate size
  • For streams: use Collectors.toMap() (no control) or collect to sized map
  • Consider load factor trade-off: lower = more memory, fewer collisions

Summary & Key Takeaways

Performance Reality Checklist

βœ“ Big-O is guidance, not gospel
  - Constant factors matter at small N
  - Cache locality can dominate
  - Profile, don't assume

βœ“ Memory awareness
  - Object headers: 12-16 bytes each
  - Boxing: 16 bytes per Integer vs 4 bytes per int
  - Collection overhead: 32-64 bytes per entry

βœ“ Benchmarking discipline
  - Use JMH for accurate results
  - Watch for dead code elimination
  - Include GC in measurements

βœ“ Production monitoring
  - JFR for non-invasive profiling
  - Metrics for collection sizes
  - Alerts for unbounded growth

βœ“ Common pitfalls avoided
  - Pre-size when possible
  - Right collection for access pattern
  - Clean up listeners and caches
  - Immutable keys for maps

Performance Decision Quick Reference

For lookup-heavy code:
β”œβ”€β”€ HashSet/HashMap: O(1) average
β”œβ”€β”€ Pre-size to avoid resizing
└── Good hashCode() is critical

For iteration-heavy code:
β”œβ”€β”€ ArrayList: best cache locality
β”œβ”€β”€ LinkedHashMap: predictable order
└── Avoid LinkedList

For concurrent access:
β”œβ”€β”€ ConcurrentHashMap: fine-grained locking
β”œβ”€β”€ CopyOnWriteArrayList: read-heavy, rare writes
└── Avoid Collections.synchronized*

For memory efficiency:
β”œβ”€β”€ Primitive collections (Eclipse, Trove, FastUtil)
β”œβ”€β”€ EnumSet/EnumMap for enum keys
└── ArrayList over LinkedList

Conclusion

Performance optimization is about understanding trade-offs, not memorizing complexity tables. The N+1 problem in our production story wasn't a database issueβ€”it was a data structure issue. By restructuring our pricing rules into a prefix-indexed map, we achieved a 67x performance improvement.

Key principles to remember:

  1. Profile first: Don't optimize based on assumptions
  2. Constant factors matter: O(1) can be slower than O(log n) in practice
  3. Memory matters: GC pressure from collection overhead is real
  4. Right tool for the job: HashMap isn't always the answer
  5. Monitor in production: Unbounded collections become memory leaks

In Part 29, we'll bring everything together with a comprehensive decision guide and quick reference cheatsheet for choosing the right collection.


πŸ“… Review Schedule

  • Day 1: Review performance comparison table and memory footprint calculations
  • Day 3: Complete one JMH benchmark exercise
  • Day 7: Practice memory leak detection patterns
  • Day 14: Review interview questions and explain concepts aloud
  • Day 30: Revisit production story and implement monitoring in your codebase

Series Navigation