Performance Pitfalls

📋 At a Glance

Aspect	Details
Focus	Real-world performance, memory footprint, common production issues
Key Insight	Big-O complexity hides constant factors that dominate in practice
Profiling Tools	JMH for benchmarks, JFR/JMC for production monitoring
Memory Reality	ArrayList of 1M Integers: 4MB data + ~20MB overhead
Top Pitfalls	Wrong collection type, forgetting to size, memory leaks

"

One-liner: Understanding collection performance means knowing both complexity AND real-world behavior under production load.

🎯 What You'll Learn

Why Big-O complexity doesn't tell the whole story
Memory footprint of different collection types
How to benchmark collections properly with JMH
Production monitoring with JFR and JMC
Common performance pitfalls and how to avoid them
Memory leak patterns specific to collections

Prerequisites

Before reading this article, ensure you understand:

Part 6-7: ArrayList and LinkedList internals
Part 12-13: HashMap internals and tree bins
Part 18-19: ConcurrentHashMap
Part 27: Testing and debugging techniques

Production Story: The N+1 That Wasn't in the Database

The alert came at 2 AM: payment processing latency had spiked from 50ms to 800ms. The on-call engineer checked the usual suspects—database queries, external APIs, network—all looked fine.

After enabling JFR in production, the flame graph revealed something unexpected. Most time was spent not in I/O, but in... HashMap.get().

JAVA(17 lines)
Code
Loading syntax highlighter...

The problem? O(n²) hidden in collection iteration, not a database N+1.

Root Cause Analysis:

rules.keySet() creates an iterator for 50,000 elements
startsWith() comparison on each key creates substring checks
For an order with 10 items × 5 tags = 50 iterations × 50,000 = 2.5 million operations

The Fix:

JAVA(25 lines)
Code
Loading syntax highlighter...

Result: Latency dropped from 800ms to 12ms—a 67x improvement by choosing the right data structure.

Mental Model: The Performance Iceberg

        What you see in Big-O:
              ┌─────┐
              │O(1) │  ← HashMap.get()
              └─────┘
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Water Line

        What actually happens:
    ┌──────────────────────────────────────┐
    │  hashCode() computation              │
    │  Array access (L1/L2/L3 cache miss?) │
    │  equals() comparison                 │
    │  Tree traversal (if bin > 8)         │
    │  Memory allocation for iterator      │
    │  GC pressure from temporary objects  │
    │  CPU branch prediction misses        │
    │  Memory locality / cache coherence   │
    └──────────────────────────────────────┘

Constant Factors That Matter

HashMap.get() "O(1)":
├── hashCode(): 1-100ns (depends on key type)
├── Array access: 1ns (L1) to 100ns (RAM)
├── equals(): 1ns (int) to 1μs (deep object)
└── Potential tree walk: +O(log n)

ArrayList.get(i) "O(1)":
├── Bounds check: 1ns
├── Array access: 1-100ns (cache dependent)
└── Boxing/unboxing: 20-50ns (if primitives)

TreeMap.get() "O(log n)":
├── Each compare: 10-100ns
├── Tree depth: log₂(n) comparisons
├── Better cache locality than HashMap!
└── At n=1000: ~10 comparisons ≈ 1μs

When O(log n) Beats O(1)

Performance crossover point (typical):

Operations │
per second │
           │  TreeMap
    10M ───│────●───────────────────────
           │   /                   HashMap
     1M ───│──/─────────────────●───────
           │ /                 /
   100K ───│/─────────────────/─────────
           │                 /
           └────┬────┬────┬────┬────────
                10  100  1K  10K  elements

For small collections (< 1000 elements):
- TreeMap often faster due to cache locality
- HashMap overhead (array, load factor) dominates
- Profile don't assume!

Deep Dive: Memory Footprint Reality

Object Headers and Alignment

Every Java object has overhead:

┌─────────────────────────────────────────┐
│         64-bit JVM Object Layout        │
├─────────────────────────────────────────┤
│ Mark Word:        8 bytes               │
│ Class Pointer:    4 bytes (compressed)  │
│ Array Length:     4 bytes (arrays only) │
│ Padding:          to 8-byte boundary    │
└─────────────────────────────────────────┘

Integer object: 16 bytes for 4 bytes of data!
- Mark word:      8 bytes
- Class pointer:  4 bytes
- int value:      4 bytes
- Total:         16 bytes (4x overhead!)

Collection Memory Calculator

JAVA(47 lines)
Code
Loading syntax highlighter...

Memory Comparison Table

Storing 1,000,000 integers:

Collection Type          | Memory Used | Overhead
-------------------------|-------------|----------
int[] (primitive)        |    4 MB     |   0%
ArrayList<Integer>       |   24 MB     | 500%
LinkedList<Integer>      |   48 MB     | 1100%
HashSet<Integer>         |   56 MB     | 1300%
TreeSet<Integer>         |   64 MB     | 1500%

Why such high overhead?
- Integer boxing: 16 bytes per element (vs 4 bytes)
- Node/Entry objects: 32-40 bytes each
- Array overhead and slack space

Primitive Collection Libraries

JAVA(17 lines)
Code
Loading syntax highlighter...

Deep Dive: JMH Benchmarking

Setting Up JMH

XML(14 lines)
Code
Loading syntax highlighter...

Collection Benchmark Example

JAVA(80 lines)
Code
Loading syntax highlighter...

Sample Benchmark Results

Benchmark                          (size)  Mode  Cnt       Score    Error  Units
CollectionBenchmark.arrayListContains   100  avgt   20      48.234 ±  1.231  ns/op
CollectionBenchmark.arrayListContains 10000  avgt   20    4823.456 ± 45.678  ns/op
CollectionBenchmark.arrayListContains 1000000 avgt  20  523456.789 ± 1234.5  ns/op

CollectionBenchmark.hashSetContains     100  avgt   20       8.234 ±  0.456  ns/op
CollectionBenchmark.hashSetContains   10000  avgt   20       9.123 ±  0.567  ns/op
CollectionBenchmark.hashSetContains 1000000  avgt   20      12.456 ±  0.789  ns/op

CollectionBenchmark.treeSetContains     100  avgt   20      15.234 ±  0.678  ns/op
CollectionBenchmark.treeSetContains   10000  avgt   20      45.678 ±  1.234  ns/op
CollectionBenchmark.treeSetContains 1000000  avgt   20      98.765 ±  2.345  ns/op

Key insight: HashSet.contains() scales O(1), ArrayList scales O(n)
At 1M elements: HashSet is 42,000x faster for lookups!

Common Benchmarking Mistakes

JAVA(41 lines)
Code
Loading syntax highlighter...

Deep Dive: Production Monitoring with JFR

Enabling JFR

BASH(12 lines)
Code
Loading syntax highlighter...

Programmatic JFR Events

JAVA(39 lines)
Code
Loading syntax highlighter...

JFR Collection Analysis Queries

JAVA(26 lines)
Code
Loading syntax highlighter...

Common Performance Pitfalls

Pitfall 1: Not Pre-sizing Collections

JAVA(17 lines)
Code
Loading syntax highlighter...

Pitfall 2: Wrong Map for Access Pattern

JAVA(14 lines)
Code
Loading syntax highlighter...

Pitfall 3: Contains on Wrong Collection

JAVA(17 lines)
Code
Loading syntax highlighter...

Pitfall 4: Streams on Small Collections

JAVA(14 lines)
Code
Loading syntax highlighter...

Pitfall 5: Creating Collections in Hot Paths

JAVA(18 lines)
Code
Loading syntax highlighter...

Pitfall 6: Boxing in Primitive Operations

JAVA(14 lines)
Code
Loading syntax highlighter...

Pitfall 7: ConcurrentHashMap Compound Operations

JAVA(19 lines)
Code
Loading syntax highlighter...

Memory Leak Patterns

Pattern 1: Static Collection Growing Forever

JAVA(25 lines)
Code
Loading syntax highlighter...

Pattern 2: Listener Collections

JAVA(28 lines)
Code
Loading syntax highlighter...

Pattern 3: Thread-Local Collections

JAVA(34 lines)
Code
Loading syntax highlighter...

Pattern 4: Key Objects That Change

JAVA(27 lines)
Code
Loading syntax highlighter...

Detecting Memory Leaks

JAVA(24 lines)
Code
Loading syntax highlighter...

GC Implications

Collection Type and GC Pressure

GC Impact by Collection Type:

ArrayList:
├── Single large array (few objects)
├── Resizing creates garbage (old arrays)
├── Good for G1/ZGC (humongous object handling)
└── Pre-size to minimize GC

LinkedList:
├── Many small Node objects
├── High allocation rate during adds
├── Scattered in memory (poor locality)
└── Avoid in throughput-critical paths

HashMap:
├── Entry nodes per element
├── Resize copies entire table
├── Tree bins at threshold (more objects)
└── Use initial capacity wisely

TreeMap:
├── TreeNode per element
├── Rebalancing doesn't create garbage
├── More predictable GC behavior
└── Good for real-time systems

Reducing GC Pressure

JAVA(32 lines)
Code
Loading syntax highlighter...

Debug This! Collection Performance Challenges

Challenge 1: The Slow Aggregation

JAVA(15 lines)
Code
Loading syntax highlighter...

Solution

Problem: Two hash lookups per transaction (containsKey + get).

JAVA(16 lines)
Code
Loading syntax highlighter...

Performance improvement: 30-50% faster for large datasets.

Challenge 2: Memory Mystery

JAVA(21 lines)
Code
Loading syntax highlighter...

Solution

Problem: String.split() returns substrings that reference the original large string (in some JVM versions) or still allocate full strings. The bigger issue is that we're keeping references to strings derived from the large file content.

JAVA(16 lines)
Code
Loading syntax highlighter...

Challenge 3: The Concurrent Bottleneck

JAVA(18 lines)
Code
Loading syntax highlighter...

Solution

Problem: Global lock creates contention; all threads wait on single lock.

JAVA(17 lines)
Code
Loading syntax highlighter...

Why LongAdder?

Lock-free increment
Striped counters reduce contention
10-100x better throughput under high contention

💻 Exercises

Exercise 1: Benchmark ArrayList vs LinkedList Insertion

Write a JMH benchmark comparing insertion performance:

JAVA(12 lines)
Code
Loading syntax highlighter...

Solution

JAVA(65 lines)
Code
Loading syntax highlighter...

Exercise 2: Find the Memory Leak

JAVA(34 lines)
Code
Loading syntax highlighter...

Solution

JAVA(46 lines)
Code
Loading syntax highlighter...

Exercise 3: Optimize This Code

JAVA(43 lines)
Code
Loading syntax highlighter...

Solution

JAVA(55 lines)
Code
Loading syntax highlighter...

Senior-Level Interview Questions

Question 1: When would you choose TreeMap over HashMap despite O(log n) vs O(1)?

Strong Answer:

Several scenarios favor TreeMap:

Need sorted iteration: HashMap iteration order is undefined
Range queries: subMap(), headMap(), tailMap() are O(log n)
NavigableMap operations: floorKey(), ceilingKey(), higherKey()
Small collections: Constant factors make TreeMap competitive under ~1000 elements
Memory-constrained: TreeMap has more predictable memory usage
Real-time systems: No sudden O(n) resize operations
Consistent iteration performance: Better cache locality than HashMap

JAVA(7 lines)
Code
Loading syntax highlighter...

Strong Answer:

Systematic approach:

Identify symptoms: OOM errors, growing heap, GC pauses increasing
Enable monitoring: JFR recording, heap metrics via JMX
Capture heap dump: jmap -dump:live,format=b,file=heap.hprof <pid>
Analyze with MAT/VisualVM:
- Dominator tree shows retention paths
- Look for collections with unexpected size
- Check for static fields holding collections

JAVA(12 lines)
Code
Loading syntax highlighter...

Add preventive monitoring:

JAVA(3 lines)
Code
Loading syntax highlighter...

Question 3: Explain the performance implications of ConcurrentHashMap's design choices.

Strong Answer:

ConcurrentHashMap makes several design trade-offs:

Segmented locking (pre-Java 8) → Node-level CAS (Java 8+)

Pre-Java 8: Segments (16 default)
├── Each segment is independent lock
├── 16x improvement over synchronized HashMap
└── But still lock contention within segment

Java 8+: Per-node synchronization + CAS
├── Lock only the bin being modified
├── CAS for updates when possible
├── Scales much better with core count
└── Size tracked with LongAdder (striped counters)

Key operations:

get(): Lock-free with volatile reads
put(): CAS for empty bin, synchronized for collision
size(): Approximate (sum of counters) unless forced
computeIfAbsent(): Atomic but may block bin

Performance implications:

JAVA(12 lines)
Code
Loading syntax highlighter...

Question 4: How do you benchmark collection performance properly?

Strong Answer:

Proper benchmarking requires understanding JVM behavior:

Use JMH - handles warmup, JIT compilation, dead code elimination
Avoid common mistakes:

JAVA(11 lines)
Code
Loading syntax highlighter...

Control state properly:

JAVA(6 lines)
Code
Loading syntax highlighter...

Measure what matters:

JAVA(3 lines)
Code
Loading syntax highlighter...

Account for GC:

JAVA
Code
Loading syntax highlighter...

Verify with -prof gc to see allocation rates

Question 5: What's the N+1 problem with collections and how do you avoid it?

Strong Answer:

N+1 with collections occurs when:

JAVA(9 lines)
Code
Loading syntax highlighter...

Solutions:

Batch loading:

JAVA(4 lines)
Code
Loading syntax highlighter...

Pre-indexed structures:

JAVA(3 lines)
Code
Loading syntax highlighter...

JOIN FETCH in JPA:

JAVA(2 lines)
Code
Loading syntax highlighter...

Stream with proper batching:

JAVA(3 lines)
Code
Loading syntax highlighter...

Question 6: How do you choose initial capacity for HashMap?

Strong Answer:

HashMap capacity planning:

JAVA(10 lines)
Code
Loading syntax highlighter...

Why it matters:

Without pre-sizing (adding 1M elements):
├── Start: capacity 16
├── Resize at 12: copy 12 elements, new capacity 32
├── Resize at 24: copy 24 elements, new capacity 64
├── ... 17 more resizes ...
├── Each resize: O(n) copy + O(n) rehash
└── Total extra work: ~2M operations, ~2M garbage objects

With pre-sizing:
├── Start: capacity 1,398,102 (next power of 2)
├── No resizes needed
└── Total extra work: 0

Best practices:

Pre-size when you know approximate size
For streams: use Collectors.toMap() (no control) or collect to sized map
Consider load factor trade-off: lower = more memory, fewer collisions

Summary & Key Takeaways

Performance Reality Checklist

✓ Big-O is guidance, not gospel
  - Constant factors matter at small N
  - Cache locality can dominate
  - Profile, don't assume

✓ Memory awareness
  - Object headers: 12-16 bytes each
  - Boxing: 16 bytes per Integer vs 4 bytes per int
  - Collection overhead: 32-64 bytes per entry

✓ Benchmarking discipline
  - Use JMH for accurate results
  - Watch for dead code elimination
  - Include GC in measurements

✓ Production monitoring
  - JFR for non-invasive profiling
  - Metrics for collection sizes
  - Alerts for unbounded growth

✓ Common pitfalls avoided
  - Pre-size when possible
  - Right collection for access pattern
  - Clean up listeners and caches
  - Immutable keys for maps

Performance Decision Quick Reference

For lookup-heavy code:
├── HashSet/HashMap: O(1) average
├── Pre-size to avoid resizing
└── Good hashCode() is critical

For iteration-heavy code:
├── ArrayList: best cache locality
├── LinkedHashMap: predictable order
└── Avoid LinkedList

For concurrent access:
├── ConcurrentHashMap: fine-grained locking
├── CopyOnWriteArrayList: read-heavy, rare writes
└── Avoid Collections.synchronized*

For memory efficiency:
├── Primitive collections (Eclipse, Trove, FastUtil)
├── EnumSet/EnumMap for enum keys
└── ArrayList over LinkedList

Conclusion

Performance optimization is about understanding trade-offs, not memorizing complexity tables. The N+1 problem in our production story wasn't a database issue—it was a data structure issue. By restructuring our pricing rules into a prefix-indexed map, we achieved a 67x performance improvement.

Key principles to remember:

Profile first: Don't optimize based on assumptions
Constant factors matter: O(1) can be slower than O(log n) in practice
Memory matters: GC pressure from collection overhead is real
Right tool for the job: HashMap isn't always the answer
Monitor in production: Unbounded collections become memory leaks

In Part 29, we'll bring everything together with a comprehensive decision guide and quick reference cheatsheet for choosing the right collection.

📅 Review Schedule

Day 1: Review performance comparison table and memory footprint calculations
Day 3: Complete one JMH benchmark exercise
Day 7: Practice memory leak detection patterns
Day 14: Review interview questions and explain concepts aloud
Day 30: Revisit production story and implement monitoring in your codebase

Previous: Part 27: Testing & Debugging Collections
Next: Part 29: Decision Guide & Quick Reference
Series Start: Part 0: How to Use This Series

📋 At a Glance

🎯 What You'll Learn

Prerequisites

Production Story: The N+1 That Wasn't in the Database

Mental Model: The Performance Iceberg

Constant Factors That Matter

When O(log n) Beats O(1)

Deep Dive: Memory Footprint Reality

Object Headers and Alignment

Collection Memory Calculator

Memory Comparison Table

Primitive Collection Libraries

Deep Dive: JMH Benchmarking

Setting Up JMH

Collection Benchmark Example

Sample Benchmark Results

Common Benchmarking Mistakes

Deep Dive: Production Monitoring with JFR

Enabling JFR

Programmatic JFR Events

JFR Collection Analysis Queries

Common Performance Pitfalls

Pitfall 1: Not Pre-sizing Collections

Pitfall 2: Wrong Map for Access Pattern

Pitfall 3: Contains on Wrong Collection

Pitfall 4: Streams on Small Collections

Pitfall 5: Creating Collections in Hot Paths

Pitfall 6: Boxing in Primitive Operations

Pitfall 7: ConcurrentHashMap Compound Operations

Memory Leak Patterns

Pattern 1: Static Collection Growing Forever

Pattern 2: Listener Collections

Pattern 3: Thread-Local Collections

Pattern 4: Key Objects That Change

Detecting Memory Leaks

GC Implications

Collection Type and GC Pressure

Reducing GC Pressure

Debug This! Collection Performance Challenges

Challenge 1: The Slow Aggregation

Challenge 2: Memory Mystery

Challenge 3: The Concurrent Bottleneck

💻 Exercises

Exercise 1: Benchmark ArrayList vs LinkedList Insertion

Exercise 2: Find the Memory Leak

Exercise 3: Optimize This Code

Senior-Level Interview Questions

Question 1: When would you choose TreeMap over HashMap despite O(log n) vs O(1)?

Question 2: How would you diagnose a collection-related memory leak in production?

Question 3: Explain the performance implications of ConcurrentHashMap's design choices.

Question 4: How do you benchmark collection performance properly?

Question 5: What's the N+1 problem with collections and how do you avoid it?

Question 6: How do you choose initial capacity for HashMap?

Summary & Key Takeaways

Performance Reality Checklist

Performance Decision Quick Reference

Conclusion

📅 Review Schedule

Series Navigation

Tags: