Streams and Collectors
Master the Stream API for powerful collection transformations. Learn Collector internals, avoid toMap() pitfalls, write custom collectors, and understand when parallel streams help versus hurt performance.
📋 At a Glance
| Aspect | Details |
|---|---|
| Topic | Stream API, Collectors, groupingBy, toMap, parallel streams |
| Complexity | Intermediate to Advanced |
| Prerequisites | Part 1 (Collection Architecture), Part 2 (Generics) |
| Time to Master | 4-5 hours |
| Interview Frequency | Very High (functional programming, data transformation) |
🎯 What You'll Learn
After completing this article, you will be able to:
- Transform collections efficiently with Stream operations
- Use Collectors for complex aggregations
- Avoid common toMap() and groupingBy() pitfalls
- Write custom Collectors for specialized needs
- Decide when parallel streams improve performance
Production Story: The toMap() Crash
The Incident
Our data import service crashed during the nightly batch job. The culprit was a simple-looking stream operation:
JAVA(10 lines)CodeLoading syntax highlighter...
The Problem
TEXT(17 lines)CodeLoading syntax highlighter...
The Stream Solution
JAVA(39 lines)CodeLoading syntax highlighter...
The Difference
TEXT(10 lines)CodeLoading syntax highlighter...
Mental Model: The Assembly Line
TEXT(62 lines)CodeLoading syntax highlighter...
Deep Dive: Collection to Stream and Back
Creating Streams from Collections
JAVA(21 lines)CodeLoading syntax highlighter...
Collecting Back to Collections
JAVA(18 lines)CodeLoading syntax highlighter...
Deep Dive: Collectors.toMap()
Basic toMap Usage
JAVA(13 lines)CodeLoading syntax highlighter...
Handling Duplicates (The Critical Part!)
JAVA(33 lines)CodeLoading syntax highlighter...
Specifying Map Implementation
JAVA(25 lines)CodeLoading syntax highlighter...
Deep Dive: groupingBy and partitioningBy
Basic Grouping
JAVA(9 lines)CodeLoading syntax highlighter...
Downstream Collectors
JAVA(42 lines)CodeLoading syntax highlighter...
Nested Grouping
JAVA(17 lines)CodeLoading syntax highlighter...
partitioningBy (Binary Split)
JAVA(14 lines)CodeLoading syntax highlighter...
Deep Dive: Advanced Collectors
joining()
JAVA(18 lines)CodeLoading syntax highlighter...
collectingAndThen()
JAVA(28 lines)CodeLoading syntax highlighter...
reducing()
JAVA(19 lines)CodeLoading syntax highlighter...
teeing() (Java 12+)
JAVA(22 lines)CodeLoading syntax highlighter...
Deep Dive: Writing Custom Collectors
Collector Interface
JAVA(12 lines)CodeLoading syntax highlighter...
Custom Collector: ImmutableList
JAVA(15 lines)CodeLoading syntax highlighter...
Custom Collector: Running Statistics
JAVA(42 lines)CodeLoading syntax highlighter...
Custom Collector: Top N
JAVA(29 lines)CodeLoading syntax highlighter...
Deep Dive: Parallel Streams
When to Use Parallel Streams
JAVA(19 lines)CodeLoading syntax highlighter...
Parallel Stream Pitfalls
JAVA(34 lines)CodeLoading syntax highlighter...
Measuring Parallel Performance
JAVA(21 lines)CodeLoading syntax highlighter...
⚠️ Common Mistakes
Mistake 1: toMap() Without Merge Function
JAVA(15 lines)CodeLoading syntax highlighter...
Mistake 2: Modifying Source During Stream
JAVA(14 lines)CodeLoading syntax highlighter...
Mistake 3: Assuming Parallel is Faster
JAVA(15 lines)CodeLoading syntax highlighter...
Mistake 4: Using peek() for Side Effects
JAVA(14 lines)CodeLoading syntax highlighter...
Mistake 5: Ignoring Optional in Collectors
JAVA(14 lines)CodeLoading syntax highlighter...
🐛 Debug This
Challenge 1: The Empty Map
JAVA(12 lines)CodeLoading syntax highlighter...
{cat=2, dog=1, bird=1} - it works correctly! The merge function Integer::sum adds up values for duplicate keys.toMap().However, a simpler approach:
JAVA(5 lines)CodeLoading syntax highlighter...
Challenge 2: The Lost Elements
JAVA(8 lines)CodeLoading syntax highlighter...
[6, 7, 8] or [8, 9, 10] or [6, 9, 10], etc.limit() with parallel streams doesn't guarantee which elements are kept - just that 3 are kept.JAVA(6 lines)CodeLoading syntax highlighter...
Challenge 3: The Mysterious Null
JAVA(7 lines)CodeLoading syntax highlighter...
NullPointerException! groupingBy doesn't allow null keys.JAVA(10 lines)CodeLoading syntax highlighter...
💻 Exercises
Exercise 1: Multi-level Aggregation
Create a report showing average salary by department and seniority level:
JAVA(4 lines)CodeLoading syntax highlighter...
JAVA(26 lines)CodeLoading syntax highlighter...
Exercise 2: Custom Collector - Distinct Count Per Group
Write a collector that counts distinct values per group:
JAVA(11 lines)CodeLoading syntax highlighter...
JAVA(40 lines)CodeLoading syntax highlighter...
Exercise 3: Pagination Collector
Create a collector that paginates results:
JAVA(5 lines)CodeLoading syntax highlighter...
JAVA(40 lines)CodeLoading syntax highlighter...
🎤 Senior-Level Interview Questions
Question 1: toMap vs groupingBy
| Aspect | toMap() | groupingBy() |
|---|---|---|
| Result | Map<K, V> | Map<K, List<V>> |
| Duplicates | Must handle explicitly | Naturally groups duplicates |
| Use case | Unique key per element | Multiple elements per key |
JAVA(7 lines)CodeLoading syntax highlighter...
Question 2: Collector Components
JAVA(15 lines)CodeLoading syntax highlighter...
Question 3: Parallel Stream Overhead
Parallel streams have overhead:
- Splitting: Source must be split into chunks
- Thread management: ForkJoinPool coordination
- Combining: Results from threads must be merged
- Memory: Each thread needs its own accumulator
JAVA(15 lines)CodeLoading syntax highlighter...
Question 4: Stream Reuse
No, streams can only be consumed once:
JAVA(18 lines)CodeLoading syntax highlighter...
Question 5: flatMap vs map
JAVA(27 lines)CodeLoading syntax highlighter...
📝 Summary & Key Takeaways
Essential Collectors
| Collector | Purpose | Example |
|---|---|---|
toList() | Collect to List | stream.collect(toList()) |
toSet() | Collect to Set | stream.collect(toSet()) |
toMap() | Collect to Map | stream.collect(toMap(k, v, merge)) |
groupingBy() | Group by key | stream.collect(groupingBy(classifier)) |
partitioningBy() | Binary split | stream.collect(partitioningBy(predicate)) |
joining() | Concatenate strings | stream.collect(joining(", ")) |
counting() | Count elements | groupingBy(x, counting()) |
summingInt() | Sum values | groupingBy(x, summingInt(fn)) |
mapping() | Transform in group | groupingBy(x, mapping(fn, toList())) |
Key Rules
- Always use merge function with toMap() - duplicate keys are common
- groupingBy doesn't allow null keys - filter or transform nulls
- Parallel streams need stateless operations - avoid shared mutable state
- Measure before parallelizing - overhead can exceed benefit
- Streams are single-use - create new stream for each operation
🏁 Conclusion
Streams and Collectors provide powerful tools for collection transformation, but their complexity can lead to subtle bugs. The key insights are:
- toMap() is dangerous without merge function - always specify one
- groupingBy with downstream collectors enables complex aggregations
- Parallel streams have overhead - measure before assuming they're faster
- Custom collectors solve specialized needs elegantly
- Stream pipeline order matters - filter early, transform late
In the next article, we'll explore Views, Wrappers, and Defensive Patterns - techniques for protecting your collections from unintended modification.
📅 Review Schedule
To solidify your understanding, review this material:
- Tomorrow: Practice toMap() with merge functions
- In 3 days: Write a groupingBy with nested downstream collectors
- In 1 week: Implement a custom Collector
- In 2 weeks: Benchmark sequential vs parallel for your use case