Partitions & Replication
π At a Glance
| Aspect | Details |
|---|---|
| Difficulty | π‘ Intermediate |
| Prerequisites | Part 1 (Architecture & Storage) |
| Key Concepts | Partitions, replication factor, rack awareness, reassignment |
| Time Investment | 30 minutes read + 45 minutes practice |
| Payoff | Design topics that scale and survive failures |
π― What You'll Learn
After this article, you'll be able to:
- Choose the right partition count for your workload
- Understand replication and how data is distributed across brokers
- Configure rack awareness for datacenter fault tolerance
- Perform partition reassignment safely in production
- Diagnose partition imbalance and hot partition issues
π₯ Production Story: The Hot Partition Nightmare
orders (10 partitions, 3x replication). During a Black Friday sale, the system collapsed.Consumer lag: Partition 0: 50 messages Partition 1: 45 messages Partition 2: 52 messages Partition 3: 2,500,000 messages β WTF! Partition 4: 48 messages ...
One partition had 50,000x the lag of others!
JAVA(2 lines)CodeLoading syntax highlighter...
Looks reasonableβkey by customer ID. But wait...
SQL(11 lines)CodeLoading syntax highlighter...
customer_id = "GUEST" for all anonymous users. Since partitioning is hash(key) % partitions, all 2.5 million guest orders went to the same partition!JAVA(6 lines)CodeLoading syntax highlighter...
π§ Mental Model: Partitions and Replication
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β TOPIC: orders (6 partitions, RF=3) β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Broker 1 Broker 2 Broker 3 β β βββββββββββ βββββββββββ βββββββββββ β β β P0 (L) β β P0 (F) β β P0 (F) β β β β P1 (F) β β P1 (L) β β P1 (F) β β β β P2 (F) β β P2 (F) β β P2 (L) β β β β P3 (L) β β P3 (F) β β P3 (F) β β β β P4 (F) β β P4 (L) β β P4 (F) β β β β P5 (F) β β P5 (F) β β P5 (L) β β β βββββββββββ βββββββββββ βββββββββββ β β β β L = Leader (handles reads/writes) β β F = Follower (replicates from leader) β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β PRODUCER PERSPECTIVE: β β β β Producer β β β β β β Message with key="user123" β β β β β βΌ β β Partitioner: hash("user123") % 6 = 2 β β β β β βΌ β β Send to Broker 3 (leader of P2) β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β CONSUMER PERSPECTIVE: β β β β Consumer Group: order-processors (3 consumers) β β β β Consumer 1 Consumer 2 Consumer 3 β β βββββββββββ βββββββββββ βββββββββββ β β β P0, P1 β β P2, P3 β β P4, P5 β β β βββββββββββ βββββββββββ βββββββββββ β β β β Each partition assigned to exactly one consumer in group β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π¬ Deep Dive
1. Understanding Partitions
A partition is the unit of:
- Parallelism: More partitions = more consumers can process in parallel
- Ordering: Messages within a partition are strictly ordered
- Storage: Each partition is a separate log on disk
JAVA(14 lines)CodeLoading syntax highlighter...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ORDERING GUARANTEES β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β WITHIN A PARTITION: Total ordering guaranteed β β β β Partition 0: [M1] β [M2] β [M3] β [M4] β β Always consumed in this order β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ACROSS PARTITIONS: No ordering guarantee β β β β P0: [M1] β [M3] β β P1: [M2] β [M4] β β β β Consumer might see: M2, M1, M4, M3 (any interleaving) β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β IMPLICATION: If you need order, use the same key! β β β β // All orders for user123 go to same partition β β kafkaTemplate.send("orders", "user123", orderEvent); β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. How Many Partitions?
Target Partitions = max( Throughput / Max Throughput per Partition, Consumer Count in Largest Consumer Group )
| Factor | Guidance |
|---|---|
| Throughput | ~10 MB/s or 10K msg/s per partition (varies by message size) |
| Consumers | At least as many partitions as max consumers |
| Over-partitioning | More partitions = more overhead, but easier to scale later |
| Rule of thumb | Start with # brokers Γ 10 for high-throughput topics |
Requirements: - Expected throughput: 100K messages/sec - Message size: 1KB - Max consumers in any group: 20 Calculations: - Data throughput: 100K Γ 1KB = 100 MB/s - Partitions for throughput: 100 MB/s Γ· 10 MB/s = 10 - Partitions for consumers: 20 Result: max(10, 20) = 20 partitions minimum Recommendation: 24-30 partitions (room to grow)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β PARTITION COUNT TRADE-OFFS β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β TOO FEW PARTITIONS TOO MANY PARTITIONS β β βββββββββββββββββ ββββββββββββββββββ β β β’ Limited parallelism β’ More file handles β β β’ Can't add more consumers β’ More memory per broker β β β’ Hot partitions more likely β’ Longer leader election β β β’ Hard to scale later β’ Slower rebalancing β β β’ More ZK/KRaft metadata β β β β SWEET SPOT β β ββββββββββ β β β’ 2-3x expected consumer count β β β’ Allows for growth β β β’ Manageable overhead β β β’ Typically: dozens to low hundreds per topic β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
BASH(10 lines)CodeLoading syntax highlighter...
3. Replication Deep Dive
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β REPLICATION MECHANICS β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Topic: payments, Partition 0, RF=3 β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Broker 1 (Leader) β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β Offset: 0 1 2 3 4 5 6 7 ββ β β β β [M0] [M1] [M2] [M3] [M4] [M5] [M6] [M7] ββ β β β β β ββ β β β β High Watermark (HW) ββ β β β β (committed, safe) ββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β β Fetch requests β β βΌ β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Broker 2 (Follower, ISR) β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β Offset: 0 1 2 3 4 5 6 7 ββ β β β β [M0] [M1] [M2] [M3] [M4] [M5] [M6] [M7] ββ β β β β β ββ β β β β Caught up! ββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Broker 3 (Follower, ISR) β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β Offset: 0 1 2 3 4 5 6 ββ β β β β [M0] [M1] [M2] [M3] [M4] [M5] [M6] ββ β β β β β ββ β β β β 1 message behind ββ β β β β (still in ISR) ββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ISR (In-Sync Replicas) = {Broker1, Broker2, Broker3} β β All replicas within replica.lag.time.max.ms are "in sync" β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Environment | RF | Reason |
|---|---|---|
| Development | 1 | No redundancy needed |
| Production (standard) | 3 | Survives 2 broker failures |
| Critical data | 3-5 | Extra safety margin |
PROPERTIES(8 lines)CodeLoading syntax highlighter...
4. Rack Awareness
Without rack awareness, all replicas might be on the same rack:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β WITHOUT RACK AWARENESS β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Rack A Rack B β β βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ β β β Broker 1 β β Broker 3 β β β β βββ P0 (Leader) β β (no replicas) β β β β βββ P0 (Follower) β β β β β βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€ β β β Broker 2 β β Broker 4 β β β β βββ P0 (Follower) β β (no replicas) β β β βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ β β β β β οΈ If Rack A loses power, ALL replicas of P0 are lost! β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
With rack awareness:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β WITH RACK AWARENESS β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Rack A Rack B β β βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ β β β Broker 1 β β Broker 3 β β β β βββ P0 (Leader) β β βββ P0 (Follower) β β β βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€ β β β Broker 2 β β Broker 4 β β β β βββ P0 (Follower) β β (balanced elsewhere) β β β βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ β β β β β Replicas spread across racks β β β Rack A failure: P0 still available from Rack B β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PROPERTIES(5 lines)CodeLoading syntax highlighter...
BASH(6 lines)CodeLoading syntax highlighter...
5. Partition Reassignment
When you add brokers or rebalance load, you need to move partitions:
BASH(28 lines)CodeLoading syntax highlighter...
BASH(13 lines)CodeLoading syntax highlighter...
6. Spring Kafka: Partition Configuration
JAVA(20 lines)CodeLoading syntax highlighter...
JAVA(42 lines)CodeLoading syntax highlighter...
JAVA(20 lines)CodeLoading syntax highlighter...
7. Diagnosing Partition Issues
BASH(13 lines)CodeLoading syntax highlighter...
BASH(8 lines)CodeLoading syntax highlighter...
JAVA(25 lines)CodeLoading syntax highlighter...
β οΈ Common Mistakes
Mistake 1: Using High-Cardinality Keys Incorrectly
JAVA(12 lines)CodeLoading syntax highlighter...
Mistake 2: Wrong Partition Count
JAVA(14 lines)CodeLoading syntax highlighter...
Mistake 3: Replication Factor = 1 in Production
PROPERTIES(8 lines)CodeLoading syntax highlighter...
Mistake 4: Increasing Partitions Without Understanding Impact
BASH(8 lines)CodeLoading syntax highlighter...
Mistake 5: Not Throttling Reassignment
BASH(9 lines)CodeLoading syntax highlighter...
π Debug This
You've deployed 6 consumers for a topic with 12 partitions. Expected: 2 partitions each. Actual:
Consumer 1: P0, P1, P2, P3, P4, P5 Consumer 2: P6, P7, P8, P9, P10, P11 Consumer 3: (none) Consumer 4: (none) Consumer 5: (none) Consumer 6: (none)
Only 2 consumers are getting partitions. What's wrong?
Click to reveal analysis
-
Different group IDs: Consumers might be in different groupsBASH(3 lines)CodeLoading syntax highlighter...
-
Static membership mismatch: If using static membership, instance IDs might conflictJAVA(2 lines)CodeLoading syntax highlighter...
-
Partition assignment strategy: RangeAssignor can cause uneven distributionJAVA(7 lines)CodeLoading syntax highlighter...
-
Consumer not fully started: Some consumers might still be starting upBASH(3 lines)CodeLoading syntax highlighter...
group.id values. Verify all consumers use the exact same group ID:JAVA(2 lines)CodeLoading syntax highlighter...
π» Exercises
Exercise 1: Partition Distribution Analysis
Create a topic with 10 partitions. Send 10,000 messages with various key patterns and analyze distribution:
JAVA(6 lines)CodeLoading syntax highlighter...
Exercise 2: Hot Partition Simulation
Create a scenario with a hot partition and measure impact:
JAVA(4 lines)CodeLoading syntax highlighter...
Exercise 3: Rack Awareness Setup
Configure a 4-broker local cluster with rack awareness:
BASH(3 lines)CodeLoading syntax highlighter...
Exercise 4: Partition Reassignment
Practice safe reassignment:
BASH(5 lines)CodeLoading syntax highlighter...
Exercise 5: Custom Partitioner
Implement a partitioner that:
- Routes "priority" orders to partition 0
- Routes orders by region (EMEA β P1-3, APAC β P4-6, AMER β P7-9)
- Falls back to hash for unknown regions
π€ Interview Questions
Q1: How do you decide the number of partitions for a new topic?
-
Consumer parallelism: Partitions β₯ max consumers in any consumer group
If max consumers = 20, need at least 20 partitions -
Throughput requirements: ~10 MB/s per partition (varies)
100 MB/s needed β at least 10 partitions -
Ordering requirements: More partitions = less ordering
If you need all events ordered, use 1 partition (limits throughput) -
Growth: Increasing partitions later breaks key-based ordering
Better to over-provision initially: target Γ 2
partitions = max( target_throughput / throughput_per_partition, max_consumers ) Γ growth_factor
max(5, 10) Γ 2 = 20 partitions
Q2: What happens when you add partitions to an existing topic?
-
New messages with existing keys may go to different partitions:
Before: hash("user123") % 10 = 3 After: hash("user123") % 15 = 8Messages for the same key are now split across partitions.
-
Ordering guarantee is broken for existing keys: New messages for "user123" go to partition 8, but old messages are still in partition 3.
-
Consumers need rebalance: Consumer group will rebalance to assign new partitions.
-
Existing data stays in place: Old messages don't move.
- Topic doesn't use key-based partitioning
- Application doesn't require ordering
- You can tolerate a period of unordered processing
Q3: Explain the relationship between partitions, replication factor, and ISR.
RF=3 means each partition has 3 replicas (1 leader + 2 followers)
If RF=3 and one follower is slow: ISR={leader, follower1} (size 2)
min.insync.replicas=2, acks=all: - Producer waits for leader + 1 follower to acknowledge - If ISR drops to 1, producers get NotEnoughReplicas error - Data is durable on 2+ machines before ack
RF=3, min.insync.replicas=2 Normal: ISR={1,2,3} β Writes succeed (3 β₯ 2) 1 broker down: ISR={1,2} β Writes succeed (2 β₯ 2) 2 brokers down: ISR={1} β Writes FAIL (1 < 2)
Q4: How does rack awareness improve fault tolerance?
Rack A: [P0-leader, P0-follower, P0-follower] Rack B: [empty] Rack A power failure = ALL replicas lost = DATA LOSS
Rack A: [P0-leader, P0-follower] Rack B: [P0-follower] Rack A power failure = P0-follower in Rack B becomes leader = NO DATA LOSS
PROPERTIES(2 lines)CodeLoading syntax highlighter...
- RF should span at least 2 racks
- For RF=3, use 3 racks if possible
- Align with cloud availability zones
Q5: A consumer group has 6 consumers but some partitions are unassigned. What could be wrong?
-
Partition count < consumer count:
Topic has 4 partitions, 6 consumers Result: 2 consumers idle (partitions can't be shared) -
Different group IDs:JAVA(2 lines)CodeLoading syntax highlighter...
-
Static membership collision:JAVA(2 lines)CodeLoading syntax highlighter...
-
Assignment strategy issue: RangeAssignor with multiple topics can leave consumers idle.
-
Consumer failed health check: Consumer might be considered dead.
BASH(5 lines)CodeLoading syntax highlighter...
group.id (not actually the same group).π Summary & Key Takeaways
Partition Principles
- Unit of parallelism, ordering, and storage
- Same key β same partition β ordering guaranteed
- More partitions = more consumers possible
Sizing Guidelines
partitions = max(throughput_needs, consumer_count) Γ growth_factor
Replication
- RF=3 for production
- ISR tracks healthy replicas
min.insync.replicasdefines minimum for writes
Key Gotchas
- Adding partitions breaks key-based ordering
- Hot keys create hot partitions
- RF=1 means any broker failure loses data
π Quick Reference
BASH(20 lines)CodeLoading syntax highlighter...
π Review Schedule
- Day 1: Understand partition-key relationship
- Day 3: Practice calculating partition counts
- Day 7: Experiment with reassignment
- Day 14: Review rack awareness
- Day 30: Diagnose partition issues without notes
π Series Navigation
- Previous: Part 1 - Architecture & Storage Engine
- Next: Part 3 - Leaders, ISR & Fault Tolerance
- Index: Kafka Compendium Series