Devops

Partitions & Replication

πŸ“‹ At a Glance

AspectDetails
Difficulty🟑 Intermediate
PrerequisitesPart 1 (Architecture & Storage)
Key ConceptsPartitions, replication factor, rack awareness, reassignment
Time Investment30 minutes read + 45 minutes practice
PayoffDesign topics that scale and survive failures

🎯 What You'll Learn

After this article, you'll be able to:

  1. Choose the right partition count for your workload
  2. Understand replication and how data is distributed across brokers
  3. Configure rack awareness for datacenter fault tolerance
  4. Perform partition reassignment safely in production
  5. Diagnose partition imbalance and hot partition issues

πŸ”₯ Production Story: The Hot Partition Nightmare

The Setup: An order processing system with topic orders (10 partitions, 3x replication). During a Black Friday sale, the system collapsed.
The Symptoms:
Consumer lag:
  Partition 0: 50 messages
  Partition 1: 45 messages
  Partition 2: 52 messages
  Partition 3: 2,500,000 messages  ← WTF!
  Partition 4: 48 messages
  ...

One partition had 50,000x the lag of others!

The Investigation:
JAVA(2 lines)
Code
Loading syntax highlighter...

Looks reasonableβ€”key by customer ID. But wait...

SQL(11 lines)
Code
Loading syntax highlighter...
The Root Cause: Guest checkout used customer_id = "GUEST" for all anonymous users. Since partitioning is hash(key) % partitions, all 2.5 million guest orders went to the same partition!
The Fix:
JAVA(6 lines)
Code
Loading syntax highlighter...
Lesson Learned: Partition key design is critical. A single hot key can bottleneck your entire system.

🧠 Mental Model: Partitions and Replication

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                TOPIC: orders (6 partitions, RF=3)               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Broker 1          Broker 2          Broker 3                  β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚   β”‚ P0 (L)  β”‚       β”‚ P0 (F)  β”‚       β”‚ P0 (F)  β”‚               β”‚
β”‚   β”‚ P1 (F)  β”‚       β”‚ P1 (L)  β”‚       β”‚ P1 (F)  β”‚               β”‚
β”‚   β”‚ P2 (F)  β”‚       β”‚ P2 (F)  β”‚       β”‚ P2 (L)  β”‚               β”‚
β”‚   β”‚ P3 (L)  β”‚       β”‚ P3 (F)  β”‚       β”‚ P3 (F)  β”‚               β”‚
β”‚   β”‚ P4 (F)  β”‚       β”‚ P4 (L)  β”‚       β”‚ P4 (F)  β”‚               β”‚
β”‚   β”‚ P5 (F)  β”‚       β”‚ P5 (F)  β”‚       β”‚ P5 (L)  β”‚               β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                                                 β”‚
β”‚   L = Leader (handles reads/writes)                             β”‚
β”‚   F = Follower (replicates from leader)                         β”‚
β”‚                                                                 β”‚
β”‚   ───────────────────────────────────────────────────────────── β”‚
β”‚                                                                 β”‚
β”‚   PRODUCER PERSPECTIVE:                                         β”‚
β”‚                                                                 β”‚
β”‚   Producer                                                      β”‚
β”‚      β”‚                                                          β”‚
β”‚      β”‚ Message with key="user123"                               β”‚
β”‚      β”‚                                                          β”‚
β”‚      β–Ό                                                          β”‚
β”‚   Partitioner: hash("user123") % 6 = 2                          β”‚
β”‚      β”‚                                                          β”‚
β”‚      β–Ό                                                          β”‚
β”‚   Send to Broker 3 (leader of P2)                               β”‚
β”‚                                                                 β”‚
β”‚   ───────────────────────────────────────────────────────────── β”‚
β”‚                                                                 β”‚
β”‚   CONSUMER PERSPECTIVE:                                         β”‚
β”‚                                                                 β”‚
β”‚   Consumer Group: order-processors (3 consumers)                β”‚
β”‚                                                                 β”‚
β”‚   Consumer 1         Consumer 2         Consumer 3              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚   β”‚ P0, P1  β”‚       β”‚ P2, P3  β”‚       β”‚ P4, P5  β”‚               β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                                                 β”‚
β”‚   Each partition assigned to exactly one consumer in group      β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”¬ Deep Dive

1. Understanding Partitions

A partition is the unit of:

  • Parallelism: More partitions = more consumers can process in parallel
  • Ordering: Messages within a partition are strictly ordered
  • Storage: Each partition is a separate log on disk
Partition assignment algorithm (default):
JAVA(14 lines)
Code
Loading syntax highlighter...
Ordering guarantees:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ORDERING GUARANTEES                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   WITHIN A PARTITION: Total ordering guaranteed                 β”‚
β”‚                                                                 β”‚
β”‚   Partition 0: [M1] β†’ [M2] β†’ [M3] β†’ [M4]                        β”‚
β”‚                Always consumed in this order                    β”‚
β”‚                                                                 β”‚
β”‚   ───────────────────────────────────────────────────────────── β”‚
β”‚                                                                 β”‚
β”‚   ACROSS PARTITIONS: No ordering guarantee                      β”‚
β”‚                                                                 β”‚
β”‚   P0: [M1] β†’ [M3]                                               β”‚
β”‚   P1: [M2] β†’ [M4]                                               β”‚
β”‚                                                                 β”‚
β”‚   Consumer might see: M2, M1, M4, M3 (any interleaving)         β”‚
β”‚                                                                 β”‚
β”‚   ────────────────────────────────────────────────────────────  β”‚
β”‚                                                                 β”‚
β”‚   IMPLICATION: If you need order, use the same key!             β”‚
β”‚                                                                 β”‚
β”‚   // All orders for user123 go to same partition                β”‚
β”‚   kafkaTemplate.send("orders", "user123", orderEvent);          β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. How Many Partitions?

The formula (starting point):
Target Partitions = max(
    Throughput / Max Throughput per Partition,
    Consumer Count in Largest Consumer Group
)
Practical guidelines:
FactorGuidance
Throughput~10 MB/s or 10K msg/s per partition (varies by message size)
ConsumersAt least as many partitions as max consumers
Over-partitioningMore partitions = more overhead, but easier to scale later
Rule of thumbStart with # brokers Γ— 10 for high-throughput topics
Example calculation:
Requirements:
- Expected throughput: 100K messages/sec
- Message size: 1KB
- Max consumers in any group: 20

Calculations:
- Data throughput: 100K Γ— 1KB = 100 MB/s
- Partitions for throughput: 100 MB/s Γ· 10 MB/s = 10
- Partitions for consumers: 20

Result: max(10, 20) = 20 partitions minimum
Recommendation: 24-30 partitions (room to grow)
Why not just use 1000 partitions?
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              PARTITION COUNT TRADE-OFFS                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   TOO FEW PARTITIONS                TOO MANY PARTITIONS         β”‚
β”‚   ─────────────────                 ──────────────────          β”‚
β”‚   β€’ Limited parallelism             β€’ More file handles         β”‚
β”‚   β€’ Can't add more consumers        β€’ More memory per broker    β”‚
β”‚   β€’ Hot partitions more likely      β€’ Longer leader election    β”‚
β”‚   β€’ Hard to scale later             β€’ Slower rebalancing        β”‚
β”‚                                     β€’ More ZK/KRaft metadata    β”‚
β”‚                                                                 β”‚
β”‚   SWEET SPOT                                                    β”‚
β”‚   ──────────                                                    β”‚
β”‚   β€’ 2-3x expected consumer count                                β”‚
β”‚   β€’ Allows for growth                                           β”‚
β”‚   β€’ Manageable overhead                                         β”‚
β”‚   β€’ Typically: dozens to low hundreds per topic                 β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Can you change partition count later?
BASH(10 lines)
Code
Loading syntax highlighter...

3. Replication Deep Dive

Replication factor determines how many copies of each partition exist:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    REPLICATION MECHANICS                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Topic: payments, Partition 0, RF=3                            β”‚
β”‚                                                                 β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  Broker 1 (Leader)                                      β”‚   β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚   β”‚
β”‚   β”‚  β”‚ Offset: 0    1    2    3    4    5    6    7        β”‚β”‚   β”‚
β”‚   β”‚  β”‚         [M0] [M1] [M2] [M3] [M4] [M5] [M6] [M7]     β”‚β”‚   β”‚
β”‚   β”‚  β”‚                                              ↑      β”‚β”‚   β”‚
β”‚   β”‚  β”‚                               High Watermark (HW)   β”‚β”‚   β”‚
β”‚   β”‚  β”‚                               (committed, safe)     β”‚β”‚   β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚          β”‚                                                      β”‚
β”‚          β”‚ Fetch requests                                       β”‚
β”‚          β–Ό                                                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  Broker 2 (Follower, ISR)                               β”‚   β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚   β”‚
β”‚   β”‚  β”‚ Offset: 0    1    2    3    4    5    6    7        β”‚β”‚   β”‚
β”‚   β”‚  β”‚         [M0] [M1] [M2] [M3] [M4] [M5] [M6] [M7]     β”‚β”‚   β”‚
β”‚   β”‚  β”‚                                              ↑      β”‚β”‚   β”‚
β”‚   β”‚  β”‚                                     Caught up!      β”‚β”‚   β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚          β”‚                                                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  Broker 3 (Follower, ISR)                               β”‚   β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚   β”‚
β”‚   β”‚  β”‚ Offset: 0    1    2    3    4    5    6             β”‚β”‚   β”‚
β”‚   β”‚  β”‚         [M0] [M1] [M2] [M3] [M4] [M5] [M6]          β”‚β”‚   β”‚
β”‚   β”‚  β”‚                                        ↑            β”‚β”‚   β”‚
β”‚   β”‚  β”‚                             1 message behind        β”‚β”‚   β”‚
β”‚   β”‚  β”‚                             (still in ISR)          β”‚β”‚   β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚   ISR (In-Sync Replicas) = {Broker1, Broker2, Broker3}          β”‚
β”‚   All replicas within replica.lag.time.max.ms are "in sync"     β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Replication factor recommendations:
EnvironmentRFReason
Development1No redundancy needed
Production (standard)3Survives 2 broker failures
Critical data3-5Extra safety margin
Key configuration:
PROPERTIES(8 lines)
Code
Loading syntax highlighter...

4. Rack Awareness

Without rack awareness, all replicas might be on the same rack:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    WITHOUT RACK AWARENESS                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Rack A                          Rack B                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚   β”‚  Broker 1               β”‚    β”‚  Broker 3               β”‚    β”‚
β”‚   β”‚  β”œβ”€β”€ P0 (Leader)        β”‚    β”‚  (no replicas)          β”‚    β”‚
β”‚   β”‚  └── P0 (Follower)      β”‚    β”‚                         β”‚    β”‚
β”‚   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚
β”‚   β”‚  Broker 2               β”‚    β”‚  Broker 4               β”‚    β”‚
β”‚   β”‚  └── P0 (Follower)      β”‚    β”‚  (no replicas)          β”‚    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                 β”‚
β”‚   ⚠️ If Rack A loses power, ALL replicas of P0 are lost!        β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

With rack awareness:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    WITH RACK AWARENESS                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Rack A                          Rack B                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚   β”‚  Broker 1               β”‚    β”‚  Broker 3               β”‚    β”‚
β”‚   β”‚  └── P0 (Leader)        β”‚    β”‚  └── P0 (Follower)      β”‚    β”‚
β”‚   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚
β”‚   β”‚  Broker 2               β”‚    β”‚  Broker 4               β”‚    β”‚
β”‚   β”‚  └── P0 (Follower)      β”‚    β”‚  (balanced elsewhere)   β”‚    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                 β”‚
β”‚   βœ“ Replicas spread across racks                                β”‚
β”‚   βœ“ Rack A failure: P0 still available from Rack B              β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Configuration:
PROPERTIES(5 lines)
Code
Loading syntax highlighter...
Creating rack-aware topics:
BASH(6 lines)
Code
Loading syntax highlighter...

5. Partition Reassignment

When you add brokers or rebalance load, you need to move partitions:

BASH(28 lines)
Code
Loading syntax highlighter...
Throttling reassignment (crucial for production):
BASH(13 lines)
Code
Loading syntax highlighter...

6. Spring Kafka: Partition Configuration

Creating topics programmatically:
JAVA(20 lines)
Code
Loading syntax highlighter...
Custom partitioner:
JAVA(42 lines)
Code
Loading syntax highlighter...
Sending to specific partition:
JAVA(20 lines)
Code
Loading syntax highlighter...

7. Diagnosing Partition Issues

Check partition distribution:
BASH(13 lines)
Code
Loading syntax highlighter...
Check consumer lag per partition:
BASH(8 lines)
Code
Loading syntax highlighter...
Diagnose hot partition:
JAVA(25 lines)
Code
Loading syntax highlighter...

⚠️ Common Mistakes

Mistake 1: Using High-Cardinality Keys Incorrectly

JAVA(12 lines)
Code
Loading syntax highlighter...

Mistake 2: Wrong Partition Count

JAVA(14 lines)
Code
Loading syntax highlighter...

Mistake 3: Replication Factor = 1 in Production

PROPERTIES(8 lines)
Code
Loading syntax highlighter...

Mistake 4: Increasing Partitions Without Understanding Impact

BASH(8 lines)
Code
Loading syntax highlighter...

Mistake 5: Not Throttling Reassignment

BASH(9 lines)
Code
Loading syntax highlighter...

πŸ› Debug This

You've deployed 6 consumers for a topic with 12 partitions. Expected: 2 partitions each. Actual:

Consumer 1: P0, P1, P2, P3, P4, P5
Consumer 2: P6, P7, P8, P9, P10, P11
Consumer 3: (none)
Consumer 4: (none)
Consumer 5: (none)
Consumer 6: (none)

Only 2 consumers are getting partitions. What's wrong?

Click to reveal analysis
Possible causes:
  1. Different group IDs: Consumers might be in different groups
    BASH(3 lines)
    Code
    Loading syntax highlighter...
  2. Static membership mismatch: If using static membership, instance IDs might conflict
    JAVA(2 lines)
    Code
    Loading syntax highlighter...
  3. Partition assignment strategy: RangeAssignor can cause uneven distribution
    JAVA(7 lines)
    Code
    Loading syntax highlighter...
  4. Consumer not fully started: Some consumers might still be starting up
    BASH(3 lines)
    Code
    Loading syntax highlighter...
Most likely: Different group.id values. Verify all consumers use the exact same group ID:
JAVA(2 lines)
Code
Loading syntax highlighter...

πŸ’» Exercises

Exercise 1: Partition Distribution Analysis

Create a topic with 10 partitions. Send 10,000 messages with various key patterns and analyze distribution:

JAVA(6 lines)
Code
Loading syntax highlighter...

Exercise 2: Hot Partition Simulation

Create a scenario with a hot partition and measure impact:

JAVA(4 lines)
Code
Loading syntax highlighter...

Exercise 3: Rack Awareness Setup

Configure a 4-broker local cluster with rack awareness:

BASH(3 lines)
Code
Loading syntax highlighter...

Exercise 4: Partition Reassignment

Practice safe reassignment:

BASH(5 lines)
Code
Loading syntax highlighter...

Exercise 5: Custom Partitioner

Implement a partitioner that:

  • Routes "priority" orders to partition 0
  • Routes orders by region (EMEA β†’ P1-3, APAC β†’ P4-6, AMER β†’ P7-9)
  • Falls back to hash for unknown regions

🎀 Interview Questions

Q1: How do you decide the number of partitions for a new topic?

Answer: Consider these factors:
  1. Consumer parallelism: Partitions β‰₯ max consumers in any consumer group
    If max consumers = 20, need at least 20 partitions
    
  2. Throughput requirements: ~10 MB/s per partition (varies)
    100 MB/s needed β†’ at least 10 partitions
    
  3. Ordering requirements: More partitions = less ordering
    If you need all events ordered, use 1 partition (limits throughput)
    
  4. Growth: Increasing partitions later breaks key-based ordering
    Better to over-provision initially: target Γ— 2
    
Formula:
partitions = max(
    target_throughput / throughput_per_partition,
    max_consumers
) Γ— growth_factor
Example: 50 MB/s, 10 consumers, 2x growth factor
max(5, 10) Γ— 2 = 20 partitions

Q2: What happens when you add partitions to an existing topic?

Answer: Adding partitions has these effects:
  1. New messages with existing keys may go to different partitions:
    Before: hash("user123") % 10 = 3
    After:  hash("user123") % 15 = 8
    

    Messages for the same key are now split across partitions.

  2. Ordering guarantee is broken for existing keys: New messages for "user123" go to partition 8, but old messages are still in partition 3.
  3. Consumers need rebalance: Consumer group will rebalance to assign new partitions.
  4. Existing data stays in place: Old messages don't move.
When it's safe to add partitions:
  • Topic doesn't use key-based partitioning
  • Application doesn't require ordering
  • You can tolerate a period of unordered processing
Alternative: Create new topic, migrate consumers, re-key if needed.

Q3: Explain the relationship between partitions, replication factor, and ISR.

Answer:
Partitions: Independent ordered logs within a topic. Provide parallelism.
Replication Factor (RF): How many copies of each partition exist.
RF=3 means each partition has 3 replicas (1 leader + 2 followers)
ISR (In-Sync Replicas): Subset of replicas that are caught up with the leader.
If RF=3 and one follower is slow: ISR={leader, follower1} (size 2)
How they interact:
min.insync.replicas=2, acks=all:
- Producer waits for leader + 1 follower to acknowledge
- If ISR drops to 1, producers get NotEnoughReplicas error
- Data is durable on 2+ machines before ack
Example scenario:
RF=3, min.insync.replicas=2

Normal:     ISR={1,2,3}  β†’ Writes succeed (3 β‰₯ 2)
1 broker down: ISR={1,2} β†’ Writes succeed (2 β‰₯ 2)
2 brokers down: ISR={1}  β†’ Writes FAIL (1 < 2)

Q4: How does rack awareness improve fault tolerance?

Answer: Rack awareness ensures replicas are spread across failure domains:
Without rack awareness:
Rack A: [P0-leader, P0-follower, P0-follower]
Rack B: [empty]

Rack A power failure = ALL replicas lost = DATA LOSS
With rack awareness:
Rack A: [P0-leader, P0-follower]
Rack B: [P0-follower]

Rack A power failure = P0-follower in Rack B becomes leader = NO DATA LOSS
Configuration:
PROPERTIES(2 lines)
Code
Loading syntax highlighter...
Best practices:
  • RF should span at least 2 racks
  • For RF=3, use 3 racks if possible
  • Align with cloud availability zones

Q5: A consumer group has 6 consumers but some partitions are unassigned. What could be wrong?

Answer: Several possibilities:
  1. Partition count < consumer count:
    Topic has 4 partitions, 6 consumers
    Result: 2 consumers idle (partitions can't be shared)
    
  2. Different group IDs:
    JAVA(2 lines)
    Code
    Loading syntax highlighter...
  3. Static membership collision:
    JAVA(2 lines)
    Code
    Loading syntax highlighter...
  4. Assignment strategy issue: RangeAssignor with multiple topics can leave consumers idle.
  5. Consumer failed health check: Consumer might be considered dead.
Diagnosis:
BASH(5 lines)
Code
Loading syntax highlighter...
Most common cause: Misconfigured group.id (not actually the same group).

πŸ“ Summary & Key Takeaways

Partition Principles

  • Unit of parallelism, ordering, and storage
  • Same key β†’ same partition β†’ ordering guaranteed
  • More partitions = more consumers possible

Sizing Guidelines

partitions = max(throughput_needs, consumer_count) Γ— growth_factor

Replication

  • RF=3 for production
  • ISR tracks healthy replicas
  • min.insync.replicas defines minimum for writes

Key Gotchas

  • Adding partitions breaks key-based ordering
  • Hot keys create hot partitions
  • RF=1 means any broker failure loses data

πŸ“‹ Quick Reference

BASH(20 lines)
Code
Loading syntax highlighter...

πŸ“… Review Schedule

  • Day 1: Understand partition-key relationship
  • Day 3: Practice calculating partition counts
  • Day 7: Experiment with reassignment
  • Day 14: Review rack awareness
  • Day 30: Diagnose partition issues without notes

πŸ“š Series Navigation