Devops

Leaders, ISR & Fault Tolerance

πŸ“‹ At a Glance

AspectDetails
Difficulty🟠 Advanced
PrerequisitesPart 2 (Partitions & Replication)
Key ConceptsLeader election, ISR, HW, LEO, min.insync.replicas
Time Investment32 minutes read + 45 minutes practice
PayoffUnderstand exactly when data can be lost and how to prevent it

🎯 What You'll Learn

After this article, you'll be able to:

  1. Explain leader election and what triggers it
  2. Understand ISR dynamics and replica.lag.time.max.ms
  3. Configure min.insync.replicas correctly
  4. Prevent data loss scenarios with proper acks settings
  5. Diagnose under-replicated partitions and ISR shrinkage

πŸ”₯ Production Story: The Unclean Election

The Setup: A financial services company ran a 5-broker Kafka cluster for trade events. Configuration: RF=3, min.insync.replicas=2, acks=all. They were confident no data could be lost.
The Incident: During a network partition, two brokers became isolated. The remaining three brokers elected new leaders.
The Symptoms:
Alert: 15,000 trade events missing!
Time window: 14:32:15 - 14:33:45 UTC
The Investigation:
BASH(4 lines)
Code
Loading syntax highlighter...
The Root Cause: They had unclean.leader.election.enable=true (an old default). When the network partition occurred:
BEFORE PARTITION:
Broker 1 (Leader, ISR): offset 1,000,000
Broker 2 (ISR):         offset 1,000,000
Broker 3 (ISR):         offset 999,985  (slightly behind, still in ISR)

NETWORK PARTITION (Brokers 1,2 isolated):
Remaining: Broker 3,4,5
Broker 3 had offset 999,985

UNCLEAN ELECTION:
Broker 3 elected leader (only available replica)
New leader offset: 999,985
Messages 999,986 - 1,000,000: LOST FOREVER

PARTITION HEALS:
Brokers 1,2 rejoin
They truncate to match new leader
15,000 messages gone
The Fix:
PROPERTIES(2 lines)
Code
Loading syntax highlighter...
Lesson Learned: With unclean leader election, Kafka prioritizes availability over consistency. For financial data, this is unacceptable. Disable it.

🧠 Mental Model: Leader, Followers, and ISR

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LEADER, FOLLOWERS & ISR                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Partition 0, RF=3                                             β”‚
β”‚                                                                 β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  LEADER (Broker 1)                                      β”‚   β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚   β”‚
β”‚   β”‚  β”‚ Offset: ... 95   96   97   98   99   100  101  102  β”‚β”‚   β”‚
β”‚   β”‚  β”‚              [M] [M] [M] [M] [M] [M] [M] [M]        β”‚β”‚   β”‚
β”‚   β”‚  β”‚                                      ↑              β”‚β”‚   β”‚
β”‚   β”‚  β”‚                            LEO (Log End Offset)     β”‚β”‚   β”‚
β”‚   β”‚  β”‚                                 = 102               β”‚β”‚   β”‚
β”‚   β”‚  β”‚                                                     β”‚β”‚   β”‚
β”‚   β”‚  β”‚                               ↑                     β”‚β”‚   β”‚
β”‚   β”‚  β”‚                    HW (High Watermark) = 100        β”‚β”‚   β”‚
β”‚   β”‚  β”‚                    "Committed" - safe to consume    β”‚β”‚   β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚         β”‚                                                       β”‚
β”‚         β”‚ Followers fetch from leader                           β”‚
β”‚         β–Ό                                                       β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  FOLLOWER (Broker 2) - IN ISR                           β”‚   β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚   β”‚
β”‚   β”‚  β”‚ Offset: ... 95   96   97   98   99   100            β”‚β”‚   β”‚
β”‚   β”‚  β”‚              [M] [M] [M] [M] [M] [M]                β”‚β”‚   β”‚
β”‚   β”‚  β”‚                                   ↑ LEO = 100       β”‚β”‚   β”‚
β”‚   β”‚  β”‚            βœ“ Within replica.lag.time.max.ms         β”‚β”‚   β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  FOLLOWER (Broker 3) - REMOVED FROM ISR                 β”‚   β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚   β”‚
β”‚   β”‚  β”‚ Offset: ... 95   96   97   98                       β”‚β”‚   β”‚
β”‚   β”‚  β”‚              [M] [M] [M] [M]                        β”‚β”‚   β”‚
β”‚   β”‚  β”‚                           ↑ LEO = 98                β”‚β”‚   β”‚
β”‚   β”‚  β”‚            βœ— Too far behind (lag > threshold)       β”‚β”‚   β”‚
β”‚   β”‚  β”‚            βœ— Removed from ISR                       β”‚β”‚   β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚   ISR = {Broker 1, Broker 2}                                    β”‚
β”‚                                                                 β”‚
β”‚   KEY CONCEPTS:                                                 β”‚
β”‚   β€’ LEO: Last offset written (leader's view)                    β”‚
β”‚   β€’ HW: Last offset replicated to ALL ISR members               β”‚
β”‚   β€’ Consumers can only read up to HW                            β”‚
β”‚   β€’ Messages between HW and LEO: "uncommitted"                  β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”¬ Deep Dive

1. Leader Election

Every partition has exactly one leader. Leaders handle all reads and writes.

When does leader election happen?
  1. Broker failure: Leader goes down, new leader elected from ISR
  2. Controlled shutdown: Leader migrates before broker stops
  3. Rebalance: Admin triggers preferred leader election
  4. Unclean election: No ISR available, elect from non-ISR (if enabled)
Election process:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LEADER ELECTION FLOW                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   1. LEADER FAILURE DETECTED                                    β”‚
β”‚      Controller notices leader is unresponsive                  β”‚
β”‚      (via ZK session timeout or KRaft heartbeat)                β”‚
β”‚                                                                 β”‚
β”‚   2. SELECT NEW LEADER                                          β”‚
β”‚      Controller picks first replica from ISR                    β”‚
β”‚      (Ordering: prefer existing ISR, then by replica ID)        β”‚
β”‚                                                                 β”‚
β”‚   3. UPDATE METADATA                                            β”‚
β”‚      Controller updates cluster metadata                        β”‚
β”‚      New leader is authoritative                                β”‚
β”‚                                                                 β”‚
β”‚   4. NOTIFY CLIENTS                                             β”‚
β”‚      Producers/Consumers get new metadata                       β”‚
β”‚      Requests route to new leader                               β”‚
β”‚                                                                 β”‚
β”‚   Timeline: Typically < 1 second                                β”‚
β”‚   During election: Partition unavailable for writes             β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Preferred leader election:
BASH(15 lines)
Code
Loading syntax highlighter...

2. ISR (In-Sync Replicas) Deep Dive

ISR is the set of replicas that are "caught up" with the leader.

What determines "in sync"?
PROPERTIES(6 lines)
Code
Loading syntax highlighter...
ISR dynamics:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ISR SHRINKING AND EXPANDING                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   TIME 0: Normal operation                                      β”‚
β”‚   ISR = {Leader, Follower1, Follower2}                          β”‚
β”‚                                                                 β”‚
β”‚   TIME 1: Follower2 becomes slow (network issue)                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  Leader:     offset 1000                                β”‚   β”‚
β”‚   β”‚  Follower1:  offset 998   (within lag threshold) βœ“      β”‚   β”‚
β”‚   β”‚  Follower2:  offset 850   (hasn't fetched in 35s) βœ—     β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚   Controller removes Follower2 from ISR                         β”‚
β”‚   ISR = {Leader, Follower1}                                     β”‚
β”‚                                                                 β”‚
β”‚   TIME 2: Follower2 recovers, starts catching up                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚  Leader:     offset 1200                                 β”‚  β”‚
β”‚   β”‚  Follower1:  offset 1198                                 β”‚  β”‚
β”‚   β”‚  Follower2:  offset 1195  (catching up, fetched recently)β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚   Follower2 added back to ISR                                   β”‚
β”‚   ISR = {Leader, Follower1, Follower2}                          β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Monitoring ISR:
BASH(11 lines)
Code
Loading syntax highlighter...

3. High Watermark (HW) and Log End Offset (LEO)

These are crucial for understanding data visibility:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    HW vs LEO EXPLAINED                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   LEADER:                                                       β”‚
β”‚   [0][1][2][3][4][5][6][7][8][9][10][11][12]                    β”‚
β”‚                             ↑          ↑                        β”‚
β”‚                            HW=9       LEO=12                    β”‚
β”‚                                                                 β”‚
β”‚   FOLLOWER 1 (ISR):                                             β”‚
β”‚   [0][1][2][3][4][5][6][7][8][9][10]                            β”‚
β”‚                             ↑     ↑                             β”‚
β”‚                            HW=9  LEO=10                         β”‚
β”‚                                                                 β”‚
β”‚   FOLLOWER 2 (ISR):                                             β”‚
β”‚   [0][1][2][3][4][5][6][7][8][9]                                β”‚
β”‚                             ↑ ↑                                 β”‚
β”‚                           HW=LEO=9                              β”‚
β”‚                                                                 β”‚
β”‚   HW = minimum LEO across all ISR replicas                      β”‚
β”‚   HW = 9 (because Follower2's LEO is 9)                         β”‚
β”‚                                                                 β”‚
β”‚   VISIBILITY:                                                   β”‚
β”‚   β€’ Consumers can read: [0] to [9] (up to HW)                   β”‚
β”‚   β€’ Messages [10-12]: written but not yet "committed"           β”‚
β”‚   β€’ If leader fails before [10-12] replicate: DATA LOST         β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Why this matters for acks:
JAVA(11 lines)
Code
Loading syntax highlighter...

4. min.insync.replicas

This is your safety net:

PROPERTIES(6 lines)
Code
Loading syntax highlighter...
How it works:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            min.insync.replicas BEHAVIOR                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Config: RF=3, min.insync.replicas=2, acks=all                 β”‚
β”‚                                                                 β”‚
β”‚   SCENARIO 1: All brokers healthy                               β”‚
β”‚   ISR = {B1, B2, B3}  (size = 3)                                β”‚
β”‚   3 >= 2 βœ“  Writes succeed                                      β”‚
β”‚                                                                 β”‚
β”‚   SCENARIO 2: One follower slow/down                            β”‚
β”‚   ISR = {B1, B2}  (size = 2)                                    β”‚
β”‚   2 >= 2 βœ“  Writes succeed                                      β”‚
β”‚                                                                 β”‚
β”‚   SCENARIO 3: Two followers down                                β”‚
β”‚   ISR = {B1}  (size = 1)                                        β”‚
β”‚   1 < 2 βœ—  Writes FAIL with NotEnoughReplicasException          β”‚
β”‚                                                                 β”‚
β”‚   This protects you:                                            β”‚
β”‚   β€’ Can't write to single replica that might fail               β”‚
β”‚   β€’ Guarantees data on at least 2 machines                      β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Recommended settings:
RFmin.insync.replicasFault Tolerance
32Survives 1 broker failure
53Survives 2 broker failures
Setting in Spring Kafka:
JAVA(8 lines)
Code
Loading syntax highlighter...

5. Unclean Leader Election

The most dangerous configuration in Kafka:

PROPERTIES(6 lines)
Code
Loading syntax highlighter...
When unclean election happens:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                UNCLEAN LEADER ELECTION                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   INITIAL STATE:                                                β”‚
β”‚   Broker 1 (Leader):  offset 1000                               β”‚
β”‚   Broker 2 (ISR):     offset 1000                               β”‚
β”‚   Broker 3 (not ISR): offset 950  (lagging)                     β”‚
β”‚                                                                 β”‚
β”‚   DISASTER: Brokers 1 and 2 fail simultaneously                 β”‚
β”‚                                                                 β”‚
β”‚   IF unclean.leader.election.enable=false:                      β”‚
β”‚   β€’ Partition becomes unavailable                               β”‚
β”‚   β€’ Writes fail until B1 or B2 recover                          β”‚
β”‚   β€’ No data loss                                                β”‚
β”‚                                                                 β”‚
β”‚   IF unclean.leader.election.enable=true:                       β”‚
β”‚   β€’ Broker 3 elected as leader                                  β”‚
β”‚   β€’ New leader at offset 950                                    β”‚
β”‚   β€’ Messages 951-1000: GONE FOREVER                             β”‚
β”‚   β€’ When B1/B2 recover, they truncate to 950                    β”‚
β”‚   β€’ Partition available, but data lost                          β”‚
β”‚                                                                 β”‚
β”‚   RECOMMENDATION: Always set to false for important data        β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6. The Write Path: End-to-End

Understanding how writes flow helps diagnose issues:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    WRITE PATH (acks=all)                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   1. PRODUCER SENDS                                             β”‚
β”‚      Producer β†’ Leader (Broker 1)                               β”‚
β”‚      Message appended to leader's log                           β”‚
β”‚      Leader LEO: 100 β†’ 101                                      β”‚
β”‚                                                                 β”‚
β”‚   2. FOLLOWERS FETCH                                            β”‚
β”‚      Followers poll leader for new data                         β”‚
β”‚      Broker 2 fetches, LEO: 100 β†’ 101                           β”‚
β”‚      Broker 3 fetches, LEO: 100 β†’ 101                           β”‚
β”‚                                                                 β”‚
β”‚   3. ACKNOWLEDGE REPLICATION                                    β”‚
β”‚      Followers send fetch response with their LEO               β”‚
β”‚      Leader updates HW when all ISR caught up                   β”‚
β”‚      HW: 100 β†’ 101                                              β”‚
β”‚                                                                 β”‚
β”‚   4. LEADER ACKNOWLEDGES PRODUCER                               β”‚
β”‚      Leader sends ack to producer                               β”‚
β”‚      Producer marks message as sent                             β”‚
β”‚                                                                 β”‚
β”‚   5. FOLLOWERS UPDATE HW                                        β”‚
β”‚      Next fetch request, followers learn new HW                 β”‚
β”‚      Followers update local HW: 100 β†’ 101                       β”‚
β”‚                                                                 β”‚
β”‚   TIMING:                                                       β”‚
β”‚   Steps 1-4: ~5-10ms (same datacenter)                          β”‚
β”‚   Bottleneck: Slowest ISR replica                               β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

7. Failure Scenarios and Recovery

Scenario 1: Leader fails, ISR available
BEFORE:
  Broker 1 (Leader, ISR): offset 1000
  Broker 2 (ISR):         offset 1000
  Broker 3 (ISR):         offset 998

BROKER 1 FAILS:
  Controller detects failure
  New leader: Broker 2 (first in ISR)
  Broker 3 catches up to 1000

RESULT:
  No data loss
  ~1 second unavailability
  Broker 3 now at offset 1000
Scenario 2: Leader fails with uncommitted data
BEFORE:
  Broker 1 (Leader): LEO=1005, HW=1000
  Broker 2 (ISR):    LEO=1000
  Broker 3 (ISR):    LEO=1000

  Messages 1001-1005: In leader only, not yet replicated

BROKER 1 FAILS:
  New leader: Broker 2
  Leader LEO becomes 1000
  HW remains 1000

BROKER 1 RECOVERS:
  Broker 1 truncates to HW=1000
  Messages 1001-1005: LOST

LESSON:
  With acks=1, this is possible
  With acks=all, messages 1001-1005 wouldn't be acked
Spring Kafka producer configuration for durability:
JAVA(23 lines)
Code
Loading syntax highlighter...

⚠️ Common Mistakes

Mistake 1: Using acks=1 for Critical Data

JAVA(6 lines)
Code
Loading syntax highlighter...

Mistake 2: min.insync.replicas = RF

PROPERTIES(5 lines)
Code
Loading syntax highlighter...

Mistake 3: Ignoring Under-Replicated Partitions

BASH(6 lines)
Code
Loading syntax highlighter...

Mistake 4: Setting replica.lag.time.max.ms Too Low

PROPERTIES(8 lines)
Code
Loading syntax highlighter...

Mistake 5: Enabling Unclean Leader Election

PROPERTIES(5 lines)
Code
Loading syntax highlighter...

πŸ› Debug This

You see this error in producer logs:

org.apache.kafka.common.errors.NotEnoughReplicasException:
Messages are rejected since there are fewer in-sync replicas than required.

Your configuration:

  • RF = 3
  • min.insync.replicas = 2
  • acks = all

All three brokers are running. What's wrong?

Click to reveal analysis
The error means: ISR size < min.insync.replicas for the partition you're writing to.
Investigation steps:
BASH(8 lines)
Code
Loading syntax highlighter...
Possible causes:
  1. Followers can't keep up: High produce rate, followers falling behind
    BASH(2 lines)
    Code
    Loading syntax highlighter...
  2. Network issues: Followers can't reach leader
    BASH(2 lines)
    Code
    Loading syntax highlighter...
  3. Disk issues: Followers' disks are slow
    BASH(2 lines)
    Code
    Loading syntax highlighter...
  4. GC pauses: Long GC pauses cause followers to be removed
    BASH(2 lines)
    Code
    Loading syntax highlighter...
Quick fix:
BASH(4 lines)
Code
Loading syntax highlighter...
Real fix: Identify why followers are out of ISR and fix the root cause.

πŸ’» Exercises

Exercise 1: ISR Observation

BASH(7 lines)
Code
Loading syntax highlighter...

Exercise 2: min.insync.replicas Testing

JAVA(5 lines)
Code
Loading syntax highlighter...

Exercise 3: Unclean Election Simulation

BASH(6 lines)
Code
Loading syntax highlighter...

Exercise 4: High Watermark Monitoring

BASH(4 lines)
Code
Loading syntax highlighter...

Exercise 5: Preferred Leader Election

BASH(5 lines)
Code
Loading syntax highlighter...

🎀 Interview Questions

Q1: What is the ISR and why is it important?

Answer: ISR (In-Sync Replicas) is the set of replicas that are caught up with the leader within replica.lag.time.max.ms.
Importance:
  1. Leader election: Only ISR members can become leader (with unclean election disabled). This ensures no data loss.
  2. Write durability: With acks=all, writes are only acknowledged when replicated to all ISR members.
  3. High watermark: HW advances only when all ISR replicas have the data. Consumers can only read up to HW.
ISR dynamics:
Replica falls behind > replica.lag.time.max.ms β†’ Removed from ISR
Replica catches up and fetches within threshold β†’ Added back to ISR
Monitoring: Under-replicated partitions (ISR < RF) indicate potential durability risk.

Q2: Explain the relationship between acks, min.insync.replicas, and data durability.

Answer: These settings work together to determine durability guarantees:
acks=0: No durability guarantee
        Producer doesn't wait for any acknowledgment
        Message may be lost before reaching broker

acks=1: Leader durability only
        Leader acknowledges before replication
        If leader fails immediately after ack: DATA LOSS

acks=all + min.insync.replicas=1:
        At least leader must acknowledge
        Same as acks=1 in practice

acks=all + min.insync.replicas=2:
        At least 2 replicas must have the data
        Survives 1 broker failure without data loss

acks=all + min.insync.replicas=N:
        At least N replicas must have the data
        If ISR < N, writes fail (availability trade-off)
Recommendation for critical data:
PROPERTIES(3 lines)
Code
Loading syntax highlighter...

This survives 1 broker failure without data loss or unavailability.


Q3: What is unclean leader election and when might you enable it?

Answer: Unclean leader election allows electing a leader from replicas that are NOT in the ISRβ€”meaning replicas that may be behind.
Risk: Data loss. The new leader may be missing messages that were acknowledged.
When to enable (rare):
  • Availability is more important than consistency
  • Log data that can be regenerated
  • You have other durability mechanisms (e.g., source system)
When to disable (default, recommended):
  • Financial transactions
  • Critical business data
  • Any data that cannot be recovered
The trade-off:
unclean.leader.election.enable=false:
  ISR exhausted β†’ Partition unavailable
  No data loss, but writes fail

unclean.leader.election.enable=true:
  ISR exhausted β†’ Non-ISR replica becomes leader
  Writes succeed, but may lose data

Q4: How does Kafka's high watermark (HW) work?

Answer: High watermark is the offset up to which consumers can read. It ensures consumers only see "committed" data.
Mechanics:
  1. Producer writes message to leader at offset N
  2. Leader's LEO (Log End Offset) advances to N
  3. Followers fetch and replicate message
  4. When ALL ISR replicas have the message, HW advances to N
  5. Consumers can now read offset N
Why it matters:
LEO = 100, HW = 95

Messages 96-100: Written but not fully replicated
If leader fails: Messages 96-100 may be lost
Consumers can't read them yet (can only read up to HW=95)

This is intentional: Consumers never see data that might be lost
Visibility guarantee: Consumers never see messages that could disappear on failure.

Q5: A partition has ISR={Leader}. What risks does this pose and how would you address it?

Answer: ISR size of 1 means no redundancyβ€”critical risk.
Risks:
  1. Data loss: If leader fails, data since last ISR sync is lost
  2. With min.insync.replicas=2: Writes fail (safer)
  3. With min.insync.replicas=1: Writes succeed but unprotected
Investigation:
BASH(8 lines)
Code
Loading syntax highlighter...
Remediation:
  1. Immediate: Fix follower issues so they rejoin ISR
  2. If broker dead: Reassign partition to healthy brokers
  3. Monitoring: Alert on ISR < RF before it becomes critical
Prevention:
PROPERTIES(5 lines)
Code
Loading syntax highlighter...

πŸ“ Summary & Key Takeaways

Key Concepts

ConceptDefinition
LeaderHandles all reads/writes for a partition
ISRReplicas caught up with leader
LEOLog End Offset - last message in log
HWHigh Watermark - last committed offset

Durability Configuration

PROPERTIES(8 lines)
Code
Loading syntax highlighter...

Failure Behavior

ScenarioClean ElectionUnclean Election
Leader fails, ISR availableNew leader from ISR, no data lossSame
All ISR failsPartition unavailableData loss possible

πŸ“‹ Quick Reference

BASH(13 lines)
Code
Loading syntax highlighter...

πŸ“… Review Schedule

  • Day 1: Understand ISR and HW concepts
  • Day 3: Practice ISR monitoring
  • Day 7: Test failure scenarios
  • Day 14: Review acks/min.insync.replicas interaction
  • Day 30: Explain durability guarantees without notes

πŸ“š Series Navigation