Raft Consensus

📋 Quick Reference

Property	Value
Type	Distributed consensus algorithm
Purpose	Replicate state across cluster nodes
Fault Tolerance	Survives (n-1)/2 failures (n nodes)
Leader Election	O(election timeout)
Log Replication	O(heartbeat interval) per entry
Best For	Distributed databases, coordination services

One-liner: Understandable consensus algorithm where a leader replicates commands to followers, ensuring all nodes agree on the same sequence of operations.

🎮 Interactive Visualizer

Watch how Raft achieves consensus through leader election and log replication:

Raft Consensus
DistributedConsensus
1 / 13
Initialize 5 nodes as followers in term 0
Used by:etcd (Kubernetes), CockroachDB, TiKV, Consul, HashiCorp Vault
vs Paxos:Same guarantees, but much easier to understand and implement
🗳️ Raft Consensus Protocol
Purpose: Replicated state machine consensus for distributed systems
Safety: At most one leader per term, committed entries never lost
Liveness: System makes progress if majority of nodes are alive
—
Node State Term Voted For Log Commit
N1 follower 0 - 0 entries -
N2 follower 0 - 0 entries -
N3 follower 0 - 0 entries -
N4 follower 0 - 0 entries -
N5 follower 0 - 0 entries -
Election Safety
At most one leader per term
Log Matching
Same index+term = same entry
Pseudocode
1# Raft Consensus Algorithm
2 
3# State: follower → candidate → leader
4 
5# On election timeout (follower):
6  state = candidate
7  term += 1
8  votedFor = self
9  send RequestVote to all
10 
11# On receiving votes (candidate):
12  if votes > N/2:
13    state = leader
14    send heartbeats
15 
16# Leader operations:
17  append entry to log
18  replicate to followers
19  if majority ack:
20    commit entry
Variables
nodes
=5
term
=0
Keyboard Shortcuts
P Play / Pause
[ Step back
] Step forward
R Reset
Speed
Follower
Follower
Candidate
Candidate
Leader
Leader
Vote message
Vote message
Heartbeat/AppendEntries
Heartbeat/AppendEntries
P · [ ] · R
Loading visualizer...

Node	State	Voted For	Log	Commit
N1	follower	-	0 entries	-
N2	follower	-	0 entries	-
N3	follower	-	0 entries	-
N4	follower	-	0 entries	-
N5	follower	-	0 entries	-

Try these operations:

Watch leader election (who gets elected?)
Send a command - see log replication
Kill the leader - observe re-election
See how majority quorum ensures consistency

🔧 How It Works

Node States

Every node is in one of three states:

┌─────────┐   timeout    ┌───────────┐   wins election   ┌────────┐
│ Follower │ ──────────> │ Candidate │ ────────────────> │ Leader │
└─────────┘              └───────────┘                   └────────┘
     ^                        │                              │
     │     higher term        │     discovers higher term    │
     └────────────────────────┴──────────────────────────────┘

Follower: Passive, responds to Leader/Candidate requests
Candidate: Actively seeking votes to become Leader
Leader: Handles all client requests, replicates to Followers

Term Concept

Term = logical clock for the cluster
- Monotonically increasing
- Each term has at most one leader
- Helps detect stale leaders

Term 1: Node A is leader
Term 2: Node A failed, Node B elected
Term 3: Node B failed, Node C elected

If node sees higher term → becomes follower

Leader Election

1. Follower doesn't hear from leader (election timeout)
2. Becomes Candidate, increments term
3. Votes for self, requests votes from others
4. Wins if receives majority votes
5. Becomes Leader, starts sending heartbeats

Vote Rules:
- Each node votes once per term
- Only vote if candidate's log is at least as up-to-date
- First come, first served

Log Replication

Client Request → Leader → Append to log → Replicate to Followers

Leader log: [1:x←3] [2:y←5] [3:x←7]
                      ↓ replicate
Follower 1: [1:x←3] [2:y←5] [3:x←7]
Follower 2: [1:x←3] [2:y←5] [3:x←7]

When majority has entry → "committed" → apply to state machine

📊 Timing and Guarantees

Parameter	Typical Value	Purpose
Heartbeat interval	50-150 ms	Leader liveness
Election timeout	150-300 ms	Detect leader failure
Broadcast time	< 50 ms	Network round trip

Safety Guarantees

1. Election Safety: At most one leader per term
2. Leader Append-Only: Leader never overwrites/deletes log entries
3. Log Matching: Same index+term → same command, same prefix
4. Leader Completeness: Committed entries appear in future leaders' logs
5. State Machine Safety: All nodes apply same commands in same order

Fault Tolerance

Cluster Size | Tolerated Failures | Quorum
     3       |         1          |   2
     5       |         2          |   3
     7       |         3          |   4

Formula: tolerates (n-1)/2 failures with n nodes

✅ When to Use Raft

Good Use Cases

Distributed databases - etcd, CockroachDB, TiKV
Configuration management - Consul, Zookeeper alternative
Leader election - distributed locking, coordination
Replicated state machines - any service needing consistency
Log replication - event sourcing, audit logs

Avoid When

High throughput writes - consensus overhead per write
Geo-distributed - latency makes consensus slow
Eventual consistency OK - simpler alternatives exist
Single node sufficient - no need for distribution complexity

🔄 Raft vs Alternatives

Feature	Raft	Paxos	ZAB (Zookeeper)
Understandability	High	Low	Medium
Leader-based	Yes	Optional	Yes
Membership change	Built-in	Complex	Built-in
Implementations	Many	Fewer	Zookeeper
Performance	Good	Good	Good

Why Raft Over Paxos?

Paxos: Proven but hard to understand and implement correctly
Raft: Designed for understandability with same guarantees

Raft innovations:
- Strong leader (simplifies replication)
- Randomized election timeouts (reduces split votes)
- Joint consensus for membership changes

🧩 Implementation Patterns

etcd (Go)

GO(13 lines)
Code
Loading syntax highlighter...

Java Raft Libraries

JAVA(18 lines)
Code
Loading syntax highlighter...

State Machine Pattern

JAVA(20 lines)
Code
Loading syntax highlighter...

⚠️ Common Pitfalls

1. Wrong Cluster Size

JAVA(7 lines)
Code
Loading syntax highlighter...

2. Election Timeout Too Short

JAVA(6 lines)
Code
Loading syntax highlighter...

3. Not Handling Network Partitions

JAVA(9 lines)
Code
Loading syntax highlighter...

4. Ignoring Log Compaction

JAVA(7 lines)
Code
Loading syntax highlighter...

5. Stale Reads from Followers

JAVA(7 lines)
Code
Loading syntax highlighter...

🎤 Interview Tips

Q: What is Raft and why was it created?

"

Raft is a consensus algorithm for managing replicated logs. It was designed to be understandable (unlike Paxos) while providing the same guarantees. It ensures a cluster of nodes agrees on the same sequence of operations.

Q: Explain leader election in Raft.

"

When a follower doesn't hear from the leader (election timeout), it becomes a candidate, increments the term, and requests votes. A node votes if the candidate's log is at least as up-to-date and it hasn't voted in this term. First to get majority wins.

Q: How does Raft ensure safety during network partitions?

"

A leader needs majority to commit entries. In a partition, only the partition with majority can elect a leader and make progress. The minority partition's leader can't commit new entries and will step down when it sees a higher term.

Q: What is log matching property?

"

If two logs have an entry with the same index and term, then all preceding entries are identical. This is maintained by the leader checking the previous entry's term before appending new entries to a follower.

Q: How many node failures can a 5-node Raft cluster tolerate?

"

2 failures. Quorum is (5/2)+1 = 3 nodes. With 3 nodes alive, the cluster can still elect a leader and commit entries.

Previous: Part 22: Consistent Hashing

Next: Part 24: ArrayList vs LinkedList

Visualizer: RaftVisualizer from @tomaszjarosz/react-visualizers

📋 Quick Reference

🎮 Interactive Visualizer

Raft Consensus

🔧 How It Works

Node States

Term Concept

Leader Election

Log Replication

📊 Timing and Guarantees

Safety Guarantees

Fault Tolerance

✅ When to Use Raft

Good Use Cases

Avoid When

🔄 Raft vs Alternatives

Why Raft Over Paxos?

🧩 Implementation Patterns

etcd (Go)

Java Raft Libraries

State Machine Pattern

⚠️ Common Pitfalls

1. Wrong Cluster Size

2. Election Timeout Too Short

3. Not Handling Network Partitions

4. Ignoring Log Compaction

5. Stale Reads from Followers

🎤 Interview Tips

📚 Series Navigation

Tags:

Raft Consensus