Raft Consensus
๐ Quick Reference
| Property | Value |
|---|---|
| Type | Distributed consensus algorithm |
| Purpose | Replicate state across cluster nodes |
| Fault Tolerance | Survives (n-1)/2 failures (n nodes) |
| Leader Election | O(election timeout) |
| Log Replication | O(heartbeat interval) per entry |
| Best For | Distributed databases, coordination services |
๐ฎ Interactive Visualizer
Watch how Raft achieves consensus through leader election and log replication:
Loading visualizer...
- Watch leader election (who gets elected?)
- Send a command - see log replication
- Kill the leader - observe re-election
- See how majority quorum ensures consistency
๐ง How It Works
Node States
Every node is in one of three states: โโโโโโโโโโโ timeout โโโโโโโโโโโโโ wins election โโโโโโโโโโ โ Follower โ โโโโโโโโโโ> โ Candidate โ โโโโโโโโโโโโโโโโ> โ Leader โ โโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโ ^ โ โ โ higher term โ discovers higher term โ โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Follower: Passive, responds to Leader/Candidate requests Candidate: Actively seeking votes to become Leader Leader: Handles all client requests, replicates to Followers
Term Concept
Term = logical clock for the cluster - Monotonically increasing - Each term has at most one leader - Helps detect stale leaders Term 1: Node A is leader Term 2: Node A failed, Node B elected Term 3: Node B failed, Node C elected If node sees higher term โ becomes follower
Leader Election
1. Follower doesn't hear from leader (election timeout) 2. Becomes Candidate, increments term 3. Votes for self, requests votes from others 4. Wins if receives majority votes 5. Becomes Leader, starts sending heartbeats Vote Rules: - Each node votes once per term - Only vote if candidate's log is at least as up-to-date - First come, first served
Log Replication
Client Request โ Leader โ Append to log โ Replicate to Followers Leader log: [1:xโ3] [2:yโ5] [3:xโ7] โ replicate Follower 1: [1:xโ3] [2:yโ5] [3:xโ7] Follower 2: [1:xโ3] [2:yโ5] [3:xโ7] When majority has entry โ "committed" โ apply to state machine
๐ Timing and Guarantees
| Parameter | Typical Value | Purpose |
|---|---|---|
| Heartbeat interval | 50-150 ms | Leader liveness |
| Election timeout | 150-300 ms | Detect leader failure |
| Broadcast time | < 50 ms | Network round trip |
Safety Guarantees
1. Election Safety: At most one leader per term 2. Leader Append-Only: Leader never overwrites/deletes log entries 3. Log Matching: Same index+term โ same command, same prefix 4. Leader Completeness: Committed entries appear in future leaders' logs 5. State Machine Safety: All nodes apply same commands in same order
Fault Tolerance
Cluster Size | Tolerated Failures | Quorum 3 | 1 | 2 5 | 2 | 3 7 | 3 | 4 Formula: tolerates (n-1)/2 failures with n nodes
โ When to Use Raft
Good Use Cases
- Distributed databases - etcd, CockroachDB, TiKV
- Configuration management - Consul, Zookeeper alternative
- Leader election - distributed locking, coordination
- Replicated state machines - any service needing consistency
- Log replication - event sourcing, audit logs
Avoid When
- High throughput writes - consensus overhead per write
- Geo-distributed - latency makes consensus slow
- Eventual consistency OK - simpler alternatives exist
- Single node sufficient - no need for distribution complexity
๐ Raft vs Alternatives
| Feature | Raft | Paxos | ZAB (Zookeeper) |
|---|---|---|---|
| Understandability | High | Low | Medium |
| Leader-based | Yes | Optional | Yes |
| Membership change | Built-in | Complex | Built-in |
| Implementations | Many | Fewer | Zookeeper |
| Performance | Good | Good | Good |
Why Raft Over Paxos?
Paxos: Proven but hard to understand and implement correctly Raft: Designed for understandability with same guarantees Raft innovations: - Strong leader (simplifies replication) - Randomized election timeouts (reduces split votes) - Joint consensus for membership changes
๐งฉ Implementation Patterns
etcd (Go)
GO(13 lines)CodeLoading syntax highlighter...
Java Raft Libraries
JAVA(18 lines)CodeLoading syntax highlighter...
State Machine Pattern
JAVA(20 lines)CodeLoading syntax highlighter...
โ ๏ธ Common Pitfalls
1. Wrong Cluster Size
JAVA(7 lines)CodeLoading syntax highlighter...
2. Election Timeout Too Short
JAVA(6 lines)CodeLoading syntax highlighter...
3. Not Handling Network Partitions
JAVA(9 lines)CodeLoading syntax highlighter...
4. Ignoring Log Compaction
JAVA(7 lines)CodeLoading syntax highlighter...
5. Stale Reads from Followers
JAVA(7 lines)CodeLoading syntax highlighter...
๐ค Interview Tips
"Raft is a consensus algorithm for managing replicated logs. It was designed to be understandable (unlike Paxos) while providing the same guarantees. It ensures a cluster of nodes agrees on the same sequence of operations.
"When a follower doesn't hear from the leader (election timeout), it becomes a candidate, increments the term, and requests votes. A node votes if the candidate's log is at least as up-to-date and it hasn't voted in this term. First to get majority wins.
"A leader needs majority to commit entries. In a partition, only the partition with majority can elect a leader and make progress. The minority partition's leader can't commit new entries and will step down when it sees a higher term.
"If two logs have an entry with the same index and term, then all preceding entries are identical. This is maintained by the leader checking the previous entry's term before appending new entries to a follower.
"2 failures. Quorum is (5/2)+1 = 3 nodes. With 3 nodes alive, the cluster can still elect a leader and commit entries.
๐ Series Navigation
RaftVisualizer from @tomaszjarosz/react-visualizers