Algorithms

Raft Consensus

๐Ÿ“‹ Quick Reference

PropertyValue
TypeDistributed consensus algorithm
PurposeReplicate state across cluster nodes
Fault ToleranceSurvives (n-1)/2 failures (n nodes)
Leader ElectionO(election timeout)
Log ReplicationO(heartbeat interval) per entry
Best ForDistributed databases, coordination services
One-liner: Understandable consensus algorithm where a leader replicates commands to followers, ensuring all nodes agree on the same sequence of operations.

๐ŸŽฎ Interactive Visualizer

Watch how Raft achieves consensus through leader election and log replication:

Loading visualizer...
Try these operations:
  1. Watch leader election (who gets elected?)
  2. Send a command - see log replication
  3. Kill the leader - observe re-election
  4. See how majority quorum ensures consistency

๐Ÿ”ง How It Works

Node States

Every node is in one of three states:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   timeout    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   wins election   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Follower โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€> โ”‚ Candidate โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€> โ”‚ Leader โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     ^                        โ”‚                              โ”‚
     โ”‚     higher term        โ”‚     discovers higher term    โ”‚
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Follower: Passive, responds to Leader/Candidate requests
Candidate: Actively seeking votes to become Leader
Leader: Handles all client requests, replicates to Followers

Term Concept

Term = logical clock for the cluster
- Monotonically increasing
- Each term has at most one leader
- Helps detect stale leaders

Term 1: Node A is leader
Term 2: Node A failed, Node B elected
Term 3: Node B failed, Node C elected

If node sees higher term โ†’ becomes follower

Leader Election

1. Follower doesn't hear from leader (election timeout)
2. Becomes Candidate, increments term
3. Votes for self, requests votes from others
4. Wins if receives majority votes
5. Becomes Leader, starts sending heartbeats

Vote Rules:
- Each node votes once per term
- Only vote if candidate's log is at least as up-to-date
- First come, first served

Log Replication

Client Request โ†’ Leader โ†’ Append to log โ†’ Replicate to Followers

Leader log: [1:xโ†3] [2:yโ†5] [3:xโ†7]
                      โ†“ replicate
Follower 1: [1:xโ†3] [2:yโ†5] [3:xโ†7]
Follower 2: [1:xโ†3] [2:yโ†5] [3:xโ†7]

When majority has entry โ†’ "committed" โ†’ apply to state machine

๐Ÿ“Š Timing and Guarantees

ParameterTypical ValuePurpose
Heartbeat interval50-150 msLeader liveness
Election timeout150-300 msDetect leader failure
Broadcast time< 50 msNetwork round trip

Safety Guarantees

1. Election Safety: At most one leader per term
2. Leader Append-Only: Leader never overwrites/deletes log entries
3. Log Matching: Same index+term โ†’ same command, same prefix
4. Leader Completeness: Committed entries appear in future leaders' logs
5. State Machine Safety: All nodes apply same commands in same order

Fault Tolerance

Cluster Size | Tolerated Failures | Quorum
     3       |         1          |   2
     5       |         2          |   3
     7       |         3          |   4

Formula: tolerates (n-1)/2 failures with n nodes

โœ… When to Use Raft

Good Use Cases

  • Distributed databases - etcd, CockroachDB, TiKV
  • Configuration management - Consul, Zookeeper alternative
  • Leader election - distributed locking, coordination
  • Replicated state machines - any service needing consistency
  • Log replication - event sourcing, audit logs

Avoid When

  • High throughput writes - consensus overhead per write
  • Geo-distributed - latency makes consensus slow
  • Eventual consistency OK - simpler alternatives exist
  • Single node sufficient - no need for distribution complexity

๐Ÿ”„ Raft vs Alternatives

FeatureRaftPaxosZAB (Zookeeper)
UnderstandabilityHighLowMedium
Leader-basedYesOptionalYes
Membership changeBuilt-inComplexBuilt-in
ImplementationsManyFewerZookeeper
PerformanceGoodGoodGood

Why Raft Over Paxos?

Paxos: Proven but hard to understand and implement correctly
Raft: Designed for understandability with same guarantees

Raft innovations:
- Strong leader (simplifies replication)
- Randomized election timeouts (reduces split votes)
- Joint consensus for membership changes

๐Ÿงฉ Implementation Patterns

etcd (Go)

GO(13 lines)
Code
Loading syntax highlighter...

Java Raft Libraries

JAVA(18 lines)
Code
Loading syntax highlighter...

State Machine Pattern

JAVA(20 lines)
Code
Loading syntax highlighter...

โš ๏ธ Common Pitfalls

1. Wrong Cluster Size

JAVA(7 lines)
Code
Loading syntax highlighter...

2. Election Timeout Too Short

JAVA(6 lines)
Code
Loading syntax highlighter...

3. Not Handling Network Partitions

JAVA(9 lines)
Code
Loading syntax highlighter...

4. Ignoring Log Compaction

JAVA(7 lines)
Code
Loading syntax highlighter...

5. Stale Reads from Followers

JAVA(7 lines)
Code
Loading syntax highlighter...

๐ŸŽค Interview Tips

Q: What is Raft and why was it created?
"

Raft is a consensus algorithm for managing replicated logs. It was designed to be understandable (unlike Paxos) while providing the same guarantees. It ensures a cluster of nodes agrees on the same sequence of operations.

Q: Explain leader election in Raft.
"

When a follower doesn't hear from the leader (election timeout), it becomes a candidate, increments the term, and requests votes. A node votes if the candidate's log is at least as up-to-date and it hasn't voted in this term. First to get majority wins.

Q: How does Raft ensure safety during network partitions?
"

A leader needs majority to commit entries. In a partition, only the partition with majority can elect a leader and make progress. The minority partition's leader can't commit new entries and will step down when it sees a higher term.

Q: What is log matching property?
"

If two logs have an entry with the same index and term, then all preceding entries are identical. This is maintained by the leader checking the previous entry's term before appending new entries to a follower.

Q: How many node failures can a 5-node Raft cluster tolerate?
"

2 failures. Quorum is (5/2)+1 = 3 nodes. With 3 nodes alive, the cluster can still elect a leader and commit entries.


๐Ÿ“š Series Navigation


Visualizer: RaftVisualizer from @tomaszjarosz/react-visualizers