Algorithms

Consistent Hashing

📋 Quick Reference

PropertyValue
TypeDistributed hashing technique
Key LookupO(log n) with binary search on ring
Add ServerO(K/n) keys remapped (K=total keys, n=servers)
Remove ServerO(K/n) keys remapped
Best ForDistributed caches, load balancers, sharding
One-liner: Hash ring that minimizes key redistribution when servers are added or removed.

🎮 Interactive Visualizer

Watch how consistent hashing distributes keys and handles server changes:

Loading visualizer...
Try these operations:
  1. Add keys and see which server they map to
  2. Add a new server - observe only nearby keys move
  3. Remove a server - see keys redistribute to next server
  4. Compare with modulo hashing (all keys would move!)

🔧 How It Works

The Problem with Modulo Hashing

Traditional: server = hash(key) % num_servers

With 3 servers:
  key "user:1" → hash=7  → 7 % 3 = 1 → Server 1
  key "user:2" → hash=12 → 12 % 3 = 0 → Server 0

Add 4th server (now % 4):
  key "user:1" → 7 % 4 = 3 → Server 3  ← MOVED!
  key "user:2" → 12 % 4 = 0 → Server 0 ← stayed

Problem: ~75% of keys move when adding 1 server!

Consistent Hashing Solution

Hash Ring (0 to 2^32):
         0°
         |
    S1(30°)
        \
         --- S2(120°)
        /
   S3(240°)

Keys map to next server clockwise:
  hash("user:1") = 50°  → Server S2 (next after 50°)
  hash("user:2") = 200° → Server S3 (next after 200°)

Add server S4 at 180°:
  Only keys between 120° and 180° move to S4
  Other keys stay where they are!

Virtual Nodes

Problem: Uneven distribution with few servers
Solution: Each server gets multiple positions (virtual nodes)

Server A: positions at 30°, 150°, 270°
Server B: positions at 90°, 210°, 330°

Benefits:
- More even key distribution
- Smoother rebalancing when servers change
- Can weight servers (more vnodes = more keys)

📊 Key Redistribution Comparison

ScenarioModulo HashConsistent Hash
Add 1 server (3→4)~75% keys move~25% keys move
Add 1 server (10→11)~91% keys move~9% keys move
Remove 1 serverSimilar high %Only affected keys

Formula

Keys that move when adding server to n existing:
- Modulo hashing: ~(n/(n+1)) × 100% of keys
- Consistent hashing: ~(1/(n+1)) × 100% of keys

✅ When to Use Consistent Hashing

Good Use Cases

  • Distributed caches - Memcached, Redis Cluster
  • Load balancers - sticky sessions
  • Database sharding - distribute data across nodes
  • CDN routing - route requests to nearest server
  • P2P networks - DHT (Distributed Hash Tables)

Avoid When

  • Single server - no distribution needed
  • Perfect balance required - may need other algorithms
  • Data locality matters - geographic sharding may be better
  • Simple round-robin suffices - adds unnecessary complexity

🔄 Consistent Hashing Variants

VariantFeatureUsed By
Basic RingSimple clockwise lookupEarly Dynamo
Virtual NodesBetter distributionCassandra, Riak
Jump HashNo ring, O(1) memoryGoogle
Rendezvous HashHighest random weightMany systems
MaglevMinimal disruption, fastGoogle LB

🧩 Implementation Patterns

Java Implementation

JAVA(42 lines)
Code
Loading syntax highlighter...

Usage Example

JAVA(16 lines)
Code
Loading syntax highlighter...

Redis Cluster

Redis Cluster uses hash slots (not consistent hashing ring):
- 16384 hash slots
- Each key maps to a slot: CRC16(key) % 16384
- Each node owns a range of slots

Resharding moves slots between nodes
Similar benefit: only affected slots' keys move

Memcached Client

JAVA(8 lines)
Code
Loading syntax highlighter...

⚠️ Common Pitfalls

1. Too Few Virtual Nodes

JAVA(5 lines)
Code
Loading syntax highlighter...

2. Poor Hash Function

JAVA(7 lines)
Code
Loading syntax highlighter...

3. Not Handling Server Failure

JAVA(13 lines)
Code
Loading syntax highlighter...

4. Ignoring Hot Spots

JAVA(9 lines)
Code
Loading syntax highlighter...

🎤 Interview Tips

Q: What problem does consistent hashing solve?
"

When servers are added/removed in a distributed system, modulo hashing remaps most keys. Consistent hashing minimizes this: only K/n keys move (K=total keys, n=servers) instead of nearly all keys.

Q: How does the hash ring work?
"

Both servers and keys are hashed to positions on a ring (0 to 2^32). A key is assigned to the first server found clockwise from its position. Adding a server only affects keys between it and the previous server.

Q: What are virtual nodes and why use them?
"

Virtual nodes are multiple positions on the ring for each physical server. They improve load distribution (a server with more vnodes handles more keys) and smoother rebalancing. Typically 100-200 vnodes per server.

Q: Name systems that use consistent hashing.
"

Amazon DynamoDB, Apache Cassandra, Memcached (Ketama), Discord, Akamai CDN. Redis Cluster uses a similar concept with hash slots instead of a ring.

Q: How do you handle server failures?
"

Replicate keys to multiple servers (next N servers on the ring). When one fails, requests go to the next replica. This is how DynamoDB and Cassandra achieve high availability.


📚 Series Navigation

Previous: Part 21: B-Tree

Visualizer: ConsistentHashingVisualizer from @tomaszjarosz/react-visualizers