Explain Consistent Hashing

Question

Accepted Answer

Consistent Hashing is a distributed hashing scheme designed to solve the scalability issues inherent in standard modulo hashing. In traditional approaches, if we have N nodes, a key hashes to i = hash(key) % N. If we add a new node, making it N+1, almost every single key must be recalculated and moved to a different server, causing significant downtime and network overhead.

Consistent Hashing maps both keys and servers onto a circular space, typically using SHA-1 or MD5. Each server is assigned one or more positions on this ring. To store a key, we hash it and place it on the first server encountered moving clockwise around the ring. This structure ensures that when a new node joins the cluster, it only claims the keys that fall between itself and its immediate predecessor on the ring. The rest of the system remains completely unaffected. Conversely, when a node fails, only its specific segment of keys moves to the next available neighbor.

To prevent hotspots where some nodes hold significantly more data than others due to random hash collisions, we implement virtual nodes. By assigning multiple points to a single physical server on the ring, we achieve a much more uniform distribution of data. This approach is critical for systems like Stripe's payment infrastructure, where minimizing data movement during auto-scaling events ensures consistent low-latency transaction processing without service interruptions.

Explain Consistent Hashing

Why Interviewers Ask This

How to Answer This Question

Key Points to Cover

Sample Answer

Common Mistakes to Avoid

Sound confident on this question in 5 minutes

Related Interview Questions

Design a CDN Edge Caching Strategy

Design a System for Monitoring Service Health

Design a Payment Processing System