Book Summaries | System Design Interview (Part 1) - Design Consistent Hashing
January 30th, 2024
Chapter 5: Design Consistent Hashing
This summary will serve to further cement my learnings taken when reading the fifth chapter of System Design Interview - An Insider's Guide (Part 1)
, titled Design Consistent Hashing
, and I hope will provide some learnings to you as well.
Introduction
To achieve horizontal scaling, efficient and even distribution of requests/data across servers is crucial. Consistent hashing is a widely adopted technique addressing this challenge. This chapter provides an in-depth look into the consistent hashing approach and explores its advantages.
The Rehashing Problem
When using a common hash method (e.g., serverIndex = hash(key) % N
) to balance the load among cache servers, issues arise during server additions or removals. The modular operation leads to different server indexes, causing significant data redistribution.
Consistent Hashing Overview
Consistent hashing, a special kind of hashing, minimizes key remapping during resizing. It utilizes a hash ring and hash space, offering a more resilient solution to the rehashing problem. When a hash table is re-sized and consistent hashing is used, only k/n
keys need to be
remapped on average, where k
is the number of keys, and n
is the number of slots.
Hash Space and Hash Ring
The hash space, exemplified by SHA-1’s range, is mapped onto a hash ring. This ring provides a visual representation of how servers are distributed.
Hash Servers
Servers are placed on the hash ring using a hash function. Each server’s position on the ring determines its responsibility.
Hash Keys
Unlike traditional hashing, consistent hashing does not involve modular operations. Keys are hashed onto the ring based on their positions.
Server Lookup
Determining the server for a specific key involves moving clockwise from the key’s position on the ring until a server is found.
Handling Server Changes
Consistent hashing efficiently manages the addition or removal of servers with minimal key redistribution.
Adding a Server
When adding a new server, only a fraction of keys need redistribution. The process involves identifying the affected range and redistributing keys within that range.
Removing a Server
Similarly, when a server is removed, only a small fraction of keys requires redistribution, minimizing the impact on the system.
Addressing Issues in the Basic Approach
Two issues, uneven partition sizes and non-uniform key distribution, are identified in the basic consistent hashing approach. Virtual nodes or replicas are introduced to overcome these challenges.
Virtual Nodes
Using virtual nodes, each server is represented by multiple nodes on the ring, leading to more balanced partition sizes and improved key distribution.
Conclusion
Consistent hashing proves beneficial in various real-world systems, providing advantages such as minimized key redistribution, scalability, and mitigation of hotspot key problems. Its applications range from databases like Amazon’s Dynamo to network load balancers like Maglev.