Book Summaries | System Design Interview (Part 1) - Design Consistent Hashing

January 30th, 2024

Chapter 5: Design Consistent Hashing

This summary will serve to further cement my learnings taken when reading the fifth chapter of System Design Interview - An Insider's Guide (Part 1), titled Design Consistent Hashing, and I hope will provide some learnings to you as well.

Introduction

To achieve horizontal scaling, efficient and even distribution of requests/data across servers is crucial. Consistent hashing is a widely adopted technique addressing this challenge. This chapter provides an in-depth look into the consistent hashing approach and explores its advantages.

The Rehashing Problem

When using a common hash method (e.g., serverIndex = hash(key) % N) to balance the load among cache servers, issues arise during server additions or removals. The modular operation leads to different server indexes, causing significant data redistribution.

Consistent Hashing Overview

Consistent hashing, a special kind of hashing, minimizes key remapping during resizing. It utilizes a hash ring and hash space, offering a more resilient solution to the rehashing problem. When a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots.

Hash Space and Hash Ring

The hash space, exemplified by SHA-1’s range, is mapped onto a hash ring. This ring provides a visual representation of how servers are distributed.

Hash Servers

Servers are placed on the hash ring using a hash function. Each server’s position on the ring determines its responsibility.

Hash Keys

Unlike traditional hashing, consistent hashing does not involve modular operations. Keys are hashed onto the ring based on their positions.

Server Lookup

Determining the server for a specific key involves moving clockwise from the key’s position on the ring until a server is found.

Handling Server Changes

Consistent hashing efficiently manages the addition or removal of servers with minimal key redistribution.

Adding a Server

When adding a new server, only a fraction of keys need redistribution. The process involves identifying the affected range and redistributing keys within that range.

Removing a Server

Similarly, when a server is removed, only a small fraction of keys requires redistribution, minimizing the impact on the system.

Addressing Issues in the Basic Approach

Two issues, uneven partition sizes and non-uniform key distribution, are identified in the basic consistent hashing approach. Virtual nodes or replicas are introduced to overcome these challenges.

Virtual Nodes

Using virtual nodes, each server is represented by multiple nodes on the ring, leading to more balanced partition sizes and improved key distribution.

Conclusion

Consistent hashing proves beneficial in various real-world systems, providing advantages such as minimized key redistribution, scalability, and mitigation of hotspot key problems. Its applications range from databases like Amazon’s Dynamo to network load balancers like Maglev.