Book Summaries | System Design Interview (Part 1) - Design a Unique ID Generator
February 25th, 2024
Chapter 7: Design a Unique ID Generator in Distributed Systems
This summary will serve to further cement my learnings taken when reading the seventh chapter of System Design Interview - An Insider's Guide (Part 1)
, titled Design a Unique ID Generator in Distributed Systems
, and I hope will provide some learnings to you as well.
Design Scope and Understanding the Problem:
- Unique ID generation in distributed systems is challenging due to limitations of
auto_increment
in traditional databases. - Requirements for this system:
- IDs must be unique.
- IDs are numerical values only.
- IDs fit into 64-bit.
- IDs are ordered by date.
- Ability to generate over 10,000 unique IDs per second.
High-Level Design Proposal and Options:
- The options considered below for ID generation include multi-master replication, UUID, ticket server, and Twitter snowflake approach.
- Each option has pros and cons, such as scalability challenges, numeric ID limitations, and potential single points of failure.
Multi-master Replication:
- Uses
auto_increment
feature across multiple database servers with an increment factor. - Faces scalability issues with multiple data centers, non-sequential IDs across servers, and challenges when adding or removing servers.
UUID:
- Provides a 128-bit number with a low probability of collision.
- Lacks numeric constraints, doesn’t order IDs by time, and may have non-numeric values.
Ticket Server:
- Utilizes a centralized auto_increment feature in a single database server.
- Simple implementation for small to medium-scale applications but introduces a single point of failure.
Twitter Snowflake Approach:
- Divides a 64-bit ID into sections, including sign bit, timestamp, datacenter ID, machine ID, and sequence number.
- Offers scalability, numeric IDs, and time-based ordering, addressing limitations of other approaches.
Design Deep Dive:
-
Datacenter and Machine IDs:
- These IDs are chosen during the startup phase of the system and are typically fixed once the system is operational, requiring careful consideration.
- Any changes to the datacenter or machine IDs need meticulous review, as accidental alterations could lead to ID conflicts and impact system functionality.
-
Timestamp and Sequence Numbers:
- Timestamps, crucial for sorting and ordering IDs by time, constitute 41 bits, allowing for a working period of approximately 69 years.
- Sequence numbers, spanning 12 bits, provide 4096 combinations, ensuring uniqueness within a given millisecond and offering scalability for various levels of system demand.
-
Timestamp Significance:
- The most critical 41 bits in the ID are dedicated to the timestamp, allowing IDs to be sortable by time.
- As timestamps grow with time, the system remains effective, and the binary representation of timestamps can be converted to and from UTC.
-
Timestamp Maximum Value:
- The maximum timestamp value, represented by 41 bits, is 2199023255551 milliseconds, equivalent to around 69 years.
- This extensive time frame ensures the ID generator’s longevity, and adopting a custom epoch time close to the present date delays the possibility of overflow.
-
Sequence Number Limits:
- With 12 bits allocated to sequence numbers, the system can support up to 4096 unique IDs per millisecond.
- This effectively handles scenarios where multiple IDs may be generated within the same millisecond on a given server, ensuring the uniqueness of each ID.