Book Summaries | System Design Interview (Part 1) - Design a Unique ID Generator

February 25th, 2024

Chapter 7: Design a Unique ID Generator in Distributed Systems

This summary will serve to further cement my learnings taken when reading the seventh chapter of System Design Interview - An Insider's Guide (Part 1), titled Design a Unique ID Generator in Distributed Systems, and I hope will provide some learnings to you as well.

Design Scope and Understanding the Problem:

  • Unique ID generation in distributed systems is challenging due to limitations of auto_increment in traditional databases.
  • Requirements for this system:
    • IDs must be unique.
    • IDs are numerical values only.
    • IDs fit into 64-bit.
    • IDs are ordered by date.
    • Ability to generate over 10,000 unique IDs per second.

High-Level Design Proposal and Options:

  • The options considered below for ID generation include multi-master replication, UUID, ticket server, and Twitter snowflake approach.
  • Each option has pros and cons, such as scalability challenges, numeric ID limitations, and potential single points of failure.

Multi-master Replication:

  • Uses auto_increment feature across multiple database servers with an increment factor.
  • Faces scalability issues with multiple data centers, non-sequential IDs across servers, and challenges when adding or removing servers.

UUID:

  • Provides a 128-bit number with a low probability of collision.
  • Lacks numeric constraints, doesn’t order IDs by time, and may have non-numeric values.

Ticket Server:

  • Utilizes a centralized auto_increment feature in a single database server.
  • Simple implementation for small to medium-scale applications but introduces a single point of failure.

Twitter Snowflake Approach:

  • Divides a 64-bit ID into sections, including sign bit, timestamp, datacenter ID, machine ID, and sequence number.
  • Offers scalability, numeric IDs, and time-based ordering, addressing limitations of other approaches.

Design Deep Dive:

  • Datacenter and Machine IDs:

    • These IDs are chosen during the startup phase of the system and are typically fixed once the system is operational, requiring careful consideration.
    • Any changes to the datacenter or machine IDs need meticulous review, as accidental alterations could lead to ID conflicts and impact system functionality.
  • Timestamp and Sequence Numbers:

    • Timestamps, crucial for sorting and ordering IDs by time, constitute 41 bits, allowing for a working period of approximately 69 years.
    • Sequence numbers, spanning 12 bits, provide 4096 combinations, ensuring uniqueness within a given millisecond and offering scalability for various levels of system demand.
  • Timestamp Significance:

    • The most critical 41 bits in the ID are dedicated to the timestamp, allowing IDs to be sortable by time.
    • As timestamps grow with time, the system remains effective, and the binary representation of timestamps can be converted to and from UTC.
  • Timestamp Maximum Value:

    • The maximum timestamp value, represented by 41 bits, is 2199023255551 milliseconds, equivalent to around 69 years.
    • This extensive time frame ensures the ID generator’s longevity, and adopting a custom epoch time close to the present date delays the possibility of overflow.
  • Sequence Number Limits:

    • With 12 bits allocated to sequence numbers, the system can support up to 4096 unique IDs per millisecond.
    • This effectively handles scenarios where multiple IDs may be generated within the same millisecond on a given server, ensuring the uniqueness of each ID.