Interview Preparation | Building a URL Shortener

January 1st, 2024

Introduction to Building a URL Shortener

This summary will serve to further cement my learnings taken when reviewing the Building a URL Shortener module in the Grokking the Modern System Design Interview course, and I hope will provide some learnings to you as well.

Overview of a URL Shortener

  • URL Shortening Purpose: Shortening URLs, similar to what a service like TinyURL does, enhances accessibility and sharing possibilities of links.
  • Advantages:
    • Convenience: Short URLs optimize link usage across devices, ensuring accessibility and non-breakability.
    • Professionalism: Visually professional, engaging, and facilitates higher sharing possibilities.
    • Error Reduction: Less error-prone while typing.
    • Space Efficiency: Requires smaller storage space at the user’s end.
  • Disadvantages:
    • Loss of Originality: Use of third-party service may result in loss of brand originality.
    • Dependency Risks: Dependence on a third-party service that may shut down, risking the loss of all shortened URLs.
    • Brand Image Impact: Business brand image is tied to the trustworthiness of the URL shortening service.
    • Competition Challenges: Competition for in-demand custom short URLs, potentially limiting options.

Requirements for URL Shortening Design

Functional Requirements:

  • Short URL Generation: Generate a unique, shorter alias for a given URL.
  • Redirection: Redirect users to the original URL based on a short link.
  • Custom Short Links: Allow users to create custom short links.
  • Deletion: Enable users to delete short links with proper rights.
  • Update: Allow users to update the long URL associated with a short link.
  • Expiry Time: Implement default expiration time for short links, customizable by users.

Non-functional Requirements

  • Availability: High availability is critical for URL redirection reliability.
  • Scalability: The system should scale horizontally to meet increasing demand.
  • Readability: Short URLs must be easily readable, distinguishable, and typeable.
  • Latency: Low-latency performance for a smooth user experience.
  • Unpredictability: Short URLs should be highly unpredictable for security reasons.

Resource Estimation

Assumptions

  • We assume that the shortening-to-redirection request ratio is 1:100.

  • There are 200 million new URL shortening requests per month.

  • A URL shortening entry requires 500 Bytes of database storage.

  • Each entry will have a maximum of five years of expiry time unless explicitly deleted.

  • There are 100 million Daily Active Users (DAU).

  • Storage Estimation:

    • 12 billion URL shortening requests in 5 years, requiring 6 TB of storage (200 million * 12 months * 5 years * 500 Bytes = 6 TB)
  • Query Rate Estimation:

    • 20 billion redirection requests per month, leading to 7.6K URL requests/s (200 million * 100 = 20B).
    • 200 million new shortening requests per month, leading to 76 URL shortening requests / s
  • Bandwidth Estimation:

    • Incoming data: 304 Kbps (76 URLs/s * 500 bytes * 8 bits); Outgoing data: 30.4 Mbps (7.6K URL requests / s * 500 bytes * 8 bits).
  • Memory Estimation:

    • Caching 20% of redirection requests requires 66 GB of memory (7.6K URLs/s * 500 bytes * 8 bits * 86400 seconds * 0.2).
  • Servers Estimation:

    • Approximately 12,500 servers are needed based on DAU and daily user handling limit.

Building Blocks

  • Database: Stores mapping of long and short URLs.
  • Sequencer: Provides unique IDs for short URL generation.
  • Load Balancers: Distribute requests among servers.
  • Caches: Store frequently accessed short URLs.
  • Rate Limiters: Prevent system exploitation.

Additional Components

  • Servers: Handle service requests and run application logic.
  • Base-58 Encoder: Transform numeric sequencer output into alphanumeric form.

System APIs

Shortening a URL

To shorten a URL, the system provides a REST API with the following parameters:

  • api_dev_key: User’s unique identifier.
  • original_url: The original long URL to be shortened.
  • custom_alias (optional): User-defined custom short URL.
  • expiry_date (optional): Expiration date for the shortened URL.

The API returns the shortened URL on successful insertion; otherwise, an appropriate error code is returned.

Redirecting a short URL

For redirection, the REST API includes:

  • api_dev_key: User’s unique identifier.
  • url_key: Shortened URL to fetch the associated long URL from the database.

Successful redirection leads the user to the original URL.

Deleting a short URL

Deleting a short URL involves the API with:

  • api_dev_key: User’s unique identifier.
  • url_key: Shortened URL to fetch the long URL from the database.

Successful deletion results in a system message, “URL Removed.”

Design Components

Database

  • Stores user details and URL mappings.

  • NoSQL, specifically MongoDB, is chosen for horizontal scalability.

    • MongoDB uses the leader-follower protocol, making it possible to use replicas for heavy reading.
    • MongoDB also ensures atomicity in concurrent write operations and avoids collisions by returning duplicate-key errors for record-duplication issues.

Short URL Generator

  • Utilizes a sequencer for unique IDs.
  • Incorporates a Base-58 encoder for alphanumeric short URLs.

Other Building Blocks

  • Load balancing: Global Server Load Balancing (GSLB) for improved availability.
  • Cache: Memcached for read-intensive design.
  • Rate limiter: Implements the fixed window counter algorithm for security.

Workflow

  1. Shortening:

    • User request forwarded to Short URL Generator.
    • A successful short link is sent to the user and recorded in the database.
  2. Redirection:

    • The application server checks the cache and database for redirection requests.
    • If found, the user is redirected to the associated long URL.
  3. Deletion:

    • User or system-initiated deletion by sending details to the database server.
  4. Custom Short Links:

    • The application server checks the eligibility and availability of the requested custom short URL.
    • The user receives the success/error message accordingly.

Why use encoding

The encoder enhances the readability of short URLs generated by the sequencer. Base-58 encoding is used over base-64 encoding as it helps avoid confusion with look-alike characters (like I and l).

Base 58 Mapping

| Value | Character | Value | Character | Value | Character | Value | Character |
| ----- | --------- | ----- | --------- | ----- | --------- | ----- | --------- |
| 0     | 1         | 15    | G         | 30    | X         | 45    | n         |
| 1     | 2         | 16    | H         | 31    | Y         | 46    | o         |
| 2     | 3         | 17    | J         | 32    | Z         | 47    | p         |
| 3     | 4         | 18    | K         | 33    | a         | 48    | q         |
| 4     | 5         | 19    | L         | 34    | b         | 49    | r         |
| 5     | 6         | 20    | M         | 35    | c         | 50    | s         |
| 6     | 7         | 21    | N         | 36    | d         | 51    | t         |
| 7     | 8         | 22    | P         | 37    | e         | 52    | u         |
| 8     | 9         | 23    | Q         | 38    | f         | 53    | v         |
| 9     | A         | 24    | R         | 39    | g         | 54    | w         |
| 10    | B         | 25    | S         | 40    | h         | 55    | x         |
| 11    | C         | 26    | T         | 41    | i         | 56    | y         |
| 12    | D         | 27    | U         | 42    | j         | 57    | z         |
| 13    | E         | 28    | V         | 43    | k         |       |           |
| 14    | F         | 29    | W         | 44    | m         |       |           |

Converting base-10 to base-58

Converting a base-10 number to a base-58 number is crucial to our architecture. Let’s run through an example of this process.

Example: Let’s say we have the base-10 number 2468135791013 and we want to convert it to base-58.

Conversion Process:

  1. Start by repeatedly dividing the base-10 number by 58 and noting the remainder at each step.
  2. Stop when the quotient is zero.
  3. Assign the base-58 characters corresponding to each remainder.

Calculation:

Base-10 = 2468135791013

2468135791013 % 58 = 17   => Remainder (base-58 character: q)
42554065362 % 58 = 6      => Remainder (base-58 character: 6)
733690782 % 58 = 4        => Remainder (base-58 character: 4)
12649841 % 58 = 41        => Remainder (base-58 character: 1)
218100 % 58 = 20          => Remainder (base-58 character: 20)
3760 % 58 = 48            => Remainder (base-58 character: 48)
64 % 58 = 6              => Remainder (base-58 character: 6)
1 % 58 = 1               => Remainder (base-58 character: 1)

Base-58 = [1] [6] [48] [20] [41] [4] [6] [17]

Converted Base-58 Number = 27qMi57J

Therefore, the base-10 number 2468135791013 is converted to the base-58 number 27qMi57J.

Converting base-58 to base-10

Equally as important as encoding a base-10 number to a base-58 number is decoding a base-58 number to a base-10 number. Let’s take a look at an example.

Example: Let’s say we have the base-58 number 27qMi57J, and we want to decode it to base-10.

Decoding Process:

  1. Assign numerical values to each base-58 character based on the table.
  2. Multiply each character’s value by the base (58) raised to the power of its position.
  3. Sum up all the individual multiplication results.

Calculation: Using the provided base-58 table:

Base-58 = 27qMi57J

27qMi57J
| | | | | | | |
v v v v v v v v
[1] [6] [48] [20] [41] [4] [6] [17]

Base-10 = (1 * 58^7) + (6 * 58^6) + (48 * 58^5) + (20 * 58^4) + (41 * 58^3) + (4 * 58^2) + (6 * 58^1) + (17 * 58^0)

Base-10 = 2207984167552 + 228412155264 + 31505124864 + 226329920 + 7999592 + 13456 + 348 + 17

Base-10 = 2468135791013

Therefore, the base-58 number 27qMi57J is decoded to the base-10 number 2468135791013.

Reviewing the requirements

Availability

  • High availability is crucial for short URL generation and redirection.
  • Building blocks (databases, caches, application servers) have built-in replication for fault tolerance.
  • Disaster recovery includes frequent backups using Amazon S3.
  • Global server load balancing (GSLB) handles traffic intelligently.
  • Rate limiters protect against DoS attacks.

Scalability

  • Easily scalable with horizontally sharded databases.
  • MongoDB facilitates horizontal scaling, suitable for the design.
  • NoSQL database flexibility supports the design’s requirements.

Readability

  • Base-58 encoder enhances readability by eliminating look-alike characters.
  • Non-alphanumeric characters are removed, making URLs compatible with various systems.

Latency

  • Low latency is ensured through quick encoding, especially for the redirection-heavy system.
  • MongoDB’s low latency and high throughput enhance performance.
  • Distributed cache minimizes service delays.

Unpredictability

  • Unique IDs are randomly selected, enhancing security by making short URLs unpredictable.