Book Summaries | System Design Interview (Part 1) - Design a URL Shortener

March 3rd, 2024

Chapter 8: Design a URL Shortener

This summary will serve to further cement my learnings taken when reading the eighth chapter of System Design Interview - An Insider's Guide (Part 1), titled Design a URL Shortener, and I hope will provide some learnings to you as well.

Design Scope and Clarification Questions

  • Problem: Design a URL shortening service like tinyurl.
  • Clarification Questions:
    • Example provided: Original URL to shortened URL.
    • Traffic Volume: 100 million URLs generated daily.
    • Shortened URL Characteristics: Combination of numbers (0-9) and characters (a-z, A-Z).
    • Shortened URL Management: No deletion or updates assumed.

Back of the Envelope Estimation

  • Estimation:
    • Write Operations: 100 million URLs per day.
    • Read Operations: 11,600 per second.
    • Storage Requirement: 365 TB over 10 years.
    • Average URL Length: Assumed to be 100 characters.

High-Level Design and API Endpoints

  • API Endpoints:
    • URL Shortening: POST request to create a short URL.
    • URL Redirecting: GET request to redirect to the original URL.
  • RESTful Style: Designing APIs based on REST principles.
  • Two Primary Endpoints:
    • POST api/v1/data/shorten
    • GET api/v1/shortUrl

URL Redirecting and 301 vs 302 Redirects

  • URL Redirecting:
    • Server-side redirection with 301 redirect.
    • Detailed communication illustrated.
  • 301 vs 302 Redirects:
    • 301: “Permanently” moved, reduces server load.
    • 302: “Temporarily” moved, useful for analytics.

URL Shortening Deep Dive and Data Model

Data Model:

  • Initial Design: High-level design proposed the use of a hash table for storing <shortURL, longURL> pairs.
  • Feasibility Issue: Real-world systems face limitations in memory resources, making the hash table approach impractical.
  • Proposed Solution: Shift to a Relational Database Model.
    • Database Table: Three columns - id, shortURL, longURL.
    • Simplified Design: Id serves as the primary key, linking the short and long URLs.

Hash Function:

  • Critical Role: Hash function is essential for converting long URLs to short hash values.
  • HashValue Characteristics: Comprised of characters [0-9, a-z, A-Z].
  • Two Approaches Explored:
    1. “Hash + Collision Resolution”
      • Utilizes well-known hash functions (CRC32, MD5, SHA-1).
      • Comparison: The hash values obtained may be longer than the desired 7 characters.
      • Resolution: Collision resolution is handled by appending a predefined string recursively.
      • Optimization: Implementation complexity addressed using bloom filters.
    2. “Base 62 Conversion”
      • Leverages base conversion for representation using 62 possible characters.
      • Example: Conversion of a decimal number (11157) to base 62 representation.
      • Significance: Addresses the need for a short and unique hash value.

URL Shortening Flow:

  • Logical Flow:
    1. Input: longURL
      • User provides the longURL for shortening.
    2. Check the Database for Existing shortURL
      • Verify if the longURL has been previously shortened.
    3. If Exists: Return shortURL
      • If the longURL is found, retrieve the corresponding shortURL and return it to the client.
    4. If Not Exists: Generate Unique ID
      • If the longURL is new, generate a unique ID using a distributed unique ID generator.
    5. Convert ID to shortURL Using Base 62 Conversion
      • Apply base 62 conversion to create a shortURL from the unique ID.
    6. Database Record Creation
      • Save the ID, shortURL, and longURL in the relational database for future reference.

Distributed Unique ID Generator:

  • Critical Component: Responsible for generating globally unique IDs for shortURL creation.
  • Challenges: Addressing uniqueness in a highly distributed environment.
  • Solutions Discussed: Previously covered in “Chapter 7: Design A Unique ID Generator in Distributed Systems.”
    • Recap: Techniques such as snowflake, UUID, or centralized ID allocation.

Considerations and Optimization:

  • System Complexity: Balancing simplicity and functionality in the URL shortening flow.
  • Base 62 Conversion Choice: Rationale behind choosing base 62 conversion for hash value representation.
  • Performance and Scalability: Ensuring the system efficiently handles a large volume of URL shortening requests over time.
  • Continuous Improvement: Regularly evaluating and optimizing the hash function and URL shortening process for better performance.

URL Redirecting Deep Dive

  • URL Redirecting Flow:
    • User clicks short URL.
    • Load balancer forwards request to web servers.
    • Cache lookup for shortURL; if present, return longURL.
    • If not in cache, fetch from the database; if not found, handle as an invalid shortURL.

Wrap Up and Additional Talking Points

  • Rate Limiter: Addressing potential security issues with overwhelming URL shortening requests.
  • Web Server Scaling: Stateless web tier facilitates easy scalability.
  • Database Scaling: Techniques like replication and sharding for database scalability.
  • Analytics Integration: Utilizing analytics to gather insights on link clicks and user behavior.