Book Summaries | System Design Interview (Part 1) - Design a URL Shortener
March 3rd, 2024
Chapter 8: Design a URL Shortener
This summary will serve to further cement my learnings taken when reading the eighth chapter of System Design Interview - An Insider's Guide (Part 1)
, titled Design a URL Shortener
, and I hope will provide some learnings to you as well.
Design Scope and Clarification Questions
- Problem: Design a URL shortening service like tinyurl.
- Clarification Questions:
- Example provided: Original URL to shortened URL.
- Traffic Volume: 100 million URLs generated daily.
- Shortened URL Characteristics: Combination of numbers (0-9) and characters (a-z, A-Z).
- Shortened URL Management: No deletion or updates assumed.
Back of the Envelope Estimation
- Estimation:
- Write Operations: 100 million URLs per day.
- Read Operations: 11,600 per second.
- Storage Requirement: 365 TB over 10 years.
- Average URL Length: Assumed to be 100 characters.
High-Level Design and API Endpoints
- API Endpoints:
- URL Shortening: POST request to create a short URL.
- URL Redirecting: GET request to redirect to the original URL.
- RESTful Style: Designing APIs based on REST principles.
- Two Primary Endpoints:
- POST api/v1/data/shorten
- GET api/v1/shortUrl
URL Redirecting and 301 vs 302 Redirects
- URL Redirecting:
- Server-side redirection with 301 redirect.
- Detailed communication illustrated.
- 301 vs 302 Redirects:
- 301: “Permanently” moved, reduces server load.
- 302: “Temporarily” moved, useful for analytics.
URL Shortening Deep Dive and Data Model
Data Model:
- Initial Design: High-level design proposed the use of a hash table for storing <shortURL, longURL> pairs.
- Feasibility Issue: Real-world systems face limitations in memory resources, making the hash table approach impractical.
- Proposed Solution: Shift to a Relational Database Model.
- Database Table: Three columns - id, shortURL, longURL.
- Simplified Design: Id serves as the primary key, linking the short and long URLs.
Hash Function:
- Critical Role: Hash function is essential for converting long URLs to short hash values.
- HashValue Characteristics: Comprised of characters [0-9, a-z, A-Z].
- Two Approaches Explored:
- “Hash + Collision Resolution”
- Utilizes well-known hash functions (CRC32, MD5, SHA-1).
- Comparison: The hash values obtained may be longer than the desired 7 characters.
- Resolution: Collision resolution is handled by appending a predefined string recursively.
- Optimization: Implementation complexity addressed using bloom filters.
- “Base 62 Conversion”
- Leverages base conversion for representation using 62 possible characters.
- Example: Conversion of a decimal number (11157) to base 62 representation.
- Significance: Addresses the need for a short and unique hash value.
- “Hash + Collision Resolution”
URL Shortening Flow:
- Logical Flow:
- Input: longURL
- User provides the longURL for shortening.
- Check the Database for Existing shortURL
- Verify if the longURL has been previously shortened.
- If Exists: Return shortURL
- If the longURL is found, retrieve the corresponding shortURL and return it to the client.
- If Not Exists: Generate Unique ID
- If the longURL is new, generate a unique ID using a distributed unique ID generator.
- Convert ID to shortURL Using Base 62 Conversion
- Apply base 62 conversion to create a shortURL from the unique ID.
- Database Record Creation
- Save the ID, shortURL, and longURL in the relational database for future reference.
- Input: longURL
Distributed Unique ID Generator:
- Critical Component: Responsible for generating globally unique IDs for shortURL creation.
- Challenges: Addressing uniqueness in a highly distributed environment.
- Solutions Discussed: Previously covered in “Chapter 7: Design A Unique ID Generator in Distributed Systems.”
- Recap: Techniques such as snowflake, UUID, or centralized ID allocation.
Considerations and Optimization:
- System Complexity: Balancing simplicity and functionality in the URL shortening flow.
- Base 62 Conversion Choice: Rationale behind choosing base 62 conversion for hash value representation.
- Performance and Scalability: Ensuring the system efficiently handles a large volume of URL shortening requests over time.
- Continuous Improvement: Regularly evaluating and optimizing the hash function and URL shortening process for better performance.
URL Redirecting Deep Dive
- URL Redirecting Flow:
- User clicks short URL.
- Load balancer forwards request to web servers.
- Cache lookup for shortURL; if present, return longURL.
- If not in cache, fetch from the database; if not found, handle as an invalid shortURL.
Wrap Up and Additional Talking Points
- Rate Limiter: Addressing potential security issues with overwhelming URL shortening requests.
- Web Server Scaling: Stateless web tier facilitates easy scalability.
- Database Scaling: Techniques like replication and sharding for database scalability.
- Analytics Integration: Utilizing analytics to gather insights on link clicks and user behavior.