Interview Preparation | Building a URL Shortener
January 1st, 2024
Introduction to Building a URL Shortener
This summary will serve to further cement my learnings taken when reviewing the Building a URL Shortener
module in the Grokking the Modern System Design Interview course, and I hope will provide some learnings to you as well.
Overview of a URL Shortener
- URL Shortening Purpose: Shortening URLs, similar to what a service like TinyURL does, enhances accessibility and sharing possibilities of links.
- Advantages:
- Convenience: Short URLs optimize link usage across devices, ensuring accessibility and non-breakability.
- Professionalism: Visually professional, engaging, and facilitates higher sharing possibilities.
- Error Reduction: Less error-prone while typing.
- Space Efficiency: Requires smaller storage space at the user’s end.
- Disadvantages:
- Loss of Originality: Use of third-party service may result in loss of brand originality.
- Dependency Risks: Dependence on a third-party service that may shut down, risking the loss of all shortened URLs.
- Brand Image Impact: Business brand image is tied to the trustworthiness of the URL shortening service.
- Competition Challenges: Competition for in-demand custom short URLs, potentially limiting options.
Requirements for URL Shortening Design
Functional Requirements:
- Short URL Generation: Generate a unique, shorter alias for a given URL.
- Redirection: Redirect users to the original URL based on a short link.
- Custom Short Links: Allow users to create custom short links.
- Deletion: Enable users to delete short links with proper rights.
- Update: Allow users to update the long URL associated with a short link.
- Expiry Time: Implement default expiration time for short links, customizable by users.
Non-functional Requirements
- Availability: High availability is critical for URL redirection reliability.
- Scalability: The system should scale horizontally to meet increasing demand.
- Readability: Short URLs must be easily readable, distinguishable, and typeable.
- Latency: Low-latency performance for a smooth user experience.
- Unpredictability: Short URLs should be highly unpredictable for security reasons.
Resource Estimation
Assumptions
-
We assume that the shortening-to-redirection request ratio is 1:100.
-
There are 200 million new URL shortening requests per month.
-
A URL shortening entry requires 500 Bytes of database storage.
-
Each entry will have a maximum of five years of expiry time unless explicitly deleted.
-
There are 100 million Daily Active Users (DAU).
-
Storage Estimation:
- 12 billion URL shortening requests in 5 years, requiring 6 TB of storage (200 million * 12 months * 5 years * 500 Bytes = 6 TB)
-
Query Rate Estimation:
- 20 billion redirection requests per month, leading to 7.6K URL requests/s (200 million * 100 = 20B).
- 200 million new shortening requests per month, leading to 76 URL shortening requests / s
-
Bandwidth Estimation:
- Incoming data: 304 Kbps (76 URLs/s * 500 bytes * 8 bits); Outgoing data: 30.4 Mbps (7.6K URL requests / s * 500 bytes * 8 bits).
-
Memory Estimation:
- Caching 20% of redirection requests requires 66 GB of memory (7.6K URLs/s * 500 bytes * 8 bits * 86400 seconds * 0.2).
-
Servers Estimation:
- Approximately 12,500 servers are needed based on DAU and daily user handling limit.
Building Blocks
- Database: Stores mapping of long and short URLs.
- Sequencer: Provides unique IDs for short URL generation.
- Load Balancers: Distribute requests among servers.
- Caches: Store frequently accessed short URLs.
- Rate Limiters: Prevent system exploitation.
Additional Components
- Servers: Handle service requests and run application logic.
- Base-58 Encoder: Transform numeric sequencer output into alphanumeric form.
System APIs
Shortening a URL
To shorten a URL, the system provides a REST API with the following parameters:
api_dev_key
: User’s unique identifier.original_url
: The original long URL to be shortened.custom_alias
(optional): User-defined custom short URL.expiry_date
(optional): Expiration date for the shortened URL.
The API returns the shortened URL on successful insertion; otherwise, an appropriate error code is returned.
Redirecting a short URL
For redirection, the REST API includes:
api_dev_key
: User’s unique identifier.url_key
: Shortened URL to fetch the associated long URL from the database.
Successful redirection leads the user to the original URL.
Deleting a short URL
Deleting a short URL involves the API with:
api_dev_key
: User’s unique identifier.url_key
: Shortened URL to fetch the long URL from the database.
Successful deletion results in a system message, “URL Removed.”
Design Components
Database
-
Stores user details and URL mappings.
-
NoSQL, specifically MongoDB, is chosen for horizontal scalability.
- MongoDB uses the leader-follower protocol, making it possible to use replicas for heavy reading.
- MongoDB also ensures atomicity in concurrent write operations and avoids collisions by returning duplicate-key errors for record-duplication issues.
Short URL Generator
- Utilizes a sequencer for unique IDs.
- Incorporates a Base-58 encoder for alphanumeric short URLs.
Other Building Blocks
- Load balancing: Global Server Load Balancing (GSLB) for improved availability.
- Cache: Memcached for read-intensive design.
- Rate limiter: Implements the fixed window counter algorithm for security.
Workflow
-
Shortening:
- User request forwarded to Short URL Generator.
- A successful short link is sent to the user and recorded in the database.
-
Redirection:
- The application server checks the cache and database for redirection requests.
- If found, the user is redirected to the associated long URL.
-
Deletion:
- User or system-initiated deletion by sending details to the database server.
-
Custom Short Links:
- The application server checks the eligibility and availability of the requested custom short URL.
- The user receives the success/error message accordingly.
Why use encoding
The encoder enhances the readability of short URLs generated by the sequencer. Base-58 encoding is used over base-64 encoding as it helps avoid confusion with look-alike characters (like I
and l
).
Base 58 Mapping
| Value | Character | Value | Character | Value | Character | Value | Character |
| ----- | --------- | ----- | --------- | ----- | --------- | ----- | --------- |
| 0 | 1 | 15 | G | 30 | X | 45 | n |
| 1 | 2 | 16 | H | 31 | Y | 46 | o |
| 2 | 3 | 17 | J | 32 | Z | 47 | p |
| 3 | 4 | 18 | K | 33 | a | 48 | q |
| 4 | 5 | 19 | L | 34 | b | 49 | r |
| 5 | 6 | 20 | M | 35 | c | 50 | s |
| 6 | 7 | 21 | N | 36 | d | 51 | t |
| 7 | 8 | 22 | P | 37 | e | 52 | u |
| 8 | 9 | 23 | Q | 38 | f | 53 | v |
| 9 | A | 24 | R | 39 | g | 54 | w |
| 10 | B | 25 | S | 40 | h | 55 | x |
| 11 | C | 26 | T | 41 | i | 56 | y |
| 12 | D | 27 | U | 42 | j | 57 | z |
| 13 | E | 28 | V | 43 | k | | |
| 14 | F | 29 | W | 44 | m | | |
Converting base-10 to base-58
Converting a base-10 number to a base-58 number is crucial to our architecture. Let’s run through an example of this process.
Example:
Let’s say we have the base-10 number 2468135791013
and we want to convert it to base-58.
Conversion Process:
- Start by repeatedly dividing the base-10 number by 58 and noting the remainder at each step.
- Stop when the quotient is zero.
- Assign the base-58 characters corresponding to each remainder.
Calculation:
Base-10 = 2468135791013
2468135791013 % 58 = 17 => Remainder (base-58 character: q)
42554065362 % 58 = 6 => Remainder (base-58 character: 6)
733690782 % 58 = 4 => Remainder (base-58 character: 4)
12649841 % 58 = 41 => Remainder (base-58 character: 1)
218100 % 58 = 20 => Remainder (base-58 character: 20)
3760 % 58 = 48 => Remainder (base-58 character: 48)
64 % 58 = 6 => Remainder (base-58 character: 6)
1 % 58 = 1 => Remainder (base-58 character: 1)
Base-58 = [1] [6] [48] [20] [41] [4] [6] [17]
Converted Base-58 Number = 27qMi57J
Therefore, the base-10 number 2468135791013
is converted to the base-58 number 27qMi57J
.
Converting base-58 to base-10
Equally as important as encoding a base-10 number to a base-58 number is decoding a base-58 number to a base-10 number. Let’s take a look at an example.
Example:
Let’s say we have the base-58 number 27qMi57J
, and we want to decode it to base-10.
Decoding Process:
- Assign numerical values to each base-58 character based on the table.
- Multiply each character’s value by the base (58) raised to the power of its position.
- Sum up all the individual multiplication results.
Calculation: Using the provided base-58 table:
Base-58 = 27qMi57J
27qMi57J
| | | | | | | |
v v v v v v v v
[1] [6] [48] [20] [41] [4] [6] [17]
Base-10 = (1 * 58^7) + (6 * 58^6) + (48 * 58^5) + (20 * 58^4) + (41 * 58^3) + (4 * 58^2) + (6 * 58^1) + (17 * 58^0)
Base-10 = 2207984167552 + 228412155264 + 31505124864 + 226329920 + 7999592 + 13456 + 348 + 17
Base-10 = 2468135791013
Therefore, the base-58 number 27qMi57J
is decoded to the base-10 number 2468135791013
.
Reviewing the requirements
Availability
- High availability is crucial for short URL generation and redirection.
- Building blocks (databases, caches, application servers) have built-in replication for fault tolerance.
- Disaster recovery includes frequent backups using Amazon S3.
- Global server load balancing (GSLB) handles traffic intelligently.
- Rate limiters protect against DoS attacks.
Scalability
- Easily scalable with horizontally sharded databases.
- MongoDB facilitates horizontal scaling, suitable for the design.
- NoSQL database flexibility supports the design’s requirements.
Readability
- Base-58 encoder enhances readability by eliminating look-alike characters.
- Non-alphanumeric characters are removed, making URLs compatible with various systems.
Latency
- Low latency is ensured through quick encoding, especially for the redirection-heavy system.
- MongoDB’s low latency and high throughput enhance performance.
- Distributed cache minimizes service delays.
Unpredictability
- Unique IDs are randomly selected, enhancing security by making short URLs unpredictable.