Book Summaries | System Design Interview (Part 1) - Design a Rate Limiter
January 23rd, 2024
Chapter 4: Design a Rate Limiter
This summary will serve to further cement my learnings taken when reading the fourth chapter of System Design Interview - An Insider's Guide (Part 1)
, titled Design a Rate Limiter
, and I hope will provide some learnings to you as well.
Benefits of Using an API Rate Limiter
- Prevents resource starvation caused by Denial of Service (DoS) attacks.
- Reduces costs by limiting excess requests and allocating resources efficiently.
- Prevents servers from being overloaded by filtering out excess requests.
Where to Implement Rate Limiter
- Server-side implementation is preferred over client-side due to reliability.
- Consider using an API gateway for rate limiting in a distributed environment.
- Evaluate the technology stack, choose server-side if efficient, or use a third-party gateway.
Algorithms for Rate Limiting
Rate Limiting Algorithms Overview
Token Bucket Algorithm
-
Description: Token bucket is a widely used algorithm for rate limiting, where a token bucket has a pre-defined capacity. Tokens are added to the bucket at a preset rate, and each request consumes one token.
-
Pros:
- Simple and well-understood.
- Memory-efficient.
- Allows bursts of traffic for short periods.
-
Cons:
- Requires tuning of parameters (bucket size, refill rate).
- Does not handle sudden spikes in traffic well.
Leaking Bucket Algorithm
-
Description: Similar to the token bucket but processes requests at a fixed rate. Requests are added to a queue, and if the queue is full, the request is dropped.
-
Pros:
- Memory-efficient with a limited queue size.
- Suitable for use cases requiring a stable outflow rate.
-
Cons:
- It may lead to outdated requests in the queue.
- Tuning parameters can be challenging.
Fixed Window Counter Algorithm
-
Description: Divides time into fixed-sized windows, assigning a counter to each window. Requests increment the counter, and once the threshold is reached, new requests are dropped until a new window starts.
-
Pros:
- Memory-efficient and easy to understand.
- Resets available quota at the end of each time window.
-
Cons:
- Allows a burst of traffic at window edges, which could exceed the intended rate limit for the system.
- Limited precision in controlling traffic.
Sliding Window Log Algorithm
-
Description: Overcomes the edge case issue of the fixed window counter by keeping track of request timestamps. Timestamps are checked against the current time window to determine whether a request is allowed.
-
Pros:
- Highly accurate rate limiting.
- Prevents excessive requests at window edges.
-
Cons:
- Consumes more memory due to storing timestamps.
- Rejects requests even if they are infrequent.
Sliding Window Counter Algorithm
-
Description: A hybrid approach combining fixed window counter and sliding window log. Calculates the number of requests in the rolling window based on the previous window’s count and an overlap percentage.
-
Pros:
- Will smooth out spikes in traffic at an average rate.
- Is memory-efficient compared to a pure sliding window log.
-
Cons:
- An approximation assuming even distribution.
- Effectiveness depends on the use case and overlap percentage.
High-Level Architecture
- Rate-limiting rules are stored on disk, fetched by workers, and stored in cache.
- Rate limiter middleware checks the rules, fetches counters from Redis, and forwards requests.
- Rate limiter headers (X-Ratelimit) communicate remaining requests, limits, and retry-after information.
Design Deep Dive
Rate Limiting Rules
- Lyft’s rate-limiting component is open-sourced, with rules stored in configuration files on disk.
- Rules include domains, descriptors, and rate limits per unit (e.g., requests per day, per minute).
Exceeding Rate Limit
- Rate-limited requests return an HTTP 429 response code to clients.
- Enqueue rate-limited requests for later processing based on system overload.
Rate Limiter Headers
- Communicate rate-limiting information to clients through HTTP headers.
- X-Ratelimit-Remaining: Remaining number of allowed requests within the window.
- X-Ratelimit-Limit: Indicates the number of calls the client can make per time window.
- X-Ratelimit-Retry-After: Specifies the seconds to wait until the client can make a request without being throttled.
Rate Limiter in a Distributed Environment
- Scaling to support multiple servers and concurrent threads poses challenges.
- Address race conditions with Lua scripts or sorted sets in Redis.
- Synchronize data using centralized data stores like Redis in a distributed environment.
Performance Optimization
- Adopt a multi-data center setup to reduce latency.
- Synchronize data with an eventual consistency model for scalability and flexibility.
Monitoring
- Gather analytics data to ensure the effectiveness of rate limiting.
- Monitor whether the algorithm and rules are effective.
- Consider adjustments based on the observed behavior.