Book Summaries | System Design Interview (Part 1) - Design a Rate Limiter

January 23rd, 2024

Chapter 4: Design a Rate Limiter

This summary will serve to further cement my learnings taken when reading the fourth chapter of System Design Interview - An Insider's Guide (Part 1), titled Design a Rate Limiter, and I hope will provide some learnings to you as well.

Benefits of Using an API Rate Limiter

Prevents resource starvation caused by Denial of Service (DoS) attacks.
Reduces costs by limiting excess requests and allocating resources efficiently.
Prevents servers from being overloaded by filtering out excess requests.

Where to Implement Rate Limiter

Server-side implementation is preferred over client-side due to reliability.
Consider using an API gateway for rate limiting in a distributed environment.
Evaluate the technology stack, choose server-side if efficient, or use a third-party gateway.

Algorithms for Rate Limiting

Rate Limiting Algorithms Overview

Token Bucket Algorithm

Description: Token bucket is a widely used algorithm for rate limiting, where a token bucket has a pre-defined capacity. Tokens are added to the bucket at a preset rate, and each request consumes one token.
Pros:
- Simple and well-understood.
- Memory-efficient.
- Allows bursts of traffic for short periods.
Cons:
- Requires tuning of parameters (bucket size, refill rate).
- Does not handle sudden spikes in traffic well.

Leaking Bucket Algorithm

Description: Similar to the token bucket but processes requests at a fixed rate. Requests are added to a queue, and if the queue is full, the request is dropped.
Pros:
- Memory-efficient with a limited queue size.
- Suitable for use cases requiring a stable outflow rate.
Cons:
- It may lead to outdated requests in the queue.
- Tuning parameters can be challenging.

Fixed Window Counter Algorithm

Description: Divides time into fixed-sized windows, assigning a counter to each window. Requests increment the counter, and once the threshold is reached, new requests are dropped until a new window starts.
Pros:
- Memory-efficient and easy to understand.
- Resets available quota at the end of each time window.
Cons:
- Allows a burst of traffic at window edges, which could exceed the intended rate limit for the system.
- Limited precision in controlling traffic.

Sliding Window Log Algorithm

Description: Overcomes the edge case issue of the fixed window counter by keeping track of request timestamps. Timestamps are checked against the current time window to determine whether a request is allowed.
Pros:
- Highly accurate rate limiting.
- Prevents excessive requests at window edges.
Cons:
- Consumes more memory due to storing timestamps.
- Rejects requests even if they are infrequent.

Sliding Window Counter Algorithm

Description: A hybrid approach combining fixed window counter and sliding window log. Calculates the number of requests in the rolling window based on the previous window’s count and an overlap percentage.
Pros:
- Will smooth out spikes in traffic at an average rate.
- Is memory-efficient compared to a pure sliding window log.
Cons:
- An approximation assuming even distribution.
- Effectiveness depends on the use case and overlap percentage.

High-Level Architecture

Rate-limiting rules are stored on disk, fetched by workers, and stored in cache.
Rate limiter middleware checks the rules, fetches counters from Redis, and forwards requests.
Rate limiter headers (X-Ratelimit) communicate remaining requests, limits, and retry-after information.

Design Deep Dive

Rate Limiting Rules

Lyft’s rate-limiting component is open-sourced, with rules stored in configuration files on disk.
Rules include domains, descriptors, and rate limits per unit (e.g., requests per day, per minute).

Exceeding Rate Limit

Rate-limited requests return an HTTP 429 response code to clients.
Enqueue rate-limited requests for later processing based on system overload.

Rate Limiter Headers

Communicate rate-limiting information to clients through HTTP headers.
X-Ratelimit-Remaining: Remaining number of allowed requests within the window.
X-Ratelimit-Limit: Indicates the number of calls the client can make per time window.
X-Ratelimit-Retry-After: Specifies the seconds to wait until the client can make a request without being throttled.

Rate Limiter in a Distributed Environment

Scaling to support multiple servers and concurrent threads poses challenges.
Address race conditions with Lua scripts or sorted sets in Redis.
Synchronize data using centralized data stores like Redis in a distributed environment.

Performance Optimization

Adopt a multi-data center setup to reduce latency.
Synchronize data with an eventual consistency model for scalability and flexibility.

Monitoring

Gather analytics data to ensure the effectiveness of rate limiting.
Monitor whether the algorithm and rules are effective.
Consider adjustments based on the observed behavior.