Book Summaries | System Design Interview (Part 1) - Design a Rate Limiter

January 23rd, 2024

Chapter 4: Design a Rate Limiter

This summary will serve to further cement my learnings taken when reading the fourth chapter of System Design Interview - An Insider's Guide (Part 1), titled Design a Rate Limiter, and I hope will provide some learnings to you as well.

Benefits of Using an API Rate Limiter

  • Prevents resource starvation caused by Denial of Service (DoS) attacks.
  • Reduces costs by limiting excess requests and allocating resources efficiently.
  • Prevents servers from being overloaded by filtering out excess requests.

Where to Implement Rate Limiter

  • Server-side implementation is preferred over client-side due to reliability.
  • Consider using an API gateway for rate limiting in a distributed environment.
  • Evaluate the technology stack, choose server-side if efficient, or use a third-party gateway.

Algorithms for Rate Limiting

Rate Limiting Algorithms Overview

Token Bucket Algorithm

  • Description: Token bucket is a widely used algorithm for rate limiting, where a token bucket has a pre-defined capacity. Tokens are added to the bucket at a preset rate, and each request consumes one token.

  • Pros:

    • Simple and well-understood.
    • Memory-efficient.
    • Allows bursts of traffic for short periods.
  • Cons:

    • Requires tuning of parameters (bucket size, refill rate).
    • Does not handle sudden spikes in traffic well.

Leaking Bucket Algorithm

  • Description: Similar to the token bucket but processes requests at a fixed rate. Requests are added to a queue, and if the queue is full, the request is dropped.

  • Pros:

    • Memory-efficient with a limited queue size.
    • Suitable for use cases requiring a stable outflow rate.
  • Cons:

    • It may lead to outdated requests in the queue.
    • Tuning parameters can be challenging.

Fixed Window Counter Algorithm

  • Description: Divides time into fixed-sized windows, assigning a counter to each window. Requests increment the counter, and once the threshold is reached, new requests are dropped until a new window starts.

  • Pros:

    • Memory-efficient and easy to understand.
    • Resets available quota at the end of each time window.
  • Cons:

    • Allows a burst of traffic at window edges, which could exceed the intended rate limit for the system.
    • Limited precision in controlling traffic.

Sliding Window Log Algorithm

  • Description: Overcomes the edge case issue of the fixed window counter by keeping track of request timestamps. Timestamps are checked against the current time window to determine whether a request is allowed.

  • Pros:

    • Highly accurate rate limiting.
    • Prevents excessive requests at window edges.
  • Cons:

    • Consumes more memory due to storing timestamps.
    • Rejects requests even if they are infrequent.

Sliding Window Counter Algorithm

  • Description: A hybrid approach combining fixed window counter and sliding window log. Calculates the number of requests in the rolling window based on the previous window’s count and an overlap percentage.

  • Pros:

    • Will smooth out spikes in traffic at an average rate.
    • Is memory-efficient compared to a pure sliding window log.
  • Cons:

    • An approximation assuming even distribution.
    • Effectiveness depends on the use case and overlap percentage.

High-Level Architecture

  • Rate-limiting rules are stored on disk, fetched by workers, and stored in cache.
  • Rate limiter middleware checks the rules, fetches counters from Redis, and forwards requests.
  • Rate limiter headers (X-Ratelimit) communicate remaining requests, limits, and retry-after information.

Design Deep Dive

Rate Limiting Rules

  • Lyft’s rate-limiting component is open-sourced, with rules stored in configuration files on disk.
  • Rules include domains, descriptors, and rate limits per unit (e.g., requests per day, per minute).

Exceeding Rate Limit

  • Rate-limited requests return an HTTP 429 response code to clients.
  • Enqueue rate-limited requests for later processing based on system overload.

Rate Limiter Headers

  • Communicate rate-limiting information to clients through HTTP headers.
  • X-Ratelimit-Remaining: Remaining number of allowed requests within the window.
  • X-Ratelimit-Limit: Indicates the number of calls the client can make per time window.
  • X-Ratelimit-Retry-After: Specifies the seconds to wait until the client can make a request without being throttled.

Rate Limiter in a Distributed Environment

  • Scaling to support multiple servers and concurrent threads poses challenges.
  • Address race conditions with Lua scripts or sorted sets in Redis.
  • Synchronize data using centralized data stores like Redis in a distributed environment.

Performance Optimization

  • Adopt a multi-data center setup to reduce latency.
  • Synchronize data with an eventual consistency model for scalability and flexibility.

Monitoring

  • Gather analytics data to ensure the effectiveness of rate limiting.
  • Monitor whether the algorithm and rules are effective.
  • Consider adjustments based on the observed behavior.