Book Summaries | System Design Interview (Part 1) - Back-of-the-Envelope Estimation

January 21st, 2024

Chapter 2: Back-of-the-Envelope Estimation

This summary will serve to further cement my learnings taken when reading the second chapter of System Design Interview - An Insider's Guide (Part 1), titled Back-of-the-Envelope Estimation, and I hope will provide some learnings to you as well.

Overview

  • System design interviews may involve estimating system capacity or performance.
  • Back-of-the-envelope estimation involves thought experiments and common performance numbers.
  • Scalability basics are crucial for effective estimation.

Power of Two

  • Data volume calculations in distributed systems rely on the power of two.
  • A byte is 8 bits, and understanding data volume units is critical.
  • Calculations should align with the power of 2 for accurate results.
Data Volume UnitSize (in bytes)
Bit1
Byte8
Kilobyte (KB)1,024 (2^10)
Megabyte (MB)1,048,576 (2^20)
Gigabyte (GB)1,073,741,824 (2^30)
Terabyte (TB)1,099,511,627,776 (2^40)
Petabyte (PB)1,125,899,906,842,624 (2^50)

Latency Numbers

  • Dr. Dean’s 2010 numbers on computer operation lengths provide insights into fast and slow operations.
  • Overview on the table below: memory is fast, disk seeks should be avoided, and simple compression algorithms are efficient.
OperationLength
L1 cache reference1 ns
Branch mispredict1-10 ns
L2 cache reference10 ns
Mutex lock/unlock25 ns
Main memory reference100 ns
Compress 1K bytes with Zippy3,000 ns
Send 2K bytes over 1 Gbps network20,000 ns
Read 1 MB sequentially from memory250,000 ns
Round trip within the same datacenter500,000 ns
Disk seek10,000,000 ns
Read 1 MB sequentially from network10,000,000 ns
Read 1 MB sequentially from disk30,000,000 ns

Availability Numbers

  • High availability is measured as a percentage, with 100% indicating zero downtime.
  • Service Level Agreements (SLAs) define the level of uptime for service providers.
  • Cloud providers like Amazon, Google, and Microsoft set SLAs at 99.9% or above.
Availability PercentageDowntime per Year
99%3.65 days
99.9%8.76 hours
99.99%52.56 minutes
99.999%5.26 minutes
99.9999%31.56 seconds
99.99999%3.16 seconds
99.999999%315 milliseconds

Example: Estimate Twitter QPS and Storage

Assumptions:

  • 300 million monthly active users.
  • 50% of users use Twitter daily.
  • Users post 2 tweets per day on average.
  • 10% of tweets contain media.
  • Data is stored for 5 years.

QPS Estimate:

  1. Daily Active Users (DAU) = 300 million * 50% = 150 million
  2. Tweets QPS = 150 million * 2 tweets / 24 hours / 3600 seconds β‰ˆ 3,500 QPS
  3. Peak QPS = 2 * QPS β‰ˆ 7,000 QPS

Storage Estimate:

  • Average tweet size:
    • tweet_id: 64 bytes
    • text: 140 bytes
    • media: 1 MB
  1. Media Storage per day = 150 million * 2 * 10% * 1 MB = 30 TB
  2. 5-year Media Storage = 30 TB * 365 * 5 = ~55 PB

Tips for Back-of-the-Envelope Estimation

  • The process is more important than obtaining results in estimation.
  • Rounding and approximation are acceptable for simplifying calculations.
  • Writing down assumptions helps reference them later.
  • Labeling units prevents ambiguity in the estimation process.
  • Commonly asked estimations include QPS, peak QPS, storage, cache, and number of servers.