Book Summaries | System Design Interview (Part 1) - Back-of-the-Envelope Estimation
January 21st, 2024
Chapter 2: Back-of-the-Envelope Estimation
This summary will serve to further cement my learnings taken when reading the second chapter of System Design Interview - An Insider's Guide (Part 1)
, titled Back-of-the-Envelope Estimation
, and I hope will provide some learnings to you as well.
Overview
- System design interviews may involve estimating system capacity or performance.
- Back-of-the-envelope estimation involves thought experiments and common performance numbers.
- Scalability basics are crucial for effective estimation.
Power of Two
- Data volume calculations in distributed systems rely on the power of two.
- A byte is 8 bits, and understanding data volume units is critical.
- Calculations should align with the power of 2 for accurate results.
Data Volume Unit | Size (in bytes) |
---|
Bit | 1 |
Byte | 8 |
Kilobyte (KB) | 1,024 (2^10) |
Megabyte (MB) | 1,048,576 (2^20) |
Gigabyte (GB) | 1,073,741,824 (2^30) |
Terabyte (TB) | 1,099,511,627,776 (2^40) |
Petabyte (PB) | 1,125,899,906,842,624 (2^50) |
Latency Numbers
- Dr. Deanโs 2010 numbers on computer operation lengths provide insights into fast and slow operations.
- Overview on the table below: memory is fast, disk seeks should be avoided, and simple compression algorithms are efficient.
Operation | Length |
---|
L1 cache reference | 1 ns |
Branch mispredict | 1-10 ns |
L2 cache reference | 10 ns |
Mutex lock/unlock | 25 ns |
Main memory reference | 100 ns |
Compress 1K bytes with Zippy | 3,000 ns |
Send 2K bytes over 1 Gbps network | 20,000 ns |
Read 1 MB sequentially from memory | 250,000 ns |
Round trip within the same datacenter | 500,000 ns |
Disk seek | 10,000,000 ns |
Read 1 MB sequentially from network | 10,000,000 ns |
Read 1 MB sequentially from disk | 30,000,000 ns |
Availability Numbers
- High availability is measured as a percentage, with 100% indicating zero downtime.
- Service Level Agreements (SLAs) define the level of uptime for service providers.
- Cloud providers like Amazon, Google, and Microsoft set SLAs at 99.9% or above.
Availability Percentage | Downtime per Year |
---|
99% | 3.65 days |
99.9% | 8.76 hours |
99.99% | 52.56 minutes |
99.999% | 5.26 minutes |
99.9999% | 31.56 seconds |
99.99999% | 3.16 seconds |
99.999999% | 315 milliseconds |
Example: Estimate Twitter QPS and Storage
Assumptions:
- 300 million monthly active users.
- 50% of users use Twitter daily.
- Users post 2 tweets per day on average.
- 10% of tweets contain media.
- Data is stored for 5 years.
QPS Estimate:
- Daily Active Users (DAU) = 300 million * 50% = 150 million
- Tweets QPS = 150 million * 2 tweets / 24 hours / 3600 seconds โ 3,500 QPS
- Peak QPS = 2 * QPS โ 7,000 QPS
Storage Estimate:
- Average tweet size:
- tweet_id: 64 bytes
- text: 140 bytes
- media: 1 MB
- Media Storage per day = 150 million * 2 * 10% * 1 MB = 30 TB
- 5-year Media Storage = 30 TB * 365 * 5 = ~55 PB
Tips for Back-of-the-Envelope Estimation
- The process is more important than obtaining results in estimation.
- Rounding and approximation are acceptable for simplifying calculations.
- Writing down assumptions helps reference them later.
- Labeling units prevents ambiguity in the estimation process.
- Commonly asked estimations include QPS, peak QPS, storage, cache, and number of servers.