DEV Community

Vincent Tommi
Vincent Tommi

Posted on

Mastering Back-of-the-Envelope Calculations for System Design day 15 of learning system design

Back-of-the-envelope calculations are a cornerstone of system design interviews. They help you estimate resource requirements, like storage or hardware, by making reasonable assumptions and quick computations. These calculations demonstrate your ability to break down complex problems logically, even with limited data. In this article, we'll walk through an example: estimating the hardware requirements for a YouTube-like system, focusing on storage needs.

Why Back-of-the-Envelope Calculations Matter
In system design interviews, you're often tasked with estimating metrics like storage, bandwidth, or infrastructure costs for large-scale systems. These calculations test your ability to:

  1. Identify key parameters.

  2. Make reasonable assumptions.

  3. Derive estimates within an acceptable range (typically one order of magnitude).

  4. Adjust when assumptions are off.

For example, you might estimate the number of petrol pumps in Nairobi or the hardware needed for a video streaming platform like YouTube. Let’s dive into the YouTube storage estimation to see how this works.

Estimating Storage for a YouTube-Like System
Step 1: Define the User Base and Video Uploads

Assume YouTube has 1 billion active users. Not all users upload videos daily, so let’s estimate that 1 in 1,000 users uploads a video each day. This gives us:
1,000,000 (1 million) new videos per day.

Step 2: Estimate Video Size
To calculate storage, we need the size of each video. Let’s assume:

  1. Average video length: 10 minutes.

  2. Initial size estimate: A 10-minute video is approximately 1 GB.

To explore further, consider a video as a sequence of frames:

  1. A 10-minute video = 600 seconds.

  2. At 24 frames per second, that’s 600 × 24 = 14,400 frames.

If each frame is 1 MB, the total size is 14,400 MB = 14.4 GB.

This estimate seems high. A typical 10-minute video (e.g., 720p) is closer to 700 MB. Let’s revise to 1 GB per video for simplicity, noting that real-world compression (e.g., H.264) reduces sizes significantly.

Step 3: Calculate Daily Storage Needs
With 1 million videos per day at 1 GB each, the daily storage requirement is:

1 million × 1 GB = 1 petabyte (PB).

This is the raw storage for original videos. Systems like YouTube store multiple copies for redundancy (fault tolerance and performance). Assume 3 copies, giving:
3 PB of raw storage.

Step 4: Account for Video Formats and Encoding
YouTube stores videos in multiple resolutions (e.g., 720p, 480p, 360p, 240p, 144p) to support different devices and bandwidths. Assume we’re encoding in MP4 format. Lower resolutions reduce file size:

1.480p: ~50% of 720p size.

2.360p: ~25% of 720p size.

  1. 240p: ~12.5% of 720p size.

  2. 144p: ~6.25% of 720p size.

If the original 720p video is 1 PB, additional formats add:

  1. 1 PB + 0.5 PB + 0.25 PB + 0.125 PB + 0.0625 PB ≈ 2 PB.

With 3 copies for redundancy, the total storage is:

  1. 2 PB × 3 = 6 PB (processed) + 3 PB (raw) = 9 PB per day.

Step 5: Hardware and Cost Estimation
To store 9 PB, assume a hard drive holds 100 TB (0.1 PB). We need:

  1. 9 PB ÷ 0.1 PB = 90 hard drives per day.

At an approximate cost of $10,000 per hard drive, the daily cost is:

  1. 90 × $10,000 = $900,000.

For a 3-year plan (1,095 days):

  1. $900,000 × 1,095 ≈ $986 million.

Step 6: Validate with Real Numbers

Let’s cross-check:

Assume 30,000 minutes of video uploaded per minute globally.

In a day (1,440 minutes): 30,000 × 1,440 = 43.2 million minutes.

If a 60-minute video encodes to 1 GB, then 43.2 million ÷ 60 ≈ 720,000 GB = 0.72 PB.

This suggests our 1 PB estimate for raw storage is reasonable, though slightly high. Accounting for formats and redundancy aligns our estimate to 9–10 PB.

Handling Errors in Estimations
If your estimate is off by one order of magnitude (e.g., 10x), that’s acceptable. Being off by three or more orders (e.g., 1,000x) requires revisiting assumptions. Common pitfalls include:

  1. Overestimating frame sizes or video lengths.

  2. Ignoring compression techniques.

  3. Forgetting additional formats or redundancy.

Reflect on:

  1. Where assumptions went wrong: Was the video size or upload rate too high?

  2. Missed factors: Did we account for caching or CDNs?

Conclusion
Back-of-the-envelope calculations are a powerful tool for system design, blending logic, creativity, and quick math to tackle complex problems. By breaking down a system like YouTube into manageable parts—user base, video sizes, redundancy, and encoding—you can derive meaningful estimates that hold up under scrutiny. The key is to make reasonable assumptions, validate them with real-world checks, and adjust when necessary. Mastering this skill not only prepares you for interviews but also sharpens your ability to design scalable systems.

Top comments (0)