Table of Contents
- Problem Statement
- Requirements
- High Level Components (HLD)
- Key Concepts Explained
- Complete Upload Flows
- API Design
- Storage Design
-
System Characteristics – Deep Dive
- Latency
- Latency Breakdown (Upload Path)
- How to Calculate Latency
- Latency Bottlenecks
- How We Improve Latency
- Latency Summary
- Throughput
- Throughput Formula
- Upload Service Throughput
- Real Bottlenecks
- How to Improve Throughput
- Throughput Summary
- Storing Image Content (Efficiency & Cost)
- Key Metric
- Dedup Ratio Calculation
- Why Content-Addressed Storage Helps
- Cost Optimization Techniques
- Fault Tolerance (Performance-Aware)
- Failure Scenarios
- How We Measure Impact
- How We Improve Fault Tolerance Without Killing Performance
- Fault Tolerance Summary
- Final Summary (Interview Ready)
Problem Statement
Design a scalable image upload and storage system that supports millions of users, efficiently handles duplicate uploads, guarantees data integrity, and provides low-latency access while being secure, fault tolerant, and cost-effective.
The system should:
- Avoid storing the same image multiple times
- Support retries safely
- Scale globally
- Ensure strong security and reliability
Requirements
Functional Requirements
- Users can upload images
- Detect exact duplicates
- Allow multiple users to reference the same image
- Generate thumbnails asynchronously
- Fetch images securely
- Delete images safely
- Support retries without duplication (idempotency)
Non Functional Requirements
- Low latency (< 50 ms for duplicate detection)
- High throughput (high RPS, millions of uploads)
- High availability (99.9%+)
- Strong consistency for metadata
- Secure storage and access
- Fault tolerance (multi-AZ, multi-region)
- Cost-efficient storage (deduplication)
High Level Components (HLD)
Core Components
- Client
- API Gateway
- Upload Service
- Cache (Redis)
- Metadata Database
- Object Storage
- Async Processing (Queue + Workers)
- CDN
- Security & IAM
Simple ASCII Architecture Diagram
Client
|
v
+--------------+
| API Gateway |
+--------------+
|
v
+------------------+
| Upload Service |
| (Stateless) |
+------------------+
| |
v v
+--------+ +----------------+
| Cache | | Metadata DB |
| Redis | | (Content/Asset)|
+--------+ +----------------+
|
v
+------------------+
| Object Storage |
| (SHA-256 key) |
+------------------+
|
v
+------------------+
| Async Workers |
| (thumbs, scan) |
+------------------+
Key Concepts Explained
1. Cryptographic Hash (SHA-256)
What is it?
A fixed-length fingerprint of data.
Image bytes ---> SHA-256 ---> e3b0c44298fc1c149...
Properties
- Same input → same output
- Impossible to reverse
- Extremely unlikely collisions
Why we use it
- Exact deduplication
- Content-addressed storage
- Data integrity verification
Diagram
[ Image Bytes ]
|
v
[ SHA-256 Hash ]
|
v
Object Key = /objects/sha256/<hash>
2. Idempotency
What is it?
Repeating the same request does not create duplicates.
Problem it solves
Network retries, client crashes, duplicate submissions.
Example
Client retries upload → Server returns same assetId
(no extra image stored)
How we implement
- Idempotency keys
- Conditional DB inserts
- Conditional object PUT
Diagram
Client
|
|-- upload (key=abc123)
|
Server
|
|-- already processed?
| yes → return same response
| no → process upload
Complete Upload Flows
Flow 1: Direct Upload (No Duplication)
Steps
- Client uploads image
- Upload service streams image
- SHA-256 computed
- Object stored at
/objects/sha256/<hash> - Metadata created
- assetId returned
Diagram
Client
|
v
Upload Service
|
|-- compute SHA-256
|
v
Object Storage (new object)
|
v
Metadata DB (new content + asset)
Flow 2: Direct Upload (Duplicate Already Exists)
Steps
- Client uploads image
- SHA-256 computed
- Metadata lookup finds existing contentId
- No object stored
- Only new asset reference created
Diagram
Upload Service
|
|-- SHA-256
|
|-- contentId exists?
|
v
Metadata DB → yes
|
v
Create new asset reference
Back to Top
Flow 3: Same Image Uploaded by Multiple Users
Key Idea
- One image blob
- Multiple asset references
Steps
- User A uploads image → stored
- User B uploads same image
- Same SHA-256 detected
- Reference count increments
Diagram
+----------------+
User A --->| Asset A |
| contentId X |
+----------------+
|
v
+------------------+
| Image Blob X |
| (stored once) |
+------------------+
^
+----------------+
User B --->| Asset B |
| contentId X |
+----------------+
API Design
| API | Purpose | Explanation |
|---|---|---|
POST /assets/precheck |
Detect duplicates | Avoid upload if already present |
POST /assets/upload |
Upload image | Handles multipart uploads |
GET /assets/{id} |
Fetch image | Returns signed URL |
DELETE /assets/{id} |
Delete asset | Decrements reference count |
Multipart Upload Example
POST /assets/upload
Content-Type: multipart/form-data
--boundary
Content-Disposition: form-data; name="file"; filename="img.png"
(binary data)
--boundary--
Used for:
- Large files
- Resume uploads
- Network reliability
Storage Design
Storage Options
- Local disk ❌ (not scalable)
- Block storage ❌
- Object Storage ✅ (Best choice)
Object Storage
- Keyed by SHA-256
- Immutable blobs
- Built-in durability (11 9’s)
- Lifecycle policies
- Cross-region replication
/objects/sha256/abcd1234...
Metadata Storage
Stored in Database
(Relational or Distributed NoSQL)
Content Table
- contentId (SHA-256)
- size
- mimeType
- checksum
- createdAt
- referenceCount
Asset Table
- assetId
- ownerId
- contentId
- filename
- ACLs
- status
System Characteristics – Deep Dive
We’ll focus on four dimensions:
- Latency
- Throughput
- Storing Image Content (Efficiency & Cost)
- Fault Tolerance (Impact on performance)
--- Back to Top
1. Latency
What latency means here
Latency is the time taken to complete a user-visible operation, mainly:
- Upload request
- Duplicate detection
- Fetching image
We usually care about P95 / P99 latency, not average.
Latency Breakdown (Upload Path)
Total Latency =
Network RTT
+ API Gateway processing
+ Deduplication check
+ Object storage write
+ Metadata DB write
Typical numbers
| Stage | Approx latency |
|---|---|
| Network RTT | 10–40 ms |
| API Gateway | 2–5 ms |
| Cache lookup | ~1 ms |
| DB lookup | 3–10 ms |
| Object store PUT | 20–100+ ms |
How to Calculate Latency
Exact-duplicate short-circuit
Latency = RTT + Cache lookup + Metadata read
≈ 10 + 1 + 5 = ~16 ms (P50)
New upload
Latency = Upload time + Hash compute + Object store commit
Upload time depends on:
Upload time = File size / Network bandwidth
Example:
5 MB image / 10 Mbps ≈ 4 seconds
So upload latency is network-bound, not CPU-bound.
Latency Bottlenecks
- Network distance (client → region)
- Object storage commit time
- Synchronous thumbnail generation ❌
How We Improve Latency
A. Early Short-Circuit (Biggest Win)
- Precheck with quickHash
- Cache-first lookup
- Skip upload completely for duplicates
📉 Reduces latency from seconds → milliseconds
B. Edge & Region Routing
- Geo-DNS routes client to nearest region
- Reduces RTT
RTT India → India region ≈ 10–20 ms
RTT India → US region ≈ 200+ ms
C. Async Everything Except the Critical Path
- Virus scan → async
- Thumbnail → async
- Near-duplicate ML → async
Only hash + store + metadata write stays synchronous.
D. Streaming Hash Computation
- No buffering whole file
- Compute SHA-256 while uploading
Network IO + Hash CPU overlap
This avoids extra latency.
E. CDN for Reads
- Signed URL → CDN edge
- Read latency ≈ 5–20 ms
Latency Summary
- Duplicate uploads: < 50 ms
- New uploads: network-dominated
- Reads: CDN-dominated
- Improvements come from short-circuiting + async
Back to Top
2. Throughput
What is Throughput?
Throughput = number of successful uploads per second
Measured as:
RPS (requests per second)
MB/s (data throughput)
Throughput Formula
Max Throughput =
min(
Upload Service capacity,
Network bandwidth,
Object storage write capacity,
Metadata DB write capacity
)
Back to Top
Upload Service Throughput
Because upload services are stateless:
Total throughput =
Instances × throughput per instance
Example:
1 instance → 200 concurrent uploads
10 instances → 2000 concurrent uploads
Why Upload Service Is NOT the Bottleneck
- It streams data
- Minimal CPU (just hashing)
- Horizontally scalable
Real Bottlenecks
A. Object Storage
- PUT request rate
- Sustained bandwidth
Mitigation:
- Multipart uploads
- Parallel chunk uploads
- Direct-to-object-storage (bypass app servers)
B. Metadata DB
Writes per upload:
- Insert content (conditional)
- Insert asset
- Increment reference count
At scale:
Writes/sec = uploads/sec × metadata ops
Mitigation:
- Sharding by contentId hash
- Batch writes
- Conditional writes (avoid locking)
C. Cache Hotspots
Popular viral images → same hash → contention
Mitigation:
- Shard cache by hash prefix
- Short-lived distributed locks
- Bloom filters
How to Improve Throughput
1. Direct-to-Object Uploads
Client uploads directly using signed URLs
Client → Object Storage
Server → Metadata only
📈 Massive throughput gain.
2. Multipart Uploads
- Upload chunks in parallel
- Resume on failure
- Reduces retry cost
3. Horizontal Autoscaling
Scale on:
- CPU
- Network IO
- Queue lag
4. Backpressure
If downstream is slow:
- Slow down uploads
- Return 429 with retry-after
Prevents cascading failures.
Throughput Summary
- Stateless services → linear scaling
- Object storage & DB are main limits
- Direct uploads + sharding unlock massive scale
3. Storing Image Content (Efficiency & Cost)
Key Metric
Storage cost = Unique content size × replication factor
Deduplication reduces:
Total uploads ≠ total stored bytes
Dedup Ratio Calculation
Dedup Ratio =
(Total uploaded bytes - Stored bytes)
/ Total uploaded bytes
Example:
Uploaded = 100 TB
Stored = 20 TB
Dedup ratio = 80%
Why Content-Addressed Storage Helps
- One blob per unique hash
- Immutable
- Safe concurrency
Cost Optimization Techniques
- Lifecycle rules (hot → cold → archive)
- Delete when referenceCount = 0
- Tiering thumbnails separately
Back to Top
4. Fault Tolerance (Performance-Aware)
Fault tolerance directly impacts latency and throughput under failure.
Failure Scenarios
| Failure | Impact |
|---|---|
| Upload service crash | Retry safe |
| Cache failure | Fallback to DB |
| DB replica down | Increased latency |
| Region outage | DNS failover |
How We Measure Impact
- Latency spike (P95, P99)
- Error rate increase
- Queue lag growth
How We Improve Fault Tolerance Without Killing Performance
A. Multi-AZ
- Synchronous writes within AZ
- Low latency failover
B. Graceful Degradation
- If cache down → DB only
- If precheck fails → allow upload
C. Idempotent Retries
- Retries do not amplify load
- Prevent duplicate writes
D. Async Recovery
- Orphan cleaner
- Metadata ↔ object reconciler
Fault Tolerance Summary
- Failures slow the system, not break it
- Idempotency + retries keep throughput stable
- Observability catches issues before users do
Final Summary (Interview Ready)
This system uses content-addressed storage with SHA-256 hashing to guarantee exact deduplication, idempotent uploads, and high storage efficiency. Stateless services, async processing, and object storage ensure high throughput, while caching and prechecks provide low latency. Strong metadata consistency, reference counting, and conditional writes make the system safe, scalable, and production-grade.
More Details:
Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli
Git: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli
Top comments (0)