ZeeshanAli-0704

Posted on Jan 24

Scalable Image upload and storage system - HLD

#webdev #programming #systemdesignwithzeeshanali

Problem Statement
Requirements
- Functional Requirements
- Non Functional Requirements
High Level Components (HLD)
- Core Components
- Simple ASCII Architecture Diagram
Key Concepts Explained
- Cryptographic Hash (SHA-256)
- Idempotency
Complete Upload Flows
API Design
- Multipart Upload Example
Storage Design
System Characteristics – Deep Dive
Final Summary (Interview Ready)

Problem Statement

Design a scalable image upload and storage system that supports millions of users, efficiently handles duplicate uploads, guarantees data integrity, and provides low-latency access while being secure, fault tolerant, and cost-effective.

The system should:

Avoid storing the same image multiple times
Support retries safely
Scale globally
Ensure strong security and reliability

Requirements

Functional Requirements

Users can upload images
Detect exact duplicates
Allow multiple users to reference the same image
Generate thumbnails asynchronously
Fetch images securely
Delete images safely
Support retries without duplication (idempotency)

Non Functional Requirements

Low latency (< 50 ms for duplicate detection)
High throughput (high RPS, millions of uploads)
High availability (99.9%+)
Strong consistency for metadata
Secure storage and access
Fault tolerance (multi-AZ, multi-region)
Cost-efficient storage (deduplication)

High Level Components (HLD)

Core Components

Client
API Gateway
Upload Service
Cache (Redis)
Metadata Database
Object Storage
Async Processing (Queue + Workers)
CDN
Security & IAM

Simple ASCII Architecture Diagram

        Client
          |
          v
   +--------------+
   | API Gateway  |
   +--------------+
          |
          v
   +------------------+
   | Upload Service   |
   | (Stateless)      |
   +------------------+
     |        |
     v        v
+--------+  +----------------+
| Cache  |  | Metadata DB    |
| Redis  |  | (Content/Asset)|
+--------+  +----------------+
     |
     v
+------------------+
| Object Storage   |
| (SHA-256 key)    |
+------------------+
     |
     v
+------------------+
| Async Workers    |
| (thumbs, scan)  |
+------------------+

Key Concepts Explained

1. Cryptographic Hash (SHA-256)

What is it?

A fixed-length fingerprint of data.

Image bytes  --->  SHA-256  --->  e3b0c44298fc1c149...

Properties

Same input → same output
Impossible to reverse
Extremely unlikely collisions

Why we use it

Exact deduplication
Content-addressed storage
Data integrity verification

Diagram

[ Image Bytes ]
       |
       v
[ SHA-256 Hash ]
       |
       v
Object Key = /objects/sha256/<hash>

2. Idempotency

What is it?

Repeating the same request does not create duplicates.

Problem it solves

Network retries, client crashes, duplicate submissions.

Example

Client retries upload → Server returns same assetId
(no extra image stored)

How we implement

Idempotency keys
Conditional DB inserts
Conditional object PUT

Diagram

Client
  |
  |-- upload (key=abc123)
  |
Server
  |
  |-- already processed?
       | yes → return same response
       | no  → process upload

Complete Upload Flows

Flow 1: Direct Upload (No Duplication)

Steps

Client uploads image
Upload service streams image
SHA-256 computed
Object stored at /objects/sha256/<hash>
Metadata created
assetId returned

Diagram

Client
  |
  v
Upload Service
  |
  |-- compute SHA-256
  |
  v
Object Storage (new object)
  |
  v
Metadata DB (new content + asset)

Flow 2: Direct Upload (Duplicate Already Exists)

Steps

Client uploads image
SHA-256 computed
Metadata lookup finds existing contentId
No object stored
Only new asset reference created

Diagram

Upload Service
  |
  |-- SHA-256
  |
  |-- contentId exists?
        |
        v
Metadata DB → yes
        |
        v
Create new asset reference

Back to Top

Flow 3: Same Image Uploaded by Multiple Users

Key Idea

One image blob
Multiple asset references

Steps

User A uploads image → stored
User B uploads same image
Same SHA-256 detected
Reference count increments

Diagram

           +----------------+
User A --->| Asset A        |
           | contentId X   |
           +----------------+
                   |
                   v
            +------------------+
            | Image Blob X     |
            | (stored once)   |
            +------------------+
                   ^
           +----------------+
User B --->| Asset B        |
           | contentId X   |
           +----------------+

API Design

API	Purpose	Explanation
`POST /assets/precheck`	Detect duplicates	Avoid upload if already present
`POST /assets/upload`	Upload image	Handles multipart uploads
`GET /assets/{id}`	Fetch image	Returns signed URL
`DELETE /assets/{id}`	Delete asset	Decrements reference count

Multipart Upload Example

POST /assets/upload
Content-Type: multipart/form-data

--boundary
Content-Disposition: form-data; name="file"; filename="img.png"
(binary data)
--boundary--

Used for:

Large files
Resume uploads
Network reliability

Storage Design

Storage Options

Local disk ❌ (not scalable)
Block storage ❌
Object Storage ✅ (Best choice)

Object Storage

Keyed by SHA-256
Immutable blobs
Built-in durability (11 9’s)
Lifecycle policies
Cross-region replication

/objects/sha256/abcd1234...

Metadata Storage

Stored in Database

(Relational or Distributed NoSQL)

Content Table

contentId (SHA-256)
size
mimeType
checksum
createdAt
referenceCount

Asset Table

assetId
ownerId
contentId
filename
ACLs
status

System Characteristics – Deep Dive

We’ll focus on four dimensions:

Latency
Throughput
Storing Image Content (Efficiency & Cost)
Fault Tolerance (Impact on performance)

--- Back to Top

1. Latency

What latency means here

Latency is the time taken to complete a user-visible operation, mainly:

Upload request
Duplicate detection
Fetching image

We usually care about P95 / P99 latency, not average.

Latency Breakdown (Upload Path)

Total Latency =
  Network RTT
+ API Gateway processing
+ Deduplication check
+ Object storage write
+ Metadata DB write

Typical numbers

Stage	Approx latency
Network RTT	10–40 ms
API Gateway	2–5 ms
Cache lookup	~1 ms
DB lookup	3–10 ms
Object store PUT	20–100+ ms

How to Calculate Latency

Exact-duplicate short-circuit

Latency = RTT + Cache lookup + Metadata read
≈ 10 + 1 + 5 = ~16 ms (P50)

New upload

Latency = Upload time + Hash compute + Object store commit

Upload time depends on:

Upload time = File size / Network bandwidth

Example:

5 MB image / 10 Mbps ≈ 4 seconds

So upload latency is network-bound, not CPU-bound.

Latency Bottlenecks

Network distance (client → region)
Object storage commit time
Synchronous thumbnail generation ❌

How We Improve Latency

A. Early Short-Circuit (Biggest Win)

Precheck with quickHash
Cache-first lookup
Skip upload completely for duplicates

📉 Reduces latency from seconds → milliseconds

B. Edge & Region Routing

Geo-DNS routes client to nearest region
Reduces RTT

RTT India → India region ≈ 10–20 ms
RTT India → US region ≈ 200+ ms

C. Async Everything Except the Critical Path

Virus scan → async
Thumbnail → async
Near-duplicate ML → async

Only hash + store + metadata write stays synchronous.

D. Streaming Hash Computation

No buffering whole file
Compute SHA-256 while uploading

Network IO + Hash CPU overlap

This avoids extra latency.

E. CDN for Reads

Signed URL → CDN edge
Read latency ≈ 5–20 ms

Latency Summary

Duplicate uploads: < 50 ms
New uploads: network-dominated
Reads: CDN-dominated
Improvements come from short-circuiting + async

Back to Top

2. Throughput

What is Throughput?

Throughput = number of successful uploads per second

Measured as:

RPS (requests per second)
MB/s (data throughput)

Throughput Formula

Max Throughput =
  min(
    Upload Service capacity,
    Network bandwidth,
    Object storage write capacity,
    Metadata DB write capacity
  )

Back to Top

Upload Service Throughput

Because upload services are stateless:

Total throughput =
  Instances × throughput per instance

Example:

1 instance → 200 concurrent uploads
10 instances → 2000 concurrent uploads

Why Upload Service Is NOT the Bottleneck

It streams data
Minimal CPU (just hashing)
Horizontally scalable

Real Bottlenecks

A. Object Storage

PUT request rate
Sustained bandwidth

Mitigation:

Multipart uploads
Parallel chunk uploads
Direct-to-object-storage (bypass app servers)

B. Metadata DB

Writes per upload:

Insert content (conditional)
Insert asset
Increment reference count

At scale:

Writes/sec = uploads/sec × metadata ops

Mitigation:

Sharding by contentId hash
Batch writes
Conditional writes (avoid locking)

C. Cache Hotspots

Popular viral images → same hash → contention

Mitigation:

Shard cache by hash prefix
Short-lived distributed locks
Bloom filters

How to Improve Throughput

1. Direct-to-Object Uploads

Client uploads directly using signed URLs

Client → Object Storage
Server → Metadata only

📈 Massive throughput gain.

2. Multipart Uploads

Upload chunks in parallel
Resume on failure
Reduces retry cost

3. Horizontal Autoscaling

Scale on:

CPU
Network IO
Queue lag

4. Backpressure

If downstream is slow:

Slow down uploads
Return 429 with retry-after

Prevents cascading failures.

Throughput Summary

Stateless services → linear scaling
Object storage & DB are main limits
Direct uploads + sharding unlock massive scale

3. Storing Image Content (Efficiency & Cost)

Key Metric

Storage cost = Unique content size × replication factor

Deduplication reduces:

Total uploads ≠ total stored bytes

Dedup Ratio Calculation

Dedup Ratio =
  (Total uploaded bytes - Stored bytes)
  / Total uploaded bytes

Example:

Uploaded = 100 TB
Stored = 20 TB
Dedup ratio = 80%

Why Content-Addressed Storage Helps

One blob per unique hash
Immutable
Safe concurrency

Cost Optimization Techniques

Lifecycle rules (hot → cold → archive)
Delete when referenceCount = 0
Tiering thumbnails separately

Back to Top

4. Fault Tolerance (Performance-Aware)

Fault tolerance directly impacts latency and throughput under failure.

Failure Scenarios

Failure	Impact
Upload service crash	Retry safe
Cache failure	Fallback to DB
DB replica down	Increased latency
Region outage	DNS failover

How We Measure Impact

Latency spike (P95, P99)
Error rate increase
Queue lag growth

How We Improve Fault Tolerance Without Killing Performance

A. Multi-AZ

Synchronous writes within AZ
Low latency failover

B. Graceful Degradation

If cache down → DB only
If precheck fails → allow upload

C. Idempotent Retries

Retries do not amplify load
Prevent duplicate writes

D. Async Recovery

Orphan cleaner
Metadata ↔ object reconciler

Fault Tolerance Summary

Failures slow the system, not break it
Idempotency + retries keep throughput stable
Observability catches issues before users do

Final Summary (Interview Ready)

This system uses content-addressed storage with SHA-256 hashing to guarantee exact deduplication, idempotent uploads, and high storage efficiency. Stateless services, async processing, and object storage ensure high throughput, while caching and prechecks provide low latency. Strong metadata consistency, reference counting, and conditional writes make the system safe, scalable, and production-grade.

More Details:

Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali

Git: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

Table of Contents

Problem Statement

Requirements

Functional Requirements

Non Functional Requirements

High Level Components (HLD)

Core Components

Simple ASCII Architecture Diagram

Key Concepts Explained

1. Cryptographic Hash (SHA-256)

What is it?

Properties

Why we use it

Diagram

2. Idempotency

What is it?

Problem it solves

Example

How we implement

Diagram

Complete Upload Flows

Flow 1: Direct Upload (No Duplication)

Steps

Diagram

Flow 2: Direct Upload (Duplicate Already Exists)

Steps

Diagram

Back to Top

Flow 3: Same Image Uploaded by Multiple Users

Key Idea

Steps

Diagram

API Design

Multipart Upload Example

Storage Design

Storage Options

Object Storage

Metadata Storage

Stored in Database

Content Table

Asset Table

System Characteristics – Deep Dive

1. Latency

What latency means here

Latency Breakdown (Upload Path)

Typical numbers

How to Calculate Latency

Exact-duplicate short-circuit

New upload

Latency Bottlenecks

How We Improve Latency

A. Early Short-Circuit (Biggest Win)

B. Edge & Region Routing

C. Async Everything Except the Critical Path

D. Streaming Hash Computation

E. CDN for Reads

Latency Summary

Back to Top

2. Throughput

What is Throughput?

Throughput Formula

Back to Top

Upload Service Throughput

Why Upload Service Is NOT the Bottleneck

Real Bottlenecks

A. Object Storage

B. Metadata DB

C. Cache Hotspots

How to Improve Throughput

1. Direct-to-Object Uploads

2. Multipart Uploads

3. Horizontal Autoscaling

4. Backpressure

Throughput Summary

3. Storing Image Content (Efficiency & Cost)

Key Metric

Dedup Ratio Calculation

Why Content-Addressed Storage Helps

Cost Optimization Techniques

Back to Top