DEV Community

ZeeshanAli-0704
ZeeshanAli-0704

Posted on

Scalable Image upload and storage system - HLD

Table of Contents

Problem Statement

Design a scalable image upload and storage system that supports millions of users, efficiently handles duplicate uploads, guarantees data integrity, and provides low-latency access while being secure, fault tolerant, and cost-effective.

The system should:

  • Avoid storing the same image multiple times
  • Support retries safely
  • Scale globally
  • Ensure strong security and reliability

Back to Top

Requirements

Functional Requirements

  • Users can upload images
  • Detect exact duplicates
  • Allow multiple users to reference the same image
  • Generate thumbnails asynchronously
  • Fetch images securely
  • Delete images safely
  • Support retries without duplication (idempotency)

Non Functional Requirements

  • Low latency (< 50 ms for duplicate detection)
  • High throughput (high RPS, millions of uploads)
  • High availability (99.9%+)
  • Strong consistency for metadata
  • Secure storage and access
  • Fault tolerance (multi-AZ, multi-region)
  • Cost-efficient storage (deduplication)

Back to Top

High Level Components (HLD)

Core Components

  1. Client
  2. API Gateway
  3. Upload Service
  4. Cache (Redis)
  5. Metadata Database
  6. Object Storage
  7. Async Processing (Queue + Workers)
  8. CDN
  9. Security & IAM

Simple ASCII Architecture Diagram

        Client
          |
          v
   +--------------+
   | API Gateway  |
   +--------------+
          |
          v
   +------------------+
   | Upload Service   |
   | (Stateless)      |
   +------------------+
     |        |
     v        v
+--------+  +----------------+
| Cache  |  | Metadata DB    |
| Redis  |  | (Content/Asset)|
+--------+  +----------------+
     |
     v
+------------------+
| Object Storage   |
| (SHA-256 key)    |
+------------------+
     |
     v
+------------------+
| Async Workers    |
| (thumbs, scan)  |
+------------------+
Enter fullscreen mode Exit fullscreen mode

Back to Top

Key Concepts Explained


1. Cryptographic Hash (SHA-256)

What is it?

A fixed-length fingerprint of data.

Image bytes  --->  SHA-256  --->  e3b0c44298fc1c149...
Enter fullscreen mode Exit fullscreen mode

Properties

  • Same input → same output
  • Impossible to reverse
  • Extremely unlikely collisions

Why we use it

  • Exact deduplication
  • Content-addressed storage
  • Data integrity verification

Diagram

[ Image Bytes ]
       |
       v
[ SHA-256 Hash ]
       |
       v
Object Key = /objects/sha256/<hash>
Enter fullscreen mode Exit fullscreen mode

Back to Top

2. Idempotency

What is it?

Repeating the same request does not create duplicates.

Problem it solves

Network retries, client crashes, duplicate submissions.

Example

Client retries upload → Server returns same assetId
(no extra image stored)
Enter fullscreen mode Exit fullscreen mode

How we implement

  • Idempotency keys
  • Conditional DB inserts
  • Conditional object PUT

Diagram

Client
  |
  |-- upload (key=abc123)
  |
Server
  |
  |-- already processed?
       | yes → return same response
       | no  → process upload
Enter fullscreen mode Exit fullscreen mode

Back to Top

Complete Upload Flows


Flow 1: Direct Upload (No Duplication)

Steps

  1. Client uploads image
  2. Upload service streams image
  3. SHA-256 computed
  4. Object stored at /objects/sha256/<hash>
  5. Metadata created
  6. assetId returned

Diagram

Client
  |
  v
Upload Service
  |
  |-- compute SHA-256
  |
  v
Object Storage (new object)
  |
  v
Metadata DB (new content + asset)
Enter fullscreen mode Exit fullscreen mode

Flow 2: Direct Upload (Duplicate Already Exists)

Steps

  1. Client uploads image
  2. SHA-256 computed
  3. Metadata lookup finds existing contentId
  4. No object stored
  5. Only new asset reference created

Diagram

Upload Service
  |
  |-- SHA-256
  |
  |-- contentId exists?
        |
        v
Metadata DB → yes
        |
        v
Create new asset reference
Enter fullscreen mode Exit fullscreen mode

Back to Top

Flow 3: Same Image Uploaded by Multiple Users

Key Idea

  • One image blob
  • Multiple asset references

Steps

  1. User A uploads image → stored
  2. User B uploads same image
  3. Same SHA-256 detected
  4. Reference count increments

Back to Top

Diagram

           +----------------+
User A --->| Asset A        |
           | contentId X   |
           +----------------+
                   |
                   v
            +------------------+
            | Image Blob X     |
            | (stored once)   |
            +------------------+
                   ^
           +----------------+
User B --->| Asset B        |
           | contentId X   |
           +----------------+
Enter fullscreen mode Exit fullscreen mode

Back to Top


API Design

API Purpose Explanation
POST /assets/precheck Detect duplicates Avoid upload if already present
POST /assets/upload Upload image Handles multipart uploads
GET /assets/{id} Fetch image Returns signed URL
DELETE /assets/{id} Delete asset Decrements reference count

Multipart Upload Example

POST /assets/upload
Content-Type: multipart/form-data

--boundary
Content-Disposition: form-data; name="file"; filename="img.png"
(binary data)
--boundary--
Enter fullscreen mode Exit fullscreen mode

Used for:

  • Large files
  • Resume uploads
  • Network reliability

Back to Top

Storage Design

Storage Options

  • Local disk ❌ (not scalable)
  • Block storage ❌
  • Object Storage ✅ (Best choice)

Object Storage

  • Keyed by SHA-256
  • Immutable blobs
  • Built-in durability (11 9’s)
  • Lifecycle policies
  • Cross-region replication
/objects/sha256/abcd1234...
Enter fullscreen mode Exit fullscreen mode

Back to Top

Metadata Storage

Stored in Database

(Relational or Distributed NoSQL)

Content Table

  • contentId (SHA-256)
  • size
  • mimeType
  • checksum
  • createdAt
  • referenceCount

Asset Table

  • assetId
  • ownerId
  • contentId
  • filename
  • ACLs
  • status

System Characteristics – Deep Dive

We’ll focus on four dimensions:

  1. Latency
  2. Throughput
  3. Storing Image Content (Efficiency & Cost)
  4. Fault Tolerance (Impact on performance)

--- Back to Top

1. Latency

What latency means here

Latency is the time taken to complete a user-visible operation, mainly:

  • Upload request
  • Duplicate detection
  • Fetching image

We usually care about P95 / P99 latency, not average.


Latency Breakdown (Upload Path)

Total Latency =
  Network RTT
+ API Gateway processing
+ Deduplication check
+ Object storage write
+ Metadata DB write
Enter fullscreen mode Exit fullscreen mode

Back to Top

Typical numbers

Stage Approx latency
Network RTT 10–40 ms
API Gateway 2–5 ms
Cache lookup ~1 ms
DB lookup 3–10 ms
Object store PUT 20–100+ ms

How to Calculate Latency

Exact-duplicate short-circuit

Latency = RTT + Cache lookup + Metadata read
≈ 10 + 1 + 5 = ~16 ms (P50)
Enter fullscreen mode Exit fullscreen mode

New upload

Latency = Upload time + Hash compute + Object store commit
Enter fullscreen mode Exit fullscreen mode

Back to Top

Upload time depends on:

Upload time = File size / Network bandwidth
Enter fullscreen mode Exit fullscreen mode

Example:

5 MB image / 10 Mbps ≈ 4 seconds
Enter fullscreen mode Exit fullscreen mode

So upload latency is network-bound, not CPU-bound.


Latency Bottlenecks

  • Network distance (client → region)
  • Object storage commit time
  • Synchronous thumbnail generation ❌

Back to Top

How We Improve Latency

A. Early Short-Circuit (Biggest Win)

  • Precheck with quickHash
  • Cache-first lookup
  • Skip upload completely for duplicates

📉 Reduces latency from seconds → milliseconds


B. Edge & Region Routing

  • Geo-DNS routes client to nearest region
  • Reduces RTT
RTT India → India region ≈ 10–20 ms
RTT India → US region ≈ 200+ ms
Enter fullscreen mode Exit fullscreen mode

C. Async Everything Except the Critical Path

  • Virus scan → async
  • Thumbnail → async
  • Near-duplicate ML → async

Only hash + store + metadata write stays synchronous.

Back to Top


D. Streaming Hash Computation

  • No buffering whole file
  • Compute SHA-256 while uploading
Network IO + Hash CPU overlap
Enter fullscreen mode Exit fullscreen mode

This avoids extra latency.


E. CDN for Reads

  • Signed URL → CDN edge
  • Read latency ≈ 5–20 ms

Latency Summary

  • Duplicate uploads: < 50 ms
  • New uploads: network-dominated
  • Reads: CDN-dominated
  • Improvements come from short-circuiting + async

Back to Top

2. Throughput

What is Throughput?

Throughput = number of successful uploads per second

Measured as:

RPS (requests per second)
MB/s (data throughput)
Enter fullscreen mode Exit fullscreen mode

Throughput Formula

Max Throughput =
  min(
    Upload Service capacity,
    Network bandwidth,
    Object storage write capacity,
    Metadata DB write capacity
  )
Enter fullscreen mode Exit fullscreen mode

Back to Top

Upload Service Throughput

Because upload services are stateless:

Total throughput =
  Instances × throughput per instance
Enter fullscreen mode Exit fullscreen mode

Example:

1 instance → 200 concurrent uploads
10 instances → 2000 concurrent uploads
Enter fullscreen mode Exit fullscreen mode

Why Upload Service Is NOT the Bottleneck

  • It streams data
  • Minimal CPU (just hashing)
  • Horizontally scalable

Back to Top


Real Bottlenecks

A. Object Storage

  • PUT request rate
  • Sustained bandwidth

Mitigation:

  • Multipart uploads
  • Parallel chunk uploads
  • Direct-to-object-storage (bypass app servers)

B. Metadata DB

Writes per upload:

  • Insert content (conditional)
  • Insert asset
  • Increment reference count

At scale:

Writes/sec = uploads/sec × metadata ops
Enter fullscreen mode Exit fullscreen mode

Mitigation:

  • Sharding by contentId hash
  • Batch writes
  • Conditional writes (avoid locking)

Back to Top

C. Cache Hotspots

Popular viral images → same hash → contention

Mitigation:

  • Shard cache by hash prefix
  • Short-lived distributed locks
  • Bloom filters

How to Improve Throughput

1. Direct-to-Object Uploads

Client uploads directly using signed URLs

Client → Object Storage
Server → Metadata only
Enter fullscreen mode Exit fullscreen mode

📈 Massive throughput gain.


Back to Top

2. Multipart Uploads

  • Upload chunks in parallel
  • Resume on failure
  • Reduces retry cost

3. Horizontal Autoscaling

Scale on:

  • CPU
  • Network IO
  • Queue lag

4. Backpressure

If downstream is slow:

  • Slow down uploads
  • Return 429 with retry-after

Prevents cascading failures.


Back to Top

Throughput Summary

  • Stateless services → linear scaling
  • Object storage & DB are main limits
  • Direct uploads + sharding unlock massive scale

3. Storing Image Content (Efficiency & Cost)

Key Metric

Storage cost = Unique content size × replication factor
Enter fullscreen mode Exit fullscreen mode

Deduplication reduces:

Total uploads ≠ total stored bytes
Enter fullscreen mode Exit fullscreen mode

Back to Top

Dedup Ratio Calculation

Dedup Ratio =
  (Total uploaded bytes - Stored bytes)
  / Total uploaded bytes
Enter fullscreen mode Exit fullscreen mode

Example:

Uploaded = 100 TB
Stored = 20 TB
Dedup ratio = 80%
Enter fullscreen mode Exit fullscreen mode

Why Content-Addressed Storage Helps

  • One blob per unique hash
  • Immutable
  • Safe concurrency

Cost Optimization Techniques

  • Lifecycle rules (hot → cold → archive)
  • Delete when referenceCount = 0
  • Tiering thumbnails separately

Back to Top

4. Fault Tolerance (Performance-Aware)

Fault tolerance directly impacts latency and throughput under failure.


Failure Scenarios

Failure Impact
Upload service crash Retry safe
Cache failure Fallback to DB
DB replica down Increased latency
Region outage DNS failover

How We Measure Impact

  • Latency spike (P95, P99)
  • Error rate increase
  • Queue lag growth

How We Improve Fault Tolerance Without Killing Performance

A. Multi-AZ

  • Synchronous writes within AZ
  • Low latency failover

B. Graceful Degradation

  • If cache down → DB only
  • If precheck fails → allow upload

C. Idempotent Retries

  • Retries do not amplify load
  • Prevent duplicate writes

D. Async Recovery

  • Orphan cleaner
  • Metadata ↔ object reconciler

Back to Top

Fault Tolerance Summary

  • Failures slow the system, not break it
  • Idempotency + retries keep throughput stable
  • Observability catches issues before users do

Final Summary (Interview Ready)

This system uses content-addressed storage with SHA-256 hashing to guarantee exact deduplication, idempotent uploads, and high storage efficiency. Stateless services, async processing, and object storage ensure high throughput, while caching and prechecks provide low latency. Strong metadata consistency, reference counting, and conditional writes make the system safe, scalable, and production-grade.

Back to Top

More Details:

Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali

Git: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

Top comments (0)