Janmejai Singh

Posted on May 31

How Instagram, WhatsApp, Uber & Netflix Would Be Built Today Using Expo Router

#reactnative #mobiledev #systemdesign #architecture

You tap Record. You nail the take. You save as Draft. You close Instagram, reopen it — and your Reel is right there. What just happened under the hood?

This post traces that full journey: from your device's camera buffer to your follower's screen halfway across the world. We'll cover local storage, drafts, cloud uploads, media processing, caching, and CDNs — with diagrams at each stage.

Why Social Media Apps Need Efficient Media Storage

Photos and videos are the heaviest assets any mobile app handles. A single 60-second Reel captured at 1080p can be 200–400MB before any compression. Multiply that by a billion daily uploads and you begin to understand why media storage is not a solved problem — it's an ongoing engineering discipline.

Efficiency matters at three levels:

Device level — local storage is limited; apps can't balloon in size
Network level — large uploads over mobile connections are fragile and slow
Cloud level — storing, processing, and serving billions of files requires distributed systems

Instagram's architecture solves each of these with different tools, layered together.

How Photos and Videos Are Stored Before Upload

When you record a Reel, the app doesn't immediately try to upload it. Your device is the first storage layer.

The recording pipeline looks like this:

Camera Sensor
     ↓
In-Memory Frame Buffer  ← (active recording, fast but volatile)
     ↓
Local Temp Cache        ← (flushed from buffer, survives brief interruptions)
     ↓
User Decision Point
  ├── Discard → Delete temp files
  ├── Save Draft → Move to persistent storage
  └── Post → Enqueue for upload

Why local-first?

Three reasons:

Networks are unreliable — especially on cellular. Streaming raw video directly to a cloud server mid-recording is a recipe for data loss.
User intent is unknown — the app doesn't know you'll keep the clip until you decide.
Upload costs — uploading every test take wastes bandwidth, battery, and server resources.

Local storage on mobile comes in different tiers:

Tier	Speed	Persistence	Cleared by
RAM buffer	Fastest	Lost on kill	OS memory pressure
Cache directory	Fast	Semi-persistent	OS under storage pressure
Documents directory	Fast	Persistent	Only explicit delete

Active recordings live in the cache. Once you decide to keep them — as a draft or a post — they move to the documents directory.

What Happens When a User Saves a Draft

A draft isn't just a video file. It's a bundle of state that must survive app restarts:

The raw video clip(s)
Applied audio track (and timestamp offset)
Stickers, text overlays, and their screen positions
Trim start/end points
Filter and effect choices
Selected thumbnail frame

When you tap "Save Draft," Instagram performs a transactional write:

1. Write video file to documents directory
2. Write metadata record to local SQLite database
   (file path, audio, effects, trim points, etc.)
3. Confirm both writes succeeded
4. Show "Draft Saved" confirmation

Either both writes succeed or neither is surfaced to the user. This is critical — a draft pointing to a missing file, or a video file with no matching metadata record, is worse than no draft at all.

How drafts survive app restarts:

On launch, Instagram queries the SQLite database for all draft records. For each record, it reads the stored file path and reconstructs the preview. No network call needed. The entire drafts tray is built from local data.

App Launch
    ↓
Query SQLite: SELECT * FROM drafts ORDER BY updated_at DESC
    ↓
For each row: read video file at stored path → generate thumbnail
    ↓
Render Drafts tray

Local Storage vs. Cloud Storage

These two worlds have opposite strengths:

┌──────────────────────────┬──────────────────────────┐
│     LOCAL STORAGE        │     CLOUD STORAGE        │
├──────────────────────────┼──────────────────────────┤
│ ✅ Near-zero latency     │ ✅ Unlimited scale        │
│ ✅ Works offline         │ ✅ Durable (replicated)   │
│ ✅ No network dependency │ ✅ Accessible everywhere  │
│ ❌ Limited capacity      │ ❌ Network dependent      │
│ ❌ Lost if device lost   │ ❌ Slower (network I/O)   │
│ ❌ Single device only    │ ❌ Costs money at scale   │
└──────────────────────────┴──────────────────────────┘

Instagram's architecture uses both — local for speed and offline capability, cloud for durability and sharing. The transition point is the upload pipeline.

Uploading Large Media Files Efficiently

Uploading a 200MB video over cellular as one HTTP POST is a terrible idea. One dropped packet and you start over. Instagram avoids this with two techniques:

1. Chunked / Resumable Uploads

The file is split into small chunks (e.g., 5MB each). Each chunk is uploaded independently. If a chunk fails, only that chunk retries.

Full Video: 200MB
 ├── Chunk 01 (5MB) → ✅
 ├── Chunk 02 (5MB) → ✅
 ├── Chunk 03 (5MB) → ❌ (network drop)
 │       └── Retry Chunk 03 → ✅
 ├── Chunk 04 (5MB) → ✅
 └── ... 36 more chunks

Server-side, the chunks are tracked by upload session ID. Once all are received, the server reassembles them in order.

2. Background Uploads

After you tap Post, you don't have to wait in the app. The upload continues in the background using OS-level transfer APIs:

iOS: URLSession with .background configuration
Android: WorkManager or JobScheduler

These transfers are managed by the operating system, not the app. They survive the app being backgrounded or even force-quit, and resume automatically when connectivity returns.

The Upload Queue

A local queue tracks upload status for each piece of media:

PENDING → IN_PROGRESS → COMPLETED
              ↓
           PAUSED (network lost)
              ↓
           RETRYING
              ↓
           FAILED (max retries hit)

This queue state is persisted to SQLite, so if the app is killed mid-upload, it knows exactly where to resume.

Media Processing and Compression

Once your raw video lands in Instagram's cloud, it enters a processing pipeline before being stored for delivery.

Raw Upload Received
        ↓
Validation
(corrupt file check, content policy scan, format verification)
        ↓
Transcoding
(re-encode to H.264/H.265, normalize codec/container)
        ↓
Multi-Resolution Encoding
(1080p, 720p, 480p, 360p — for adaptive bitrate streaming)
        ↓
Audio Processing
(normalize levels, validate sync)
        ↓
Thumbnail Extraction
(keyframe analysis + ML quality scoring)
        ↓
Distribute to Storage + CDN

Compression Concepts

Raw uploads are transcoded to significantly smaller files without noticeable quality loss:

Codec efficiency: H.265 (HEVC) compresses ~40% better than H.264 at equal visual quality
Variable Bitrate (VBR): Static scenes get fewer bits; fast-moving scenes get more — no bits wasted
Perceptual encoding: Human vision is less sensitive to chroma (color) detail than luma (brightness), so color channels are compressed more aggressively

A 200MB raw Reel might become a 12–20MB streamable video after processing.

Thumbnail Generation and Previews

Thumbnails are more than decoration — they're a performance optimization.

A thumbnail is a single JPEG, typically under 50KB, that can be fetched and displayed in milliseconds. Without thumbnails, every feed scroll would require pre-buffering video before anything appeared on screen.

How Thumbnails Are Created

During the processing pipeline:

Multiple keyframes are extracted from the video at regular intervals
Each keyframe is scored by an ML model for visual quality (avoiding blurry, over-exposed, or mid-blink frames)
The highest-scoring frame is selected (or the creator's manually chosen frame is used if available)
The selected frame is saved as a separate JPEG in object storage

Low-Quality Image Placeholders (LQIP)

Instagram also generates a tiny blur placeholder — sometimes just 20x20 pixels encoded as a base64 string in the API response. This renders instantly (before any network request for the actual thumbnail) and gives the feed a fluid scrolling feel even on slow connections.

Feed item appears
     ↓
Base64 blur hash renders (immediate — no network)
     ↓
Full thumbnail fetched from CDN (50–200ms)
     ↓
Swap: blur → thumbnail
     ↓
User taps play → ABR video stream begins

Caching Frequently Viewed Content

Every content request has a cost: network latency, server load, battery usage. Caching short-circuits that cost by keeping a local copy of recently used content.

Multi-Level Cache Architecture

Request for content
        ↓
Memory cache (L1) → HIT → Render (~0ms)
        ↓ MISS
Disk cache (L2)   → HIT → Render + promote to L1 (~5–20ms)
        ↓ MISS
CDN Edge          → HIT → Render + store in L2 (~30–100ms)
        ↓ MISS
Origin Storage    → Fetch → Render + propagate to CDN + store in L2

What Gets Cached

Content Type	Cache Location	TTL
Profile pictures	Disk (long TTL)	Days
Feed thumbnails	Disk (medium TTL)	Hours
Active video buffers	Memory	Session only
Stories (viewed)	Disk (short TTL)	Until expiry
Prefetched posts	Disk	Minutes

Prefetching: Getting Ahead of the User

While you're watching post #3, Instagram is already fetching thumbnails and initial video buffers for posts #6–10 in your feed. This makes scrolling feel instantaneous — by the time your thumb reaches those posts, the content is already on your device.

Prefetching is throttled by connection quality: aggressive on Wi-Fi, conservative on 4G, minimal on 3G.

Cache Invalidation

Knowing when to throw away cached content is hard. Instagram's approach:

TTL (Time-to-Live): Cached items expire after a configured duration
Cache-Control headers: Server communicates max-age per content type
Versioned URLs: When content changes (e.g., profile photo update), the URL hash changes, forcing a fresh fetch

Content Delivery Using CDNs

Your followers in Tokyo, Lagos, and Berlin all load your Reel at similar speeds because Instagram doesn't serve media directly from a central data center — it uses a Content Delivery Network.

How CDN Delivery Works

User in Tokyo requests Reel video
           ↓
DNS resolves to nearest CDN edge node (Tokyo)
           ↓
Edge node checks local cache
  ├── HIT → Serve immediately (low latency)
  └── MISS → Fetch from origin → Cache locally → Serve
           ↓
Response travels <50ms vs ~200ms from US origin

Meta operates its own CDN infrastructure alongside partnerships with commercial CDN providers. Edge nodes are placed in internet exchange points worldwide to minimize the physical distance content must travel.

Adaptive Bitrate Streaming (ABR)

Instagram doesn't just serve one video file — it serves a manifest file (HLS or DASH format) that lists multiple quality variants. Your device's player selects the appropriate variant based on real-time network conditions:

Excellent connection → 1080p @ 8Mbps
Good connection      → 720p  @ 4Mbps
Fair connection      → 480p  @ 2Mbps
Poor connection      → 360p  @ 800Kbps

If your connection degrades mid-playback, the player seamlessly switches to a lower quality tier without rebuffering — you see slightly less detail, but playback continues uninterrupted.

Managing Storage, Performance, and User Experience

All of these systems involve tradeoffs. More aggressive caching = faster loads but higher device storage use. More compression = smaller files but potential quality loss. More CDN edge nodes = lower latency but higher infrastructure cost.

Device Storage Management

Instagram's client periodically runs cache eviction to prevent disk bloat:

LRU eviction: Least-recently-used content is removed first
Size caps: Cache directories are bounded to a maximum size
OS pressure response: When device storage is low, the OS may clear cache directories without asking

Drafts are immune to OS-level eviction because they live in the documents directory — only explicit deletion removes them.

Adaptive Behavior Based on Context

Condition	System Response
On Wi-Fi	Aggressive prefetch, high quality, run pending uploads
On cellular	Conservative prefetch, adaptive quality, throttle background work
Low battery	Pause non-critical background tasks
Storage low	Evict cache aggressively, warn user if drafts at risk
App in background	Continue uploads, pause prefetching

Full Architecture: End-to-End

Let's map the complete journey from recording to playback:

┌─────────────────────────────────────────────────────────────────┐
│  DEVICE                                                          │
│  Record → Frame Buffer → Temp Cache                             │
│  Edit/Draft → Documents Dir + SQLite                            │
│  Post → Upload Queue → Chunked Upload (background)             │
└────────────────────────────┬────────────────────────────────────┘
                              │ HTTPS chunks
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  INSTAGRAM CLOUD                                                 │
│  Ingest API → Reassemble → Validation → Processing Pipeline     │
│  Transcode + Compress + Multi-res → Thumbnails → Object Storage │
└────────────────────────────┬────────────────────────────────────┘
                              │ Push to edge
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  CDN EDGE NETWORK                                               │
│  Edge Node (Tokyo) │ Edge Node (London) │ Edge Node (São Paulo) │
└───────────┬─────────────────┬───────────────────┬──────────────┘
            │                 │                   │
            ▼                 ▼                   ▼
       Viewer (JP)       Viewer (UK)        Viewer (BR)
   ABR stream → cache   ABR stream → cache  ABR stream → cache

Key Takeaways for Mobile Developers

Building a React Native app with media features? Steal these patterns:

Local-first: Always write to local storage before assuming network success
Atomic draft saves: Video file + metadata must succeed together or both fail
Chunked uploads: Never send large files as single requests; build retry logic per chunk
Background transfers: Use OS-level APIs, not in-app timers
Queue your uploads: Persist queue state to SQLite; recover from kills gracefully
Multi-level caching: Memory + disk + prefetch = fast perceived performance
CDN for media, API for metadata: Your app servers shouldn't serve video files
Adaptive quality: Detect connection quality and degrade gracefully

Wrapping Up

What looks like a simple "save" or "post" action is actually a cascade of decisions made across every layer of the stack: memory buffers, SQLite databases, upload queues, processing pipelines, object storage, CDN edge networks, and device-side caches.

Understanding this architecture helps you build mobile apps that feel fast, handle failures gracefully, and respect your users' devices and data connections. Architecture and product thinking are inseparable — every engineering decision is ultimately a UX decision.

What would you like to explore next — building a resumable upload queue in React Native, implementing multi-level caching, or designing a local draft system? Let me know in the comments!

DEV Community