Designing Instagram Stories: From Upload to Expiry

#lld #hld #systemdesign #softwareengineering

Instagram Stories and WhatsApp Status look simple: upload media, show it to followers, and delete it after 24 hours. Under the hood, however, they are classic examples of ephemeral, large-scale distributed systems.

In this article, we’ll walk through a high-level and low-level system design of a Stories/Status system, focusing on architecture, key components, and lifecycle management.

1. Core Requirements

1.1 Functional requirements:

Users can post photo or video stories
Stories expire automatically after 24 hours
Visibility is restricted (followers, close friends, contacts)
Viewers can see and react to stories

1.2 Non-functional requirements:

Low latency for feed loading
High availability
Horizontal scalability
Eventual consistency is acceptable

2. High-Level Architecture (HLD)

At a high level, the system is split into independent services, each responsible for a single concern:

API Gateway – authentication, authorization, routing, rate limiting
Story Service – story creation and lifecycle management
Content Service – media, text, and link handling
Feed Service – story feed generation
Visibility Service – privacy and audience enforcement
Expiration Service – 24-hour TTL handling
Kafka & Background Workers – asynchronous processing
Analytics & Notification Services – engagement insights and alerts

3. Story Creation Flow (Write Path)

When a user posts a story:

The client sends a request through the API Gateway
The Content Service returns a pre-signed upload URL
The client uploads media directly to object storage (e.g., S3)
The Story Service stores metadata with a 24-hour expiration timestamp

Key design decision:
Media never flows through backend services. Only lightweight metadata is stored, while media is delivered via CDN.

4. Story Consumption Flow (Read Path)

When a user opens the stories tray:

Feed Service fetches active stories from followed users
Visibility Service filters stories based on privacy rules
Expired stories are ignored
Media URLs are returned to the client
The client streams media directly from the CDN

This design is optimized for read-heavy traffic, which dominates story usage.

5. Visibility and Privacy Rules

Stories support multiple visibility modes:

Followers only

Close friends

Contact-based visibility (WhatsApp Status)

Blocked users

A dedicated Visibility Service enforces these rules using:

Followers / contacts graph
Redis caching for fast permission checks

By isolating visibility logic, privacy rules remain consistent and easy to evolve.

6. Expiration and Lifecycle Management

Ephemeral content is treated as a first-class concern:

Each story has a strict 24-hour TTL

Expiration Service monitors story timestamps

Expired stories trigger lifecycle events

Expired content is no longer served

This guarantees correctness even under high traffic.

7. Event-Driven Cleanup Using Kafka

Kafka is used to decouple lifecycle events from cleanup logic.

Typical events include:

story_created

story_expired

story_viewed

story_reacted

A Media Cleanup Worker consumes expiration events and:

Deletes media from object storage
Removes CDN references

Cleanup happens asynchronously, keeping user-facing APIs fast.

8. Engagement Tracking (Views & Reactions)

User engagement is handled by separate services:

View Tracking Service – tracks story views

Reaction Service – likes, emojis, and replies

These services:

Handle extremely high write throughput

Are eventually consistent

Do not impact feed read performance

Engagement data is later aggregated for analytics.

Final Thoughts

Stories systems are deceptively complex. By designing explicitly for expiration, visibility, and scale, we can build systems that are resilient, efficient, and easy to evolve.

This design closely mirrors how platforms like Instagram and WhatsApp handle ephemeral content at massive scale.

This design is open to improvements and reviews. I’d love to hear feedback or alternative approaches.This is my first design.