DEV Community

Jyotheendra Doddala
Jyotheendra Doddala

Posted on

Designing a Playback Resume System at Scale (It’s Not Just a Timestamp)

At a surface level, this sounds trivial, as it’s just storing userId, videoId, and timestamp.

Seems easy enough

But not when:

  • Millions of users press play at the same time
  • People switch from TV to phone in seconds
  • Writes happen every few seconds
  • Resume must feel instant

1. Clarifying the Problem

We are designing a Playback Resume System that allows users to resume watching from where they left off across devices.

We are not designing:

  • The video streaming pipeline itself
  • Real-time co-watch (two users watching in sync)
  • Multi-region replication, global failover, or cross-region consistency trade-offs
  • Perfect real-time synchronisation across devices (1–2 second eventual consistency is acceptable)

This service would live within an existing micro-services architecture, so I won’t deep-dive into service discovery, deployment, etc., and will focus purely on the playback state.

2. Functional Requirements (User Centric)

  • User should be able to resume a video from the last watched position.
  • User should be able to switch devices and continue seamlessly.
  • User should have an independent watch history per profile.
  • The system should update the playback position periodically while watching.
  • Latest progress should win if multiple devices update.

3. Non-Functional Requirements

  • Resume reads <150ms
  • Writes <500ms
  • High availability
  • Scalable to millions of concurrent users
  • Eventual consistency across devices is acceptable (1–2 sec lag)

CAP Theorem Consideration

During network partitions, we prefer Availability + Partition tolerance over Strong Consistency + Partition tolerance.

Why?

Because if one replica is slightly behind, the user resuming 1 second earlier is acceptable. But we cannot afford downtime.

So:

  • High availability > Strong consistency
  • Eventual consistency + last write wins is good enough

For playback, you won't need perfection. Responsiveness is.

4. Data Model

Instead of user_id, we use:

(account_id, profile_id, video_id)
Enter fullscreen mode Exit fullscreen mode

Because in one household:

Account 123
├── Profile A → V1 → 1200s
└── Profile B → V1 → 300s
Enter fullscreen mode Exit fullscreen mode

Each profile tracks progress independently.

We also store:

position
updated_at
device_id
Enter fullscreen mode Exit fullscreen mode

updated_at enables conflict resolution.

Note: Using updated_at for last write wins assumes reasonably synchronised clocks. In production, this is typically handled using server-generated timestamps or monotonic counters. I’m keeping the conflict resolution logic simple here to focus on system behaviour rather than clock management.

5. Scale Estimation

Assume:

  • 10M daily users
  • 3M actively watching
  • Update every 10 seconds
  • 30 min session → ~180 updates

That’s:

~540M writes/day
~6K writes/sec

Okay... this escalated

This is not a small system. Logical reads are similar in magnitude, but database reads are significantly reduced via caching.

Smarter Write Strategy

In reality, we don’t unthinkingly update every 10 seconds.
We optimise by writing only when:

  • Position delta > 15–30 seconds
  • OR user pauses
  • OR the app goes to the background
  • OR periodic checkpoint (e.g., every 60 seconds)

This reduces:

  • Write amplification
  • Cache churn
  • Queue pressure
  • Database cost

That 540M/day number can realistically drop 3–5x with smarter checkpointing.

6. API Design

Update Playback

POST /playback/update
Enter fullscreen mode Exit fullscreen mode

Body:

  • account_id
  • profile_id
  • video_id
  • position
  • device_id

Resume Playback

GET /playback/resume
Enter fullscreen mode Exit fullscreen mode

7. Start Simple: DB-Only

We could store everything in DynamoDB/Cassandra.

Primary key:

(account_id#profile_id, video_id)
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Simple
  • Durable
  • Easy to scale horizontally

Cons:

  • Every resume hits DB
  • Higher latency at scale
  • Costly under heavy read traffic

Good for MVP. But not ideal for massive scale.

8. Hybrid Architecture

Because a resume is latency sensitive and read-heavy, we introduce caching.

High Level Design

HLD for Playback resume system

How It Works

Write Flow

  • Client sends update.
  • Service performs a conditional write to the DB (if updated_at is newer).
  • Redis cache is updated.
  • Event is optionally published for analytics.

Conditional Writes (Idempotency)
To avoid stale overwrites, we use conditional writes:

Update only if incoming.updated_at > existing.updated_at

This ensures:

  • Last write wins
  • Safe retries
  • No duplicate corruption

Redis Crash Safety
Instead of writing only to Redis first:
We persist to DB first (durable), then update Redis.

In a worst-case scenario, if Redis crashes, the DB remains the source of truth. We prefer durability over extreme write latency savings.

Read Flow

  • Check Redis.
  • If hit → instant resume.
  • If miss → fetch from DB → repopulate cache.

Most reads should never touch the database.

Brief stale reads may occur due to replication lag, acceptable under our 1–2 second tolerance.

Failure Handling & Retries
In production, both Redis and the database may occasionally timeout or throttle under load.

To protect latency SLOs:

  • Reads fall back to DB if Redis times out.
  • Writes use bounded retries with exponential backoff.
  • Timeouts are enforced at the service layer to avoid request pile-ups.

If a write ultimately fails, we prefer dropping that checkpoint rather than blocking playback. The next update will reconcile the state due to our last write-win logic.

9. Multi-Device Conflict Handling

If TV and phone both send updates:

  • Compare updated_at
  • Latest wins

We accept slight inconsistencies because availability matters more.
That’s our CAP trade-off in action.

10. Other Production Considerations

Storage Lifecycle (TTL)
Playback entries shouldn’t live forever. We can expire inactive entries after X days (e.g., 180 days) using TTL policies.

This prevents:

  • Unbounded storage growth
  • Cold data occupying hot partitions Hot Partition Prevention If we partitioned incorrectly (e.g., by video_id), A trending show at 8 pm could create hot shards. Using:
(account_id#profile_id, video_id)
Enter fullscreen mode Exit fullscreen mode

Ensures even distribution and avoids the hot partition problem.

Proper database capacity planning or auto scaling is required to handle peak write bursts and avoid write throttling under load.

11. UX Guardrails & Data Freshness

A resume should feel intuitive and not surprising.
To prevent confusing jumps in playback:

  • If a resume position differs by only a few seconds, the client may ignore minor regressions.
  • We may cap backward jumps beyond a safety threshold (e.g., don’t resume 5 minutes earlier unless requested).
  • Clients can display “Resume from 11:11?” to give users control when conflicts occur.

This keeps the system technically simple (last-write-wins) while protecting the user experience from edge-case inconsistencies.

Final Thoughts

This problem looks like a key-value store.
It’s not.
It touches:

  • Distributed systems
  • Caching strategy
  • Conflict resolution
  • UX latency expectations
  • CAP trade-offs
  • Data modelling for real households

Top comments (0)