Designing a Playback Resume System at Scale (It’s Not Just a Timestamp)

Jyotheendra Doddala — Sun, 15 Feb 2026 02:05:44 +0000

At a surface level, this sounds trivial, as it’s just storing userId, videoId, and timestamp.

But not when:

Millions of users press play at the same time
People switch from TV to phone in seconds
Writes happen every few seconds
Resume must feel instant

1. Clarifying the Problem

We are designing a Playback Resume System that allows users to resume watching from where they left off across devices.

We are not designing:

The video streaming pipeline itself
Real-time co-watch (two users watching in sync)
Multi-region replication, global failover, or cross-region consistency trade-offs
Perfect real-time synchronisation across devices (1–2 second eventual consistency is acceptable)

This service would live within an existing micro-services architecture, so I won’t deep-dive into service discovery, deployment, etc., and will focus purely on the playback state.

2. Functional Requirements (User Centric)

User should be able to resume a video from the last watched position.
User should be able to switch devices and continue seamlessly.
User should have an independent watch history per profile.
The system should update the playback position periodically while watching.
Latest progress should win if multiple devices update.

3. Non-Functional Requirements

Resume reads <150ms
Writes <500ms
High availability
Scalable to millions of concurrent users
Eventual consistency across devices is acceptable (1–2 sec lag)

CAP Theorem Consideration

During network partitions, we prefer Availability + Partition tolerance over Strong Consistency + Partition tolerance.

Why?

Because if one replica is slightly behind, the user resuming 1 second earlier is acceptable. But we cannot afford downtime.

So:

High availability > Strong consistency
Eventual consistency + last write wins is good enough

For playback, you won't need perfection. Responsiveness is.

4. Data Model

Instead of user_id, we use:

(account_id, profile_id, video_id)

Because in one household:

Account 123
├── Profile A → V1 → 1200s
└── Profile B → V1 → 300s

Each profile tracks progress independently.

We also store:

position
updated_at
device_id

updated_at enables conflict resolution.

Note: Using updated_at for last write wins assumes reasonably synchronised clocks. In production, this is typically handled using server-generated timestamps or monotonic counters. I’m keeping the conflict resolution logic simple here to focus on system behaviour rather than clock management.

5. Scale Estimation

Assume:

10M daily users
3M actively watching
Update every 10 seconds
30 min session → ~180 updates

That’s:

~540M writes/day
~6K writes/sec

This is not a small system. Logical reads are similar in magnitude, but database reads are significantly reduced via caching.

Smarter Write Strategy

In reality, we don’t unthinkingly update every 10 seconds.
We optimise by writing only when:

Position delta > 15–30 seconds
OR user pauses
OR the app goes to the background
OR periodic checkpoint (e.g., every 60 seconds)

This reduces:

Write amplification
Cache churn
Queue pressure
Database cost

That 540M/day number can realistically drop 3–5x with smarter checkpointing.

6. API Design

Update Playback

POST /playback/update

Body:

account_id
profile_id
video_id
position
device_id

Resume Playback

GET /playback/resume

7. Start Simple: DB-Only

We could store everything in DynamoDB/Cassandra.

Primary key:

(account_id#profile_id, video_id)

Pros:

Simple
Durable
Easy to scale horizontally

Cons:

Every resume hits DB
Higher latency at scale
Costly under heavy read traffic

Good for MVP. But not ideal for massive scale.

8. Hybrid Architecture

Because a resume is latency sensitive and read-heavy, we introduce caching.

High Level Design

How It Works

Write Flow

Client sends update.
Service performs a conditional write to the DB (if updated_at is newer).
Redis cache is updated.
Event is optionally published for analytics.

Conditional Writes (Idempotency)
To avoid stale overwrites, we use conditional writes:

Update only if incoming.updated_at > existing.updated_at

This ensures:

Last write wins
Safe retries
No duplicate corruption

Redis Crash Safety
Instead of writing only to Redis first:
We persist to DB first (durable), then update Redis.

In a worst-case scenario, if Redis crashes, the DB remains the source of truth. We prefer durability over extreme write latency savings.

Read Flow

Check Redis.
If hit → instant resume.
If miss → fetch from DB → repopulate cache.

Most reads should never touch the database.

Brief stale reads may occur due to replication lag, acceptable under our 1–2 second tolerance.

Failure Handling & Retries
In production, both Redis and the database may occasionally timeout or throttle under load.

To protect latency SLOs:

Reads fall back to DB if Redis times out.
Writes use bounded retries with exponential backoff.
Timeouts are enforced at the service layer to avoid request pile-ups.

If a write ultimately fails, we prefer dropping that checkpoint rather than blocking playback. The next update will reconcile the state due to our last write-win logic.

9. Multi-Device Conflict Handling

If TV and phone both send updates:

Compare updated_at
Latest wins

We accept slight inconsistencies because availability matters more.
That’s our CAP trade-off in action.

10. Other Production Considerations

Storage Lifecycle (TTL)
Playback entries shouldn’t live forever. We can expire inactive entries after X days (e.g., 180 days) using TTL policies.

This prevents:

Unbounded storage growth
Cold data occupying hot partitions Hot Partition Prevention If we partitioned incorrectly (e.g., by video_id), A trending show at 8 pm could create hot shards. Using:

(account_id#profile_id, video_id)

Ensures even distribution and avoids the hot partition problem.

Proper database capacity planning or auto scaling is required to handle peak write bursts and avoid write throttling under load.

11. UX Guardrails & Data Freshness

A resume should feel intuitive and not surprising.
To prevent confusing jumps in playback:

If a resume position differs by only a few seconds, the client may ignore minor regressions.
We may cap backward jumps beyond a safety threshold (e.g., don’t resume 5 minutes earlier unless requested).
Clients can display “Resume from 11:11?” to give users control when conflicts occur.

This keeps the system technically simple (last-write-wins) while protecting the user experience from edge-case inconsistencies.

Final Thoughts

This problem looks like a key-value store.
It’s not.
It touches:

Distributed systems
Caching strategy
Conflict resolution
UX latency expectations
CAP trade-offs
Data modelling for real households

DEV Community: Jyotheendra Doddala