DEV Community

Calvin Sturm
Calvin Sturm

Posted on

How a stale audio batch caused 7 seconds of A/V desync after backward seeks

I have been building FastPlay, a minimal native Windows video player written in Rust.

FastPlay repo: github.com/CalvinSturm/FastPlay

FastPlay uses FFmpeg for demuxing and decoding, D3D11/DXGI for video presentation, and WASAPI for audio output. The product goal is intentionally narrow: fast local playback on Windows, smooth seeking and scrubbing, recent files, resume playback, and a UI that gets out of the way.

During the v0.3.0 release cycle, one of the most important bugs I fixed was not a UI issue. It was a playback correctness issue.

After certain backward seeks, audio could jump several seconds ahead of video.

The worst case I reproduced was roughly 7 seconds of A/V desync.

The final fix was small. The lesson was bigger.

The symptom

FastPlay could open a file, play normally, pause, resume, seek forward, and scrub around without obvious problems. But after some backward seeks, playback came back badly out of sync.

The direction mattered.

Forward seeks were mostly fine. Backward seeks exposed the problem.

That was the first useful clue. If all seeks were broken, I would have suspected the general seek path, decoder flushing, or clock re-anchoring. But a bug that appears mainly after backward seeks usually means stale state is surviving somewhere.

A media seek is not just “move playback to another timestamp.” It invalidates a lot of pipeline state:

  • decoded packets in flight
  • queued video frames
  • queued audio samples
  • subtitle state
  • clock anchors
  • sink buffers
  • pending worker output
  • metrics tied to the old playback position

FastPlay already had generation checks in several places so stale video frames and seek results could not easily leak into the current playback timeline.

But one audio helper still had state that survived the seek.

The playback path

At a high level, FastPlay has separate video and audio paths.

Video goes through FFmpeg decode and D3D11 presentation.

Audio goes through FFmpeg decode, a small batching layer, and WASAPI shared-mode output.

The audio batching layer exists because decoded audio does not always arrive in the exact chunk shape the sink wants. It normalizes decoded samples into batches that can be submitted to the audio sink cleanly.

That batching layer had internal state:

  • pending decoded samples
  • the timestamp for the current batch
  • output format assumptions
  • batch sizing rules

That is normal for audio playback. The bug was that this state did not fully respect seek boundaries.

The bad assumption

The audio batcher assumed that if it had a partial batch, that batch still belonged to the current playback timeline.

Most of the time, that was true.

But a seek breaks that assumption.

The failure looked like this:

  1. Playback reaches around 14 seconds.
  2. AudioBatcher holds a partial audio batch stamped around 14 seconds.
  3. The user seeks backward to around 7 seconds.
  4. FFmpeg codec state is flushed.
  5. Decode resumes from the new seek target.
  6. New post-seek audio arrives around 7 seconds.
  7. The old partial audio batch is still alive.
  8. The batcher emits post-seek audio using the stale pre-seek timestamp.

That meant audio from around 7 seconds could be submitted as if it belonged around 14 seconds.

The audio master clock then anchored to the wrong timestamp. Video was trying to present around the seek target, while audio told the rest of the player that time was already several seconds ahead.

The visible result was simple:

Audio led video by several seconds after a backward seek.

Why forward seeks hid the bug

This bug was easier to miss because forward seeks accidentally masked it.

FastPlay already had seek discard logic. After a seek, decoded frames or samples before the seek target can be ignored until the decoder catches up to the intended timeline position.

That logic helps protect against stale work.

When seeking forward, stale pre-seek audio usually has a timestamp behind the new target. Because it is behind the target, discard logic can drop it.

But backward seeks are different.

When you seek backward, stale pre-seek timestamps are ahead of the new target. They can look like valid future audio instead of obviously outdated audio.

That is why forward seeks looked mostly fine, while backward seeks could produce a large offset.

The stale timestamp was not random. It was plausible. It was just from the wrong generation of playback.

The actual root cause

The root cause was:

AudioBatcher retained a partial pre-seek batch across DecodeSession::seek.

The seek path flushed the FFmpeg codec state, but it did not reset the app-level audio batcher.

That distinction matters.

It is easy to assume that flushing the decoder means the seek state is clean. But a media player usually has several layers of buffering above the decoder. Those buffers need to be reset too.

In this case:

  • FFmpeg decoder state was flushed.
  • Video queues were generation-safe.
  • Seek discard logic existed.
  • The audio batcher still kept partial timestamped state.

That one stale partial batch was enough to corrupt the post-seek audio timestamp.

The fix

The fix was to add an AudioBatcher::reset() method and call it during DecodeSession::seek after flushing the codec state.

The important behavior is:

When the decoder seeks, any partial audio batch from the old timeline is discarded.

After that, the first post-seek audio batch is stamped with the timestamp of post-seek audio, not a timestamp inherited from before the seek.

This preserves the expected clock behavior:

  • seek to 7 seconds
  • decode audio near 7 seconds
  • submit audio stamped near 7 seconds
  • anchor the audio clock near 7 seconds
  • present video near 7 seconds

The change was small, but it closed the generation leak.

The regression test

This kind of bug needs a test because it is easy to reintroduce later.

The test shape was straightforward:

  1. Feed audio from before a seek into the batcher.
  2. Leave the batcher with partial pending state.
  3. Reset the batcher as a seek would.
  4. Feed post-seek audio.
  5. Assert the emitted batch uses the post-seek timestamp.

The key assertion is not just that audio comes out. The key assertion is that the audio is stamped from the correct playback generation.

I also kept a test shape that documents the failure mode: without reset, stale pre-seek PTS leaks into the next emitted batch.

That matters because the bug was not “audio disappeared” or “decode failed.” The bug was subtler. Audio existed, but its timestamp belonged to the wrong timeline.

Validation

Before the fix, a backward seek could produce a huge A/V offset. In one smoke test, audio led video by about 7 seconds.

After the fix, the same flash/beep sync test settled within normal tolerance. Repeated backward seeks no longer caused the audio clock to jump forward with stale pre-seek timestamps.

The validation covered:

  • normal open
  • play
  • pause/resume
  • forward seek
  • backward seek
  • repeated scrub
  • close/reopen
  • A/V sync after backward seek

The automated test suite also gained focused coverage for the batcher reset behavior.

The broader lesson

The lesson was not “remember to reset this one helper.”

The lesson was broader:

Any component that stores timestamped media state must either reset on seek or explicitly track playback generation.

A media player has many places where timestamped state can hide:

  • packet queues
  • decoded frame queues
  • audio batchers
  • resamplers
  • subtitle state
  • sink buffers
  • presentation state
  • clock anchors
  • worker-thread channels
  • metrics tied to playback position

If a component cannot answer “which seek generation does this buffered data belong to?”, it is probably a future bug.

That is especially true in a player with persistent decode sessions, background workers, bounded queues, and separate audio/video timing paths.

Why the bug was product-important

FastPlay is intentionally narrow. It is not trying to replace every VLC or MPC-HC feature. The goal is fast local playback on Windows with good open, seek, scrub, and resume behavior.

That makes seek correctness central to the product.

A video player can have a clean UI, a fast first frame, and a nice recent-files overlay. But if scrubbing breaks sync, users will not trust it.

This bug was small in code size, but large in product feel. Fixing it made repeated seeking and backward scrubbing much more reliable.

The rule I took away is simple:

A seek is not just a decoder operation. It is a full pipeline generation boundary.

That is the kind of bug I am glad to find early.

Try it

FastPlay is open source and currently Windows-only.

Repo and MSI download: github.com/CalvinSturm/FastPlay

v0.3.0 includes recent files, resume playback, the backward-seek A/V sync fix described above, and a local benchmark harness for open/seek/pause/resume metrics.

Top comments (0)