Killing false alarms and fixing "lying" MP3 headers

What we shipped on 2026-06-19

We spent a good chunk of today fighting ghosts in our monitoring and audio. The most annoying was the recurring Prefect queue backlog false alarm (PR #1713). We'd been seeing pages fire while healthy, long-running content_generation tasks held the single Prefect slot, simply because the detector saw a pile of scheduled runs and assumed we were stuck. We fixed this by adding a durable per-node heartbeat--pipeline_tasks.last_progress_at--stamped at every graph-node start. Now the probe can actually tell if the slot-holder is progressing or truly dead.

We caught another silent failure in run_health_probes where we were dropping Telegram pages (PR #1715). The brain watchdog was calling notify_fn synchronously, but since the production notify_fn is async, it just created a coroutine that never got awaited. It slipped through testing because our tests used sync lambdas. We wrapped the call sites in our _maybe_await shim to ensure these critical alerts actually land.

On the media side, we found why our podcasts were cutting off mid-episode (PR #1706). When Speaches synthesizes long inputs, it byte-concatenates segments but only keeps the first segment's duration in the header. The audio data is all there, but players honor that short header and stop early. The fix was a simple remux through ffmpeg -c copy at the tts_service.synthesize_speech boundary to rewrite a correct whole-file header losslessly.

We also did some structural hardening in the pipeline with an atom contract fingerprint handshake (PR #1709). Previously, graph_def rows referenced atoms by name only, meaning I/O drift could happen silently if a contract changed but the version wasn't bumped. We now stamp each node with a hash of the structural contract (requires/produces), so we can detect real drift and avoid poisoning retries across graph changes.

To clean up our notification architecture, we replaced our hand-written Discord and Telegram handlers with one generic outbound.apprise_notify handler (PR #1711). Now, adding a new channel like Slack or SMS is just a webhook_endpoints row insert rather than a new module. While doing this, we realized we were leaking the operator's personal Telegram chat_id in the baseline seeds (0000_baseline.seeds.sql), so we moved that to follow the same secret-reference path as our bot tokens (PR #1714).

Finally, we spent some time pruning the garden. We dropped 7 never-dispatched pipeline templates and 2 orphaned atoms (PR #1710), along with a few dead verification scripts (PR #1708) that had been superseded by our test suite.

The system is quieter now--both in terms of the logs and the codebase. With the contract fingerprints in place, we can iterate on atom I/O without worrying about silent failures poisoning the pipeline.

Auto-compiled by Poindexter from today's commits and PRs. See the work: github.com/Glad-Labs/poindexter.

Sources

https://github.com/Glad-Labs/poindexter

DEV Community

Killing false alarms and fixing "lying" MP3 headers

Sources

Top comments (0)