How I Split My Livestream Archive at Shiftbloom Studio

#ai #datascience #cloud #devops

With shiftbloom studio. I build tools and projects about a variety of experimental approaches to real-world problems.

The issue for such use-case often was how most small media systems start out: one big always-on recorder that keeps costing money even when nothing is happening.

For live capture you obviously need to stay ready at all times — sometimes you can’t risk losing the first minutes. But for everything else it’s complete overkill.

The Core Problem

Backfills, VOD downloads, clip imports, repairs and re-encodes are queue work. They can wait a few seconds, run on burst capacity, or even on a regular VPS or laptop. They don’t need the same always-hot infrastructure as the live recorder.

That’s why I split the system.

Instead of one large monolith, I deployed:

Observer cells — only for live streams (time-critical)
Harvest cells — for all queue processing (can be delayed)

The Three Roles

1. Mothership

A small control-plane cron job. It checks queue sizes, currently live channels and running observer tasks, then decides:

how many harvest cells should exist right now
which channels need an observer cell

It’s intentionally simple. The database remains the single source of truth.

2. Observer Cells

Each observer cell records exactly one live channel. It receives its assignment through environment variables:

+++env
OBSERVER_VOD_ID
OBSERVER_CHANNEL_ID
OBSERVER_CHANNEL_LOGIN
OBSERVER_CHANNEL_NAME
+++

It starts recording immediately, writes HLS segments to object storage, sends heartbeats, and waits a short standby window after the stream goes offline. This window is important because streams sometimes drop and reconnect quickly. Without it you end up with many small broken VOD fragments.

3. Harvest Cells

These handle all background work: downloading VODs, re-encoding, recovering broken files, etc. They can run anywhere Docker is available — AWS tasks, a small VPS, or even a spare laptop. They only need outbound access to Postgres and object storage.

What Changed

Previously I treated live recording and backlog processing as the same infrastructure problem. They are not. One is assignment-based, the other is throughput-based.

After the split I ran a large historical migration and ingested 15.5 TB of backfill data in just 36 hours — without dropping a single frame from live streams.

Situation	Before	After
No live channels	Full recorder still running	No observer cells
Empty queue	Capacity still provisioned	No harvest cells
Large backlog	Everything slowed down	Scale harvest cells instantly
Old archives	Mixed with active data	Easy to move to cold storage

The architecture actually became smaller and easier to reason about.

The Main Lesson

The biggest improvement wasn’t switching to a particular tool or platform. It was drawing a clear line between work that has to happen right now and work that only has to happen eventually.

Most VOD archive systems have both types of work. Once you treat them as separate patterns instead of forcing everything into one monolith, the system becomes much more natural. Like a small colony of specialized components that only spin up when they’re actually needed.

I built this at Shiftbloom Studio (which I’m currently running on my own) because VOD archives can sometimes record in weird ways and you need flexible, low-overhead infrastructure to deal with that.

Would love to hear if this approach helps anyone else facing similar issues.