Alex Zhdankov

Posted on Jun 9

Why we built a durable task runtime without a broker

#python #postgres #distributedsystems #devops

Background tasks look trivial - until they must survive process restarts, support delayed execution, and run without external infrastructure.

We needed scheduled maintenance inside a PostgreSQL management agent: VACUUM jobs, health checks, heartbeat tasks, retention cleanup, delayed operations.

Celery was the obvious answer. We didn't use it.

Not because Celery is bad. Because the operational model was wrong for our constraints.

The agent is installed directly on database hosts. Sometimes inside isolated customer environments. Without Kubernetes. Without managed infrastructure. Without external services we control.

At that point, background jobs stopped being a helper library problem. They became persistence, supervision, and lifecycle management inside a single Python process.

The constraint that changed everything

The agent is a single Python process running on the database host.

We already used Redis elsewhere in the platform for the browser‑based psql terminal. Using it as a task broker sounded convenient.

Operationally, it was the wrong fit.

Terminal traffic optimises for low‑latency streaming. Task scheduling optimises for durable persistence. Mixing them would create contention.
Worse: Redis persistence is optional. If Redis restarts, queued tasks disappear. That trade‑off is acceptable for transient terminal streams. It is much harder to justify for scheduled maintenance.

RabbitMQ was even harder to justify - a dedicated distributed broker whose only purpose was “run tasks on the same machine that already runs the agent”.

The requirements became clear:

runs fully in‑process
survives restarts with task state intact
supports scheduled future + periodic execution
supports cancellation and recovery
zero external infrastructure

The incident that changed the design

The original implementation used an in‑memory queue. It worked perfectly in development.

Then one agent restarted during a maintenance window. The worker process disappeared. Queued follow‑up tasks disappeared with it. A VACUUM scheduled for 3 AM simply never ran. No error. No log. Just silence.

At that point: task execution and task state cannot live only in memory.

Scheduled operations needed to survive agent restarts, package upgrades, partial failures, unexpected termination. The scheduler became persistence‑first.

Why not cron? Why not systemd timers?

Cron solves static scheduling. We needed runtime orchestration: tasks created dynamically via API, deduplication, cancellation, rescheduling, execution status tracking, reattachment after restart. Cron has no concept of task ownership. Coordinating “run VACUUM every 6 hours unless one is already running” becomes awkward fast.

Systemd timers solve some lifecycle problems but split orchestration across two independent runtimes (the application and the OS). The scheduler needed direct access to database metadata, agent runtime state, worker registration, internal APIs. One ownership boundary: the agent owns the tasks.

The architecture

Three independent processes. Two queues. One Unix socket.

API caller
    │
    │ Unix socket
    ▼
Scheduler process
    │
    ├── task_queue ──► WorkerPool process
    │
    └── event_queue ◄─ WorkerPool process

The important framing line is this:

The scheduler is really two independent systems:
a durable state machine and a subprocess supervisor.

Task state and task execution are intentionally separated.

Scheduler owns persistence, timing, deduplication, lifecycle state.
WorkerPool owns subprocess execution, signals, supervision, cancellation.

They never share memory - only queues and socket messages.

That separation turned out to matter operationally: if the WorkerPool crashes, the Scheduler continues running. If the Scheduler restarts, tasks survive on disk.

Task lifecycle

The state machine is explicit:
DEFAULT → SCHEDULED → QUEUED → DOING → DONE, with branches to FAILED, ABORTED, CANCELED.

Status is a bitmask. Recovery logic collapsed into a single SQLite predicate:

WHERE status & (DOING | QUEUED)

One integer column. No joins. No OR chains. That’s not clever implementation — that’s operational simplicity when you have to reconstruct state after a crash.

How the scheduler actually works

The Scheduler runs a select() loop over two file descriptors: the Unix socket (incoming requests) and the WorkerPool event queue (status updates). Every second, it also runs schedule(): find runnable tasks, reschedule periodic ones, purge expired history.

The key insight: the Scheduler never executes work. It only orchestrates lifecycle transitions. Execution happens inside isolated subprocesses.

Process isolation gave us a clean guarantee: if a worker misbehaves (hangs, leaks memory, deadlocks), the scheduler can terminate it unconditionally. Threads would have shared the failure domain. That guarantee mattered more than raw efficiency.

Persistence and recovery (the SQLite moment)

Task state lives entirely in SQLite. On startup, recovery runs:

Any task that was DOING or QUEUED during shutdown becomes ABORTED. Future scheduled tasks remain intact.

A VACUUM scheduled for 3 AM still runs after a midnight restart.

SQLite here is not a database. It is a durable state machine log - exactly the same pattern we used for our metrics buffer. That realisation unified our mental model across the entire agent.

Deterministic deduplication

Duplicate submissions (retries, double‑clicks, race conditions) should not create duplicate execution.

Task IDs are generated deterministically from the operation signature: database:schema:table:hour:operation_type. The second submission resolves to the same identifier and is rejected immediately. One simple mechanism eliminated an entire class of coordination bugs.

What this system intentionally does not have

This is not Celery. No distributed routing, no broker federation, no result backend, no retry orchestration, no monitoring UI, no horizontal scaling.

For a single‑host agent, what we needed instead was:

durable local scheduling
subprocess isolation
restart recovery
periodic execution
deterministic deduplication
bounded orchestration complexity

The entire scheduler fits in roughly 400 lines of standard library Python.

The mental model

Unix socket = scheduler API boundary
Scheduler = durable task state machine
WorkerPool = subprocess supervisor
SQLite = local orchestration persistence

You could call it a local control plane - a miniature of what the larger platform does, but inside a single host boundary.

The interesting realisation was this:

Once tasks must survive crashes and restarts, background jobs stop being a threading problem. They become a persistence problem, a lifecycle problem, and a process supervision problem.

Celery solves distributed execution. We needed crash‑safe orchestration inside a single host boundary.

Those are very different systems.

DEV Community