I built a tiny runtime for resumable agent workers

Mariusz Czajkowski — Tue, 26 May 2026 12:48:58 +0000

A while ago I needed a resumable agent runtime.

I did not want something as large as Temporal, and I did not want another agent framework like LangChain. I wanted something small enough to understand, but solid enough to adapt across the different verticals I was building.

It started with a few bare-bones questions.

The moment an agent leaves a notebook, script, or chat session, the hard problems change:

What work exists?
Which worker owns it right now?
What was the last durable step?
Can another worker resume after a crash?
Which resources are locked?
What did the agent produce?
Can operators inspect what happened?

The effect of it is Roost as a small runtime layer for that problem.

GitHub: https://github.com/mczaykowski/Roost

The basic idea

Roost treats an agent as a durable step machine.

An engine implements two methods:

class Engine:
    engine_id: str

    async def init_snapshot(self, item: WorkItem) -> Snapshot: ...
    async def step(self, snapshot: Snapshot, item: WorkItem) -> Snapshot: ...

The engine owns the domain-specific transition.

Roost owns the operational substrate:

Queue
  -> acquire lease
  -> load latest Snapshot
  -> Engine.step(snapshot, item)
  -> compare-and-swap save Snapshot
  -> re-enqueue or mark done

That gives you:

durable snapshots
per-work leases
at-least-once execution
retry-safe progress
delayed continuation
resource claims
event history
content-addressed artifacts
failed-work inspection

It is intentionally small. It is not trying to be a prompt framework, model router, workflow DSL, or hosted agent platform.

Roost does not help an agent think.

Roost helps an agent keep going.

Why I built it

A lot of agent tooling focuses on the thinking loop: prompts, tools, retrieval, planning, memory, model routing.

That is useful, but once agents run as workers for minutes, hours, or days, the bottleneck becomes more boring and more operational.

For example:

a worker dies halfway through a task
the same job is delivered twice
a long-running task needs to wait before its next step
two workers should not touch the same resource at the same time
an operator needs to know what happened
the output needs to be inspectable later

You can solve this with a workflow engine, a custom queue, a database table, or a pile of scripts.

Roost is my attempt at a small, agent-shaped version of that layer.

A simple demo: crash-safe URL watchlist

The demo engine is a URL watchlist worker.

It fetches a URL over multiple steps, saves each observation into a snapshot, waits between checks, and writes a final JSON artifact.

You can kill the worker halfway through, restart it, and Roost resumes from the latest saved snapshot.

uv sync --extra redis --extra dev
docker run --rm -p 6379:6379 redis:7

In one terminal:

uv run roost worker --engines watchlist

In another:

WORK_ID=$(uv run roost enqueue \
  --engine watchlist \
  --resource domain:example.com \
  --payload '{"url":"https://example.com","claim":"Example Domain is reachable","checks_required":3,"delay_seconds":5}')

uv run roost status "$WORK_ID"

Then kill the worker with Ctrl-C, start it again, and inspect the same work item.

uv run roost worker --engines watchlist
uv run roost status "$WORK_ID"

There is also a local end-to-end script:

scripts/e2e_watchlist.sh

No LLM key is required. The demo is about runtime behavior, not model behavior.

Local console

Roost includes a small local console:

uv run roost ui

It shows live work, saved state, events, failed work, and artifacts.

The detail view lets you inspect payloads, snapshots, and outputs:

Where this fits

Roost is not a replacement for LangChain, LlamaIndex, CrewAI, AutoGen, Temporal, Celery, or your own agent loop.

It sits at a different layer.

LangChain helps decide what an agent should do.
Temporal helps coordinate workflows.
Celery runs jobs.
Roost keeps long-running agent workers alive, inspectable, and resumable.

The current backend is Redis + SAQ. Execution is at-least-once, so engines need to make step() retry-safe from the same snapshot.

That tradeoff is intentional. I would rather expose the semantics clearly than pretend exactly-once execution exists.

What I’m looking for feedback on

I’m especially interested in feedback on the abstraction boundary.

Is this useful as a small runtime under agent loops?

Would you rather reach for Temporal, Celery, or a custom queue?

Does the init_snapshot() / step() model feel too small, or exactly small enough?