Temporal Hits 3,000 Customers: Durable Execution for AI Agent Workflows

#webdev #devops #cloud #astro

Temporal says it crossed 3,000 paying customers. The number on its own is a vanity metric — what's interesting is who's signing up. A growing share are teams building AI agents: long-running LLM pipelines that call models, hit tools, wait on humans, and have to survive a process restart in the middle of all of it.

If you've shipped an agent that runs longer than a single request, you know the failure mode. The model call times out on step 9 of 14. Your worker gets redeployed mid-run. A tool API returns a 429. The agent loop was holding all of its state in memory, and now that state is gone. Temporal's pitch is that this class of bug should not be your problem. We read through its docs and SDKs to see how well that holds up for agent workloads specifically.

What durable execution actually changes

Temporal is a workflow engine built around one idea: your workflow code runs as if the machine never fails. You write an ordinary function — call a model, branch on the result, sleep for an hour, call a tool — and Temporal makes that function's execution durable. If the process running it dies, another worker picks the workflow up and continues from the line it left off.

It does this with event sourcing. Every step a workflow takes — every activity it schedules, every timer it sets, every signal it receives — is appended to an event history stored by the Temporal service. When a worker resumes a workflow, it replays that history to rebuild in-memory state, then continues. The workflow function never persists anything explicitly. You do not write checkpoint code.

That split is the core of the model: workflow code is the deterministic orchestration layer, and activities are the side effects. An activity is a plain function — an HTTP call to a model API, a database write, a tool invocation. Activities fail and get retried independently, with backoff policies you set per activity instead of hand-rolling. The workflow that called them never sees the retries; it sees the eventual result.

For an agent, the mapping is direct. The loop — decide, act, observe, repeat — becomes a workflow. Each model call and each tool call becomes an activity. A six-hour sleep costs nothing while it waits and survives any number of deploys. Waiting on a human approval becomes a signal: the workflow blocks until your app sends one, even if that takes three days.

Temporal is open source. You can run the server yourself with a database — PostgreSQL, MySQL, or Cassandra — behind it, or use Temporal Cloud, the managed service, which bills on usage. SDKs cover Go, Java, TypeScript, Python, and .NET, and the workflow-and-activity model is identical across all of them.

The DIY retry code you are replacing

Most agent projects start without any of this. The loop lives in one process, state lives in a variable, and reliability is whatever try/except and a retry decorator give you. That works in a notebook. It stops working the first time a run outlives the process that started it.

The two common upgrades both have sharp edges. The first is scattering retry logic — tenacity in Python, a backoff wrapper in TypeScript — around every external call. It handles transient failures and does nothing for a crash. If the process dies, the half-finished run dies with it, and you have no record of where it was. You also end up with retry policy duplicated across a dozen call sites, each one slightly different.

The second is a job queue: Celery, BullMQ, SQS with workers. Queues are good at fan-out and at surviving restarts, but they push a different cost onto you. A multi-step run becomes several queued jobs, and now you own the glue: persisting state between steps, making every step idempotent so a redelivered message does not double-charge a model call, and reconstructing which step the run was on after a failure. You are building a workflow engine, badly, one queue at a time.

Temporal collapses that work. State between steps is the workflow's own local variables, persisted for you. Idempotency is handled because a replayed workflow does not re-run activities that already completed — it reads their results from history. Which step the run is on is the event history, visible in a UI you did not build. Retry policy lives in one place per activity.

The honest version: you do not adopt Temporal to write less code on day one. You adopt it so the reliability code you would otherwise write, and keep rewriting, is no longer yours to maintain. Building it out does mean writing typed SDK code — workflow definitions, activity stubs, worker registration — and that is where an AI-native editor earns its place.

Where Temporal makes you pay

None of this is free in effort. Three costs are worth knowing before you commit.

Determinism is the big one. Workflow code is replayed, so it cannot do anything non-deterministic directly — no Date.now(), no random(), no direct network calls, no reading a file. Those go through activities or the SDK's deterministic equivalents. Break the rule and a replay diverges from history, which surfaces as an error at the worst possible time. The constraint is learnable, but it is a real shift in how you write the orchestration layer.

Versioning is the second. Because old workflows replay old history, changing a running workflow's code can break in-flight executions. Temporal gives you patching APIs for this, but long-lived agent workflows — ones that sleep for days — mean you will hit it. You have to treat code changes the way you treat database migrations.

Operations is the third. Self-hosting means running the service plus a database and keeping event history from growing without bound. Temporal Cloud removes that, but its usage-based billing scales with how many actions your workflows take, and a chatty agent loop generates a lot of actions. Model the cost before you move a high-volume workload onto it.

For a single short-lived agent call, Temporal is overkill — a plain retry wrapper is the right tool. The line to cross is when runs are long, span multiple services, wait on humans or timers, or cannot afford to lose state. That is the workload driving the 3,000-customer figure, and it is one that genuinely lacked a clean answer before.

The most common first-week mistake is calling an LLM SDK directly inside workflow code. Model calls are non-deterministic and belong in an activity. If a workflow replays and the model returns different text than it did originally, Temporal detects the mismatch and fails the run. Keep workflow code to control flow only.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.