Temporal is the gold standard for durable execution. If you need long-running workflows that survive crashes, it's the first thing most teams evaluate.
But then you read the docs. And you discover what Temporal actually requires.
The Cluster Problem
Temporal needs a cluster. Either you run it yourself (Temporal Server + Cassandra/PostgreSQL + Elasticsearch) or you pay for Temporal Cloud.
Self-hosted means:
- Temporal Server (3+ nodes for HA)
- Cassandra or PostgreSQL for persistence
- Elasticsearch for visibility
- Monitoring, upgrades, schema migrations
- A team that understands Temporal internals when something breaks at 2am
This is fine if you're Uber. If you're a team of 5 building an AI agent pipeline, it's a lot of infrastructure for "I want my workflow to survive a crash."
The Determinism Problem
Temporal replays your workflow code on every restart. This means your workflow functions must be deterministic. No side effects.
# These all break Temporal workflows:
import random
random.randint(1, 100) # non-deterministic
from datetime import datetime
datetime.now() # different on replay
import requests
requests.get("https://api.example.com") # side effect
Every developer on the team needs to learn this. New hire writes datetime.now() in a workflow, the replay breaks in production, and nobody understands why until someone reads the Temporal determinism docs.
Activities solve this - you put non-deterministic code in activities. But that means restructuring your code around Temporal's execution model. Your agent code now has to know it's running inside Temporal.
What If You Just Didn't
from axme import AxmeClient, AxmeClientConfig
client = AxmeClient(AxmeClientConfig(api_key=os.environ["AXME_API_KEY"]))
intent_id = client.send_intent({
"intent_type": "intent.pipeline.process.v1",
"to_agent": "agent://myorg/production/data-pipeline",
"payload": {
"steps": ["extract", "validate", "transform", "load"],
"source": "postgres-main",
"destination": "warehouse",
},
})
result = client.wait_for(intent_id)
No cluster. No determinism constraints. Write normal Python. Call datetime.now() all you want.
The state lives in the platform (managed PostgreSQL). Your agent is stateless. If it crashes, the platform redelivers the intent. If it needs human approval mid-workflow, the platform handles the wait.
Side-by-Side
| Temporal | AXME | |
|---|---|---|
| Infrastructure | Cluster (self-hosted or Cloud) | Managed API |
| Determinism constraints | Required for workflow code | None |
| Learning curve | Weeks (activities, signals, queries, replay) | Hours |
| Human approval | Build it (signals + UI + notifications) | Built-in |
| Crash recovery | Replay-based (determinism required) | Redelivery-based (stateless agent) |
| Setup time | Days-weeks | pip install axme |
When Temporal Is Still the Right Choice
Temporal is better when you have:
- Complex compensation logic (sagas with rollbacks across 10 services)
- A dedicated platform team to operate the cluster
- Workflows with hundreds of steps and complex branching
- Existing investment in the Temporal ecosystem
If your use case is "durable execution for agent operations with human approval" - you don't need a workflow engine. You need a coordination layer.
Try It
Working example - durable multi-step pipeline with crash recovery, no cluster, no determinism constraints:
github.com/AxmeAI/durable-execution-with-human-approval
Built with AXME - durable execution without the cluster. Alpha - feedback welcome.
Top comments (0)