DEV Community

Jangwook Kim
Jangwook Kim

Posted on • Originally published at effloow.com

Temporal for AI Agents: Durable Execution Guide 2026

Why Your AI Agent Dies in Production (And What to Do About It)

You deploy an AI research agent. It works perfectly in demos. It searches the web, calls APIs, writes files, loops through 50 documents. Then on day three, the server restarts mid-task. The agent starts over from scratch—wasting API tokens, duplicating writes, and confusing downstream systems.

This is not a rare edge case. Long-running AI agents fail constantly in production because most frameworks treat each LLM call as a fire-and-forget operation with no memory of what already happened. The agent's state lives in memory, and when memory disappears, so does the progress.

Temporal solves this. Its durable execution model records every step of your workflow as an immutable event history. If the process dies at step 47 of 100, Temporal replays the event log and resumes at step 48—not step 1.

In February 2026, Temporal raised a $300M Series D at a $5B valuation, led by Andreessen Horowitz. The signal is clear: durable execution has moved from a niche infrastructure concern to a core requirement for production AI systems. OpenAI, Replit, and Lovable build their agents on Temporal. ADP uses it for human-in-the-loop agentic processes. Abridge uses it to serve ambient AI across 200+ healthcare systems.

This guide explains when you need Temporal, how it works, and how to build an AI agent workflow with the Python SDK.


What Temporal Actually Does

Temporal is a durable execution runtime. That phrase deserves unpacking.

When you write a Temporal workflow in Python, it looks like ordinary async code. But under the hood, every step is recorded in an append-only event history stored in the Temporal service. If your worker crashes, a new worker picks up the event log and replays the workflow from the beginning—skipping already-completed activities and resuming exactly where execution stalled.

Three properties flow from this design:

1. Crash recovery without code changes. Your workflow code does not need try/except for infrastructure failures. If the network blips during an LLM call, Temporal retries automatically according to a configurable retry policy. If the entire worker process dies, the next worker instance replays the history and continues.

2. State over arbitrary time horizons. Temporal workflows can run for minutes, days, or years. The workflow pauses and sleeps without consuming compute resources, then wakes up to continue. A document review agent that waits three days for a human approval is trivial to implement—use workflow.wait_condition() and a signal handler.

3. Observability and debuggability. Every state transition is visible in the Temporal Web UI. You can pause workflows, inspect their event history, and replay specific executions for debugging.


The Core Model: Workflows and Activities

Temporal divides your application into two types of code, and the distinction is not optional.

Workflows: Deterministic Orchestrators

A workflow is the coordination layer. It defines the sequence of steps, handles signals and queries, and manages control flow. The critical constraint: workflow code must be deterministic. You cannot call datetime.now(), random.random(), read from the filesystem, or make HTTP requests inside a workflow. Every time Temporal replays the event history, the workflow must produce the same decisions.

import dataclasses
import datetime
from temporalio import workflow

@dataclasses.dataclass
class ResearchInput:
    query: str
    max_steps: int = 10

@workflow.defn
class ResearchAgentWorkflow:
    def __init__(self) -> None:
        self._paused = False

    @workflow.signal
    async def pause(self) -> None:
        self._paused = True

    @workflow.query
    def is_paused(self) -> bool:
        return self._paused

    @workflow.run
    async def run(self, inp: ResearchInput) -> str:
        # Pause signal support built-in
        await workflow.wait_condition(lambda: not self._paused)

        results = []
        for step in range(inp.max_steps):
            result = await workflow.execute_activity(
                run_research_step,
                args=[inp.query, step],
                start_to_close_timeout=datetime.timedelta(minutes=5),
            )
            results.append(result)
            if "[DONE]" in result:
                break

        return "\n".join(results)
Enter fullscreen mode Exit fullscreen mode

Activities: Non-Deterministic Workers

An activity is where side effects live. LLM API calls, database writes, file reads, HTTP requests—all of these belong in activities. Temporal executes activities at most once per attempt and retries failed attempts automatically.

from temporalio import activity
from temporalio.common import RetryPolicy

@activity.defn
async def run_research_step(query: str, step: int) -> str:
    # Heartbeat keeps Temporal informed the activity is alive
    activity.heartbeat(f"Running step {step}")

    # Your LLM call goes here — crashes here will be retried
    response = await call_llm(f"Research step {step} for: {query}")
    return response
Enter fullscreen mode Exit fullscreen mode

The retry policy lets you tune backoff behavior, cap attempts, and exclude certain errors from retries:

retry_policy = RetryPolicy(
    initial_interval=datetime.timedelta(seconds=1),
    backoff_coefficient=2.0,
    maximum_interval=datetime.timedelta(seconds=60),
    maximum_attempts=5,
    non_retryable_error_types=["InvalidInputError", "RateLimitExceeded"],
)
Enter fullscreen mode Exit fullscreen mode

Pass it to execute_activity() and Temporal handles the retry loop. Your workflow code stays clean.


Setting Up a Local Development Environment

Install the Python SDK and the Temporal CLI:

pip install temporalio==1.10.0
brew install temporal         # macOS
# Windows/Linux: download from github.com/temporalio/cli
Enter fullscreen mode Exit fullscreen mode

Start the local development server:

temporal server start-dev
# Temporal Service:  localhost:7233
# Web UI:            http://localhost:8233
Enter fullscreen mode Exit fullscreen mode

The dev server runs entirely in memory. For persistence across restarts, add --db-filename temporal.db.

Connect from your Python worker:

from temporalio.client import Client
from temporalio.worker import Worker

async def main():
    client = await Client.connect("localhost:7233")

    async with Worker(
        client,
        task_queue="ai-agents",
        workflows=[ResearchAgentWorkflow],
        activities=[run_research_step],
    ):
        # Worker is running; start workflows via client
        handle = await client.start_workflow(
            ResearchAgentWorkflow.run,
            ResearchInput(query="transformer attention mechanisms"),
            id="research-001",
            task_queue="ai-agents",
        )
        result = await handle.result()
        print(result)
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:8233 to see your workflow's event history in real time.


OpenAI Agents SDK Integration

On March 23, 2026, Temporal's OpenAI Agents SDK integration became Generally Available. This is the most significant addition to the Temporal AI ecosystem since Nexus.

The integration works through two mechanisms. First, the TemporalRunner class wraps the OpenAI Agents SDK runner so every agent invocation executes as a Temporal Activity. Second, the activity_as_tool helper converts Temporal activity functions into OpenAI-compatible tool schemas automatically.

from openai_agents import Agent, Runner
from temporalio.contrib.openai_agents import TemporalRunner, activity_as_tool

# Your Temporal activity becomes an agent tool
@activity.defn
async def search_documents(query: str) -> str:
    # Search your knowledge base
    return await vector_search(query)

@activity.defn
async def save_finding(finding: str) -> None:
    await db.insert(finding)

# Convert activities to OpenAI tools
agent = Agent(
    name="ResearchAgent",
    instructions="Research and save findings on the given topic.",
    tools=[
        activity_as_tool(search_documents),
        activity_as_tool(save_finding),
    ],
)

@workflow.defn
class DurableAgentWorkflow:
    @workflow.run
    async def run(self, topic: str) -> str:
        return await TemporalRunner.run(agent, topic)
Enter fullscreen mode Exit fullscreen mode

The result: your OpenAI agent's tool calls are backed by Temporal's durable execution. If the agent is halfway through a 50-step research task and the server restarts, it resumes from the last completed tool call rather than restarting.


Temporal vs LangGraph: When to Use Each

Dimension Temporal LangGraph
Primary role Durable execution runtime Agent reasoning / tool flow
State persistence Full event history, survives crashes Checkpoints between nodes only
Execution horizon Minutes to years Seconds to minutes (typically)
Retry logic Built-in, configurable per-activity Manual implementation
Human-in-the-loop Signals + wait conditions, native Possible but complex
Observability Web UI, event history, metrics LangSmith integration required
Learning curve Moderate (workflow/activity split) Low (Python classes)
Best fit Production agents, multi-day tasks Rapid LLM prototyping

The important nuance: these tools are not competitors. The most robust production systems use both. LangGraph handles agent reasoning and multi-step tool planning inside a single Activity. Temporal wraps the outer workflow, handling retries, state durability, and long-running coordination. LangGraph's checkpointers only save state between graph nodes—they do not persist state inside a node. When your agent crashes mid-node, LangGraph starts that node over. Temporal starts the activity over and replays the event history to get back to the exact point of failure.

Use LangGraph first to prove agent behavior. Add Temporal when you need that behavior to survive production.


Common Mistakes When Building Temporal AI Agents

1. Putting LLM calls inside the workflow function.

Workflow code replays deterministically. An LLM call inside @workflow.run will be re-executed on every replay with unpredictable results. Always move LLM calls to @activity.defn functions.

2. Not setting heartbeat_timeout on long-running activities.

A long LLM batch job that goes silent looks like a hung process to Temporal. Set heartbeat_timeout and call activity.heartbeat() periodically so Temporal knows the activity is alive and can detect genuine hangs.

# In the workflow:
result = await workflow.execute_activity(
    batch_llm_job,
    start_to_close_timeout=datetime.timedelta(hours=2),
    heartbeat_timeout=datetime.timedelta(seconds=30),
)

# In the activity:
@activity.defn
async def batch_llm_job(items: list[str]) -> list[str]:
    results = []
    for i, item in enumerate(items):
        results.append(await process_item(item))
        activity.heartbeat(f"Processed {i+1}/{len(items)}")
    return results
Enter fullscreen mode Exit fullscreen mode

3. Passing large payloads through workflow signals.

Workflow event history is stored in Temporal's database. If you pass a 50MB document through a signal or as a workflow result, you'll hit payload size limits and degrade performance. Store large artifacts in S3 or a database; pass only identifiers through Temporal.

4. Creating a new workflow for every sub-task instead of using child workflows or activities.

Each workflow execution has overhead. For parallelism within a task, prefer workflow.execute_activity() with multiple concurrent calls, or use child workflows only when you need separate lifecycle management and cancellation scope.

5. Ignoring non-retryable errors.

Rate limit errors are retryable. Authentication errors are not. Passing non_retryable_error_types in your RetryPolicy prevents Temporal from endlessly retrying errors that require human intervention.


Temporal Cloud vs Self-Hosted

For most teams, Temporal Cloud is the right starting point. It handles server management, persistence, high availability, and monitoring. The free tier is generous for development, and production plans start at around $200/month with usage-based activity execution fees.

Self-hosted Temporal requires running the Temporal server (Go binary), Cassandra or PostgreSQL for persistence, and Elasticsearch for visibility queries. This makes sense for organizations with strict data residency requirements or at very high scale. The community edition is Apache 2.0 licensed.

Temporal Cloud's high-availability tier adds Multi-Region Replication with a 99.99% uptime SLA, sub-1-minute RPO, and 20-minute RTO. The same-region replication option is currently in public preview for additional redundancy within a single region.


Frequently Asked Questions

Q: Can I use Temporal with Claude or Gemini instead of OpenAI?

Yes. The core Temporal workflow and activity pattern works with any LLM. The activity_as_tool helper and TemporalRunner are specific to the OpenAI Agents SDK, but you can wrap any LLM call in a Temporal activity and get the same durability guarantees. Anthropic's documentation also covers Temporal patterns for Claude-based agents.

Q: Does Temporal work with LangChain tools?

Temporal activities can call any Python code, including LangChain tools. You can wrap a LangChain tool invocation in a @activity.defn function to get retry logic and durability. The integration is manual rather than native, unlike the OpenAI Agents SDK helper.

Q: How does Temporal handle the cost of replaying workflows?

Workflow replay does not re-execute activities—it only re-runs the workflow's coordination code using the recorded outputs from the event history. API calls, database writes, and LLM calls are not repeated during replay. This is the core guarantee that makes Temporal safe and cost-efficient.

Q: What's the difference between start_to_close_timeout and schedule_to_close_timeout?

start_to_close_timeout is the maximum time from when an activity starts executing to when it must complete. It applies per-attempt. schedule_to_close_timeout is the maximum total time from scheduling the activity to completing it, including all retry attempts. Set start_to_close_timeout for LLM calls where each attempt has a predictable upper bound. Use schedule_to_close_timeout when you want to cap total execution time regardless of retries.


Key Takeaways

Temporal gives AI agents a durability layer that agent frameworks alone cannot provide. The core design—deterministic workflows coordinating non-deterministic activities—maps cleanly onto how LLM agents work: LLM calls, tool uses, and API requests are all non-deterministic operations that belong in activities, while the sequencing and control flow logic belongs in the workflow.

The February 2026 $300M raise and the GA of the OpenAI Agents SDK integration signal that the industry has reached consensus on this pattern. If you are building agents that do meaningful work over non-trivial time horizons, Temporal is the infrastructure layer that makes them production-grade.

The Effloow Lab sandbox PoC confirmed that temporalio 1.10.0 installs cleanly, all core patterns compile as documented, and the activity_as_tool helper is present in the package. For hands-on setup, start with temporal server start-dev and the official Temporal AI cookbook.

Bottom Line

Temporal is not a replacement for LangGraph or the OpenAI Agents SDK—it is the reliability layer underneath them. If your AI agent runs longer than a few seconds or touches external systems, add Temporal to survive the production environment.

Top comments (0)