DEV Community: Nebula

LangChain Deep Agents vs OpenAI Agents SDK (2026)

Nebula — Tue, 24 Mar 2026 20:01:07 +0000

If you're building AI agents in Python right now, two frameworks are competing for your attention: LangChain Deep Agents (launched March 15, 2026) and the OpenAI Agents SDK (early March 2026). Both promise production-ready multi-agent orchestration. Both have real traction -- Deep Agents hit 9.9k GitHub stars in 5 hours, while the Agents SDK formalized patterns thousands of teams were already hacking together with OpenAI's experimental Swarm library.

But they solve the problem from fundamentally different directions. Deep Agents is an agent harness -- batteries-included with planning, filesystem context management, and subagent spawning baked in. The Agents SDK is a lightweight toolkit -- minimal primitives (agents, handoffs, guardrails) that you compose with Python. Picking the wrong one means rewriting your orchestration layer in three months.

This comparison breaks down the architectures, shows code side-by-side, and gives you a decision framework so you can pick the right tool for your use case.

TL;DR

Deep Agents wins for long-horizon, stateful tasks (research sessions, coding agents, multi-step analysis) where you need built-in planning and filesystem-based context management.

OpenAI Agents SDK wins for multi-agent handoff workflows (triage + specialists) where you want the simplest possible setup with built-in tracing and guardrails.

Neither wins for teams that want agent capabilities without writing orchestration code -- that's where managed platforms like Nebula fit.

Skip to the comparison table or the decision framework.

Quick Comparison Table

Feature	LangChain Deep Agents	OpenAI Agents SDK
Architecture	Agent harness on LangGraph	Lightweight standalone SDK
Language	Python (+ TypeScript SDK)	Python + TypeScript
Planning	Built-in `write_todos` tool	Manual (you build it)
Memory	LangGraph Memory Store + filesystem	Sessions (persistent working context)
Multi-Agent	Subagent via `task` tool (context isolation)	Handoffs + Triage pattern
Context Management	Auto-summarization + file offload	Conversation context (ephemeral)
Tracing	LangSmith / LangGraph Studio	OpenAI Dashboard (built-in, zero config)
Guardrails	Via LangGraph middleware	Input/output guardrails built-in
Human-in-the-Loop	LangGraph interrupts	SDK pause/resume
Model Support	Any LLM (model-agnostic)	OpenAI-first (others via params)
MCP Support	Via LangChain MCP integration	Built-in MCP server tool calling
Learning Curve	Medium-High (LangGraph required)	Low-Medium
Best For	Long-running stateful tasks	Multi-agent handoff workflows
Pricing	Free (OSS) + LLM costs	Free (OSS) + LLM costs

What LangChain Deep Agents Brings to the Table

Deep Agents is what LangChain calls an "agent harness" -- a layer above the basic agent loop that packages planning, context management, and subagent delegation into sensible defaults. Harrison Chase built it by reverse-engineering the patterns behind Claude Code, Deep Research, and Manus.

Planning That Doesn't Require Prompt Hacking

The built-in write_todos tool forces the agent to decompose tasks into explicit steps. This isn't a side feature -- on trajectories of 50-100 tool calls, it's the difference between an agent that stays on track and one that drifts.

from deepagents import create_deep_agent

agent = create_deep_agent(
    model="openai:gpt-4o",
    tools=[web_search, analyze_data],
    system_prompt="You are a research assistant."
)

# The agent automatically gets planning, filesystem,
# shell execution, and subagent tools -- no extra config
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Research the top 5 AI agent frameworks, compare their architectures, and write a summary report."
    }]
})

With that single create_deep_agent() call, your agent can plan tasks, read/write files, spawn subagents, and manage its own context window. You didn't request these features -- they're built in.

Filesystem-Based Context Management

This is Deep Agents' most underappreciated feature. Instead of cramming everything into the LLM's context window, agents offload intermediate results to a virtual filesystem using write_file, read_file, edit_file, ls, glob, and grep.

Why this matters: a research agent processing 200 pages of documentation would overflow any context window. With filesystem tools, it writes findings to research.md, code to app.py, and reads them back as needed. The filesystem acts as a shared workspace where agents and subagents collaborate.

Deep Agents supports pluggable backends:

StateBackend (default): Stored in LangGraph state, transient per-thread
LangGraph Store: Cross-thread persistence
LocalFilesystem: Standard disk storage
CompositeBackend: Mix multiple backends
Remote sandboxes: Modal, Runloop, Daytona

Subagents for Context Isolation

The task tool spawns specialized subagents with isolated context windows. The main agent stays clean while subagents go deep on focused subtasks.

research_subagent = {
    "name": "research-agent",
    "description": "Deep research on specific topics",
    "system_prompt": "You are a thorough researcher.",
    "tools": [web_search],
    "model": "openai:gpt-4o",
}

agent = create_deep_agent(subagents=[research_subagent])

This prevents context pollution -- one of the biggest agent failure modes in production. When a subagent's 20+ tool calls don't flood the main agent's context, the main agent can coordinate effectively across multiple parallel workstreams.

Key strength: Best for long-running, stateful tasks -- research sessions, code generation, multi-step analysis. The filesystem approach is genuinely novel for context management.

Key weakness: Requires LangGraph knowledge. If you're not already in the LangChain ecosystem, the learning curve is real. The middleware abstraction (before_agent, wrap_model_call, before_tools, after_tools) adds a layer you need to understand when debugging.

What OpenAI Agents SDK Does Differently

The Agents SDK takes the opposite approach: minimal primitives, maximum composability. Three concepts handle almost everything -- Agents, Handoffs, and Guardrails. The SDK formally extends what OpenAI learned from the experimental Swarm library, but with production-grade tracing and validation.

Handoffs as a First-Class Primitive

The handoff pattern is the SDK's core innovation. Agents transfer control to each other explicitly, carrying conversation context through the transition. Think of it like a well-run support team: a triage agent classifies the request and routes it to the right specialist.

from agents import Agent, Runner

billing_agent = Agent(
    name="Billing",
    instructions="Handle billing inquiries. Access CRM and invoice tools.",
    tools=[lookup_invoice, process_refund]
)

support_agent = Agent(
    name="Support",
    instructions="Handle technical support. Access docs and ticket tools.",
    tools=[search_docs, create_ticket]
)

triage = Agent(
    name="Triage",
    instructions="Route customer queries to the right specialist.",
    handoffs=[billing_agent, support_agent]
)

result = Runner.run_sync(triage, "I was double-charged on my last invoice")
# Triage routes to billing_agent automatically

The handoff pattern is clean and scales naturally up to 8-10 agent types. Beyond that, it can get unwieldy -- but most production systems don't need more.

Guardrails Without a Separate Library

Input and output guardrails are built into the SDK as first-class primitives. Attach validation functions to any agent:

Input guardrails: Reject prompt injection, validate format, enforce policies
Output guardrails: Enforce schema, catch policy violations, validate response quality

Guardrails run in parallel with agent execution, so they don't add latency. If a check fails, the agent stops fast before wasting tokens.

Compare this to Deep Agents, where guardrails are implemented through LangGraph middleware -- more flexible, but more setup.

Zero-Config Tracing

Every agent run is automatically traced in the OpenAI Dashboard. You see which tools were called, with what arguments, the model's reasoning between steps, and how long each step took. No separate observability tool needed.

For Deep Agents, equivalent visibility requires LangSmith (LangChain's observability platform). LangSmith is powerful -- LangGraph Studio even lets you visually debug agent states in real-time -- but it's a separate service to set up and manage.

Key strength: Simplest path from zero to a working multi-agent system. If you're on OpenAI, setup takes minutes not hours. The handoff pattern is elegant and well-documented.

Key weakness: Lighter on long-horizon capabilities. No built-in planning, no filesystem context management. If your agent needs to work for 30+ minutes on a complex task, you're building those pieces yourself. Also, the SDK is OpenAI-first -- other model providers work via configuration but aren't the primary path.

When to Pick Which

Forget feature lists. Here's the decision that matters:

Pick Deep Agents if:

Your tasks are long-horizon (research, code generation, multi-step analysis that runs for 10+ minutes)
You need persistent memory across conversations and sessions
You want to use non-OpenAI models (Claude, Gemini, open-source via Ollama)
You're already in the LangChain/LangGraph ecosystem
You need filesystem-based context management for tasks that produce more output than fits in a prompt
You need subagent delegation with context isolation

Pick OpenAI Agents SDK if:

Your workflow is multi-agent handoffs (triage agent routes to specialists)
You want the simplest possible setup with minimal abstractions
You're primarily using OpenAI models (GPT-4o, GPT-5)
Built-in guardrails for input/output validation matter to you
You want tracing without a separate observability tool
Your agents handle shorter, focused tasks (customer support, lead qualification, document processing)

Consider a managed platform if:

You want agent capabilities without writing orchestration code
Your team needs agents that connect to existing tools (Slack, GitHub, Gmail, databases) out of the box
You want built-in planning, memory, safety, and multi-agent delegation without assembling it from primitives
You'd rather describe what the agent should do in natural language than write Python

Platforms like Nebula exist for this exact use case -- pre-built agent orchestration with tool integrations, so your team focuses on what the agent does rather than how it's wired together.

The Bigger Picture: Framework Fatigue Is Real

Let's zoom out. In March 2026 alone, we've seen launches from LangChain (Deep Agents), OpenAI (Agents SDK updates), Google (ADK ecosystem expansion), Anthropic (Agent SDK), and Pydantic AI (Deep Agents). That's five agent frameworks in one month from five different companies.

The pattern is familiar from the JavaScript framework wars of the 2010s: every vendor ships an opinionated framework, developers spend more time evaluating tools than building products, and the "best" framework changes every quarter.

The real question isn't which framework. It's whether you need a framework at all. For teams building AI infrastructure as their core product, frameworks like Deep Agents and the Agents SDK are essential building blocks. For teams that want agents to augment their existing product, a managed platform that abstracts the orchestration layer is often the faster path to production.

For a broader comparison of all the major frameworks, check out our Top 7 AI Agent Frameworks in 2026.

Verdict

LangChain Deep Agents is the better choice for complex, stateful, long-running tasks. The planning tool, filesystem context management, and subagent isolation solve real problems that the Agents SDK doesn't address out of the box. If your agent needs to work autonomously for extended periods -- think research assistants, coding agents, or multi-step analysis pipelines -- Deep Agents gives you the infrastructure.

OpenAI Agents SDK is the better choice for clean multi-agent handoff systems. If your use case maps to "coordinator routes to specialists" -- customer support, sales qualification, document processing -- the SDK's handoff pattern, built-in guardrails, and zero-config tracing get you to production faster with less code.

Both are open-source. Both install in one command. The best move is to prototype with both on a real task from your product and see which architecture matches your actual workflow. You can always swap later -- the underlying LLM calls are the same.

Pick the tool that matches where you are today. Ship something. Iterate.

Top 6 AI API Testing Tools for Developers (2026)

Nebula — Mon, 23 Mar 2026 22:02:46 +0000

TL;DR: For AI-native test generation from specs, try Kusho AI. For the most complete platform with the newest AI Agent Mode, go Postman. For open-source and Git-native workflows, Bruno or Hoppscotch are your best bets. Enterprise teams should evaluate Katalon. Collaboration-first smaller teams will like Testfully.

Manual API testing does not scale. You have dozens of endpoints, each with edge cases, auth flows, and payload variations. Writing test scripts by hand means spending more time maintaining tests than building features.

AI-powered API testing tools flip that equation. They ingest your OpenAPI specs, generate comprehensive test suites, and adapt when your API changes. The question is which one fits your workflow.

Here are six tools worth evaluating in 2026, compared across the features that matter most to developers.

Quick Comparison

Feature	Postman	Katalon	Kusho AI	Testfully	Bruno	Hoppscotch
AI Test Gen	Agent Mode	Built-in	Core feature	Basic	Community	Built-in
Open Source	No	No	No	No	Yes (MIT)	Yes (MIT)
Self-Hosted	No	No	No	No	Desktop app	Yes
Git-Native	Yes (v2026)	No	No	No	Yes	Partial
CI/CD	CLI + Newman	Native	CLI	CLI	CLI (bru)	CLI
Multi-Protocol	HTTP, gRPC, GraphQL, MCP, MQTT, WS	HTTP, SOAP	HTTP/REST	HTTP/REST	HTTP/REST	HTTP, GraphQL, WS
Collaboration	Cloud teams	Cloud teams	Workspaces	Workspaces	Git-based	Real-time
Free Tier	Yes	Yes	Yes	Yes	Yes (OSS)	Yes (OSS)
Best For	Full platform	Enterprise	AI-first testing	Team collab	Git workflows	Speed

Postman -- The Rebuilt Industry Standard

Postman is the most widely used API platform, and its March 2026 relaunch turned it AI-native. The headline feature is Agent Mode: an AI that works across your collections, tests, and mocks. It generates contract tests, load tests, integration tests, and end-to-end tests automatically. When a test fails, Agent Mode diagnoses the root cause and proposes a fix in the run results.

The platform is now Git-native from the ground up. Collections are stored as diffable YAML files, and you can work on the same branch you are coding on. It also added multi-protocol support -- HTTP, GraphQL, gRPC, MCP, MQTT, and WebSockets in the same collection.

Strength: Most complete platform. AI Agent Mode works across your entire API lifecycle, not just testing.
Weakness: Feature-heavy. If you just want quick API test generation, the full platform can feel like overkill.
Pricing: Free tier. Team plans from $14/user/month.
Best for: Teams that want one platform for API development, testing, documentation, and governance.

Katalon Studio -- Enterprise Unified Testing

Katalon covers API, web, mobile, and desktop testing in a single platform. Its AI generates test cases from your API specifications, and the visual test builder lets non-coders create API tests. Self-healing tests automatically update when field names or response structures change, keeping your CI/CD pipeline green without manual fixes.

The platform integrates tightly with Jira, Jenkins, GitLab CI, and most enterprise CI/CD systems. Katalon TestCloud provides cross-environment execution without managing infrastructure.

Strength: Unified testing across API + UI + mobile. Strong enterprise governance and reporting.
Weakness: Enterprise pricing is steep. The learning curve is steeper than lightweight alternatives.
Pricing: Free tier with limits. Platform plans from $175/month.
Best for: Enterprise teams that need unified API and UI test automation under one roof.

Kusho AI -- AI-Native from Day One

Kusho AI was built specifically for AI-powered API testing, not retrofitted. Point it at your OpenAPI spec and it generates comprehensive test suites -- including edge cases, boundary conditions, and security scenarios that humans typically miss.

The AI learns from each test execution to improve coverage over time. You can also describe tests in plain English and Kusho generates the corresponding test code. It integrates into CI/CD pipelines via CLI, so generated tests run automatically on every push.

Strength: Truly AI-native test generation. Catches edge cases and security issues that manual testing misses. Fast setup.
Weakness: Newer platform with a smaller community. Less mature ecosystem than established tools.
Pricing: Free tier available. Paid plans for teams.
Best for: Developers who want AI to generate comprehensive API tests from specs, without legacy platform overhead.

Testfully -- Collaboration-First API Testing

Testfully focuses on making API testing collaborative and approachable. The visual request builder, shared team workspaces, and environment management make it easy for teams to organize and run API tests together.

Testfully supports request chaining, assertions, and automated test runs. While its AI features are less mature than Postman or Kusho, the tool excels at making API testing a team sport rather than a solo developer activity.

Strength: Clean, modern UX with strong collaboration features. Low learning curve for team onboarding.
Weakness: AI capabilities are still catching up to competitors. Smaller plugin ecosystem.
Pricing: Free tier. Paid plans from $15/user/month.
Best for: Small-to-mid teams that want collaborative API testing with a modern interface.

Bruno -- Open-Source and Git-Native

Bruno stores API collections as plain files on your filesystem using the Bru markup language. No cloud account required. Collections live in your Git repo alongside your code, making API tests part of your version control workflow.

Bruno runs offline, is fully open source (MIT), and has a growing plugin ecosystem. The CLI (bru run) integrates with CI/CD pipelines. While AI features are community-driven rather than built-in, the Git-native workflow means you can pair Bruno with any external AI tool for test generation.

Strength: 100% open source. Git-native by design. No cloud dependency. Fast and lightweight.
Weakness: AI features are community-driven, not as polished as commercial tools. Smaller ecosystem.
Pricing: Free and open source. Golden Edition at $19 one-time for extra features.
Best for: Developers who want an open-source Postman alternative where API collections live in Git.

Hoppscotch -- Lightweight and Fast

Hoppscotch is a browser-based, open-source API development platform built for speed. The interface is minimal and responsive. It supports HTTP, GraphQL, WebSocket, and real-time testing, with self-hosting available for teams that want full control.

Hoppscotch includes AI-assisted test generation and real-time collaboration. The CLI enables CI/CD integration. For individual developers or small teams that want to test APIs without installing anything, it is hard to beat.

Strength: Lightning fast. Beautiful, minimal UI. Open source and self-hostable. Browser-based with no install.
Weakness: Fewer enterprise features. AI capabilities are less mature than Postman or Kusho.
Pricing: Free and open source. Enterprise plans available for self-hosted deployments.
Best for: Individual developers and small teams who want a fast, free, no-install API testing tool.

How to Choose

Your decision comes down to three factors:

Do you need AI-first test generation? Kusho AI and Postman Agent Mode lead here. Kusho is purpose-built for it. Postman wraps it into a broader platform.
Is open source a requirement? Bruno and Hoppscotch are MIT-licensed. Bruno is Git-native by design. Hoppscotch is browser-first and self-hostable.
Are you an enterprise team? Katalon gives you unified API + UI + mobile testing with governance. Postman gives you the broadest platform with the new API Catalog.

For most individual developers shipping side projects, Hoppscotch or Bruno will cover your needs for free. For teams building production APIs, Postman or Kusho AI will save the most time with AI-generated tests. For enterprises standardizing across multiple test types, Katalon is worth evaluating.

If you are building AI agents that chain multiple API calls together, testing individual endpoints is only part of the puzzle. Tools like Nebula help you orchestrate and monitor multi-API agent workflows that sit on top of your tested endpoints -- pairing well with any of the tools above for your core API testing.

This post is part of the Developer Tool Showdowns series, where we compare tools developers actually use -- with honest trade-offs, not rankings we were paid for.

How to Build an MCP Server with Python in 5 Min

Nebula — Mon, 23 Mar 2026 21:01:37 +0000

You want to give Claude (or any MCP client) access to your own custom tools. Every Python tutorial you find is 2,000+ words and 15 steps. Here's a working MCP server with two tools in under 30 lines.

The Code

Create a file called notes_server.py:

from fastmcp import FastMCP

mcp = FastMCP("Notes")

# In-memory storage
notes: dict[str, str] = {}


@mcp.tool
def add_note(name: str, content: str) -> str:
    """Save a note with a given name and content."""
    notes[name] = content
    return f"Saved note '{name}'."


@mcp.tool
def search_notes(query: str) -> list[dict]:
    """Search notes by keyword. Returns all notes containing the query string."""
    results = [
        {"name": name, "content": content}
        for name, content in notes.items()
        if query.lower() in name.lower() or query.lower() in content.lower()
    ]
    return results if results else [{"message": f"No notes found for '{query}'"}]


if __name__ == "__main__":
    mcp.run()

That's the entire server. Two tools, fully typed, ready to connect.

Install and Run

Install FastMCP:

pip install fastmcp

Test it locally with the built-in inspector:

fastmcp dev notes_server.py:mcp

This opens a browser-based inspector where you can call add_note and search_notes directly. Try adding a note, then searching for it.

Connect to Claude Desktop

To use your server inside Claude Desktop, edit the config file.

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

Add your server:

{
  "mcpServers": {
    "notes": {
      "command": "python",
      "args": ["/full/path/to/notes_server.py"]
    }
  }
}

Restart Claude Desktop. You'll see a hammer icon in the chat input indicating your tools are connected. Ask Claude to "save a note about today's meeting" and it will call add_note on your server.

How It Works

FastMCP handles all the MCP protocol details. Here's what each piece does:

FastMCP("Notes") creates a server instance. The string is the server name that MCP clients display.

@mcp.tool registers a function as an MCP tool. FastMCP reads the function's type hints and docstring to generate the tool's schema automatically. The docstring becomes the tool description that the LLM reads when deciding which tool to call.

mcp.run() starts the server using the stdio transport (the default). Claude Desktop launches your script as a subprocess and communicates over stdin/stdout.

The notes dictionary is intentionally simple. In production, you'd swap this for a database or file storage. The pattern stays the same -- FastMCP doesn't care what your function does internally, only that it has type hints and returns a value.

Add More Tools

Extending the server is just adding more decorated functions:

@mcp.tool
def delete_note(name: str) -> str:
    """Delete a note by name."""
    if name in notes:
        del notes[name]
        return f"Deleted note '{name}'."
    return f"Note '{name}' not found."

No configuration, no registration steps. Decorate, restart, done.

Quick Reference

What	Command
Install	`pip install fastmcp`
Test locally	`fastmcp dev server.py:mcp`
Run with stdio	`python server.py`
Run with HTTP	`mcp.run(transport="http", port=8000)`
Client connect	`Client("http://localhost:8000/mcp")`

What's Next

This server loses notes when you restart it. For persistence, swap the dictionary for SQLite or a JSON file. If you're building agents that need to orchestrate multiple MCP servers across services, Nebula handles that coordination layer so you can focus on the tools themselves.

The full FastMCP docs are at gofastmcp.com. The Tools guide covers advanced patterns like async tools, Pydantic validation, and custom error handling.

AI Agent Error Handling: 4 Resilience Patterns in Python

Nebula — Mon, 23 Mar 2026 20:01:16 +0000

Your AI agent works flawlessly in development. Then it hits production, OpenAI returns a 429, your fallback prompt throws a validation error, and the entire pipeline crashes at 2 AM with nobody watching.

This is not a testing problem. It is an AI agent error handling problem. LLM APIs fail in ways traditional software never does -- rate limits, non-deterministic outputs, content policy rejections, and context window overflows are not edge cases. They are daily operational realities at any meaningful scale.

This guide covers four battle-tested resilience patterns -- retry with backoff, model fallback chains, circuit breakers, and graceful degradation -- with pure Python implementations you can drop into any project. No framework lock-in, no heavy dependencies.

Why AI Agents Fail Differently Than Traditional Software

Traditional APIs fail predictably. A database is down, you get a connection error. An auth token expires, you get a 401. You can write deterministic tests for these.

LLM-powered agents introduce a fundamentally different failure model:

Rate limits (429) hit unpredictably based on tokens-per-minute quotas that fluctuate with provider load
Context window overflow happens silently as your agent accumulates tool results and conversation history
Content policy rejections vary between providers and trigger on inputs you never anticipated
Response format drift occurs when providers update models -- your perfectly structured JSON prompt returns subtly different output
Partial or malformed responses break downstream parsing without throwing obvious errors

The critical insight: these failures are not bugs to eliminate. They are operational realities to engineer around. Every production AI agent needs a resilience layer between its business logic and the LLM APIs it depends on.

Here are the four patterns that provide that layer.

Pattern 1: Smart Retry with Exponential Backoff

Retries are your first line of defense against transient failures. But naive retries on LLM APIs are dangerous -- they amplify failures, waste tokens, and can drain your budget during an outage.

The key principle: not all errors deserve a retry. Retrying a permanent failure (bad API key, malformed request) wastes time and money. Failing fast on a transient error (rate limit, timeout) loses a request that would have succeeded on the second try.

Start by classifying errors:

from enum import Enum

class ErrorType(Enum):
    TRANSIENT = "transient"    # Retry with backoff
    PERMANENT = "permanent"    # Fail immediately
    DEGRADED = "degraded"      # Switch to fallback

def classify_error(error: Exception) -> ErrorType:
    """Classify an LLM API error to determine recovery strategy."""
    error_str = str(error).lower()
    status = getattr(error, 'status_code', None)

    # Transient: retry with backoff
    if status in (429, 500, 502, 503) or 'timeout' in error_str:
        return ErrorType.TRANSIENT

    # Degraded: switch to fallback model
    if 'context_length' in error_str or 'content_filter' in error_str:
        return ErrorType.DEGRADED

    # Permanent: fail immediately
    return ErrorType.PERMANENT

Now build the retry logic. The implementation uses exponential backoff with jitter -- the jitter prevents the "thundering herd" problem where multiple agent instances all retry at exactly the same intervals after a shared rate limit:

import time
import random
import logging

logger = logging.getLogger(__name__)

def retry_with_backoff(
    func,
    max_retries: int = 3,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    jitter: float = 1.0,
):
    """Retry a function with exponential backoff and jitter.

    Only retries on transient errors. Permanent errors fail immediately.
    Degraded errors are re-raised for the fallback layer to handle.
    """
    last_exception = None

    for attempt in range(max_retries + 1):
        try:
            return func()
        except Exception as e:
            last_exception = e
            error_type = classify_error(e)

            if error_type == ErrorType.PERMANENT:
                logger.error(f"Permanent error, not retrying: {e}")
                raise

            if error_type == ErrorType.DEGRADED:
                logger.warning(f"Degraded error, passing to fallback: {e}")
                raise

            if attempt == max_retries:
                logger.error(f"All {max_retries} retries exhausted: {e}")
                raise

            # Exponential backoff: 1s, 2s, 4s... capped at max_delay
            delay = min(base_delay * (2 ** attempt), max_delay)
            # Add random jitter to prevent thundering herd
            delay += random.uniform(0, jitter)

            logger.warning(
                f"Transient error (attempt {attempt + 1}/{max_retries}): {e}. "
                f"Retrying in {delay:.1f}s"
            )
            time.sleep(delay)

    raise last_exception

Usage with any LLM provider:

import openai

client = openai.OpenAI()

def call_llm():
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain circuit breakers"}],
        timeout=30,
    )

# Retries transient errors up to 3 times with backoff
response = retry_with_backoff(call_llm, max_retries=3)

Two details that matter in production:

Always set a timeout on LLM calls. A request that hangs for 5 minutes during a retry cycle blocks your entire agent pipeline. 30 seconds is a reasonable default.
Track token spend across retries. Three retries of a 4K-token prompt cost 12K tokens. Add a budget cap if your agent runs autonomously.

Pattern 2: Model Fallback Chains

Retries handle transient failures within a single provider. But what happens when the provider itself is down, or when a content policy rejection is provider-specific, or when you need a model with a larger context window?

Fallback chains route requests to alternative models automatically when the primary fails:

from dataclasses import dataclass
from typing import Callable, Any

@dataclass
class ModelConfig:
    name: str
    call_fn: Callable
    cost_per_1k_tokens: float  # Track cost at each tier

class FallbackChain:
    """Routes LLM requests through a prioritized chain of models.

    Each model gets retry_with_backoff protection. If retries exhaust,
    the chain moves to the next model.
    """

    def __init__(self, models: list[ModelConfig], max_retries: int = 2):
        self.models = models
        self.max_retries = max_retries

    def call(self, messages: list[dict], **kwargs) -> dict:
        errors = []

        for i, model in enumerate(self.models):
            try:
                result = retry_with_backoff(
                    lambda m=model: m.call_fn(messages, **kwargs),
                    max_retries=self.max_retries,
                )
                if i > 0:
                    logger.info(
                        f"Fallback succeeded: {model.name} "
                        f"(after {i} failed model(s))"
                    )
                return {
                    "content": self._extract_content(result, model.name),
                    "model": model.name,
                    "fallback_used": i > 0,
                }
            except Exception as e:
                errors.append({"model": model.name, "error": str(e)})
                logger.warning(f"Model {model.name} failed: {e}")
                # Permanent errors (auth, bad request) should not fall through
                error_type = classify_error(e)
                if error_type == ErrorType.PERMANENT:
                    raise
                continue

        raise RuntimeError(f"All {len(self.models)} models failed: {errors}")

    def _extract_content(self, result, model_name: str) -> str:
        """Normalize response format across providers."""
        # OpenAI format
        if hasattr(result, 'choices'):
            return result.choices[0].message.content
        # Anthropic format
        if hasattr(result, 'content'):
            return result.content[0].text
        # Dict format
        if isinstance(result, dict):
            return result.get('content', str(result))
        return str(result)

Set up a practical fallback chain:

import openai
import anthropic

oai = openai.OpenAI()
anth = anthropic.Anthropic()

def call_gpt4o(messages, **kwargs):
    return oai.chat.completions.create(
        model="gpt-4o", messages=messages, timeout=30, **kwargs
    )

def call_claude_sonnet(messages, **kwargs):
    system = next((m["content"] for m in messages if m["role"] == "system"), "")
    user_msgs = [m for m in messages if m["role"] != "system"]
    return anth.messages.create(
        model="claude-sonnet-4-20250514", system=system,
        messages=user_msgs, max_tokens=4096, timeout=30,
    )

def call_gpt4o_mini(messages, **kwargs):
    return oai.chat.completions.create(
        model="gpt-4o-mini", messages=messages, timeout=30, **kwargs
    )

chain = FallbackChain([
    ModelConfig("gpt-4o", call_gpt4o, cost_per_1k_tokens=0.005),
    ModelConfig("claude-sonnet", call_claude_sonnet, cost_per_1k_tokens=0.003),
    ModelConfig("gpt-4o-mini", call_gpt4o_mini, cost_per_1k_tokens=0.00015),
])

# Automatically falls through: GPT-4o -> Claude -> GPT-4o-mini
result = chain.call([{"role": "user", "content": "Analyze this data..."}])
print(f"Answered by: {result['model']}, fallback: {result['fallback_used']}")

The fallback order matters. Organize by: quality first, then different provider, then cost-optimized. If GPT-4o is rate-limited, Claude Sonnet (different provider) will likely succeed. GPT-4o-mini is the last resort -- cheaper, faster, lower quality, but always available.

One design decision worth highlighting: the FallbackChain wraps each model call in retry_with_backoff. This means each model gets its own retry attempts before the chain moves on. Retries handle transient blips; fallbacks handle sustained outages.

Pattern 3: Circuit Breaker for Tool Calls

Retries and fallbacks handle individual request failures. Circuit breakers solve a different problem: what happens when a provider or tool is down for 10 minutes and every request in your system wastes 30 seconds retrying before failing?

Without a circuit breaker, a flaky external API turns every agent request into a slow failure. Your users wait, your token budget burns, and the struggling provider gets hammered with retry traffic that prevents recovery.

A circuit breaker monitors failure rates and "trips" when they exceed a threshold, immediately rejecting requests instead of attempting them:

import time
import threading

class CircuitBreaker:
    """Prevents cascading failures by fast-failing when a service is down.

    States:
        CLOSED  - Normal operation, requests pass through
        OPEN    - Service is down, requests fail immediately  
        HALF_OPEN - Testing if service recovered (one probe request)
    """

    def __init__(
        self,
        name: str,
        failure_threshold: int = 5,
        reset_timeout: float = 60.0,
        success_threshold: int = 2,
    ):
        self.name = name
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.success_threshold = success_threshold

        self._state = "CLOSED"
        self._failure_count = 0
        self._success_count = 0
        self._last_failure_time = 0.0
        self._lock = threading.Lock()

    @property
    def state(self) -> str:
        with self._lock:
            if self._state == "OPEN":
                # Check if reset timeout has elapsed
                if time.time() - self._last_failure_time >= self.reset_timeout:
                    self._state = "HALF_OPEN"
                    self._success_count = 0
            return self._state

    def call(self, func, *args, **kwargs):
        """Execute function through circuit breaker protection."""
        current_state = self.state

        if current_state == "OPEN":
            raise CircuitOpenError(
                f"Circuit '{self.name}' is OPEN. "
                f"Service unavailable, retrying in "
                f"{self.reset_timeout - (time.time() - self._last_failure_time):.0f}s"
            )

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        with self._lock:
            if self._state == "HALF_OPEN":
                self._success_count += 1
                if self._success_count >= self.success_threshold:
                    self._state = "CLOSED"
                    self._failure_count = 0
                    logger.info(f"Circuit '{self.name}' CLOSED (recovered)")
            else:
                self._failure_count = 0

    def _on_failure(self):
        with self._lock:
            self._failure_count += 1
            self._last_failure_time = time.time()

            if self._state == "HALF_OPEN":
                self._state = "OPEN"
                logger.warning(f"Circuit '{self.name}' re-OPENED (probe failed)")
            elif self._failure_count >= self.failure_threshold:
                self._state = "OPEN"
                logger.warning(
                    f"Circuit '{self.name}' OPENED "
                    f"after {self._failure_count} consecutive failures"
                )


class CircuitOpenError(Exception):
    """Raised when a circuit breaker is open."""
    pass

The state machine is simple but powerful:

CLOSED (normal)     -- failures hit threshold -->  OPEN (fast-fail)
                                                      |
                                                  timeout expires
                                                      |
                                                  HALF_OPEN (probe)
                                                   /        \
                                              success       failure
                                                /              \
                                           CLOSED             OPEN

Use a separate circuit breaker for each external dependency:

# One breaker per service -- never share across providers
openai_breaker = CircuitBreaker("openai", failure_threshold=5, reset_timeout=60)
search_breaker = CircuitBreaker("web-search", failure_threshold=3, reset_timeout=30)
db_breaker = CircuitBreaker("database", failure_threshold=3, reset_timeout=45)

def agent_search(query: str) -> list[dict]:
    """Agent tool: web search with circuit breaker protection."""
    try:
        return search_breaker.call(web_search_api, query)
    except CircuitOpenError:
        logger.warning("Search unavailable, using cached results")
        return get_cached_results(query)
    except Exception:
        return []  # Graceful degradation: empty results, not a crash

The critical detail: one breaker per external dependency. If OpenAI is down, you do not want the breaker to block Anthropic calls too. And the success_threshold=2 parameter prevents a single lucky request from restoring full traffic to an unstable service.

Pattern 4: Graceful Degradation

Sometimes everything fails. Your primary model is rate-limited, the fallback provider is down, and the circuit breaker is open. Traditional error handling crashes. Graceful degradation delivers something useful instead of nothing.

The principle: users tolerate reduced capability far more than they tolerate crashes or hung requests.

from dataclasses import dataclass
from typing import Optional

@dataclass
class AgentResponse:
    content: str
    quality_tier: str    # "full", "reduced", "cached", "static"
    model_used: str
    warning: Optional[str] = None

class ResilientAgent:
    """Agent with tiered degradation: full -> reduced -> cached -> static."""

    def __init__(self, fallback_chain: FallbackChain, cache: dict = None):
        self.chain = fallback_chain
        self.cache = cache or {}

    def run(self, messages: list[dict]) -> AgentResponse:
        # Tier 1: Full capability via fallback chain
        try:
            result = self.chain.call(messages)
            # Cache successful responses for future degradation
            cache_key = messages[-1]["content"][:100]
            self.cache[cache_key] = result["content"]
            return AgentResponse(
                content=result["content"],
                quality_tier="full" if not result["fallback_used"] else "reduced",
                model_used=result["model"],
            )
        except RuntimeError:
            pass  # All models failed

        # Tier 2: Cached response from similar previous query
        cache_key = messages[-1]["content"][:100]
        if cache_key in self.cache:
            return AgentResponse(
                content=self.cache[cache_key],
                quality_tier="cached",
                model_used="cache",
                warning="This response is from cache and may be outdated.",
            )

        # Tier 3: Static fallback -- honest about limitations
        return AgentResponse(
            content=(
                "I'm experiencing temporary difficulties connecting to AI services. "
                "Please try again in a few minutes. If this persists, check "
                "https://status.openai.com for provider status."
            ),
            quality_tier="static",
            model_used="none",
            warning="All AI services are currently unavailable.",
        )

The quality_tier field is important for downstream logic. Your application can make decisions based on response quality:

agent = ResilientAgent(chain)
response = agent.run([{"role": "user", "content": "Summarize today's metrics"}])

if response.quality_tier == "static":
    # Don't send automated reports with static fallback content
    notify_ops_team("Agent degraded, manual review needed")
elif response.quality_tier == "cached":
    # Send the report but flag it
    send_report(response.content, caveat="Based on cached data")
else:
    send_report(response.content)

Putting It All Together: A Resilient Agent Pipeline

The real power comes from composing all four patterns into a layered defense. Here is the execution order from outermost to innermost:

  Your Agent Logic
       |
  Graceful Degradation (always returns something)
       |
  Fallback Chain (tries alternative models)
       |
  Circuit Breaker (fast-fails during outages)  
       |
  Retry with Backoff (handles transient errors)
       |
  LLM Provider API

Here is a complete, working pipeline that wires everything together:

def build_resilient_agent() -> ResilientAgent:
    """Build an agent with all four resilience patterns composed."""

    # Layer 1: Circuit breakers per provider
    oai_breaker = CircuitBreaker("openai", failure_threshold=5, reset_timeout=60)
    anth_breaker = CircuitBreaker("anthropic", failure_threshold=5, reset_timeout=60)

    # Layer 2: Provider calls wrapped with circuit breakers
    oai_client = openai.OpenAI()
    anth_client = anthropic.Anthropic()

    def gpt4o_with_breaker(messages, **kwargs):
        return oai_breaker.call(
            lambda: oai_client.chat.completions.create(
                model="gpt-4o", messages=messages, timeout=30, **kwargs
            )
        )

    def claude_with_breaker(messages, **kwargs):
        system = next((m["content"] for m in messages if m["role"] == "system"), "")
        user_msgs = [m for m in messages if m["role"] != "system"]
        return anth_breaker.call(
            lambda: anth_client.messages.create(
                model="claude-sonnet-4-20250514", system=system,
                messages=user_msgs, max_tokens=4096, timeout=30,
            )
        )

    def gpt4o_mini_with_breaker(messages, **kwargs):
        return oai_breaker.call(
            lambda: oai_client.chat.completions.create(
                model="gpt-4o-mini", messages=messages, timeout=30, **kwargs
            )
        )

    # Layer 3: Fallback chain with retry built in
    chain = FallbackChain(
        models=[
            ModelConfig("gpt-4o", gpt4o_with_breaker, 0.005),
            ModelConfig("claude-sonnet", claude_with_breaker, 0.003),
            ModelConfig("gpt-4o-mini", gpt4o_mini_with_breaker, 0.00015),
        ],
        max_retries=2,
    )

    # Layer 4: Graceful degradation wraps everything
    return ResilientAgent(chain)


# Usage
agent = build_resilient_agent()
response = agent.run([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What are the key trends in AI this week?"},
])

print(f"Quality: {response.quality_tier}")
print(f"Model: {response.model_used}")
print(f"Response: {response.content[:200]}")

Notice how the layers compose: retry happens inside the circuit breaker, which happens inside the fallback chain, which happens inside the degradation wrapper. If retries exhaust their attempts, the circuit breaker records a failure. After enough failures, the circuit opens and the fallback chain skips that provider entirely -- no retries, no waiting.

Quick Reference: When to Use Each Pattern

Pattern	Best For	Avoid When
Exponential Backoff	Rate limits, transient 5xx errors	Permanent failures (auth, bad request)
Model Fallback	Provider outages, cost optimization	Task needs a specific model's capabilities
Circuit Breaker	Flaky external APIs, sustained outages	Internal computations that don't call external services
Graceful Degradation	Multi-source tasks, user-facing agents	Binary success/fail operations (payments, writes)

Key Metrics to Track

Resilience patterns are only as good as your ability to observe them. Track these in production:

Retry rate per provider -- Spike above 20%? Something is degraded upstream. Set alerts.
Fallback activation rate -- If your primary model fails more than 10% of the time, reconsider your provider choice.
Circuit breaker state changes -- Every OPEN/CLOSE transition should trigger an alert. Frequent cycling means an unstable dependency.
Degradation tier distribution -- What percentage of responses are served from cache or static fallback? This is the real quality metric your users experience.
Cost per successful request -- Fallbacks to more expensive models inflate costs. Track this to catch budget overruns before they become a problem.

Wrapping Up

Production AI agents need resilience as a first-class architectural concern, not an afterthought bolted on after the first 2 AM outage. The four patterns in this guide -- retry with backoff, model fallbacks, circuit breakers, and graceful degradation -- form a defense-in-depth strategy that keeps your agents running when everything around them is breaking.

The code in this article is framework-agnostic Python you can drop into any project. Start with retry + classify (the highest-ROI pattern), add fallback chains when you depend on a single provider, and layer in circuit breakers when your agents call external tools at scale.

If you want to skip building this resilience plumbing yourself, platforms like Nebula handle retry logic, model fallbacks, and tool circuit breakers at the infrastructure level -- so you can focus on what your agent does instead of how it recovers.

The complete code from this article is ready to copy-paste. Build something resilient.

Top 6 AI Agent Memory Frameworks for Devs (2026)

Nebula — Sun, 22 Mar 2026 22:02:32 +0000

TL;DR: Pick Mem0 for the broadest standalone memory layer, Zep for temporal-aware production pipelines, Letta for long-running agents that need unlimited memory, Cognee for knowledge-graph-first RAG workflows, LangChain Memory if you're already on LangChain, or LlamaIndex Memory for document-heavy retrieval agents.

Your AI agent forgets everything between sessions. A user says "use the same format as last time" and the agent has no idea what that means. A support bot asks the same clarifying questions it asked yesterday. A procurement agent makes the same mistake a human corrected last week.

The fix is a memory layer -- something that extracts knowledge from interactions, stores it durably, and retrieves it when relevant. But "memory" means wildly different things depending on which framework you pick: a conversation buffer, a vector store, a knowledge graph, or a full extraction engine.

Here is how the six most popular frameworks compare for developers building agents in 2026.

Quick Comparison

Feature	Mem0	Zep	Letta	Cognee	LangChain	LlamaIndex
Architecture	Vector+Graph+KV	Temporal KG	Tiered (OS-style)	KG+Vector pipelines	Multiple types	Composable modules
License	Apache 2.0	Open + Managed	Apache 2.0	Open core	MIT	MIT
GitHub Stars	~48K	~24K	~21K	~12K	Part of ecosystem	Part of ecosystem
Standalone	Yes	Yes	Yes	Yes	No (LangChain)	No (LlamaIndex)
Managed Cloud	Yes	Yes	Yes	Yes	Via LangSmith	Via LlamaCloud
Memory Focus	Personalization	Temporal + entities	Both (tiered)	Institutional knowledge	Conversation context	Document + conversation
Best For	Assistants, support	Production pipelines	Long-running agents	Research workflows	LangChain teams	Doc-heavy agents

Mem0 -- The Most Popular Standalone Memory

Mem0 is the most widely adopted standalone memory layer for AI agents, with roughly 48,000 GitHub stars and a multi-store architecture that combines vector search, graph relationships, and key-value storage.

Key strength: Adaptive memory updates. When a user corrects a preference, Mem0 updates the existing memory rather than creating a duplicate. It supports user-level, session-level, and agent-level memory scopes -- so one agent can maintain separate context for different users without cross-contamination.

Key weakness: Strongest for personalization (remembering user preferences and conversation context) but less mature for institutional knowledge -- the kind of accumulated operational learning that makes agents better at their jobs over time.

Best for: Personalized assistants, customer support agents, and B2B copilots where remembering user context across sessions is the primary requirement.

Pricing: Free and open source (Apache 2.0). Managed cloud available with a free tier.

Zep / Graphiti -- Best Temporal Awareness

Zep models memory as a temporal knowledge graph, meaning it tracks not just what happened but when it happened and how entities relate over time. Its open-source component, Graphiti, handles the graph construction.

Key strength: Time-aware retrieval. Zep understands that "Alice was the budget owner until February, then Bob took over" -- a distinction that flat vector stores miss entirely. It groups interactions into episodes with automatic summarization, so retrieval uses both relevance and recency.

Key weakness: The temporal graph architecture requires more infrastructure than simpler vector-only solutions. If your agent only needs basic conversation history, Zep's complexity may not be justified.

Best for: Production LLM pipelines where entities change over time -- CRM agents, project management assistants, and any workflow where "who owns what, and since when" matters.

Pricing: Graphiti is open source. Zep Cloud offers a managed service with usage-based pricing.

Letta (MemGPT) -- OS-Inspired Memory Management

Letta, originally known as MemGPT, takes the most architecturally unique approach: it models agent memory like an operating system. Main context is RAM (fast, limited), external storage is disk (slow, unlimited), and the agent itself decides when to page information in and out.

Key strength: Agents control their own memory through function calls -- reading, writing, searching, and archiving information explicitly. This means an agent can maintain effectively unlimited memory despite fixed context window constraints. The memory is transparent and developer-controllable.

Key weakness: The OS-inspired architecture has a steeper learning curve than simpler drop-in solutions. Setting up the tiered memory system and configuring the agent's memory management behavior requires more upfront investment.

Best for: Long-running conversational agents that accumulate knowledge over weeks or months, where context windows would otherwise become a bottleneck.

Pricing: Free and open source (Apache 2.0). Managed cloud available.

Cognee -- Knowledge Graph From Unstructured Data

Cognee approaches memory as a pipeline: ingest raw data, extract structure, build a knowledge graph, and retrieve with precision. It blurs the line between RAG and agent memory in a productive way.

Key strength: Builds knowledge graphs automatically from unstructured data -- documents, conversations, and external sources. Retrieval combines graph traversal with vector search, so the system understands relationships between concepts, not just similarity between text chunks.

Key weakness: More pipeline-oriented than plug-and-play. Cognee is designed for teams that want to process and structure data before retrieval, which adds setup complexity compared to frameworks that work directly with conversation history.

Best for: RAG-heavy research workflows, institutional knowledge bases, and agents that need to reason over relationships between entities (authors, papers, concepts, projects).

Pricing: Open core with a free tier. Managed cloud available for enterprise.

LangChain Memory -- Best Ecosystem Integration

LangChain Memory provides multiple memory types within the LangChain ecosystem: conversation buffer, summary memory, entity memory, and vector-backed memory. You pick the strategy that fits your use case.

Key strength: Flexibility within the ecosystem. You can swap between conversation buffer (simple, keeps everything), summary memory (compresses old messages), entity memory (tracks named entities), and vector memory (semantic search over history) -- all with the same API. Works seamlessly with LangGraph for stateful agent workflows.

Key weakness: Tied to the LangChain ecosystem. If you are not already using LangChain or LangGraph, adopting their memory module means adopting their entire abstraction layer. Less standalone capability than Mem0 or Zep.

Best for: Teams already building on LangChain or LangGraph who want integrated memory without adding another vendor.

Pricing: Free and open source (MIT). LangSmith (observability) starts at $39/seat/month.

LlamaIndex Memory -- Best for Document-Heavy Agents

LlamaIndex Memory combines chat history with document context, making it particularly strong for agents that need to remember both what was discussed and what documents were referenced.

Key strength: Composable memory modules that work with LlamaIndex's query engines. Your agent can do semantic search over past conversations AND over the documents those conversations referenced -- unified retrieval across both data types.

Key weakness: Like LangChain Memory, it is ecosystem-dependent. The memory capabilities are tightly integrated with LlamaIndex's data structures and query engines, making standalone usage impractical.

Best for: Knowledge-intensive agents that work with large document collections -- research assistants, legal document reviewers, and technical documentation bots.

Pricing: Free and open source (MIT). LlamaCloud offers managed hosting.

How to Choose

The decision comes down to two questions: what kind of memory do you need and what are you already using?

Need standalone memory you can plug into any agent? Start with Mem0. It covers the widest range of use cases with the lowest integration friction.
Need to track how entities and relationships change over time? Zep is purpose-built for temporal awareness.
Building a long-running agent that manages its own context? Letta gives agents explicit control over their memory lifecycle.
Want to build a knowledge graph from raw documents? Cognee turns unstructured data into structured, retrievable knowledge.
Already on LangChain or LangGraph? Use LangChain Memory -- it integrates natively.
Building document-heavy retrieval agents? LlamaIndex Memory unifies conversation and document retrieval.

If you are building agents that orchestrate across multiple services and want memory handled for you rather than managing a separate framework, platforms like Nebula include persistent agent memory as part of the runtime -- your agents retain context across sessions without additional infrastructure.

The important thing is to stop building stateless agents. Pick a memory layer, give your agent a past, and watch it get better at its job over time.

How to Get Structured Output from Any LLM in 5 Min

Nebula — Sun, 22 Mar 2026 21:03:30 +0000

You asked an LLM to extract contact info from an email. It returned a wall of text instead of clean data. Now you're writing regex to parse a response that changes format every time.

There's a better way. PydanticAI's output_type parameter forces any LLM to return typed, validated data -- no parsing required.

The Code

import asyncio
from pydantic import BaseModel, Field
from pydantic_ai import Agent


class ContactInfo(BaseModel):
    """Structured contact details extracted from text."""
    name: str = Field(description="Full name of the person")
    email: str = Field(description="Email address")
    company: str = Field(description="Company or organization")
    role: str = Field(description="Job title or role")


agent = Agent(
    'openai:gpt-4o',
    output_type=ContactInfo,
    instructions='Extract contact information from the provided text.',
)

raw_text = """
Hey, just met Sarah Chen at the DevTools Summit.
She's the VP of Engineering at Acme Corp.
Her email is sarah.chen@acmecorp.io -- said she's
interested in our API. Follow up next week.
"""

result = agent.run_sync(raw_text)

print(result.output)
#> name='Sarah Chen' email='sarah.chen@acmecorp.io' company='Acme Corp' role='VP of Engineering'

print(result.output.name)    # Sarah Chen
print(result.output.email)   # sarah.chen@acmecorp.io
print(result.output.company) # Acme Corp

That's it. No regex. No JSON parsing. No retry loops for malformed output.

How It Works

Define your schema. ContactInfo is a standard Pydantic BaseModel. The Field(description=...) hints tell the LLM what each field should contain. Pydantic validates the response automatically -- if the LLM returns garbage, you get a clear validation error instead of silent corruption.

Set output_type on the Agent. This is the key line. When you pass output_type=ContactInfo, PydanticAI registers a tool with the LLM whose parameters match your model's JSON schema. The LLM is forced to call that tool, so it can't return plain text.

Access typed fields directly. result.output isn't a dict or a string -- it's a ContactInfo instance. Your IDE gives you autocomplete. Your type checker catches mistakes. Your downstream code gets clean data every time.

Handling Multiple Output Types

Sometimes the LLM can't extract the data you need. Instead of letting it hallucinate, give it an escape hatch:

class ExtractionFailed(BaseModel):
    """Use when contact info cannot be extracted."""
    reason: str

agent = Agent(
    'openai:gpt-4o',
    output_type=[ContactInfo, ExtractionFailed],
    instructions='Extract contact info. If the text has no contact details, explain why.',
)

result = agent.run_sync('The weather in Tokyo is sunny today.')
print(result.output)
#> reason='The text contains weather information but no contact details such as name, email, company, or role.'

Pass a list of types to output_type and PydanticAI registers each as a separate tool. The LLM picks the right one. You check isinstance(result.output, ContactInfo) in your code and handle each case.

Why This Matters

Structured output is the bridge between "cool LLM demo" and "production agent." Every multi-step agent workflow depends on it -- one agent extracts data, the next agent acts on it. If the first agent returns unstructured text, the whole pipeline breaks.

PydanticAI handles the hard parts: schema generation, tool registration, response validation, and automatic retries when the model returns invalid data. You just define a BaseModel and go.

If you're building agents that chain structured outputs across multiple steps, platforms like Nebula handle tool orchestration and output routing so you can focus on the agent logic.

Quick Reference

What you need	How to do it
Single structured type	`output_type=MyModel`
Multiple possible types	`output_type=[TypeA, TypeB]`
Structured + plain text fallback	`output_type=[MyModel, str]`
Custom tool names	`output_type=ToolOutput(MyModel, name='...')`

Install and try it now:

pip install pydantic-ai

For more AI agent patterns, check out the other articles in the AI Agent Quick Tips series -- including LLM fallbacks, guardrails, and agent memory.

Event-Driven AI Agents: Patterns That Scale

Nebula — Sun, 22 Mar 2026 20:02:07 +0000

Most AI agent tutorials teach you to build a chatbot that waits for user input. But production agents do not wait -- they react. A deploy finishes and your agent runs smoke tests. A customer signs up and your agent sends a personalized onboarding sequence. A monitoring threshold trips and your agent pages the on-call engineer before a human even notices.

The architecture that makes this possible is event-driven design. And getting it right is the difference between agents that demo well and agents that run your operations.

This guide covers four event-driven architecture patterns for AI agents, each with runnable Python code you can adapt today. No vendor lock-in, no Kafka required, no enterprise sales pitch -- just patterns that work.

Why Polling Fails for Production Agents

Before diving into patterns, let's be clear about why the default approach breaks down.

Polling is the naive solution: your agent checks a database, API, or inbox on a timer. "Any new emails? No? Check again in 30 seconds." It works in demos. It fails in production for three reasons:

Wasted compute. Your agent burns CPU and API quota checking for changes that have not happened. At scale, this adds up fast.
Latency floor. Your response time equals your polling interval. A 30-second poll means up to 30 seconds of delay on every event. For incident response, that is an eternity.
Quadratic connections. If N agents each poll M services, you have N x M connections. Add agents, and the system becomes unmanageable.

Event-driven architecture eliminates all three problems. Agents subscribe to event streams and react only when something actually happens. Connection complexity drops from O(N x M) to O(N + M). Latency drops to milliseconds. Compute is spent on real work, not checking.

Research from production deployments shows event-driven systems reduce AI agent response latency by 70-90% compared to polling approaches. That is not a theoretical improvement -- it is the difference between catching an outage in 200ms versus discovering it 30 seconds later.

Pattern 1: Event Queue with Worker Agents

The simplest event-driven pattern: events go into a queue, worker agents pull and process them. This is your starting point for any event-driven agent system.

When to Use It

Single-purpose agents that process one type of event
Workloads where ordering matters (FIFO processing)
Systems where you need guaranteed delivery (no dropped events)

Implementation

Here is a minimal but production-ready implementation using Redis Streams (you could swap in RabbitMQ, SQS, or any message broker):

import asyncio
import json
import redis.asyncio as redis
from datetime import datetime
from openai import AsyncOpenAI

client = AsyncOpenAI()
rdb = redis.Redis(host="localhost", port=6379, decode_responses=True)

STREAM = "agent:events"
GROUP = "agent-workers"
CONSUMER = "worker-1"

async def ensure_group():
    """Create consumer group if it does not exist."""
    try:
        await rdb.xgroup_create(STREAM, GROUP, id="0", mkstream=True)
    except redis.ResponseError as e:
        if "BUSYGROUP" not in str(e):
            raise

async def publish_event(event_type: str, payload: dict):
    """Publish an event to the stream."""
    event = {
        "type": event_type,
        "payload": json.dumps(payload),
        "timestamp": datetime.utcnow().isoformat(),
    }
    event_id = await rdb.xadd(STREAM, event)
    print(f"Published {event_type} -> {event_id}")
    return event_id

async def process_event(event_id: str, event: dict):
    """Route event to the appropriate AI handler."""
    event_type = event["type"]
    payload = json.loads(event["payload"])

    handlers = {
        "deploy.completed": handle_deploy,
        "alert.triggered": handle_alert,
        "email.received": handle_email,
    }

    handler = handlers.get(event_type)
    if handler:
        await handler(payload)
    else:
        print(f"No handler for event type: {event_type}")

    # Acknowledge the event so it is not redelivered
    await rdb.xack(STREAM, GROUP, event_id)

async def handle_deploy(payload: dict):
    """AI agent handles deployment verification."""
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a deployment verification agent."},
            {"role": "user", "content": f"Verify this deployment: {json.dumps(payload)}"}
        ],
    )
    print(f"Deploy check: {response.choices[0].message.content[:100]}")

async def handle_alert(payload: dict):
    """AI agent triages monitoring alerts."""
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are an incident triage agent. Classify severity and suggest next steps."},
            {"role": "user", "content": f"Alert: {json.dumps(payload)}"}
        ],
    )
    print(f"Alert triage: {response.choices[0].message.content[:100]}")

async def worker_loop():
    """Main worker loop: pull events, process, repeat."""
    await ensure_group()
    print(f"Worker {CONSUMER} listening on {STREAM}...")

    while True:
        # Block for up to 5 seconds waiting for new events
        messages = await rdb.xreadgroup(
            GROUP, CONSUMER, {STREAM: ">"}, count=1, block=5000
        )
        for stream_name, events in messages:
            for event_id, event_data in events:
                try:
                    await process_event(event_id, event_data)
                except Exception as e:
                    print(f"Error processing {event_id}: {e}")
                    # Event stays unacknowledged -> will be reclaimed

if __name__ == "__main__":
    asyncio.run(worker_loop())

This gives you several production features out of the box:

Consumer groups: Multiple worker agents split the load. Add workers to scale horizontally.
Acknowledgments: Events are not removed until explicitly acknowledged. If a worker crashes, unacknowledged events get redelivered.
Backpressure: Workers pull events at their own pace. A spike in events queues up instead of overwhelming your agents.

Scaling It

To add more workers, just run additional instances with different CONSUMER names. Redis handles partition assignment automatically. Three workers processing deploy events? Each picks up roughly one-third of the load with zero configuration changes.

Pattern 2: Fan-Out for Parallel Agent Processing

Sometimes a single event needs to trigger multiple agents simultaneously. A new customer signs up and you need to: send a welcome email, provision their account, update the CRM, and notify the sales team. Sequentially, that takes 4x as long as it should.

Fan-out solves this by broadcasting one event to N subscribers, each processing in parallel.

Implementation

Redis Pub/Sub is the simplest fan-out mechanism. For durability, you would use multiple consumer groups on the same Redis Stream (each group gets every message independently).

import asyncio
import json
import redis.asyncio as redis

rdb = redis.Redis(host="localhost", port=6379, decode_responses=True)
STREAM = "events:customer"

async def setup_fan_out():
    """Create independent consumer groups for parallel processing."""
    groups = ["email-agent", "provisioning-agent", "crm-agent", "sales-notifier"]
    for group in groups:
        try:
            await rdb.xgroup_create(STREAM, group, id="0", mkstream=True)
        except redis.ResponseError:
            pass  # Group already exists

async def agent_worker(group_name: str, handler):
    """Generic agent worker that processes events for its group."""
    consumer = f"{group_name}-1"
    print(f"[{group_name}] Listening...")

    while True:
        messages = await rdb.xreadgroup(
            group_name, consumer, {STREAM: ">"}, count=1, block=5000
        )
        for _, events in messages:
            for event_id, data in events:
                payload = json.loads(data.get("payload", "{}"))
                try:
                    await handler(payload)
                    await rdb.xack(STREAM, group_name, event_id)
                except Exception as e:
                    print(f"[{group_name}] Failed: {e}")

async def email_handler(payload):
    print(f"[email] Sending welcome to {payload.get('email')}")
    await asyncio.sleep(0.5)  # Simulate API call

async def provision_handler(payload):
    print(f"[provision] Creating workspace for {payload.get('user_id')}")
    await asyncio.sleep(1.0)

async def crm_handler(payload):
    print(f"[crm] Adding {payload.get('email')} to CRM")
    await asyncio.sleep(0.3)

async def sales_handler(payload):
    print(f"[sales] Notifying team about {payload.get('plan')} signup")
    await asyncio.sleep(0.2)

async def main():
    await setup_fan_out()
    # All agents run in parallel, each gets every event independently
    await asyncio.gather(
        agent_worker("email-agent", email_handler),
        agent_worker("provisioning-agent", provision_handler),
        agent_worker("crm-agent", crm_handler),
        agent_worker("sales-notifier", sales_handler),
    )

if __name__ == "__main__":
    asyncio.run(main())

The key insight: each consumer group maintains its own read cursor. Publishing one customer.signup event means all four agents process it independently. If the email agent is slow, it does not block the provisioning agent. If the CRM agent crashes, it resumes from its last acknowledged event without affecting the others.

When Fan-Out Gets Tricky

Fan-out is powerful but introduces a coordination challenge: what happens when multiple agents need to complete before a final action? For example, you want to send a "setup complete" email only after provisioning AND CRM updates both finish.

The cleanest solution is the completion event pattern: each agent emits a completion event when done. A coordinator agent subscribes to all completion events and triggers the final action when all prerequisites are met.

# Each agent emits when done:
await publish_event("provision.completed", {"user_id": uid})
await publish_event("crm.updated", {"user_id": uid})

# Coordinator checks if all steps are complete:
async def coordinator(payload):
    user_id = payload["user_id"]
    status = await rdb.hgetall(f"onboarding:{user_id}")
    if status.get("provisioned") and status.get("crm_updated"):
        await publish_event("onboarding.complete", {"user_id": user_id})

Pattern 3: Event Sourcing for Auditable Agent Decisions

In regulated industries or high-stakes applications, you need to know exactly what your agent did and why. Event sourcing records every state change as an immutable event, creating a complete audit trail that you can replay to reconstruct any past state.

This matters for AI agents because LLM outputs are stochastic. The same input can produce different outputs. When a customer asks "why did your agent reject my application?", you need to show the exact inputs, the exact model output, and the exact decision logic -- not a best guess.

Implementation

import json
import hashlib
from dataclasses import dataclass, asdict
from datetime import datetime
from typing import Any

@dataclass
class AgentEvent:
    event_id: str
    event_type: str
    agent_id: str
    timestamp: str
    payload: dict
    parent_event_id: str | None = None  # For causal chains
    checksum: str = ""

    def __post_init__(self):
        if not self.checksum:
            content = json.dumps(
                {"type": self.event_type, "payload": self.payload,
                 "agent": self.agent_id, "ts": self.timestamp},
                sort_keys=True,
            )
            self.checksum = hashlib.sha256(content.encode()).hexdigest()[:16]

class EventStore:
    """Append-only event store for agent decision auditing."""

    def __init__(self):
        self._events: list[AgentEvent] = []

    def append(self, event: AgentEvent):
        self._events.append(event)

    def get_agent_history(self, agent_id: str) -> list[AgentEvent]:
        return [e for e in self._events if e.agent_id == agent_id]

    def get_causal_chain(self, event_id: str) -> list[AgentEvent]:
        """Trace the full decision chain for a given event."""
        chain = []
        current_id = event_id
        while current_id:
            event = next((e for e in self._events if e.event_id == current_id), None)
            if not event:
                break
            chain.append(event)
            current_id = event.parent_event_id
        return list(reversed(chain))

    def replay_to(self, timestamp: str) -> list[AgentEvent]:
        """Get all events up to a point in time for state reconstruction."""
        return [e for e in self._events if e.timestamp <= timestamp]

# Usage: recording an agent's loan decision
store = EventStore()

# Step 1: Application received
store.append(AgentEvent(
    event_id="evt_001",
    event_type="application.received",
    agent_id="loan-reviewer",
    timestamp=datetime.utcnow().isoformat(),
    payload={"applicant": "user_42", "amount": 50000},
))

# Step 2: Agent analyzed credit data
store.append(AgentEvent(
    event_id="evt_002",
    event_type="credit.analyzed",
    agent_id="loan-reviewer",
    timestamp=datetime.utcnow().isoformat(),
    payload={"score": 720, "risk_level": "medium", "model": "gpt-4o",
             "prompt_hash": "a3f2c1d8", "raw_output": "Applicant shows..."},
    parent_event_id="evt_001",
))

# Step 3: Decision made
store.append(AgentEvent(
    event_id="evt_003",
    event_type="application.approved",
    agent_id="loan-reviewer",
    timestamp=datetime.utcnow().isoformat(),
    payload={"decision": "approved", "conditions": ["income_verification"]},
    parent_event_id="evt_002",
))

# Audit: trace how the decision was made
chain = store.get_causal_chain("evt_003")
for event in chain:
    print(f"{event.event_type}: {event.payload}")

The parent_event_id field creates a causal chain. Every agent decision links back to the event that triggered it. When auditors ask "how did the agent decide to approve this loan?", you walk the chain: application received -> credit analyzed (with exact model, prompt, and output) -> decision made.

Checksums for Tamper Detection

Notice the checksum field. Each event gets a SHA-256 hash of its content. If anyone modifies an event after the fact, the checksum will not match. This is essential for compliance in finance, healthcare, and legal applications where you need to prove the audit trail has not been altered.

Pattern 4: Saga Orchestration for Multi-Step Workflows

Real-world agent workflows span multiple steps, multiple services, and sometimes multiple days. A saga coordinates these long-running workflows and -- critically -- handles failures with compensating actions that undo partial work.

Consider an e-commerce fulfillment agent: it needs to charge the card, reserve inventory, schedule shipping, and send confirmation. If shipping fails, you need to release the inventory and refund the card. Without saga orchestration, partial failures leave your system in an inconsistent state.

Implementation

import asyncio
from dataclasses import dataclass, field
from enum import Enum
from typing import Callable, Any

class StepStatus(Enum):
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    FAILED = "failed"
    COMPENSATED = "compensated"

@dataclass
class SagaStep:
    name: str
    execute: Callable
    compensate: Callable  # Undo action if later steps fail
    status: StepStatus = StepStatus.PENDING
    result: Any = None
    error: str = ""

@dataclass
class Saga:
    name: str
    steps: list[SagaStep] = field(default_factory=list)
    context: dict = field(default_factory=dict)

    async def run(self) -> bool:
        """Execute all steps. On failure, compensate completed steps."""
        completed: list[SagaStep] = []

        for step in self.steps:
            step.status = StepStatus.RUNNING
            try:
                step.result = await step.execute(self.context)
                step.status = StepStatus.COMPLETED
                completed.append(step)
                print(f"  [ok] {step.name}")
            except Exception as e:
                step.status = StepStatus.FAILED
                step.error = str(e)
                print(f"  [FAIL] {step.name}: {e}")
                # Compensate all completed steps in reverse order
                await self._compensate(completed)
                return False

        return True

    async def _compensate(self, completed: list[SagaStep]):
        """Undo completed steps in reverse order."""
        print("  Rolling back...")
        for step in reversed(completed):
            try:
                await step.compensate(self.context)
                step.status = StepStatus.COMPENSATED
                print(f"  [undo] {step.name}")
            except Exception as e:
                print(f"  [undo-FAIL] {step.name}: {e}")
                # Log for manual intervention

# Define the workflow steps
async def charge_card(ctx):
    # Call payment API
    ctx["charge_id"] = "ch_abc123"
    return {"charged": 99.99}

async def refund_card(ctx):
    print(f"    Refunding charge {ctx.get('charge_id')}")

async def reserve_inventory(ctx):
    ctx["reservation_id"] = "res_xyz"
    return {"reserved": True}

async def release_inventory(ctx):
    print(f"    Releasing reservation {ctx.get('reservation_id')}")

async def schedule_shipping(ctx):
    # Simulate a failure
    raise Exception("Carrier API timeout")

async def cancel_shipping(ctx):
    print("    Cancelling shipping request")

async def main():
    saga = Saga(
        name="order-fulfillment",
        steps=[
            SagaStep("charge_card", charge_card, refund_card),
            SagaStep("reserve_inventory", reserve_inventory, release_inventory),
            SagaStep("schedule_shipping", schedule_shipping, cancel_shipping),
        ],
        context={"order_id": "order_789"},
    )

    print(f"Running saga: {saga.name}")
    success = await saga.run()
    print(f"Result: {'Success' if success else 'Rolled back'}")

if __name__ == "__main__":
    asyncio.run(main())

Output when shipping fails:

Running saga: order-fulfillment
  [ok] charge_card
  [ok] reserve_inventory
  [FAIL] schedule_shipping: Carrier API timeout
  Rolling back...
  [undo] reserve_inventory
    Releasing reservation res_xyz
  [undo] charge_card
    Refunding charge ch_abc123
Result: Rolled back

The saga pattern is essential for AI agents that interact with external services. LLM calls can timeout, APIs can return errors, and rate limits can hit at any point. Without compensating actions, every failure leaves your system in an unknown state.

Production Hardening: Retry, Dead Letters, and Observability

The four patterns above give you the architecture. But production systems need three more pieces to be reliable.

Retry with Exponential Backoff

Transient failures are common -- network blips, rate limits, cold starts. Retrying with exponential backoff handles them gracefully:

import asyncio
import random

async def retry_with_backoff(fn, max_retries=3, base_delay=1.0):
    """Retry a function with exponential backoff and jitter."""
    for attempt in range(max_retries + 1):
        try:
            return await fn()
        except Exception as e:
            if attempt == max_retries:
                raise  # Final attempt failed, propagate
            delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
            print(f"Retry {attempt + 1}/{max_retries} in {delay:.1f}s: {e}")
            await asyncio.sleep(delay)

The jitter (random addition) prevents thundering herd problems when multiple agents retry simultaneously against the same service.

Dead Letter Queues

When retries are exhausted, events go to a dead letter queue (DLQ) instead of being dropped silently. This gives you a safety net for manual investigation:

DLQ_STREAM = "agent:dead-letters"

async def send_to_dlq(event_id: str, event: dict, error: str):
    """Move a failed event to the dead letter queue."""
    await rdb.xadd(DLQ_STREAM, {
        "original_event_id": event_id,
        "original_stream": STREAM,
        "event_data": json.dumps(event),
        "error": error,
        "failed_at": datetime.utcnow().isoformat(),
        "retry_count": "3",
    })
    # Acknowledge the original event so it stops being redelivered
    await rdb.xack(STREAM, GROUP, event_id)

Check your DLQ daily. Patterns in dead-lettered events reveal systemic issues: if the same event type keeps failing, you have a bug, not a transient error.

Structured Logging for Agent Observability

Event-driven systems are harder to debug than request-response systems because there is no single request thread to follow. Structured logging with correlation IDs solves this:

import structlog

log = structlog.get_logger()

async def process_event(event_id: str, event: dict):
    logger = log.bind(
        event_id=event_id,
        event_type=event["type"],
        correlation_id=event.get("correlation_id", event_id),
    )
    logger.info("event.received")

    try:
        result = await handle(event)
        logger.info("event.processed", result=result)
    except Exception as e:
        logger.error("event.failed", error=str(e))
        raise

The correlation_id follows an event through the entire fan-out chain. When four agents process the same customer signup, you can filter logs by correlation ID to see the complete picture.

Choosing the Right Pattern

Here is a quick reference for matching problems to patterns:

Scenario	Pattern	Why
Process events one at a time	Queue + Workers	Simple, ordered, scalable
One event triggers multiple agents	Fan-Out	Parallel, independent processing
Compliance or audit requirements	Event Sourcing	Immutable, replayable trail
Multi-step workflows with rollback	Saga	Compensating actions on failure
High-throughput with mixed needs	Combine patterns	Queue for ingestion, fan-out for distribution

In practice, production systems combine patterns. You might use a queue for ingestion, fan-out for distribution to specialized agents, event sourcing for the audit trail, and sagas for workflows that span external services.

Platforms like Nebula handle much of this infrastructure for you -- event-driven triggers, automatic retries, and multi-agent coordination are built into the agent runtime, so you can focus on the agent logic rather than the plumbing. But understanding the patterns helps you debug issues and make better architectural decisions regardless of what platform you use.

What to Build Next

If you are building event-driven agents today, start with Pattern 1 (queue + workers) for your most common event type. Get the basics working: publish events, consume them, acknowledge them, handle failures.

Then add fan-out when you need parallel processing, event sourcing when you need auditability, and sagas when your workflows span multiple services.

The patterns in this guide are framework-agnostic by design. Whether you are using LangGraph, CrewAI, PydanticAI, or raw API calls, the event-driven architecture layer sits beneath your agent framework. It is the foundation that makes everything else reliable.

The best agents are not the ones with the smartest prompts. They are the ones that never drop an event, never leave a workflow half-finished, and never lose track of what they did and why.

This is Part 5 of the Building Production AI Agents series. Previous: Event-Driven AI Agent Architecture Patterns.

How to Build a Text-to-SQL Agent with Python in 10 Minutes

Nebula — Sat, 21 Mar 2026 21:02:36 +0000

You want to ask your database questions in plain English. Most tutorials make this harder than it needs to be — spinning up PostgreSQL, installing heavy ORMs, writing 200 lines of boilerplate.

Here's a text-to-SQL agent in under 40 lines of Python. It uses PydanticAI for the agent logic and SQLite so you don't need any database server.

The Code

import sqlite3
import asyncio
from pydantic_ai import Agent, RunContext, ModelRetry
from pydantic import BaseModel
from dataclasses import dataclass

# 1. Set up a sample SQLite database
conn = sqlite3.connect(":memory:")
conn.execute(\"\"\"CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT,
    department TEXT,
    salary INTEGER,
    hire_date TEXT
)\"\"\")
conn.executemany(
    "INSERT INTO employees (name, department, salary, hire_date) VALUES (?, ?, ?, ?)",
    [
        ("Alice", "Engineering", 120000, "2023-01-15"),
        ("Bob", "Marketing", 85000, "2023-06-01"),
        ("Carol", "Engineering", 135000, "2022-03-20"),
        ("Dave", "Sales", 90000, "2024-01-10"),
        ("Eve", "Engineering", 110000, "2024-07-01"),
    ],
)
conn.commit()


# 2. Define dependencies and output schema
@dataclass
class Deps:
    conn: sqlite3.Connection


class QueryResult(BaseModel):
    sql: str
    explanation: str
    rows: list[dict]


# 3. Create the agent
agent = Agent(
    "openai:gpt-4o-mini",
    deps_type=Deps,
    system_prompt=(
        "You are a SQL assistant. Given the 'employees' table with columns: "
        "id, name, department, salary, hire_date — generate a SELECT query "
        "for the user's question. Return ONLY the SQL query, nothing else."
    ),
)


# 4. Add a tool that runs the generated SQL
@agent.tool
async def run_query(ctx: RunContext[Deps], sql_query: str) -> str:
    \"\"\"Execute a SQL query against the employees database and return results.\"\"\"
    if not sql_query.strip().upper().startswith("SELECT"):
        raise ModelRetry("Only SELECT queries are allowed. Try again.")
    try:
        cursor = ctx.deps.conn.execute(sql_query)
        columns = [desc[0] for desc in cursor.description]
        rows = [dict(zip(columns, row)) for row in cursor.fetchall()]
        return f"Query: {sql_query}\nResults ({len(rows)} rows):\n{rows}"
    except sqlite3.Error as e:
        raise ModelRetry(f"SQL error: {e}. Fix the query and try again.")


# 5. Run it
async def main():
    questions = [
        "Who earns the most?",
        "How many engineers do we have?",
        "Show me everyone hired in 2024",
    ]
    deps = Deps(conn=conn)
    for q in questions:
        print(f"\n> {q}")
        result = await agent.run(q, deps=deps)
        print(result.output)


asyncio.run(main())

What's Happening

Lines 1-26: The database. We create an in-memory SQLite database with an employees table and five rows. No server install, no Docker — sqlite3 ships with Python.

Lines 29-35: Dependencies and schema. The Deps dataclass passes the database connection into the agent's tools. QueryResult defines what we expect back, though here we let the agent respond freely since the tool formats the output.

Lines 38-44: The agent. One line creates the agent with a model, dependency type, and system prompt. The system prompt tells the LLM exactly which table and columns exist. This is the key to accurate SQL generation — always give the agent your schema.

Lines 48-59: The tool. The @agent.tool decorator registers run_query as something the agent can call. When the user asks "Who earns the most?", the LLM generates SQL, then calls this tool to execute it. Two safety features are built in:

Only SELECT queries run (no drops or deletes)
SQL errors trigger ModelRetry, which tells the LLM to fix its query and try again

Lines 62-71: Run it. We loop through three questions. The agent generates SQL, runs it, and returns results.

Expected Output

> Who earns the most?
Carol in Engineering earns the most at $135,000.

> How many engineers do we have?
There are 3 engineers in the Engineering department.

> Show me everyone hired in 2024
Dave (Sales, hired 2024-01-10) and Eve (Engineering, hired 2024-07-01).

Make It Your Own

Swap the in-memory database for a real one by changing the connection string:

conn = sqlite3.connect("your_database.db")

For PostgreSQL, swap sqlite3 for asyncpg and update the tool's execute call. The agent logic stays identical.

The pattern here — agent + tool + retry — works for any data source. Swap the SQL tool for a REST API call, a CSV reader, or a vector search, and you have a different agent with the same clean architecture.

If you're building AI agents that need to coordinate multiple tools — database queries, API calls, and more — platforms like Nebula can help you orchestrate these workflows without managing infrastructure.

This is part of the AI Agent Quick Tips series. Previously: How to Build Your First MCP Server in 10 Minutes.

Agents vs Workflows: A Decision Framework for 2026

Nebula — Sat, 21 Mar 2026 20:04:44 +0000

You are building an internal tool. A user submits a form and six things need to happen: validate the input, enrich the data from two APIs, run a classification, route to the right team, and send a notification. Do you write a workflow or deploy an agent?

If you picked "agent" because it sounds more modern, you just added three weeks of debugging, 10x the cost per execution, and a system that breaks in ways you cannot reproduce.

If you picked "workflow" but the classification step requires judgment about ambiguous inputs, you just built a system that routes 30% of cases wrong and generates a backlog for humans to fix.

The answer depends on your problem, not the trend cycle. This article gives you a concrete decision framework — a tree you can walk through for any use case — so you stop guessing and start choosing the right architecture on the first try.

What Actually Separates Agents from Workflows

Both agents and workflows can use LLMs. Both can call APIs. Both can automate multi-step processes. The marketing makes them sound interchangeable, but they differ in one fundamental way: who decides the next step.

A workflow follows a path defined at design time. You draw the flowchart, write the branches, and the runtime executes exactly what you specified. Step A always leads to Step B. If the input matches condition X, take the left branch. The execution path is deterministic — you can predict it by reading the code.

# Workflow: the developer decides the path
def process_order(order):
    validated = validate_order(order)        # Step 1 — always
    enriched = enrich_customer(validated)     # Step 2 — always
    if enriched.total > 500:
        flag_for_review(enriched)            # Branch A
    else:
        fulfill_order(enriched)              # Branch B
    send_confirmation(enriched)              # Step 3 — always

An agent follows a path decided at runtime. You give it a goal, tools, and context. The LLM reasons about what to do next based on what just happened. The path emerges from the interaction — you cannot draw the flowchart in advance because it depends on intermediate results.

# Agent: the LLM decides the path
def handle_support_ticket(ticket):
    agent = Agent(
        goal="Resolve this support ticket or escalate with context",
        tools=[search_docs, check_account, query_logs, respond, escalate]
    )
    # The agent decides: maybe it searches docs first,
    # maybe it checks logs, maybe it escalates immediately.
    # The path depends on what it finds at each step.
    return agent.run(ticket)

There is also a middle ground that most teams overlook: a workflow with agent steps. The orchestration is deterministic — Step 1, then Step 2, then Step 3 — but one or more steps use an LLM to handle ambiguity within a bounded scope. This is what most production systems actually look like, and it is the pattern you should default to.

The Decision Tree

Walk through these five questions in order. The first "yes" answer tells you your architecture.

Question 1: Can you define every possible execution path before runtime?

If you can draw a complete flowchart — every branch, every condition, every edge case — before the system processes a single input, use a workflow. Most tasks fall here: CI/CD pipelines, order processing, data ETL, notification routing, scheduled reports.

→ Yes: Use a workflow. Stop here.

Question 2: Is the ambiguity limited to one or two steps?

If the overall process is predictable but one step requires judgment — classifying an input, summarizing a document, extracting entities from unstructured text — use a workflow with an LLM step. The workflow controls sequencing. The LLM handles the fuzzy part.

→ Yes: Use a workflow with an LLM step. Stop here.

Question 3: Does the next action depend on the result of the previous one in ways you cannot enumerate?

If investigating a bug requires checking logs, and what you find in the logs determines whether you check the database, the config, or the deployment history — and each of those results opens different investigation paths — you need an agent. The branching is too dynamic to predefine.

→ Yes: Use an agent for that subtask. Continue to Question 4.

Question 4: Can you contain the agent's scope?

An agent that triages support tickets using three tools is manageable. An agent that has access to your database, email, Slack, GitHub, and deployment pipeline is a liability. If you can limit the agent to a specific subtask with a bounded tool set, wrap it in a workflow.

→ Yes: Use a hybrid — workflow orchestration with a bounded agent step. Stop here.
→ No: Continue to Question 5.

Question 5: Is this a research or exploration task with no fixed deliverable format?

Open-ended research, competitive analysis, investigative debugging — tasks where the output shape depends on what the agent discovers — are the rare cases where a full agent loop makes sense. Even here, set a maximum iteration count and a timeout.

→ Yes: Use a full agent with guardrails. Set max iterations, cost caps, and human-in-the-loop for high-stakes actions.

For roughly 80% of production use cases, you will stop at Question 1 or 2. The remaining 20% will mostly land on Question 4 (hybrid). Full autonomous agents — Question 5 — represent maybe 2-3% of real production workloads.

The Hybrid Pattern in Practice

The most effective architecture in production is not pure workflow or pure agent. It is a workflow that delegates to agents only where reasoning is required.

Consider a customer support pipeline:

def support_pipeline(ticket):
    # Step 1: Agent — classify the ticket (needs judgment)
    classification = classify_agent.run(
        f"Classify this ticket: {ticket.subject}\n{ticket.body}",
        output_schema={"category": str, "priority": str, "sentiment": str}
    )

    # Step 2: Workflow — route based on classification (deterministic)
    if classification.priority == "critical":
        channel = "#incidents"
        notify_oncall(ticket)
    elif classification.category == "billing":
        channel = "#billing-support"
    else:
        channel = "#general-support"

    # Step 3: Agent — draft a response (needs judgment)
    draft = response_agent.run(
        f"Draft a response for this {classification.category} ticket. "
        f"Priority: {classification.priority}. Sentiment: {classification.sentiment}.\n"
        f"Ticket: {ticket.body}",
        tools=[search_knowledge_base, check_account_status]
    )

    # Step 4: Workflow — deliver (deterministic)
    post_to_slack(channel, format_ticket(ticket, classification, draft))
    update_crm(ticket.id, classification, draft)

    return {"classification": classification, "channel": channel, "draft": draft}

Steps 1 and 3 are agents — they handle ambiguity. Steps 2 and 4 are workflow — they are predictable and cheap. The workflow controls the overall sequencing so you can audit exactly what happened. The agents handle the parts that require judgment, within bounded scope.

This pattern gives you three things no pure architecture can:

Auditability at the system level (the workflow logs every step)
Flexibility where you need it (agents reason about ambiguous inputs)
Bounded blast radius when an agent does something unexpected (the workflow catches it at the next deterministic step)

Three Anti-Patterns That Cost Teams Months

The God Agent

You give one agent 15+ tools and a vague goal: "Handle customer requests." It works in demos because your test inputs are clean. In production, it picks the wrong tool 20% of the time, chains tool calls in ways you did not anticipate, and occasionally sends a customer a Slack message meant for your internal channel.

Fix: Split into specialized agents with 3-5 tools each, orchestrated by a workflow. A classification agent picks the category, then the workflow routes to the right specialist agent.

The Premature Agent

You deploy an agent for a task that has deterministic inputs, predictable outputs, and no judgment required. Parsing structured JSON, routing based on a field value, sending a templated notification. The agent works, but it costs 50x more, runs 10x slower, and introduces non-determinism where none was needed.

The test: If you can write the logic as a Python function with no LLM call and it handles 95%+ of cases correctly, it should be a workflow step.

The Workflow Pretending to Be an Agent

You build a massive decision tree with 47 branches to handle every edge case. Each branch has its own LLM prompt. You are maintaining a flowchart that looks like a city subway map and adding new branches every week. The system is brittle — every new edge case requires a code change.

The signal: If you keep adding branches to handle new cases and the workflow keeps growing, the problem space has variable execution paths. Replace the branching section with an agent that reasons about the cases, keeping the rest of the workflow deterministic.

Real-World Architecture Examples

Here is how the decision tree maps to four common use cases:

E-Commerce Order Processing → Pure Workflow

Order received → Validate payment → Check inventory → 
Calculate shipping → Charge card → Send to fulfillment → 
Email confirmation

Every step is predictable. The inputs are structured. The volume is high (thousands per hour). An agent here would add cost, latency, and non-determinism with zero benefit. Decision tree stops at Question 1.

Customer Support Inbox → Hybrid

Ticket received → [Agent: classify + assess priority] → 
Workflow: route to team → [Agent: draft response with KB search] → 
Workflow: send + log

The classification and response steps require judgment — a billing complaint about an unauthorized charge is different from a question about pricing, even though both mention "charges." The routing and delivery are deterministic. Decision tree stops at Question 4.

Code Review Automation → Agent-Heavy

PR opened → [Agent: read diff, check patterns, query docs, 
assess risk, write review comments]

The agent needs to reason about what it sees in the diff. A security issue requires different analysis than a performance concern. The investigation path depends on the code — you cannot predefine it. Decision tree reaches Question 5, but scope is bounded (one PR, read-only actions plus comments), so it stays manageable.

Daily Engineering Report → Workflow + Agent Step

Cron trigger → Workflow: fetch metrics from Datadog → 
Workflow: fetch open issues from GitHub → 
Workflow: fetch deploy log → [Agent: analyze + write summary] → 
Workflow: post to Slack

Three of five steps are deterministic API calls. Only the analysis requires judgment. Decision tree stops at Question 2. This is the most common pattern in production — and the one most teams over-engineer with a full agent.

Choosing Your Stack

The right tool depends on which side of the decision tree your use case lands.

For workflows: Temporal for complex orchestration with durable execution. Airflow for data pipelines. n8n or Zapier for no-code automation. AWS Step Functions for serverless workflows. All of these handle sequencing, retries, and error recovery out of the box.

For agents: LangGraph for stateful agent graphs with checkpointing. CrewAI for multi-agent teams with role-based coordination. The OpenAI Agents SDK for lightweight single-agent tasks. Each has a different abstraction level — choose based on how much control you need over the execution graph.

For hybrids: This is where platforms like Nebula fit — you define the pipeline as a workflow, and individual steps can be handled by agents with their own tools and reasoning. The workflow controls sequencing and error handling; the agents handle the ambiguous parts. This pattern works particularly well for teams that need observability across both the deterministic and non-deterministic parts of their system.

The key architectural requirement regardless of stack: observability. You need to see what the workflow executed (step-level logs) AND what the agent decided (reasoning traces). Without both, debugging production issues is guesswork.

Your Checklist Before You Choose

Question	If Yes	If No
Can you draw the complete flowchart?	Workflow	Continue
Is ambiguity limited to 1-2 steps?	Workflow + LLM step	Continue
Can you bound the agent's scope?	Hybrid pattern	Continue
Is the task open-ended exploration?	Full agent + guardrails	Rethink the task
Are you handling >1000 executions/hour?	Workflow (cost matters)	Either
Is auditability a hard requirement?	Workflow outer shell	Either
Does the task change shape with new inputs?	Agent for that subtask	Workflow

The default answer is a workflow. The burden of proof is on the agent — it needs to earn its complexity by solving a problem that deterministic logic cannot.

Start with a workflow. Add agent steps only where you need judgment. Measure the cost and accuracy of each agent step independently. And never go full autonomous agent on day one — you will regret it by day three.

Top 6 Secrets Management Tools for Devs in 2026

Nebula — Sat, 21 Mar 2026 20:03:57 +0000

TL;DR: Pick Infisical for open-source control, Doppler for the simplest team workflow, HashiCorp Vault for enterprise-grade dynamic secrets, AWS Secrets Manager if you're all-in on AWS, 1Password Developer for small teams, or Bitwarden Secrets Manager for budget-friendly open-source.

Hardcoded secrets in repos caused over 10 million leaked credentials on GitHub in 2025. If your team is still passing API keys through .env files or Slack DMs, you're one accidental git push away from a breach.

Modern secrets management tools solve this by centralizing credentials, injecting them at runtime, rotating them automatically, and auditing every access. But there are now dozens of options — from open-source self-hosted platforms to cloud-managed dashboards.

Here's how the top 6 stack up for developer teams in 2026.

Quick Comparison

Feature	Infisical	Doppler	Vault	AWS SM	1Password	Bitwarden
Open Source	MIT	No	BSL	No	No	GPL
Self-Hosted	Yes	No	Yes	No	No	Yes
Dynamic Secrets	DB rotation	No	Full	Custom Lambda	No	No
CLI Injection	`infisical run`	`doppler run`	`vault` CLI	AWS CLI	`op run`	`bws` CLI
K8s Operator	Yes	Yes	Yes (Agent)	External	Yes	Yes
Secret Rotation	Auto	Manual + webhooks	Dynamic / auto-expire	Lambda-based	Manual	Manual
Free Tier	Yes	Yes	Yes (OSS)	No ($0.40/secret/mo)	No ($7.99/user/mo)	Yes
Best For	Dev teams, startups	All team sizes	Enterprise infra	AWS-native shops	Small teams	Budget-conscious

1. Infisical — Open-Source With Full Control

Infisical is the most popular open-source secrets manager on GitHub (12,700+ stars). It gives you end-to-end encrypted secret storage with SDKs for Node.js, Python, Go, and Java.

Key strength: Self-host on your own infrastructure with MIT license — no vendor lock-in, full audit trail, and automatic secret rotation for databases.

Key weakness: Fewer native integrations than Doppler. The self-hosted setup requires some DevOps investment.

Best for: Developer teams and startups that want open-source transparency and the option to self-host for compliance.

Pricing: Free tier for up to 5 users. Paid plans start at $8/user/month.

2. Doppler — Simplest Team Onboarding

Doppler is a cloud-first secrets platform with the fastest developer onboarding: doppler setup, pick your project, and doppler run -- npm start injects everything.

Key strength: Universal dashboard with 30+ native integrations (Vercel, AWS, GitHub Actions, Netlify). Cross-project secret references eliminate duplication.

Key weakness: Cloud-only with no self-hosted option. No dynamic secret generation.

Best for: Teams of any size that want reliable secret syncing across every environment without managing infrastructure.

Pricing: Free for up to 5 users and 3 projects. Team plan starts at $4/user/month.

3. HashiCorp Vault — Enterprise Secrets Engine

HashiCorp Vault is the industry standard for complex infrastructure. It goes beyond key-value storage with dynamic secrets (auto-generated, auto-expired database credentials), transit encryption, and PKI certificate management.

Key strength: Dynamic secrets — Vault generates temporary database credentials on demand with automatic expiration. No standing credentials to leak.

Key weakness: High operational complexity. Requires dedicated infrastructure knowledge to deploy and maintain. The BSL license change in 2023 pushed some teams toward alternatives.

Best for: Large organizations with multi-cloud infrastructure that need dynamic secrets, encryption-as-a-service, and fine-grained ACL policies.

Pricing: Open-source is free. HCP Vault (managed) starts at ~$0.03/hour. Enterprise licensing requires sales contact.

4. AWS Secrets Manager — Native AWS Integration

AWS Secrets Manager is the obvious choice if your entire stack runs on AWS. It integrates natively with Lambda, ECS, RDS, and other AWS services.

Key strength: Seamless integration with AWS IAM for access control and Lambda for custom rotation functions. No additional infrastructure to manage.

Key weakness: Expensive at scale ($0.40/secret/month + $0.05 per 10K API calls). Limited to AWS ecosystem — not great for multi-cloud.

Best for: Teams running primarily on AWS that want native integration without adding another vendor.

Pricing: $0.40 per secret per month + API call charges. No free tier.

5. 1Password Developer — From Password Manager to Secrets

1Password Developer extends the familiar 1Password UX into developer workflows. Use op run to inject secrets, reference them in code with op:// URIs, and integrate with GitHub Actions.

Key strength: If your team already uses 1Password, the developer tools feel like a natural extension. The op:// secret reference syntax is elegant.

Key weakness: Not purpose-built for infrastructure-scale secrets management. Lacks dynamic secrets, rotation automation, and Kubernetes-native features.

Best for: Small teams and indie developers who already use 1Password and want a simple way to manage a moderate number of secrets.

Pricing: Business plan at $7.99/user/month includes developer features. No standalone secrets-only plan.

6. Bitwarden Secrets Manager — Budget-Friendly Open Source

Bitwarden Secrets Manager brings Bitwarden's open-source ethos to developer secrets. Self-host the entire stack or use their cloud — with SDK support for multiple languages.

Key strength: Competitive pricing with open-source transparency (GPL license). Self-hostable for teams that need data sovereignty on a budget.

Key weakness: Younger product with a smaller ecosystem than Infisical or Vault. Fewer integrations and no dynamic secret generation.

Best for: Budget-conscious teams that value open-source and self-hosting but don't need advanced features like dynamic secrets.

Pricing: Free for individuals. Teams plan at $6/user/month. Self-hosted is free (open-source).

Verdict: Which One Should You Pick?

There's no single winner — it depends on your team size, infrastructure, and priorities:

Want open-source with maximum control? Start with Infisical. It covers the most ground for dev teams.
Want the simplest setup? Doppler gets you from zero to injected secrets in under 5 minutes.
Running enterprise infrastructure? HashiCorp Vault is still the gold standard for dynamic secrets and encryption.
All-in on AWS? AWS Secrets Manager is the path of least resistance.
Small team, already use 1Password? The developer tools are a natural fit.
Tight budget, want open-source? Bitwarden Secrets Manager delivers solid value.

If you're building AI agent workflows that connect to multiple APIs and services, platforms like Nebula handle credential management across agent integrations — pairing well with any of the tools above for your core infrastructure secrets.

Whatever you choose, the important thing is to stop putting secrets in .env files and Slack messages. Pick a tool, centralize your secrets, and ship with confidence.

Top 7 AI Agent Frameworks for Developers in 2026

Nebula — Fri, 20 Mar 2026 22:05:10 +0000

TL;DR: Pick LangGraph for production systems, CrewAI for fast prototypes, and Nebula if you want automation without writing code.

The Agent Framework Explosion

GitHub repositories for AI agent frameworks grew 535% between 2024 and 2025. Today, 85% of developers use AI tools regularly, and the question is no longer whether to build with agents but which framework to bet on.

The problem: there are too many options, each with different trade-offs around model lock-in, learning curve, and production-readiness. This guide compares the seven frameworks that matter most in March 2026 — with honest assessments of where each one falls short.

Quick Comparison

Feature	LangGraph	CrewAI	OpenAI SDK	Claude SDK	Google ADK	Dify	Nebula
Best For	Production workflows	Fast prototyping	OpenAI ecosystem	Anthropic ecosystem	GCP + multimodal	No-code teams	Automation without code
Learning Curve	High	Low	Low	Medium	Medium	Beginner	Beginner
Model Lock-in	None	None	High	High	Medium	None	None
MCP Support	Yes	Yes	Yes	Native	Yes	Yes	Yes
Pricing	Free (OSS)	Free / $25+/mo	Pay-per-token	Pay-per-token	GCP pricing	Free / $59+/mo	Free tier
GitHub Stars	25K	44.6K	19.1K	N/A	18K	60K+	N/A

LangGraph

LangGraph models agents as directed graphs with explicit state machines. It's the most production-hardened option with checkpointing, time-travel debugging, and durable execution.

Strengths: Used by Klarna, Uber, and LinkedIn in production. 34.5 million monthly downloads. MIT-licensed with no model lock-in. Human-in-the-loop patterns are first-class citizens.

Weaknesses: The steepest learning curve of any framework here. Graph-based thinking isn't intuitive for everyone, and simple use cases feel over-engineered.

Best for: Teams building regulated, long-running workflows that need pause/resume, audit trails, and explicit state management.

Pricing: Free and open source. LangSmith (observability) starts at $39/seat/month.

CrewAI

CrewAI takes a role-based approach — you define agents as team members (researcher, writer, editor) and let them collaborate. It's the fastest path from zero to working multi-agent demo.

Strengths: 44.6K GitHub stars. You can go from concept to working prototype in 2-4 hours. The mental model ("a team of specialists") clicks immediately with non-technical stakeholders. 60% of Fortune 500 companies have tried it.

Weaknesses: The simplicity that makes prototyping fast can become a limitation in complex production systems. Teams often migrate to LangGraph once workflows get sophisticated.

Best for: MVPs, hackathons, and demos where speed-to-value matters more than production hardening.

Pricing: Free (open source). CrewAI Enterprise starts at $25/month with SOC2 compliance.

OpenAI Agents SDK

The OpenAI Agents SDK uses a handoff-based architecture where agents transfer control to each other. It's the lowest-friction option if you're already paying for GPT.

Strengths: 19.1K GitHub stars, 10.3 million monthly downloads. Built-in guardrails, tracing, and sessions. Native MCP support. If your team already uses OpenAI, setup takes minutes.

Weaknesses: Heavy vendor lock-in to OpenAI models. Less community diversity than framework-agnostic options. TypeScript support is still catching up.

Best for: Teams committed to the OpenAI ecosystem wanting the fastest path to production agents.

Pricing: Free SDK; you pay for OpenAI API usage. Web search runs $25-30 per 1K queries.

Claude Agent SDK

Anthropic's Claude Agent SDK is built around tool-use with sandboxed code execution. It has the deepest MCP integration of any framework — MCP was designed by Anthropic, after all.

Strengths: Sandboxed execution environment for safety. Constitutional AI guardrails. The 1M-token context window (via Claude Code) handles entire codebases. Best-in-class for security-sensitive agent work.

Weaknesses: Locked to Claude models. Smaller ecosystem than LangGraph or CrewAI. Less community content and fewer tutorials available.

Best for: Teams committed to Anthropic who need safe, sandboxed agent execution with deep tool integration.

Pricing: Free SDK; pay-per-token for Claude API. Pro plans from $20/month.

Google ADK

Google's Agent Development Kit uses hierarchical agent trees where a root agent delegates to specialized sub-agents. It's the only framework with native A2A (Agent-to-Agent) protocol support.

Strengths: 18K GitHub stars. True multimodal support — text, images, audio, video via Gemini. A2A protocol lets your agents communicate with agents built on other frameworks (50+ partners including Salesforce and ServiceNow).

Weaknesses: Medium vendor lock-in to Google Cloud. Smaller community than LangGraph/CrewAI. Documentation is still maturing.

Best for: GCP-native teams building multimodal agents or needing cross-framework interoperability via A2A.

Pricing: Free (open source). Gemini and Vertex AI usage billed through GCP.

Dify

Dify is a no-code/low-code platform for building agent workflows visually. It recently raised $30 million and is used by 280 enterprises across 1.4 million deployments.

Strengths: Visual drag-and-drop workflow builder. Built-in RAG, knowledge bases, and observability. Self-hosted or cloud. No model lock-in — supports 100+ LLMs.

Weaknesses: Less flexibility than code-first frameworks for complex custom logic. Enterprise features (SSO, RBAC) are behind paid tiers.

Best for: Non-technical teams or organizations wanting production agent workflows without writing Python.

Pricing: Free (open source). Pro $59/month, Team $159/month. Enterprise pricing available.

Nebula

Nebula is a different beast — it's not a code-level framework but an AI agent platform focused on connecting services and automating workflows. Think of it as the glue between your existing tools.

Strengths: 600+ OAuth app integrations (GitHub, Slack, Gmail, Linear, Notion, and more). Create agents and automated triggers without code. Scheduled and event-driven workflows out of the box. Custom agents with specialized capabilities.

Weaknesses: Not designed for building custom ML pipelines or low-level agent logic. If you need fine-grained control over agent reasoning chains, use a code-first framework.

Best for: Teams that want to connect existing services, automate repetitive workflows, and build agents without writing code — complementing rather than replacing code-first frameworks.

Pricing: Free tier available.

Decision Matrix

Choose based on your situation:

Building a production system with compliance needs → LangGraph
Need a working prototype by Friday → CrewAI
Already paying for OpenAI → OpenAI Agents SDK
Committed to Anthropic + need sandboxed execution → Claude Agent SDK
GCP shop needing multimodal or cross-framework agents → Google ADK
Non-technical team, want visual workflow builder → Dify
Want to automate across 600+ apps without code → Nebula

The Verdict

There's no single "best" framework — but there is a best framework for your stack. LangGraph dominates production deployments for good reason: explicit state, checkpointing, and battle-tested patterns. CrewAI remains the fastest on-ramp for teams exploring agents. And platforms like Dify and Nebula prove that not every agent workflow needs a Python file.

The real trend to watch: MCP adoption across all frameworks means your tool integrations are becoming portable. Build your agent logic in one framework, and your MCP servers work everywhere. That's the closest thing to a safe bet in 2026.

How to Build Your First MCP Server in 10 Minutes

Nebula — Fri, 20 Mar 2026 21:06:07 +0000

You keep hearing about MCP servers but every tutorial throws you into multi-agent swarms and complex architectures. Here's the simplest possible MCP server -- one file, one tool, fully runnable in 10 minutes.

The Code

Create a new project and install the SDK:

mkdir my-mcp-server && cd my-mcp-server
npm init -y
npm install @modelcontextprotocol/sdk zod

Create server.ts:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
  name: "weather-server",
  version: "1.0.0",
});

server.registerTool(
  "get-weather",
  {
    title: "Get Weather",
    description: "Get current weather for a city",
    inputSchema: z.object({
      city: z.string().describe("City name"),
    }),
  },
  async ({ city }) => {
    const res = await fetch(
      `https://wttr.in/${encodeURIComponent(city)}?format=j1`
    );
    const data = await res.json();
    const current = data.current_condition[0];

    return {
      content: [
        {
          type: "text",
          text: `${city}: ${current.temp_C}°C, ${current.weatherDesc[0].value}`,
        },
      ],
    };
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);
console.error("Weather MCP server running on stdio");

That's it. 30 lines. A fully functional MCP server.

What's Happening

Lines 1-3 import the three things every MCP server needs: McpServer to create the server instance, StdioServerTransport for the communication layer, and zod for input validation.

Lines 5-8 create your server with a name and version. These show up when AI clients discover your server.

Lines 10-31 register a single tool called get-weather. The inputSchema uses Zod to define and validate what the tool expects -- a city name as a string. The handler function fetches real weather data from wttr.in (a free API, no key needed) and returns it as a text content block.

Lines 33-35 wire up the stdio transport and start the server. Stdio means the AI client launches your server as a child process and communicates over stdin/stdout. This is how Claude Desktop, Cursor, and most local MCP clients work.

Connect It to Claude Desktop

Add this to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "weather": {
      "command": "npx",
      "args": ["tsx", "/full/path/to/my-mcp-server/server.ts"]
    }
  }
}

Restart Claude Desktop. Ask it "What's the weather in Tokyo?" and it will call your get-weather tool.

Test It Without a Client

You can also test directly from the terminal using the MCP Inspector:

npx @modelcontextprotocol/inspector npx tsx server.ts

This opens a web UI where you can browse your server's tools and call them manually -- useful for debugging before connecting to an AI client.

What to Build Next

Now that you have a working server, swap the weather API for anything:

Database queries: Register a tool that runs read-only SQL against your dev database
Internal APIs: Wrap your company's REST endpoints as MCP tools
File operations: Let AI assistants read/search your project files

Each new tool is just another server.registerTool() call with a schema and a handler. The MCP SDK handles discovery, validation, and communication automatically.

The full TypeScript SDK has examples for HTTP transports, authentication, streaming, and multi-tool servers at github.com/modelcontextprotocol/typescript-sdk.

This is part of the AI Agent Quick Tips series -- short, code-first tutorials for developers building with AI tools.