DEV Community

Cover image for Claude Agent SDK: Build Production AI Agents Without Starting from Scratch
Nico Acosta for BrainGrid

Posted on • Originally published at braingrid.ai

Claude Agent SDK: Build Production AI Agents Without Starting from Scratch

You've been building features by hand while your competitors ship AI agents that work around the clock. The Claude Agent SDK—the same engine powering Claude Code—is now available as a library.

In the next 20 minutes, you'll understand exactly how to build an autonomous agent that reads your codebase, fixes bugs, and ships features while you focus on landing customers. No PhD required. No six-month learning curve. Just working code you can deploy this weekend.

What Is the Claude Agent SDK and Why Should You Care?

Here's the situation: you've seen what Claude Code can do. It reads files, runs commands, fixes bugs, and ships features autonomously. What you might not know is that the entire engine powering Claude Code is now available as a library you can drop into your own product.

That's the Claude Agent SDK.

Think of it this way: if Claude Code is the finished car, the Agent SDK is the engine you can install in your own chassis. Same power, your design. The SDK gives you Claude's entire agent loop—the part that decides what to do, uses tools, and verifies its work—without you having to reinvent any of it.

One important note: you might see references to "Claude Code SDK" in older articles or search results. Anthropic renamed it to "Claude Agent SDK" in late 2025 to reflect its broader use cases beyond just coding tasks.

Don't confuse this with the Anthropic Client SDK. The Client SDK requires you to implement the tool loop yourself—you send a prompt, get a response, execute any tools manually, send results back, repeat. It's a lot of plumbing.

The Agent SDK handles all of that autonomously. You send a prompt, and the agent reads files, runs commands, makes edits, and verifies its own work without you writing the orchestration logic.

flowchart LR
    subgraph ClientSDK["Client SDK (Manual)"]
        A1[Send Prompt] --> B1[Get Response]
        B1 --> C1{Tool Use?}
        C1 -->|Yes| D1[YOU Execute Tool]
        D1 --> E1[Send Result Back]
        E1 --> B1
        C1 -->|No| F1[Done]
    end

    subgraph AgentSDK["Agent SDK (Autonomous)"]
        A2[Send Prompt] --> B2[Agent Loop]
        B2 --> C2[SDK Executes Tools]
        C2 --> D2[SDK Verifies]
        D2 --> B2
        B2 --> E2[Done]
    end
Enter fullscreen mode Exit fullscreen mode

What does this mean for your product? Your agent can handle customer support tickets, debug code, generate reports, or analyze documents while you sleep. Each of those is a potential paid feature. The SDK removes the build-from-scratch tax so you can focus on what makes your product unique.

How Do I Install and Set Up the Claude Agent SDK?

A botched setup can waste hours. Let's get this right the first time so you're building in minutes, not debugging your environment.

The SDK comes in two flavors: TypeScript and Python. Pick whichever matches your stack. Both have identical capabilities—the agent loop, built-in tools, streaming, sessions, everything.

Requirements:

  • Python 3.10+ for the Python SDK
  • Node.js 18+ for the TypeScript SDK
  • Claude Code CLI is bundled automatically with both packages

Here's the copy-paste installation:

## TypeScript/Node.js
npm install @anthropic-ai/claude-agent-sdk

## Python
pip install claude-agent-sdk

## Set your API key (get it from console.anthropic.com)
export ANTHROPIC_API_KEY=your-api-key
Enter fullscreen mode Exit fullscreen mode

That's it. One command and you're ready.

One gotcha that trips people up: version mismatch between the Claude Code CLI and the SDK. If you're getting weird agent recognition errors, run claude --version and make sure it matches the SDK requirements in the docs. This is the most common support question on the GitHub issues, and the fix is always "update your CLI."

Never hardcode your API key in source files. Environment variables keep your credentials out of git history and make deployment cleaner. Your future self (and anyone who reviews your code) will thank you.

Faster setup means faster time-to-demo. When a potential customer asks "can you show me how this works?", you want to be deploying agents, not debugging npm installs.

What's the Core Agent Loop and How Does It Work?

Understanding the agent loop is the difference between debugging agents quickly and spending days confused about why your agent isn't working.

The loop has three phases that repeat until the task is done:

  1. Gather context - The agent reads files, searches the codebase, or spawns subagents to collect information
  2. Take action - Execute tools, run bash commands, generate code, make edits
  3. Verify work - Check if the output is correct, run tests, validate assumptions

If verification fails, the loop repeats. The agent gathers more context, tries a different approach, and verifies again. This feedback mechanism is what makes agents actually useful—they self-correct instead of confidently shipping broken code.

flowchart TD
    A[Start Task] --> B[Gather Context]
    B --> B1[Search Files]
    B --> B2[Read Documentation]
    B --> B3[Spawn Subagents]
    B1 & B2 & B3 --> C[Take Action]
    C --> C1[Execute Tools]
    C --> C2[Run Scripts]
    C --> C3[Generate Code]
    C1 & C2 & C3 --> D[Verify Work]
    D --> D1{Passes Checks?}
    D1 -->|No| B
    D1 -->|Yes| E[Complete]
    style B fill:#10312D,color:#FCFCFB
    style C fill:#C2E476,color:#121212
    style D fill:#AACF57,color:#121212
Enter fullscreen mode Exit fullscreen mode

Here's the loop in action:

```typescript "TypeScript"
import { query } from "@anthropic-ai/claude-agent-sdk";

// The SDK handles the entire loop for you
for await (const message of query({
prompt: "Find and fix the bug in auth.py",
options: {
allowedTools: ["Read", "Edit", "Bash"],
permissionMode: "acceptEdits"
}
})) {
console.log(message);
// Claude reads the file, finds the bug, edits it, verifies the fix
}




The biggest pitfall here: agents try to one-shot complex tasks. They'll attempt to implement an entire feature in a single pass, run out of context mid-implementation, and leave you with half-working code. The fix is explicit task breakdown—give your agent smaller, focused tasks rather than "build me an authentication system."

Context management is the production differentiator. Pushing entire conversation history on each API call exhausts your token budget fast. The SDK includes automatic context compaction that summarizes older exchanges, but you should still design your prompts to request focused, specific actions.

A well-tuned agent loop handles 10x more customer requests without human intervention. That's the difference between a support burden and a profit center.

## What Built-In Tools Are Available and When Should I Use Each?

The SDK ships with eight core tools that handle 90% of what you'll need. Here's what each does and when to reach for it:

| Tool | What it does | When to use |
|------|--------------|-------------|
| **Read** | Read any file in the working directory | Viewing code, configs, documentation |
| **Write** | Create new files | Generating new components or configs |
| **Edit** | Make precise edits to existing files | Bug fixes, refactoring, updates |
| **Bash** | Run terminal commands and scripts | Git operations, npm, tests, builds |
| **Glob** | Find files by pattern | Discovering files: `**/*.ts`, `src/**/*.py` |
| **Grep** | Search file contents with regex | Finding function calls, variable usage |
| **WebSearch** | Search the internet | Looking up current documentation or APIs |
| **WebFetch** | Fetch and parse web pages | Reading docs, scraping structured data |



```mermaid
flowchart TD
    A[Task Type?] --> B{Read or Write?}
    B -->|Read| C{What are you reading?}
    C -->|Files| D[Use Read]
    C -->|Find Files| E[Use Glob]
    C -->|Search Content| F[Use Grep]
    C -->|Web Info| G[Use WebSearch/WebFetch]

    B -->|Write| H{What are you changing?}
    H -->|New File| I[Use Write]
    H -->|Edit Existing| J[Use Edit]
    H -->|Run Commands| K[Use Bash]
    style D fill:#AACF57,color:#121212
    style E fill:#AACF57,color:#121212
    style F fill:#AACF57,color:#121212
    style G fill:#10312D,color:#F3F1E8
    style I fill:#C2E476,color:#121212
    style J fill:#C2E476,color:#121212
    style K fill:#473392,color:#FCFCFB
Enter fullscreen mode Exit fullscreen mode

Here's the critical security lesson: don't enable all tools by default. Start with read-only access (Read, Glob, Grep) and add write capabilities only after you've validated the agent's behavior. One Reddit thread described an agent that ran rm -rf on a test directory because Bash was enabled without restrictions. Start paranoid, loosen permissions carefully.

Each tool you enable is a capability you can market. "AI that fixes your bugs" requires the Edit tool. "AI that deploys your code" requires Bash. Think about which capabilities map to features your customers will pay for, then enable only those.

Building an agent is the easy part. Knowing which features to build first is where most founders waste months. BrainGrid turns your vague ideas into structured specs with AI-ready tasks—so you ship features that convert, not features that collect dust.

How Do I Add Custom Tools and Integrate External APIs?

Built-in tools cover file operations and web access. But your agent needs to talk to your product—your CRM, database, Stripe, Slack, whatever powers your business. That's where custom tools and MCP come in.

Model Context Protocol (MCP) is Anthropic's standardized way to connect agents to external services. Instead of writing OAuth flows and API wrappers yourself, you plug in pre-built MCP servers for Slack, GitHub, Asana, Playwright, databases, and hundreds more. They handle authentication and API calls. You just configure them.

Here's how to add browser automation with the Playwright MCP server:

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Open our pricing page and verify it loads correctly",
  options: {
    mcpServers: {
      playwright: {
        command: "npx",
        args: ["@playwright/mcp@latest"]
      }
    }
  }
})) {
  console.log(message);
}
Enter fullscreen mode Exit fullscreen mode

For custom integrations that don't have pre-built servers, you define your own tools with input validation and safety guards:

import { z } from "zod";

const createContactTool = {
  name: "create_crm_contact",
  description: "Create a new contact in the CRM",
  inputSchema: z.object({
    email: z.string().email(),
    name: z.string().min(1),
    plan: z.enum(["free", "pro", "enterprise"])
  }),
  handler: async ({ input, context }) => {
    // Always add timeout protection
    const controller = new AbortController();
    const timeoutId = setTimeout(() => controller.abort(), 5000);

    try {
      const result = await crmClient.createContact(input, {
        signal: controller.signal
      });
      clearTimeout(timeoutId);
      return result;
    } catch (error) {
      clearTimeout(timeoutId);
      throw error;
    }
  }
};
Enter fullscreen mode Exit fullscreen mode
flowchart LR
    A[Claude Agent] --> B[MCP Protocol]
    B --> C[Playwright MCP]
    B --> D[GitHub MCP]
    B --> E[Slack MCP]
    B --> F[Custom MCP]
    C --> G[Browser Automation]
    D --> H[GitHub API]
    E --> I[Slack API]
    F --> J[Your Database]
    style A fill:#C2E476,color:#121212
    style B fill:#10312D,color:#F3F1E8
Enter fullscreen mode Exit fullscreen mode

The critical pitfall: failing to sandbox tools. Running shell commands or database queries without timeouts creates runaway processes and security holes. Every custom tool should have a timeout wrapper, input validation, and explicit error handling.

Every integration you add is a potential upsell. "Connect to Stripe" becomes a Pro feature. "Sync with Slack" becomes an Enterprise add-on. Plan your integrations around what customers will pay for before you build them.

How Do I Handle Long-Running Agents and Subagents?

Real production tasks take minutes or hours, not seconds. A security audit across a large codebase. Analyzing months of customer support tickets. Generating comprehensive documentation. Agents that crash mid-task lose customer data and trust.

The SDK handles this through two mechanisms: sessions and subagents.

Sessions maintain context across multiple exchanges. Your agent can work on a task, you can close your laptop, come back tomorrow, and resume exactly where it left off. Context is preserved—files read, analysis done, conversation history.

let sessionId: string;

// First query: capture session ID
for await (const message of query({
  prompt: "Read the authentication module and understand how it works",
  options: { allowedTools: ["Read", "Glob", "Grep"] }
})) {
  if (message.type === "system" && message.subtype === "init") {
    sessionId = message.session_id;
  }
}

// Later (hours or days later): resume with full context preserved
for await (const message of query({
  prompt: "Now find all places that call the auth module",
  options: { resume: sessionId }
})) {
  console.log(message);
  // Agent remembers everything from the first query
}
Enter fullscreen mode Exit fullscreen mode

Subagents handle task complexity by isolating context windows. When your main agent hits a complex subtask, it spawns a specialized subagent to handle it. The subagent works in its own context, returns relevant excerpts, and the parent continues without context explosion.

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions

async def analyze_codebase():
    # Enable Task tool to let Claude spawn subagents automatically
    async for message in query(
        prompt="Analyze this codebase for security vulnerabilities",
        options=ClaudeAgentOptions(
            allowed_tools=["Read", "Glob", "Grep", "Task"]
        )
    ):
        print(message)
    # Claude may spawn subagents for:
    # - SQL injection analysis
    # - XSS vulnerability scanning
    # - Dependency auditing

asyncio.run(analyze_codebase())
Enter fullscreen mode Exit fullscreen mode
flowchart TD
    A[Parent Agent] --> B[Analyze Task]
    B --> C{Complex Enough?}
    C -->|Yes| D[Spawn Subagents]
    D --> E[Security Subagent]
    D --> F[Performance Subagent]
    D --> G[Documentation Subagent]
    E --> H[SQL Injection Report]
    F --> I[Latency Analysis]
    G --> J[Missing Docs List]
    H & I & J --> K[Aggregate Results]
    K --> L[Return to Parent]
    C -->|No| M[Handle Directly]
    style A fill:#C2E476,color:#121212
    style E fill:#10312D,color:#F3F1E8
    style F fill:#10312D,color:#F3F1E8
    style G fill:#10312D,color:#F3F1E8
Enter fullscreen mode Exit fullscreen mode

One common problem: CPU usage spikes to 100% when spawning too many subagents simultaneously. Claude tries to parallelize aggressively, which is great for speed but can overwhelm modest hardware. Limit concurrency in your .claude configuration or implement explicit concurrency controls in your orchestration layer.

The SDK also includes automatic context compaction. For very long-running operations, it summarizes older parts of the conversation to prevent token exhaustion. You don't need to implement this—it happens automatically—but understanding it helps you design better prompts.

Agents that handle hour-long analysis tasks without crashing can charge premium pricing. Reliability is a feature customers pay for.

What Are the Critical Mistakes That Cost You Customers?

Every crashed agent is a churned customer. Every silent failure erodes trust. Here are the mistakes that kill production agents—and how to avoid them.

Mistake What You'll See The Fix
No tool time limits Runaway processes, hung requests Wrap every tool handler in a 5-second timeout
Full history on every call Token exhaustion mid-task Use conversation summaries + selective retrieval
No streaming backpressure UI freezes, stalled responses Flush SSE/websocket frames explicitly
Hardcoded agent prompts Can't update without redeploy Store agent templates in a config service
No verification layer Silent failures, wrong outputs Add rules-based + visual feedback loops
Single model dependency Outages cascade to users Route fast tasks to Haiku, complex to Sonnet
flowchart LR
    A[Mistake] --> B[Agent Crash]
    B --> C[Customer Sees Error]
    C --> D[Support Ticket]
    D --> E[Trust Erodes]
    E --> F[Churn]

    G[Timeout Wrappers] -.->|Prevents| B
    H[Verification Layer] -.->|Prevents| C
    I[Monitoring] -.->|Catches| D
    style A fill:#EF4444,color:#FCFCFB
    style F fill:#EF4444,color:#FCFCFB
    style G fill:#AACF57,color:#121212
    style H fill:#AACF57,color:#121212
    style I fill:#AACF57,color:#121212
Enter fullscreen mode Exit fullscreen mode

The most insidious mistake: marking features complete without end-to-end testing. Agents will confidently report "task complete" while the feature is actually broken. Without verification—whether that's automated tests, visual checks, or LLM-as-judge evaluation—you're shipping silent failures.

Permission sprawl is the fastest path to unsafe autonomy. Treat tool access like production IAM: start from deny-all, allow only what each agent needs, require explicit confirmations for sensitive actions, and block dangerous commands entirely.

One person on the Anthropic community described their experience:

"We deployed an agent that could run Bash commands for our internal tooling. Worked great in dev. In production, a weird edge case triggered git reset --hard on a customer's repo. Three hours of their work, gone. Now every destructive command requires human approval, no exceptions."

Each security hole is a lawsuit waiting to happen. Each crashed agent is a support ticket. Each silent failure is revenue walking out the door. Build verification into every agent from day one.

How Do I Deploy My Agent to Production This Weekend?

An agent sitting on your laptop earns $0. You need it live, in front of customers, collecting feedback and proving value. Here's the fastest path from "working locally" to "deployed and demo-ready."

The proven stack for a weekend deploy:

┌─────────────────────────────────────────────────────────┐
│                     Your SaaS                           │
├─────────────────────────────────────────────────────────┤
│  Next.js Frontend                                       │
│  ├── /app/api/agent/route.ts  ← Claude Agent SDK       │
│  └── /app/dashboard           ← User interface          │
├─────────────────────────────────────────────────────────┤
│  Database (Postgres)                                    │
│  ├── Sessions table (agent state)                       │
│  └── Results table (outputs)                            │
├─────────────────────────────────────────────────────────┤
│  Vercel                                                 │
│  ├── Edge Functions (API routes)                        │
│  └── Cron Jobs (scheduled agents)                       │
└─────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Here's a minimal API route that streams agent responses to your frontend:

// app/api/agent/route.ts
import { query } from "@anthropic-ai/claude-agent-sdk";
import { NextRequest } from "next/server";

export async function POST(req: NextRequest) {
  const { prompt, sessionId } = await req.json();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      for await (const message of query({
        prompt,
        options: {
          allowedTools: ["Read", "Glob", "Grep"],
          resume: sessionId,
          permissionMode: "bypassPermissions"
        }
      })) {
        controller.enqueue(
          encoder.encode(`data: ${JSON.stringify(message)}\n\n`)
        );
      }
      controller.close();
    }
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/event-stream" }
  });
}
Enter fullscreen mode Exit fullscreen mode
flowchart TD
    A[User Browser] --> B[Next.js on Vercel]
    B --> C[API Route: /api/agent]
    C --> D[Claude Agent SDK]
    D --> E[Built-in Tools]
    D --> F[MCP Servers]
    C --> G[Postgres Database]
    G --> H[Sessions Table]
    G --> I[Results Table]
    style B fill:#121212,color:#FCFCFB
    style D fill:#C2E476,color:#121212
    style G fill:#10312D,color:#F3F1E8
Enter fullscreen mode Exit fullscreen mode

The critical deployment gotcha: rate limiting. Without it, one eager user can exhaust your entire monthly API budget in an hour. Add request limits per user, per session, and per time window before you go live.

Feature-flag new capabilities per tenant. When you add a new tool or integration, roll it out to beta users first, validate it doesn't break anything, then expand. This saves you from deploying a bug to 100% of customers simultaneously.

Add monitoring from day one. Plug SDK hooks into your observability stack—Datadog, OpenTelemetry, whatever you're already using. Capture tool latency, token usage, error rates. You can't improve what you don't measure, and you definitely can't debug production issues without logs.

A deployed agent is a demo-able product. Demo-able products close deals. Get it live, then iterate.


You now have everything you need to build production AI agents with the Claude Agent SDK. But building the right agent—one that customers actually pay for—requires more than code. It requires clarity on what to build and why.

BrainGrid transforms your product ideas into AI-ready specifications, breaking complex features into tasks that agents can execute. Stop guessing. Start shipping.


Originally published on the BrainGrid blog.

Top comments (0)