DEV Community: CloudX

One Brain, Many Hands: Building a Parallel Task Orchestrator for AI Agents

Pablo Calofatti — Wed, 20 May 2026 15:39:00 +0000

The Problem No One Talks About

AI coding assistants are fast. Absurdly fast. They can scaffold a service, write tests, and refactor a module in the time it takes me to finish my coffee.

But they do it one thing at a time.

I was working on a microservices audit logging system. Four tasks, all independent, all well-scoped: implement the event schema, build the ingestion endpoint, add the query API, write the integration tests. I could see the finish line from the start. And yet, I sat there feeding tasks one by one into my Claude Code session like I was hand-loading a washing machine with individual socks.

Each task took 10-15 minutes of agent time. Four tasks, run sequentially: an hour.

But they didn't need to be sequential. They touched different files, different concerns, different parts of the codebase. If I had four developers, I'd assign all four tasks at once and go make lunch.

So why couldn't I do that with AI agents?

The Stripe Spark

Around early March 2026, Stripe published a blog post about their internal system called Minions: one-shot, end-to-end coding agents that handle tasks in parallel. The core insight hit me immediately:

You don't need smarter agents. You need an orchestrator that knows how to keep dumb-ish agents on rails.

Stripe's blueprint pattern was elegant: wrap the agentic steps (the creative, unpredictable part — writing code, fixing bugs) inside deterministic guardrails (git operations, linting, testing). The agent can hallucinate all it wants during implementation, but if the tests don't pass, it loops back. If the lint fails, it fixes. And the whole thing is bounded: two attempts max, then stop.

I wasn't trying to build AGI.

I was building a foreman for a construction crew.

The Architecture: Markdown All the Way Down

Here's where it gets a little unhinged. The entire orchestrator is pure markdown. No TypeScript runtime. No Python glue. No code to maintain, compile, or debug. Just three markdown files that Claude Code loads as instructions:

Orchestrator (commands/minion.md): the brain. Parses your task list, figures out dependency order, computes parallel waves, spawns workers.
Worker (agents/minion-worker.md): the hands. Each worker gets a task, a git worktree, and a blueprint to follow.
Blueprint (skills/minion-blueprint/SKILL.md): the rails. A step-by-step execution pattern: branch, implement, lint, test, commit, report.

I call this zero-code prompt engineering. The "code" is the prompt. Claude Code interprets the markdown instructions and executes them using its built-in tools (bash, file editing, git). No SDK, no API calls, no infrastructure.

You (human)
  └── Claude Code session (orchestrator)
        ├── Worker-1 (worktree: .worktrees/task-1)
        ├── Worker-2 (worktree: .worktrees/task-2)
        └── Worker-3 (worktree: .worktrees/task-3)

Each worker runs in an isolated git worktree. That's the key. A worktree is like a parallel checkout of your repo: same git history, different working directory. Worker-1 can edit src/api/events.ts while Worker-2 edits src/api/queries.ts and they never step on each other's toes.

How It Actually Works

You invoke the orchestrator with a task list:

/minion

Tasks:
1. Implement event schema with Zod validation in src/events/schema.ts
2. Build ingestion POST endpoint at /api/events (depends on: 1)
3. Add query GET endpoint at /api/events with filtering (depends on: 1)
4. Write integration tests for both endpoints (depends on: 2, 3)

The orchestrator reads these, builds a dependency graph, and computes waves:

Wave 1: Task 1 (no dependencies)
Wave 2: Tasks 2 and 3 (both depend on 1, but not each other, so they run in parallel)
Wave 3: Task 4 (depends on 2 and 3)

Before spawning anything, it runs conflict detection — scanning the file paths each task is likely to touch. If two tasks in the same wave would edit the same file, it flags the conflict and serializes them.

Then it spawns workers. Each one gets:

A fresh worktree branched from the current HEAD
The task description with full context
A domain agent overlay, auto-selected by keywords. "Endpoint" and "API" gets backend-architect. "React component" gets frontend-architect. "Dockerfile" gets devops-engineer.
A role overlay for the current phase. Planning? You get researcher. Implementing? tdd-developer. Reviewing? code-reviewer.
Auto-detected project tools: it reads your package.json to find your lint command, test runner, and build script.

Here's how it looks in practice, running a real 12-task HAL refactor with wave-based parallel execution:

The blueprint each worker follows looks like this:

Create branch from base
Implement the task (agentic — this is where creativity happens)
Run lint. If fails, fix (1 attempt)
Run tests. If fails, fix (1 attempt)
If still failing after fixes, STOP and report partial progress
Commit with conventional commit message
Push branch, create PR

The two-iteration maximum is sacred. Without it, an agent can loop forever trying to fix a cascading lint error. With it, the worst case is a partially complete PR with a clear status report.

And here's something I learned: incomplete runs are still an excellent starting point. A PR that's 80% done with a clear "tests fail because X" note is infinitely better than an agent spinning its wheels for 20 minutes.

When the full workflow runs with plan, implement, and review phases, it looks like this:

The First Real Test (And the First Real Crash)

I tested minion-toolkit on a real project — a microservices tracing system called traza-microservicios. Two tasks, two parallel workers.

Worker-2 nailed it. Clean branch, passing tests, PR ready.

Worker-1... corrupted git.

Turns out, running concurrent git worktree operations on macOS can trigger a SIGBUS signal — a low-level memory bus error. The git index file gets corrupted when two processes try to update refs simultaneously. This had nothing to do with prompt engineering or AI. It was a filesystem concurrency problem that human developers rarely hit because they don't create three worktrees in the same second.

The fix was unglamorous: add a small delay between worktree creation, use --no-optional-locks on git operations, and as a last resort, clone fresh to /tmp and work from there.

This is the kind of lesson you only learn by shipping. No amount of design documents would have predicted "macOS git worktrees have a race condition under parallel agent spawning."

Lessons Learned the Hard Way

Merge conflicts are the #1 issue. Not AI hallucinations. Not wrong code. Not failing tests. The most common failure mode is two workers editing the same file. Conflict detection before spawning was the single most impactful feature I added.

Workers create extra files. An agent told to "add an endpoint" might also create a utility module, update a barrel export, or add a type file. These out-of-scope changes cause merge conflicts downstream. The solution: git cherry-pick specific commits instead of merging entire branches.

gh pr checks lies to you. GitHub's CLI reports check status using a bucket field, not conclusion. I spent an embarrassing amount of time wondering why "passing" PRs were being flagged as failed.

Dry-run mode saves sanity. Before spawning 4 parallel workers that each burn tokens, preview what would happen — which waves, which agents, which files. --dry-run was an afterthought that became a daily habit.

Cost tracking matters. Parallel workers multiply your API usage. Knowing that a 4-task run costs ~$2.50 vs. a single sequential run at ~$1.80 helps you decide when parallelism is worth it. (Spoiler: when time matters more than tokens, it almost always is.)

The Evolution

What started as a 200-line markdown file on March 4th grew into a proper open-source tool by March 8th:

Version	What Changed
1.0	Basic orchestrator, manual task parsing
2.0	Cross-phase memory, dry-run, worker health monitoring
2.1	Conflict prevention, smart context gathering, cost tracking
2.2	MCP optional delegation, CLI installer (`npx minion-toolkit install`)
2.3	Workflows, domain agents, role overlays, intent capture, stall detection

The CLI installer copies the markdown assets into your ~/.claude/ directory and configures the plugins. One command, and your Claude Code session becomes a parallel orchestrator:

npx minion-toolkit install

After that, /minion is available in any Claude Code session.

Who Is This For?

If you use Claude Code (or plan to) and regularly work on tasks that can be parallelized — feature development across multiple files, multi-service changes, test suites, documentation batches — this tool turns your single-threaded AI session into a coordinated team.

It's especially useful for:

Solo developers who want to simulate a small team
Tech leads who want to delegate a batch of well-scoped tasks
Anyone who's tired of feeding tasks one at a time into an AI session

It's not for tasks that are deeply intertwined — where every file depends on every other file. The orchestrator can handle dependencies between waves, but within a wave, tasks must be independent. If your tasks can't be parallelized, a single session is still the right tool.

Try It

The whole project is open source:

GitHub: pablocalofatti/minion-toolkit
npm: npx minion-toolkit install

Star it if the idea resonates. Open an issue if something breaks (something will break — that's the fun part). And if you build something cool with it, I want to hear about it.

Claude Code is already a remarkably capable developer. But even the best developer can only type on one keyboard. minion-toolkit gives them a crew.

One brain. Many hands. That's the whole idea.

Your agent follows instructions. Until it doesn't.

Daniel Diaz — Tue, 19 May 2026 19:08:52 +0000

Hooks over flowers

Instructions suggest. Skills guide. Hooks enforce.

That's the whole story, but it took me a while to understand why it matters.

You've set up your agent well. Custom instructions for coding standards. Skills for your team's specific patterns. Copilot knows your stack, follows your conventions, generates code that looks like yours.

Then one day it decides to run rm -rf dist/ before rebuilding. Or it edits a file and moves on without running the formatter. Or it completes a task without touching the test suite. All technically valid moves. All not what you wanted.

The problem isn't instructions or skills failing. It's that they guide the agent. They inform its decisions. But an agent that's informed can still decide differently. Instructions are context. They're not guarantees.

Hooks are guarantees.

What are Agent Hooks?

Agent Hooks are shell commands that VS Code runs at specific lifecycle points during an agent session. Not suggested. Not requested. Run.

They live in your repo at:

.github/
└── hooks/
    └── hooks.json

The filename can be anything. VS Code loads all *.json files inside .github/hooks/.

A hook that auto-formats every file the agent edits looks like this:

{
  "hooks": {
    "PostToolUse": [
      {
        "type": "command",
        "command": "npx prettier --write \"$TOOL_INPUT_FILE_PATH\""
      }
    ]
  }
}

Save the file. VS Code picks it up automatically. The next time the agent edits any file, Prettier runs. No prompt needed, no skill invocation, no agent cooperation required.

VS Code added agent-scoped hooks in version 1.111 (March 2026). Workspace hooks via .github/hooks/ were available earlier. Use /create-hook in chat to scaffold a hook configuration.

Meet the eight

This is where hooks get interesting. VS Code exposes eight points in an agent session where your code can run:

Event	When it fires	What you can do
`SessionStart`	User submits the first prompt	Inject context, validate project state
`UserPromptSubmit`	User submits any prompt	Audit requests, add system context
`PreToolUse`	Before the agent invokes any tool	Block dangerous operations, require approval
`PostToolUse`	After a tool completes	Run formatters, trigger follow-up actions
`PreCompact`	Before conversation context is compacted	Export state before it gets truncated
`SubagentStart`	A subagent is spawned	Initialize subagent resources
`SubagentStop`	A subagent completes	Aggregate results, clean up
`Stop`	Agent session ends	Run tests, generate reports, send notifications

Most people, when they first hear about hooks, think PostToolUse: run a formatter. That's valid and immediately useful.

But look at PreToolUse. That's where you can stop the agent before it does something.

And SessionStart. That's where you can inject context that the agent doesn't have yet: current branch, environment, version, whether production deploys are frozen.

PreCompact is the one nobody talks about. When a long session gets too big for the context window, VS Code compacts it. You can hook into that moment to export state, save important context to a file, or log what's about to be truncated.

Stop. Before anything happens

PreToolUse fires before every tool invocation. Your hook receives the tool name and its input via stdin as JSON:

{
  "hookEventName": "PreToolUse",
  "tool_name": "run_in_terminal",
  "tool_input": { "command": "rm -rf dist/" }
}

Your hook can return one of three permissionDecision values:

"allow": proceed without asking the user
"deny": block the operation entirely
"ask": stop and require explicit user confirmation

A simple shell script that blocks destructive commands:

#!/bin/bash
input=$(cat)
command=$(echo "$input" | jq -r '.tool_input.command // ""')

if echo "$command" | grep -qE '(rm -rf|DROP TABLE|git push --force)'; then
  echo '{"hookSpecificOutput": {"hookEventName": "PreToolUse", "permissionDecision": "deny", "permissionDecisionReason": "Destructive command blocked by policy"}}'
  exit 0
fi

echo '{"hookSpecificOutput": {"hookEventName": "PreToolUse", "permissionDecision": "allow"}}'

Wire it up:

{
  "hooks": {
    "PreToolUse": [
      {
        "type": "command",
        "command": "./scripts/validate-command.sh"
      }
    ]
  }
}

The agent never sees the operation. It doesn't try to negotiate. The hook returns deny, the tool call stops.

There's also a simpler way to block without writing any JSON at all. Every script, when it finishes, returns a number to whoever called it. That's an exit code. 0 means success. VS Code specifically watches for exit code 2 as a "stop this right now" signal. You just write your reason to stderr and exit:

echo "Production deploys are frozen" >&2
exit 2

VS Code passes that message to the model as context and blocks the tool call. No JSON, no hookSpecificOutput, no ceremony.

One more thing worth knowing: continue: false in the JSON output stops the entire agent session, not just one tool call. Think of exit code 2 as hitting the brakes on a single operation. continue: false is turning off the engine. Use the second one carefully.

If you define more than one hook for the same event, VS Code runs all of them. When multiple PreToolUse hooks return conflicting permissionDecision values, the most restrictive wins: deny beats ask, and ask beats allow. You can stack a validation hook and a logging hook on the same event without worrying about one canceling the other.

Full protocol reference: Hook input and output, VS Code docs

The morning briefing

Instructions are static. They were written at the time you created the file. But some context only exists at runtime: the current git branch, which environment is active, whether the CI pipeline is currently broken.

SessionStart fires before the agent does anything. Your hook can inject that information directly into the session:

#!/bin/bash
branch=$(git branch --show-current)
node_version=$(node -v)
env_name=${NODE_ENV:-development}

jq -n \
  --arg branch "$branch" \
  --arg node "$node_version" \
  --arg env "$env_name" \
  '{
    hookSpecificOutput: {
      hookEventName: "SessionStart",
      additionalContext: ("Branch: " + $branch + " | Node: " + $node + " | Env: " + $env)
    }
  }'

Every session the agent starts, it already knows what branch it's on and what environment it's working in. No need to tell it.

Not every hook is for everyone

Workspace hooks apply to every agent interaction. But sometimes you want a hook that only runs for a specific agent.

Since VS Code 1.111, you can define hooks directly in a custom agent's frontmatter. The hooks field here uses YAML syntax because it lives inside the .agent.md frontmatter, but the structure maps directly to what you'd write in a JSON config file.

---
name: "PM Explainer"
description: "Explains to your PM why it's taking longer. Also formats code."
hooks:
  PostToolUse:
    - type: command
      command: "./scripts/format-and-lint.sh"
---

You are a code editing agent. After making changes, files are formatted and linted automatically.

Enable chat.useCustomAgentHooks to use this. The hook only runs when this specific agent is active, either selected by the user or invoked as a subagent.

The part everyone skips (don't)

Hooks run shell commands with the same permissions as VS Code. That's the power and the responsibility.

Before enabling hooks from any source, read the scripts. Hooks receive JSON input from the agent. Validate and sanitize that input before using it. Never hardcode secrets in hook scripts; use environment variables.

One specific risk: if the agent has access to edit scripts that your hooks run, it can modify those scripts during a session and execute code it writes. Use chat.tools.edits.autoApprove to prevent the agent from editing hook scripts without manual approval.

When your hook does nothing

If a hook runs but nothing seems to happen, check the GitHub Copilot Chat Hooks output channel. Open the Output panel in VS Code, select that channel from the list, and you'll see which hooks were loaded, which fired, their exit codes, and any stdout or stderr output.

Hook scripts that exit with code 1 produce a non-blocking warning: the agent continues, but the issue shows up in the channel. Exit code 2 is the one that blocks and surfaces the message to the model. If your hook is supposed to stop something and it isn't, check which exit code your script is returning.

Already have Claude hooks? You're closer than you think

If you're coming from Claude Code, VS Code reads hook configurations from .claude/settings.json and .claude/settings.local.json by default. The format is the same, with two differences to watch for:

Tool input property names: Claude Code uses snake_case (e.g. tool_input.file_path), VS Code uses camelCase (e.g. tool_input.filePath)
Tool names: different. Claude uses Write/Edit, VS Code uses create_file/replace_string_in_file

Where Hooks Fit

Hooks are the third piece of a larger customization system:

Instructions (copilot-instructions.md / .github/instructions/): always-on context. Good for project-wide rules.
Skills (.github/skills/): domain-specific, activated on demand. Good for encoding your conventions.
Hooks ← you're here: deterministic automation at lifecycle points. Good for guarantees.
MCP Servers: external tools: database access, browser automation, external APIs.
Plugins: installable bundles that package all of the above together.

Instructions tell the agent how to work. Skills tell the agent what patterns to follow. Hooks make sure certain things happen regardless of what the agent decides.

They're not interchangeable. They're complementary.

Takeaways

The difference between guidance and guarantee matters.

Instructions and skills are probabilistic. They shift the agent's behavior in the right direction. Hooks are deterministic. Your code runs, period. Both have their place. Knowing which to reach for is the skill.

Start with one hook.

One PostToolUse Prettier hook that auto-formats on every file edit. Ship it, live with it for a week, see what else you want to automate. The scope expands naturally.

This is the second post in a series on VS Code agent customizations: Skills, Hooks, MCP Servers, CLI, and Plugins.

Previous post: I stopped fighting Copilot's conventions. I taught it mine instead.

Official docs: Agent Hooks (VS Code)

Stop getting generic output from Copilot. Teach it your patterns.

Daniel Diaz — Thu, 23 Apr 2026 13:11:44 +0000

The Problem

You use Copilot. You ask it to build something, and it does sort of. It follows your prompt, generates working code, and you ship it.

Then you do it again the next day. And the day after.

A month later, your codebase has class names in PascalCase next to camelCase functions, three different error handling styles, two ways to structure the same kind of module, and hooks that work differently from each other for no clear reason.

All of it generated by Copilot. All of it reviewed and accepted by you.

The problem isn't that Copilot generates bad code. It generates generic code. It doesn't know your stack, your team's decisions, or the patterns you settled on six months ago. It works from its training data. And your codebase slowly starts to look like it was written by the internet.

What are Agent Skills?

Agent Skills are Markdown files that tell Copilot what your conventions are, for a specific domain, task, or workflow. Persistent context the agent reads before generating anything in that area.

They live in your repo at:

.github/
└── skills/
    └── your-skill-name/
        └── SKILL.md

That's the entire setup. No config files, no registration, no CLI step. The file being there is enough for VS Code to discover it.

When you mention the skill in a chat prompt like "Use the component-structure skill to create a new button", Copilot reads that SKILL.md first, then generates code that follows your conventions.

This is what separates skills from regular prompts: they're reusable, versioned, and shared through the repo. Your conventions stop living in someone's head or a doc nobody reads; they live next to the code they govern.

With a well-written description, Copilot can automatically load the relevant skill based on your request without explicit mention. Explicitly referencing skills is more common with smaller models that need extra guidance to identify the right context.

VS Code added native skill support and the /create-skill command in version 1.110. In version 1.113, a dedicated Chat Customizations editor was added, providing a centralized UI to manage skills, instructions, and agents from a single place.

A minimal SKILL.md

Here's the minimum a skill needs to actually work:

.github/skills/component-structure/SKILL.md

Frontmatter:

---
name: component-structure
description: Guidelines for creating React components following team conventions: named exports, props interface, no inline JSX logic.
---

Body:

# Component Structure Skill

## When to use
Creating new React components or reviewing component patterns.

## Conventions
- One component per file
- Props interface defined above the component
- Named exports only (no default exports)
- No inline logic in JSX, extract to handlers or custom hooks
- Error boundaries at page level, not inside components

## Never do this
- Default exports
- Logic directly in JSX
- Props without an explicit interface

Pattern example:

interface ButtonProps {
  label: string;
  onClick: () => void;
  disabled?: boolean;
}

export function Button({ label, onClick, disabled = false }: ButtonProps) {
  const handleClick = () => {
    if (!disabled) onClick();
  };

  return (
    <button onClick={handleClick} disabled={disabled}>
      {label}
    </button>
  );
}

The name field must match the directory name exactly. The description is what Copilot reads to decide whether to load the skill, so be specific. Full frontmatter reference: SKILL.md file format.

Four sections. One pattern. A few explicit don'ts.

When Copilot reads this before generating a component, it follows the same structure every time not because it guessed right, but because you told it.

The skill doesn't need to be exhaustive. It needs to be specific. A skill that covers one thing well beats one that tries to cover everything and ends up covering nothing.

Creating a skill from a conversation

One of the more useful additions in VS Code 1.110 is /create-skill. After debugging a problem across several chat turns and landing on the right approach, you type:

/create-skill

The agent extracts the pattern from your conversation and scaffolds a SKILL.md for you. You review, adjust, commit.

This is how skills actually get created in practice. You don't design a skill from scratch. You solve a problem, recognize you'll solve it the same way every time, and capture that. /create-skill removes the friction of doing the capture manually.

The Ecosystem

Skills are one piece of a larger customization system. Here's a quick map:

Instructions (copilot-instructions.md / .github/instructions/): always-on context loaded in every session. Good for project-wide rules.
Skills ← you're here: domain-specific, activated on demand when mentioned in chat.
Hooks: scripts that run at specific points in the agent lifecycle: before a response, after a file write, on session start.
MCP Servers: external tools that give the agent capabilities your codebase doesn't have natively, like database access, browser automation, or external APIs.
Plugins: installable bundles that package all of the above together.

Each of these has its own post coming. Skills are the easiest starting point entirely local to your repo, zero infrastructure, and the effect on the agent is immediate.

What I Learned

The model doesn't know your stack. You have to teach it.

Copilot is trained on public code. Your team's specific decisions aren't in that data. Skills are the mechanism for closing that gap.

One good skill beats ten prompts.

A well-written skill invoked consistently produces better results than re-explaining your conventions in each prompt. The repeatability is the point.

The skill itself is documentation.

Writing a SKILL.md forces you to articulate things that previously existed informally. Once it's committed, the team has a shared reference, not just for Copilot.

The real value shows up at review time.

When everyone uses the same skill, code review stops being about style. You argue about logic instead. That's the conversation worth having.

This is the first post in a series on VS Code agent customizations: Skills, MCP Servers, CLI, Hooks, and Plugins.

Official docs: Agent Skills VS Code

What Backend Engineers Get Wrong About AI Integration

Juan M. Altamirano — Mon, 13 Apr 2026 17:26:14 +0000

So you've been asked to "add AI" to your project. Maybe it's a chatbot, maybe it's a smart search, maybe it's just your PM watching too many YouTube videos about agents. Either way, here you are.

And look, as backend engineers, we're actually well-positioned for this. We know how to design APIs, handle async flows, think about failure modes. The problem is that LLMs don't behave like anything we've integrated before, and our instincts sometimes work against us.

Here, I will try to cover eight common mistakes that I've seen (and made).

1. Treating the LLM like a deterministic function

This is the big one.

We're used to thinking in terms of: same input → same output. Call a database, get a row. Call an API, get a JSON response. Write a unit test, pin the behavior forever.

LLMs don't work like that. Under the hood, they're predicting the next most probable token based on everything that came before, which means there's inherent randomness added into every response. Same prompt, different temperature, different day, and you might get a slightly different answer. Or a wildly different one.

Think of it less like calling a function and more like asking a very smart contractor to do some work for you. They'll usually get it right, but you need to review the output, define what "right" looks like, and have a plan for when they hand you something unexpected.

The practical consequence: don't build flows that assume the model always returns what you expect. Validate the output. Have fallbacks. Don't let a malformed response crash your service.

2. Not handling retries and timeouts

If you call an external REST API and it times out, you probably have retry logic. Maybe exponential backoff. Maybe a circuit breaker.

For some reason, when we start calling LLM APIs, all of that discipline disappears.

LLM calls are slow. We're talking in the order of seconds depending on the model and the response length. And they fail. Rate limits, provider outages, network issues, it all happens.

Set a timeout. Retry with backoff. If you're building something user-facing, stream the response so it doesn't feel like the app is frozen. Treat the LLM provider like what it is: an external dependency that can and will go down.

3. Ignoring token costs in loops

Token costs have a way of sneaking up on you.

A common pattern that seems harmless: you have a list of items, you loop over them, you call the LLM for each one. 50 items? 50 calls. Each one dragging along the same massive system prompt like dead weight. The model isn't doing 50x the thinking. You're just paying 50x for the setup.

It's like printing the entire company handbook before every meeting just to discuss one item.

Some things to think about:

Can you batch items into a single prompt? Sometimes yes, sometimes the quality drops.
Is the system prompt really 2000 tokens? Can it be 400?
Are you sending context the model doesn't need for that specific call?
Could you cache responses for identical or near-identical inputs?

Token costs are easy to ignore in development (small dataset, free tier) and painful to discover in production.

4. Breaking prompt caching without realizing it

This one is especially relevant if you're building agents.

Most providers cache prompts automatically and give you significant discounts when a request matches a previously seen prefix. In an agent loop where you're continuously resending the system prompt, full conversation history and tool calls, that cache is what keeps your bill reasonable.

The catch: think of it like a prefix match. The moment something differs from the previous call, the cache stops there and you pay full price from that point on.

A classic example is injecting the current timestamp into your system prompt. Seems harmless, but since it changes on every call, nothing after it ever gets cached, and you're paying full price every time.

Treat everything before the dynamic part of your prompt as sacred. Same order, same content, same whitespace. Push dynamic context as far down the message history as possible, after the static parts that can be cached.

This leads to decisions that look counterintuitive: sometimes you deliberately send more tokens just to keep the cacheable prefix intact. A 2000-token cached call is cheaper than a 1500-token cache miss. Once you internalize that, it changes how you structure your payloads entirely.

5. Treating prompt engineering as someone else's problem

"The AI team handles the prompts."

If you're building the integration, you need to understand how prompts work. Not at a research level, but enough to know why your endpoint is returning garbage.

The prompt is part of your application logic. It deserves the same attention as your SQL queries or your request validation. A bad prompt will produce bad output no matter how clean your surrounding code is.

At minimum, know the difference between system and user messages, understand what temperature does to your outputs, and learn when to use structured outputs, instead of trying to parse free-form text. Which brings me to...

6. Trusting free-form text when you don’t have to

I've seen code like this in the wild:

response = llm.complete("Analyze this and return: risk level, explanation, recommendation")
lines = response.split("\n")
risk = lines[0].split(":")[1].strip()

Please don't do this.

Modern LLM APIs support structured outputs — you define a schema, the model returns valid JSON that matches it. Every time. No parsing, no IndexError because the model decided to add an extra line.

In Python, pair it with Pydantic. In Typescript, use Zod for validation and parsing. In Go, define a struct and unmarshal directly. Your future self will thank you.

class AnalysisResult(BaseModel):
    risk_level: Literal["HIGH", "MEDIUM", "LOW", "NONE"]
    explanation: str
    recommendation: str

result = llm.complete_structured(prompt, schema=AnalysisResult)
# result.risk_level is always there. Always a string. Always one of the four values.

7. Putting everything in the context window and calling it RAG

RAG (Retrieval Augmented Generation) is genuinely useful. The idea is simple: instead of stuffing all your knowledge into the prompt, you retrieve only the relevant bits and include those.

But a lot of implementations I've seen are just... dumping documents into the prompt and calling it RAG.

Real RAG has a retrieval step. You embed the user's query, find the most semantically similar chunks from your knowledge base, and only send those. If you're sending 50 pages of documentation on every call, you're not doing RAG, you're doing expensive copy-paste.

The retrieval quality matters as much as the generation. Bad retrieval means irrelevant context, and irrelevant context means bad answers (no matter how good your model is).

8. Not thinking about what happens when the model is wrong

LLMs hallucinate. They confidently state things that are false. This isn't a bug that will get patched, it's just how these models work.

So the question isn't "what if the model is wrong?"... it's "what happens in my system when the model is wrong?"

If you're using an LLM to triage support tickets, a wrong answer means someone spends extra time investigating. Annoying, not catastrophic. But if you're using it to generate financial reports or medical summaries, wrong answers have a very different impact.

Design your system with the assumption that the model will sometimes be wrong. Add human review steps where it matters. Log the inputs and outputs so you can audit. Don't pipe raw LLM output into anything irreversible.

Wrapping up

None of this means LLMs aren't useful, they absolutely are. But they're a new kind of dependency with failure modes we're not used to. The good news is that most of the solid engineering practices we already have (validation, retries, observability, defensive design) apply here too. We just need to remember to actually use them.

One last thing: this space moves fast. What's a workaround today might be a built-in feature tomorrow. If you're already the kind of engineer who keeps up with tech, just make sure AI topics are in the mix. It's worth it.

If you've run into other footguns that aren't on this list, drop them in the comments. I'm curious what patterns other backend engineers have hit.

From 'The Bench' to 'Ready to Ship': How AI Redefined My Learning Curve

Joaquin Islas — Tue, 07 Apr 2026 19:46:52 +0000

The Hook: The Bench Moment

We’ve all been there: you’re on the bench, and the pressure is mounting. Two potential assignments land on your desk, but there’s a catch—they are built on tech stacks you’ve only touched on the surface. The traditional anxiety sets in: How fast can I get up to speed without slowing the team down? In the past, that moment felt like staring at a mountain you had to climb alone. But recently, I realized the climb has changed.

The Old Way vs. The New Way

Before AI, the process was linear and often isolating. You’d spend days digging through dense documentation and watching generic tutorials. Now, the dynamic is flipped. It’s no longer about consuming static content; it’s about on-demand, interactive tutoring.

Feature	Traditional Training	AI-Powered Learning	Impact
Learning Curve	Weeks of passive courses.	Just-in-Time learning.	40% reduction in non-productive time.
Problem Solving	Waiting for a mentor.	Instant 24/7 tutoring.	Saves hours of Senior developers' time.
Contextualization	Generic examples.	Adapts to company standards.	Lower initial error rate.
Feedback Quality	Errors found in PR reviews.	Real-time logic suggestions.	Less rework and debt.

Real-World Case: Bridging Java and Go

To illustrate this, I recently had to pivot from my Java background into a Go project. Instead of starting from zero, I used AI as a "paradigm translator."

💡 The Strategy: I asked the AI: "I'm used to Java's Spring Boot; how do I achieve this same pattern idiomatically in Go?"

Here is a concrete example of how I mapped a common Java pattern (Service/Repository) to idiomatic Go in minutes, thanks to the AI's guidance:

// What I knew: Java Implementation (Spring-like)
public class UserService {
    private final UserRepository repo;

    // Standard constructor injection
    public UserService(UserRepository repo) {
        this.repo = repo;
    }

    public User getUser(String id) {
        // Linear stream-like flow
        return repo.findById(id).orElseThrow(() -> new UserNotFoundException(id));
    }
}

// What I learned (Idiomatic Go): Guided by AI
package service

import "errors"

// AI suggested defining an interface here for the dependency, 
// emphasizing Go's implicit interface satisfaction.
type UserRepository interface {
    FindByID(id string) (*User, error)
}

type UserService struct {
    repo UserRepository // AI explained composition over inheritance
}

// AI helped me write the constructor function
func NewUserService(r UserRepository) *UserService {
    return &UserService{repo: r}
}

func (s *UserService) GetUser(id string) (*User, error) {
    user, err := s.repo.FindByID(id)
    if err != nil {
        return nil, err // AI emphasized Go's explicit error handling pattern
    }
    if user == nil {
        return nil, errors.New("user not found")
    }
    return user, nil
}

✅ The Result: I quickly understood the shift from Object-Oriented inheritance to Go's composition and interfaces. This didn't just teach me syntax; it gave me the confidence to show up as a contributor, not a beginner, from week one.

The Metrics: Why This Matters for the Company

The shift isn't just a feeling; the data supports it:

🚀 Productivity Boost: According to GitHub Research, developers using AI complete tasks up to 55% faster.
⚖️ The Leveling Effect: A study by MIT and Stanford suggests that AI helps developers close the skills gap 43% faster, acting as a "great leveler" for the whole team.
📈 Continuous Learning: As highlighted by the Harvard Business Review, AI-powered training can reduce time-to-productivity by 40% compared to traditional, passive methods.

⚠ The Limits: Being Honest

However, let’s be clear: AI is not a silver bullet. It can teach you syntax and explain complex concepts, but it cannot replace human judgment. It doesn't know the specific business context or the long-term architectural risks of our project. AI provides the tools, but you provide the engineering intuition. Experience still matters; AI just lets you acquire it faster.

🛠️ A Practical Framework

If you're looking to replicate this, here is the routine I used:

🌉 Bridge the Gap: Always start by asking the AI to compare the new technology to a stack you already master.
🔀 The "English + AI" Combo: Use AI to simulate technical interviews or draft documentation in English while you learn the tech—it's a double win for professional growth.
💪 Contribution as Practice: Don't just watch videos. Take a small internal spike or task and use your AI assistant to guide your implementation.

🎯 The Closing Thought

In 2026, "being prepared" no longer means knowing everything by heart. It means being agile enough to adapt. AI has changed the definition of a Senior Engineer: it’s less about having all the answers and more about knowing how to leverage tools to bridge the gap between "I don't know this" and "I’m ready to ship."

References & Sources

EdTech Global (2025): AI in Corporate Education: Efficiency and ROI Statistics.
GitHub / Microsoft (2023): The impact of AI on developer productivity.
MIT & Stanford (NBER, 2024): Generative AI at Work and the Leveling Effect.
Harvard Business Review (2024): How Generative AI Is Changing the Way We Learn.

Stop Wasting Time on CVEs That Don't Affect You

Juan M. Altamirano — Mon, 16 Mar 2026 18:47:53 +0000

The Problem

Aren't you tired of pushing new code and then a few days later receiving an alert from Github's Dependabot? Well, I am.
The most annoying part is looking for the CVE, reviewing your code and then detecting that you aren't using the affected part.
Rinse and repeat for every single alert.

The solution?

That's why I built dep_shield — a CLI that I can plug into my common workflow (lint -> dep_shield -> tests -> sonar) and get a straight answer: "this CVE affects you" or "relax, you're fine."

How dep_shield Works

The flow is straightforward:

Parse dependencies — Read requirements.txt or pyproject.toml, extract packages and versions
Check for CVEs — Query the OSV database for known vulnerabilities
Find usage in code — Scan your Python files to see where you import vulnerable packages
AI-powered analysis — Send the CVE description + your import context to an LLM and ask: "Does this actually affect me?"

The Interesting Parts

Parsing Dependencies (Both Formats)

The tool supports requirements.txt and pyproject.toml — it figures out which one you're using.
Why both? Well, requirements.txt is still widely used, but uv is taking over fast.

Querying OSV (the free vulnerability database)

OSV over NVD or Snyk? No API key, no rate limits, no pricing tiers. NVD is slow and wants you to parse CPE identifiers. Snyk needs auth and has usage limits.
OSV just works — POST a package name, get vulnerabilities back. For a CLI tool meant to run locally and fast, that's exactly what I needed.

OSV_API_URL = "https://api.osv.dev/v1/query"

def query_vulnerabilities(package_name: str, version: str | None) -> list[Vulnerability]:
    payload = {
        "package": {
            "name": package_name,
            "ecosystem": "PyPI"
        }
    }
    if version:
        payload["version"] = version

    response = httpx.post(OSV_API_URL, json=payload, timeout=10.0)
    return parse_vulnerabilities(response.json())

{
    "vulns": [
        {
            "id": "GHSA-cpwx-vrp4-4pq7",
            "summary": "Jinja2 vulnerable to sandbox breakout through attr filter selecting format method",
            "details": "An oversight in how the Jinja sandboxed environment interacts with the `|attr` filter allows an attacker that controls the content of a template to execute arbitrary Python code.\n\nTo exploit the vulnerability, an attacker needs to control the content of a template. Whether that is the case depends on the type of application using Jinja. This vulnerability impacts users of applications which execute untrusted templates.\n\nJinja's sandbox does catch calls to `str.format` and ensures they don't escape the sandbox. However, it's possible to use the `|attr` filter to get a reference to a string's plain format method, bypassing the sandbox. After the fix, the `|attr` filter no longer bypasses the environment's attribute lookup.",
            "aliases": ["CVE-2025-27516"],
            "modified": "2026-02-04T04:14:58.595738Z",
            "published": "2025-03-05T20:40:14Z",
            "affected": [
                {
                    "package": { "name": "jinja2", "ecosystem": "PyPI", "purl": "pkg:pypi/jinja2" },
                    "ranges": [{ "type": "ECOSYSTEM", "events": [{ "introduced": "0" }, { "fixed": "3.1.6"}] }]
                }
            ]
        }
    ]
}

The full response has more fields (references, versions, etc.), but these are the relevant ones.

Finding Where You Actually Use the Package

It's easy to forget where you actually use a dependency — especially in large projects with dozens of packages. And not all usage is equal: importing requests in your core API is very different from importing it in a one-off migration script.

def scan_file_for_package(file_path: Path, package_name: str) -> list[CodeUsage]:
    pattern_import = rf"^import\s+{package_name}(\s|,|$|\.)"
    pattern_from = rf"^from\s+{package_name}(\s|\.)"
    result = []

    with open(file_path, 'r', encoding='utf-8', errors='ignore') as file:
        for line_num, line in enumerate(file, 1):
            line = line.strip()
            if line.startswith("#"):
                continue

            if re.match(pattern_import, line):
                import_type = "import"
            elif re.match(pattern_from, line):
                import_type = "from"
            else:
                continue

            result.append(CodeUsage(
                file_path=str(file_path),
                line_number=line_num,
                line_content=line,
                import_type=import_type
            ))
    return result

The AI Part: Asking A Model If It Matters

Now, here's where it gets interesting. I send the LLM:

The CVE description
The import statements from your code

And ask: "Given how this code uses the package, does this vulnerability apply?"

Important: Only import lines are sent to the LLM — not your actual business logic. The model sees from requests import Session, not the body of your functions. Your code stays local.

_SYSTEM_PROMPT = """You are a security analyst reviewing Python dependency vulnerabilities.
You will be given a CVE description and import-level code evidence.
Your job is to determine if the vulnerability realistically affects this codebase.

Risk level definitions:
- HIGH: The imports directly expose the vulnerable code path
- MEDIUM: Package is imported but no clear evidence the vulnerable feature is used
- LOW: Package only imported in tests or dev tooling
- NONE: CVE conditions are not plausible in this codebase"""

I use Pydantic to force the model into a consistent format, no free-form text that I'd have to parse.
It either gives me a valid ImpactAnalysis or it fails. No "well, it depends..." answers.

class ImpactAnalysis(BaseModel):
    risk_level: Literal["HIGH", "MEDIUM", "LOW", "NONE"]
    explanation: str
    recommendation: str

Making It Smarter Over Time (RAG)

Here's the thing — the LLM doesn't know your codebase. It analyzes each CVE in isolation. But what if it could remember past analyses?

That's where ChromaDB comes in. Every time the tool analyzes a vulnerability, it stores the result. Next time a similar CVE shows up (same package, similar vulnerability type), it retrieves past analyses as context.

def store_analysis(vulnerability_id: str, analysis: ImpactAnalysis, code_context: str):
    collection.add(
        documents=[f"{vulnerability_id}: {analysis.explanation}"],
        metadatas=[{"risk": analysis.risk_level, "package": package_name}],
        ids=[vulnerability_id]
    )

def get_similar_analyses(vulnerability_description: str, limit: int = 3):
    results = collection.query(
        query_texts=[vulnerability_description],
        n_results=limit
    )
    return results

The result? The tool gets better the more you use it. If you analyzed a requests CVE last month and a new one appears, the model sees how you handled it before.

What the Output Looks Like

What I Learned

Honestly? It reinforced an ancient rule: context is everything
A CVE that sounds critical in the abstract becomes irrelevant when you realize you only import that package in a test helper. OSV turned out to be the perfect data source — free, fast, no API keys, solid Python coverage. And while the LLM does a surprisingly good job at triage, I still treat its output as a suggestion, not a verdict. The RAG layer (ChromaDB) was an afterthought, but it's become one of the most useful parts: the tool genuinely gets better as it sees more of your codebase.

What's Next

I have a few ideas in mind:

More languages — JavaScript and Go are probably the next ones
CI/CD integration — run it in your pipeline, make it fail the build if there is any 'HIGH' impact
Offline mode — run it against a local LLM

Try It Yourself

Give it a shot and let me know what breaks. Seriously — I'd appreciate any feedback.
dep_shield

Who tests the tests?

Lucas Gabriel Sánchez — Fri, 20 Feb 2026 17:05:58 +0000

This post is based on a Gophercon talk by Daniela Petruzalek: Who tests the tests?

A little bit of history

In the beginning, we checked our code manually, running the application and trying different inputs: we called that manual testing. Then we discovered that we could write code to test application code to check if it's correct: we called that automatic testing (unit, integration, functional, etc.)

Now we are in the AI era where we write fewer tests and even less code, we need a way to swiftly check that the tests are correct and that they are testing the cases we expect in our application.

How can we know if code written by AI is correct?

You can use automatic tests in the same way we used to check code written by humans.

One option would be to let the AI write the application code and you write the tests, you can even use TDD where you, the human, write the tests and let the AI write the implementation after.

Another option is to let the AI write the application code and the tests, but then how can you be sure that those tests are testing the things you want or need? Reading and understanding the tests would be the best, but can we do that automatically?

Enter: mutation tests

Mutation testing is a way to check that the tests you have are testing the code the way you want by making small changes to the application code and checking that the tests fail as expected.

It's based on a concept known as a mutant: a version of your application with a small change.

How does it work?

The mutation testing cycle is this:

Create a mutant: change the application code by applying just one change
Run your test suite
Check how many mutants were killed

After running your tests, the ones that failed are said to have killed the mutant, those tests are testing something related to the code you changed and are, from the perspective of that change, good tests.

After many changes, if a test never failed it means that test didn't kill any mutant and is a weak test. You should probably delete that test or write a better one.

If a mutant is never killed, then that code is not being tested (no coverage) or is being tested poorly, you have an opportunity to write a test to check that piece of code if needed.

Do I have to make these changes manually?

You can do it manually, but there are some tools to aid you:

C#, TypeScript and Scala: Stryker
Go: go-gremlins
Java: pitest
Python: mutatest
Rust: mutants.rs

These tools provide a way to make those changes automatically and some of them run the tests for you.

Why do we need mutation testing?

Let's see a very simple example in Python:

def divide(a, b):
    if b == 0:
        raise ValueError("can't divide by 0")
    return a/b

A simple test suite we can have is:

import unittest
from divide import divide

class TestDivide(unittest.TestCase):
    def test_divide_error(self):
        with self.assertRaises(ValueError):
            divide(1, 0)

    def test_divide_success(self):
        self.assertEqual(1, divide(1, 1))

if __name__ == '__main__':
    unittest.main()

Run the tests and everything is fine:

$ python tests.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

Here we are testing both flows, when an error is raised and when we complete a successful operation, coverage is at 100% but those tests are not great, and here's why: let's change the implementation of divide from a/b to a*b:

def divide(a, b):
    if b == 0:
        raise ValueError("can't divide by 0")
    return a*b

Running the tests should fail, right? They don't, because 1*1 is still 1:

$ python tests.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

This is a problematic test:

def test_divide_success(self):
    self.assertEqual(1, divide(1, 1))

Even when you have 100% test coverage, that doesn't mean you have 100% test case coverage; some cases may be missing or not being tested correctly.

What we did here was a mutation: from a/b to a*b, that mutation gives us information about our tests that we didn't have before.

How to read the results of mutation testing?

When you run your test suite against mutated code, each test can do one of two things: kill the mutant (the test fails, so it detected the change) or survive (the test still passes, so it missed the change). After many mutation, you can summarize how often each test killed mutants, for example:

After 200 Mutations:
TestA: 140 Kills, 60 Survived
TestB: 200 Kills
TestC: 30 Kills, 170 Survived

How to interpret this:

TestA: is good test because on most mutations it was killed.
TestB: is your strongest test, it detects every mutation you ran.
TestC: is the weak one, it usually doesn't detect mutations.

When not to use mutation testing?

In some projects or languages, running mutation tests can take hours. For each mutant, the tool has to compile (if needed) and run the full test suite. If your test suite takes 10 minutes, each mutation will take at least that long, so keep this in mind.

Use this approach on small projects or ones where the "edit, compile, test" cycle is small.

Closing

So who tests the tests? In practice, mutation testing does that: by checking that your tests react when the code is deliberately broken.

Bicycles Are All Your AI Agents Need

Federico Pascarella — Thu, 13 Nov 2025 12:47:16 +0000

From Condors to Code

Somewhere between a condor and a keyboard lies human genius. Steve Jobs once told a story about how humans are terrible movers compared to animals. The condor beats us easily in the race of energy efficiency, but put a person on a bicycle and they fly.
The bicycle, Jobs said, is "a tool that amplifies our efficiency." Computers, he added, are bicycles for the mind.
That thought never left me. And now, with AI agents evolving super-fast, I can't help seeing the same pattern repeat.
My humble view is that tools are still the key. Only this time, the cyclists are our AI agents with its brain (the LLM), and the bicycles are the functions we build for them.
With the right tools, an agent moves with purpose. With clumsy tools, it stalls.

The Engineering of Great Tools

Great agents sit on top of small, sharp Python functions. They are plain, predictable, and fast.

1. Single Responsibility

Specialize each function. Do one job well, then compose.

# Bad: Swiss-army function
def create_user(name, email, send_welcome=True, log=True):
    user = db.save(name, email)
    if send_welcome:
        send_email(email, "Welcome!")
    if log:
        logger.info(f"User {name} created")
    return user

# Good: Focused, composable tools
def save_user(name: str, email: str) -> dict:
    return db.save(name, email)

def send_welcome_email(email: str) -> bool:
    return send_email(email, "Welcome!")

2. Clear interfaces

Name things so intent is obvious. Keep arguments explicit. Return data instead of printing.

# Bad: Vague names and side effects
def discCalc(p, x, t=None):
    result = p - (p * x / 100)
    print(f"Discount applied: ${result}")

# Good: Straight names and returns
def calculate_discount(price: float, percentage: float) -> float:
    return price - (price * percentage / 100)

3. Structured outputs

Agents prefer structure. Return dicts or JSON, not prose.

# Bad: Unstructured string
def get_weather(city):
    temp = fetch_temperature(city)
    return f"It's about {temp} degrees in {city}, partly cloudy"

# Good: MCP tool with schema
from pydantic import BaseModel, Field
from mcp.server import Server
from mcp.types import Tool

server = Server("weather")

class WeatherData(BaseModel):
    city: str = Field(description="City name")
    temperature: float = Field(description="Temperature in Celsius")
    condition: str = Field(description="Weather condition")
    humidity: int = Field(description="Humidity percentage")

@server.call_tool()
async def get_weather(city: str) -> WeatherData:
    temp = await fetch_temperature(city)
    condition = await fetch_condition(city)
    return WeatherData(city=city, temperature=temp, condition=condition, humidity=65)

4. Efficiency

Use built-ins, cache where it helps, and profile before optimizing.

# Bad: Manual loops
def filter_active_users(users):
    result = []
    for user in users:
        if user.get("active"):
            result.append(user)
    return result

# Good: Built-ins plus caching
from functools import lru_cache
from typing import Tuple, List

@lru_cache(maxsize=128)
def filter_active_users(users_tuple: Tuple[dict, ...]) -> List[dict]:
    return [u for u in users_tuple if u.get("active")]

5. Robustness

Validate inputs and fail loudly with helpful errors.

# Bad: No validation
def read_file(path):
    with open(path) as f:
        return f.read()

# Good: Validation and clear errors
def read_file(path: str) -> str:
    if not isinstance(path, str) or not path:
        raise ValueError("Path must be a non-empty string")
    try:
        with open(path, "r") as f:
            return f.read()
    except FileNotFoundError:
        raise FileNotFoundError(f"File not found: {path}")

6. The micro-tooling mindset

Break big jobs into small tools you can test and swap. MCP benefits from chains of simple, named steps.

# Bad: Monolith
def process_user_data(user_id):
    user = db.fetch(user_id)
    validated = validate(user)
    enriched = api.enrich(validated)
    return transform(enriched)

# Good: Composable steps
def fetch_user(user_id: str) -> dict:
    return db.fetch(user_id)

def validate_user(user: dict) -> dict:
    return validate(user)

def enrich_user(user: dict) -> dict:
    return api.enrich(user)

7. Trade-offs

Hundreds of tiny tools can create orchestration overhead. Clear names, steady input and output shapes, and basic docs keep things manageable.

Show, Don't Tell: Two Decision Flows

A concrete example makes the difference clear. Here is the same task, done with weak tools and with strong tools.

Task: Extract newly signed customers from a CSV in cloud storage, enrich each with firmographic data, and email an account summary.

Agent with poorly designed tools

Calls a generic process_file() that auto-detects type and tries to parse everything.
Uses one do everything enrich_user() that accepts many flags, then times out on third party rate limits.
Prints logs to stdout, returns a mixed string summary, and the agent fails to decide what to send.

Decision flow with weak tools

Input: blob path
Branch: auto-detect format, guess schema
Loop: enrich with side effects
Output: unstructured string
Failure mode: retries loop, hallucinates missing fields, no clear errors

Agent with well designed tools

load_csv(path, schema) returns a typed dataframe.
batch_enrich(users, provider, rate_limit) yields structured rows with retry metadata.
render_account_summary(users) returns JSON for send_email(to, subject, body_html).

Decision flow with strong tools

Input: explicit path and schema
Transform: strict parser
Enrich: idempotent, rate limited, returns status per row
Render: deterministic template
Output: email send result with IDs

Result: same goal, three clean steps, easy to test and to explain.

Conclusion

I believe that innovation often hides in simplicity. Building efficient AI agents isn't about giving them infinite intelligence; it's about giving them great tools. Write them clean, focused, and well-documented; think in micro-tooling: small parts, big impact.
So, next time you're debugging that stubborn Python function, just remember: you're not fixing a bug. You're tuning a bicycle for the mind of an AI.

WhatsApp + MCP: automatic audio transcription

German Burgardt — Mon, 29 Sep 2025 19:39:53 +0000

Introduction

MCP (Model Context Protocol) can look complicated until you ship something real with it. Let's use it on something practical: expose your WhatsApp voice notes with your own MCP server and turn them into transcripts.

What is MCP?

MCP is a connection standard that connects AI agents with external systems.

It has a server and a client, and they have two different ways to talk to each other:

stdio (stdin/stdout): the standard Unix mechanism for a process to receive or send data to the environment or another process.
Server-Sent Events (SSE): an HTTP mechanism where the server keeps the connection open and streams events to the client (one-way).

Quick comparison of stdio and SSE transports in MCP.

MCP architecture

Host: Claude Desktop / Cursor / any AI agent. It coordinates the LLM, spins up MCP clients, and shows results.
MCP Client: an implementation embedded in the host that connects to your server. It speaks the protocol, opens/manages the connection, and sends/receives requests.
MCP Server: your program that exposes tools. It runs actions and returns data/events to the client.

An MCP server can expose different capabilities, but in this project we stick to tools (actions like transcribing audio). MCP also supports resources or prompts; we skip them here to keep the flow simple.

Diagram of the Host → MCP Client → MCP Server flow.

Building the WhatsApp MCP

WhatsApp Desktop on macOS stores everything locally: an SQLite database with chats and folders containing the media files.

Our MCP server will:

Read the WhatsApp database
Find audio files per contact
Transcribe them with Whisper
Send the text back to the Client (Cursor in this case)

The working code lives in the repository: mcp-whatsapp-whisper. Let's walk through the key pieces.

The STDIN/STDOUT connection

import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const transport = new StdioServerTransport();
await this.server.connect(transport);

With that the server listens to every client request on STDIN and replies through STDOUT.

We pick stdio because this MCP server runs locally. It's the simplest and most stable transport on desktop/CLI: no open ports, no HTTP dependency, avoids CORS/firewalls, and hosts (Claude Desktop/Cursor) support it natively. SSE makes sense when the server lives remotely behind HTTP.

Exposing capabilities

this.server = new Server(
  {
    name: "whatsapp-audio-mcp",
    version: "1.0.0",
  },
  {
    capabilities: {
      tools: {}, // We will expose actions
    },
  }
);

Designing the tools

The server lives on three tools each with a specific role:

getRecentAudio(contactName, count?): pulls the latest audio paths for a contact.
searchAudios(query, date?): narrows the list by name or date when the history is large. We get filtering without touching SQLite directly.
transcribeAudio(audioPath): turns a path into text with Whisper. It finishes the loop by delivering the result we care about.

The goal was a minimal set: find, refine, transcribe. Each tool lines up with one of those stages.

{
  name: 'transcribeAudio',
  description: 'Transcribe an audio file using OpenAI Whisper (SDK)',
  inputSchema: {
    type: 'object',
    properties: {
      audioPath: {
        type: 'string',
        description: 'Path to the audio file',
      },
    },
    required: ['audioPath'],
  },
}

The schema follows JSON Schema. With it, Cursor knows which parameters to send.

Accessing WhatsApp

WhatsApp Desktop keeps everything under predictable paths:

this.dbPath = path.join(
  homeDir,
  "Library/Group Containers/group.net.whatsapp.WhatsApp.shared/ChatStorage.sqlite"
);
this.mediaPath = path.join(
  homeDir,
  "Library/Group Containers/group.net.whatsapp.WhatsApp.shared/Message/Media"
);

The database is SQLite:

const query = `
  SELECT DISTINCT 
    ZCONTACTJID as jid,
    ZPARTNERNAME as name,
    ZLASTMESSAGEDATE as lastMessageDate
  FROM ZWACHATSESSION
  WHERE ZPARTNERNAME IS NOT NULL
  AND ZCONTACTJID NOT LIKE '%@g.us'  -- Exclude groups
`;

Audio files are organized per contact. We scan recursively:

const audioExtensions = [".opus", ".m4a", ".mp3", ".aac", ".wav"];

async function scanDirectory(dir: string): Promise<void> {
  const entries = await fs.readdir(dir, { withFileTypes: true });

  for (const entry of entries) {
    if (audioExtensions.some((ext) => entry.name.endsWith(ext))) {
      // Found an audio file
      audioFiles.push({
        path: fullPath,
        filename: entry.name,
        modifiedDate: stats.mtime.toISOString(),
      });
    }
  }
}

The transcription: FFmpeg + Whisper

WhatsApp ships audio in Opus, but OpenAI Whisper prefers MP3. We use FFmpeg:

const ffmpeg = spawn("ffmpeg", [
  "-i",
  inputPath, // WhatsApp Opus audio
  "-acodec",
  "mp3",
  "-b:a",
  "128k",
  outputPath, // Temporary MP3
]);

Then we transcribe with OpenAI Whisper (SDK):

import OpenAI from "openai";
import fs from "node:fs";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream(outputPath), // Temporary MP3
  model: "whisper-1",
});

const transcriptionText = transcription.text;

Configuring Cursor (the client)

In the Cursor config (~/.cursor/mcp.json) we add:

{
  "mcpServers": {
    "whatsapp": {
      "command": "node",
      "args": ["/path/to/mcp-whatsapp-whisper/dist/server.js"],
      "env": {
        "OPENAI_API_KEY": "YOUR_OPENAI_KEY"
      }
    }
  }
}

Cursor can now invoke our server whenever it needs to.

MCP in action

The user asks Cursor:

"Send me the transcript of Elian's last audio."

Cursor automatically:

Calls getRecentAudio(contactName: "elian")
Receives the audio file path
Calls transcribeAudio(audioPath: "/path/to/audio.opus")
Receives the transcription
Summarizes or shows the full text

The transcription flows through the OpenAI API; the temporary MP3 is sent to get the text back. Cursor orchestrates; your server prepares the file and makes the call.

Cursor showing the transcription returned by the WhatsApp MCP server.

Limitations: macOS only

This server is macOS only. The WhatsApp paths are specific to Mac.

It depends on:

WhatsApp Desktop installed
FFmpeg (brew install ffmpeg)
OpenAI SDK (npm i openai) with OPENAI_API_KEY configured
Internet connection

We also skip Prompts and Resource Templates.

Security depends on the host. Cursor can ask for approval before it runs tools.

Keep it running with PM2

Build the project once (npm run build) and keep the server alive with pm2 start ecosystem.config.cjs. The provided config watches the compiled dist/server.js and restarts it if it crashes.

Conclusion

Your AI agent can now reach your data, use your tools, work in your context.

The WhatsApp server is just one idea. Once you realize any program that speaks STDIN/STDOUT can be an MCP server, the possibilities get wild.

Next time you think "I wish Cursor could access...", remember: it probably can. You just need to build the bridge.

How AI Reflects Your Thinking

German Burgardt — Tue, 26 Aug 2025 13:36:38 +0000

When we code using AI we ask ourselves: "what's the best prompt?" or "what magic prompt should I use?".

We'd be better off asking: "what kind of interaction is this?". Trying to understand the nature of the interaction between us and the model.

Maybe the problem isn't the technology, but us.

An Analogy

Imagine you hire a remote programmer. Brilliant, but with some quirks:

Never worked on your project before (0 context)
Extremely literal. If you don't explicitly tell them, they never assume anything.
Doesn't infer context
Completely loses their memory every day, returning to their initial state

How would you communicate with them?

You'd probably:

Explain all the necessary context, very detailed
Be very specific with requirements
Not assume they'll "figure out" anything. You explain everything
Expect some iterations before the final result
Maybe save context files to resend them every day

That's the best way to interact with an AI model.

AI As a Mirror

The model isn't just a task executor. It's also a mirror of your clarity when communicating a problem.

If you give it vague instructions, you get vague results simply because it faithfully reflects how vague your thinking was.

Most of the time when the model "doesn't understand" the problem isn't the model. It's that we ourselves weren't clear about what we wanted.

Clarity As a Skill

The real skill isn't "writing good prompts". It's thinking clearly about problems and communicating that clarity. This is a fundamental skill for any programmer.

Example

What we usually do:

Optimize this function

Why it fails: Optimize in what sense? Speed? Memory? Readability? There's no success criteria.

What we should do:

The processOrders() function in orders.js takes 5 seconds with 1000 orders.
I need it to take less than 1 second.
Orders come from the database already sorted by date.
You can assume there are no duplicate orders.
Logs: <<detailed logs>>

This is much clearer and less abstract. It describes:

The problem (5 seconds is too much)
The measurable goal (less than 1 second)
Constraints (already sorted)
Assumptions (no duplicates)

Breaking Down Problems

One of the skills that improves working with AI is breaking problems down into smaller pieces. AI won't save you the work of thinking. The clarification process itself is valuable work in programming.

Instead of:

Implement a complete authentication system

You learn to think:

Step 1: Define the User model with minimum required fields: <fields>
Step 2: Create the registration endpoint with basic validation (validation type, etc)
[etc...]

The Limitations

AI can only handle 3-4 files well at a time. It's a limitation but with its bright side:

It forces you to keep responsibilities separated and create clear interfaces. You need to avoid coupling and think in small modules.

It incentivizes you to follow good architecture practices.

The Importance of Context

AI needs all the context possible, don't skimp.

CONTEXT: Users report the checkout page hangs
SYMPTOM: The "Pay" button stays in "Processing..." state indefinitely
FILE: checkout.js, handlePayment() function
SUSPICION: Probably missing a catch to handle API errors
TASK: Add robust error handling and visual feedback to the user

The Value of Programming with AI

Programming with AI trains you in thinking clearly and communicating precisely. It forces you to break problems into manageable pieces and be explicit with your requirements while constantly verifying results.

These seem like fundamental skills for any dev regardless of language.

Final Reflection

AI doesn't save you from thinking, or at least you shouldn't use it that way. It's the opposite, every prompt you write is an opportunity to clarify your understanding. Every response you receive is feedback on your clarity. Every iteration is a chance to improve.

Next time you use AI and don't get the expected result, before blaming the model, ask yourself:

Did I really have clarity on what I wanted?
Did I break down the problem into manageable parts?

These models are honest, literal collaborators. They give you exactly what you ask for, but they demand clarity. Learning to be clear is learning to think well. AI used properly makes you a better programmer.

Automate Any Repetitive Task with MCP

German Burgardt — Mon, 28 Jul 2025 17:30:01 +0000

The Problem: Repetitive Detailed Prompting

Every time I start a new task in Claude Code / Cursor, I type a detailed prompt to guide the AI through an internal monologue before proceeding. For example:

"You will generate an internal monologue of 200 numbered lines where two thinkers debate the approach:

Pragmatic focuses on functionality and efficiency
Creative on innovation and elegance
Follow these rules: exactly 200 lines, each starting with [Pragmatic] or [Creative]
Be specific about code without abstractions
Reflect and question without solutions
Mention files/functions/variables
Consider edge cases/performance/maintainability/user experience
Debate simplicity vs functionality
Question decisions, no repeats, end without conclusion
Then address the task: [actual task here]."

Typing this repeatedly 20+ times a day wastes time and disrupts focus.

As someone researching practical AI applications, we can fix that.

// Before: 200+ word prompt every time
// After: "internal monologue 200 lines - implement auth system"

Enter MCPs: The Missing Link

Model Context Protocols (MCPs) allow extending AI agents with custom tools. While common examples include fetching data, web browsing, or integrating with Slack, I used it in a novel way to automate my repetitive prompt.

From Repetition to Automation

I built an MCP server in my Remix app (essentially the same as plain Node.js) that generates these monologues on demand. Now, Claude detects the trigger and handles it automatically.

Here's a glimpse of what it generates:

1. [Pragmatic] We need to implement auth - start with basic JWT in middleware.js
2. [Creative] But what about OAuth? Users expect social login nowadays...
3. [Pragmatic] OAuth adds complexity - first nail down password flow, then extend
...

The difference:

Before: Type the full detailed prompt each time, then describe the task.
After: Simply say "internal monologue 200 lines about X - [task]", and Claude generates the monologue via the tool, then proceeds.

Time saved: ~2 minutes per task

Characters typed: 300+ → 40

Building Your Own Monologue MCP

Here's how to implement it in a Node.js server (adaptable from my Remix example).

Step 1: Install Dependencies

npm install @modelcontextprotocol/sdk zod @anthropic-ai/sdk

Step 2: Create the MCP Server Handler

Create app/lib/mcp-server.ts:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import { z } from "zod";
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export function createMCPServer() {
  const server = new Server(
    {
      name: "monologue-mcp",
      version: "1.0.0",
    },
    {
      capabilities: {
        tools: {},
      },
    }
  );

  // Define the monologue tool
  server.setRequestHandler("tools/list", async () => ({
    tools: [
      {
        name: "generate-monologue",
        description:
          "Generate a reflective internal monologue in the style of Pragmatic vs Creative thinker",
        inputSchema: {
          type: "object",
          properties: {
            lines: {
              type: "number",
              description: "Number of lines in the monologue (default: 100)",
              default: 100,
            },
            context: {
              type: "string",
              description: "Current conversation context",
            },
            task: {
              type: "string",
              description: "Description of the task to perform",
            },
          },
          required: ["task"],
        },
      },
    ],
  }));

  // The actual tool implementation
  server.setRequestHandler("tools/call", async (request) => {
    if (request.params.name === "generate-monologue") {
      const ArgsSchema = z.object({
        lines: z.number().int().min(1).max(500).default(100),
        context: z.string().max(2000).optional().default(""),
        task: z.string().min(1).max(1000),
      });

      const { lines, context, task } = ArgsSchema.parse(
        request.params.arguments
      );

      try {
        const systemPrompt = `You are two thinkers having an internal dialogue about programming.
Pragmatic is focused on functionality and efficiency.
Creative is obsessive about innovation and elegance.

STRICT RULES:
1. Generate EXACTLY ${lines} numbered lines
2. Each line must start with [Pragmatic] or [Creative]
3. NO abstractions - be specific about the code
4. NO complete solutions - REFLECT and QUESTION
5. Mention specific files, functions, variables when relevant
6. Think about: edge cases, performance, maintainability, user experience
7. Debate simplicity vs functionality
8. Question every technical decision
9. NO repeated ideas - each line must add new value
10. End without a definitive conclusion - it's reflection, not decision`;

        const userPrompt = `${
          context ? `Previous context:\n${context}\n\n` : ""
        }Current task: ${task}

Generate an internal monologue of EXACTLY ${lines} numbered lines where the two thinkers debate the best way to approach this task.`;

        const response = await anthropic.messages.create({
          model: "claude-opus-4-20250514",
          max_tokens: 32000,
          temperature: 1,
          system: systemPrompt,
          messages: [
            {
              role: "user",
              content: userPrompt,
            },
          ],
        });

        const monologue = response.content[0].text;

        return {
          content: [
            {
              type: "text",
              text: monologue,
            },
          ],
        };
      } catch (error: any) {
        return {
          content: [
            {
              type: "text",
              text: `Error generating monologue: ${error.message}`,
            },
          ],
          isError: true,
        };
      }
    }

    throw new Error(`Unknown tool: ${request.params.name}`);
  });

  return server;
}

Step 3: Create the API Route

Create app/routes/api.mcp.ts:

The MCP server needs to be exposed as an HTTP endpoint. We use Bearer authentication to secure it. Only Claude (or other authorized clients) with the correct API key can access your server. This prevents random people from using your tools.

import type { LoaderFunctionArgs } from "@remix-run/node";
import { createMCPServer } from "~/lib/mcp-server";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";

// SSE (Server Sent Events) keeps an open connection between Claude and your server
// This allows Claude to call your tools in real time without polling

// Simple auth check
function verifyAuth(request: Request): boolean {
  const authHeader = request.headers.get("Authorization");
  const expectedKey = process.env.MCP_API_KEY || "your-secret-key";
  return authHeader === `Bearer ${expectedKey}`;
}

export async function loader({ request }: LoaderFunctionArgs) {
  if (!verifyAuth(request)) {
    return new Response("Unauthorized", { status: 401 });
  }

  const responseHeaders = new Headers({
    "Content-Type": "text/event-stream",
    "Cache-Control": "no-cache",
    Connection: "keep-alive",
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Headers": "Authorization, Content-Type",
  });

  const server = createMCPServer();

  const transport = new SSEServerTransport({
    endpoint: "/api/mcp",
    requestHeaders: Object.fromEntries(request.headers.entries()),
    responseHeaders: Object.fromEntries(responseHeaders.entries()),
  });

  const stream = new ReadableStream({
    async start(controller) {
      try {
        await server.connect(transport);

        // Keep connection alive (SSE connections timeout after 30 seconds of silence)
        const keepAlive = setInterval(() => {
          controller.enqueue(new TextEncoder().encode(": keepalive\n\n"));
        }, 30000);

        request.signal.addEventListener("abort", () => {
          clearInterval(keepAlive);
          controller.close();
        });
      } catch (error) {
        controller.error(error);
      }
    },
  });

  return new Response(stream, {
    headers: responseHeaders,
  });
}

Step 4: Configure Environment Variables

Add to your .env:

ANTHROPIC_API_KEY=your-anthropic-api-key
MCP_API_KEY=a-secret-key-for-your-mcp

The ANTHROPIC_API_KEY lets your server call Claude's API to generate monologues. The MCP_API_KEY is your own secret, it's what Claude will use to authenticate with your server.

Step 5: Deploy and Connect

Deploy your changes (I use Vercel, but any platform works):

git add .
git commit -m "Add MCP server for internal monologues"
git push

Then connect from Claude:

claude mcp add --transport sse monologue https://yourdomain.com/api/mcp --header "Authorization: Bearer your-secret-key"

The sse transport tells Claude to use Server Sent Events (the streaming connection type we set up). Replace your-secret-key with the same MCP_API_KEY from your .env file.

How It Works in Practice

Now, when working in Claude:

> internal monologue 150 lines - design user experience for login flow

Claude detects the phrase, calls the MCP tool, generates the detailed monologue (e.g., a debate on intuitive interfaces vs secure processes, navigation logic, etc.), and uses it to design the feature thoughtfully.

A sample monologue excerpt:

1. [Creative] Login flow should be innovative and seamless – perhaps biometric integration for delight?
2. [Pragmatic] Biometrics add complexity; focus on reliable password handling in auth.js first.
3. [Creative] But user experience suffers with forms – question if we can animate transitions smoothly.
4. [Pragmatic] Animations might impact performance on mobile; consider edge cases in responsive design.
...

Why This Matters

This MCP setup boosts programming efficiency by leveraging AI tools for consistent planning and productivity gains, while experimenting with a non typical application to explore MCPs more creatively and deeply.

What's Next?

One could build other creative tools, such as one that fetches and analyzes server logs directly, or another that integrates with external APIs for real time data checks.

Your Turn

What repetitive tasks do you deal with in your daily work? Maybe you can create an MCP. The code is ready to adapt and build something.

Questions? Leave a comment below and I'll be happy to help!

A2A - Understanding the Basics and Building Multi-Agent Flight Management System

Eze Quiroga — Wed, 09 Jul 2025 03:33:15 +0000

🌟 Introduction

Continuing with the context I shared in the previous article -MCP - Understanding the Basics and Building a Research Paper Management Chatbot-, where I spotted the fact there's been a growing need for a standard way to enable communication between agents and give them richer context to handle complex tasks through natural language, it's time to explore how to communicate agents or even complete agentic systems in a standard way.

That's where Google's A2A (Agent-to-Agent) protocol comes in.
Announced by Google on April 9, 2025, this emerging protocol standardizes how AI agents communicate with each other, enabling them to share context, delegate tasks, and collaborate on complex objectives that require multiple specialized capabilities.

In this post, we'll walk through building a command-line multi-agent system using the A2A protocol. We'll learn how to:

Create A2A agents with their cards and skills
Configure how agents will return information
Use a centralized LangChain ReAct agent to call A2A agents

By the end, our chatbot will be able to:

(employee_flight_request_agent) Know the status of corporate flight orders (pending purchase, purchased, and associated with a specific person)
(airport_knowledge_base_agent) Obtain information about airports and cities
(flight_search_agent) Search for real flight information departing from a specific airport
Recommend airports for flights pending acquisition

Here's how we'll break it down:

Context
Local environment setup
What is A2A?
Core components
Communicating agents
Building A2A Agents
Our chatbot
Running our chatbot
Key features
Final thoughts
Resources

Let's get started! 🚀

Important note: Only the most relevant function signatures and docstrings are shown in this post. You can find the full implementation in ezequiroga/a2a-bases.

🤝 Context

The main objective of the project is to recommend departure airports for corporate flights. For this, we will create a chatbot and three A2A agents:

employee_flight_request_agent: Manages employee flight requests and booking status using an internal database. It returns results immediately: receives requests → processes → returns results.
airport_knowledge_base_agent: Acts as a knowledge database that provides airport information and city-airport mappings. Since the main purpose of this article is to explore A2A, this agent uses fuzzy matching to retrieve information. It uses streaming to return its results.
flight_search_agent: Performs real-time flight search using external aviation data from the Aviation Stack API. This agent uses a ReAct Agent from LangChain to create filters for the tool that interacts with the Aviation Stack API. It responds to requests by sending push notifications.

These three agents will be called through the chatbot, which uses a ReAct Agent from LangChain to interact with the user and decide which agent should be called.

Each agent uses a different communication method with our chatbot, so our chatbot needs to adapt to each of them.

🛠️ Local environment

🐍 Python 3.13.5

Install the required packages from requirements.txt using the uv add -r requirements.txt command or pip install -r requirements.txt.

IMPORTANT NOTE: The A2A protocol library must be installed using UV to avoid installation errors. This is the recommended approach according to the official A2A documentation. Using pip may result in dependency conflicts or incomplete installations.

Pro tip: Use Python virtual environments for cleaner dependency management.

🤔 What Is Agent2Agent (A2A)?

Let's explore what the A2A protocol is and how it enables seamless agent-to-agent communication. For more details, check out the Resources section at the end.

The A2A protocol was created by Google with the goal of standardizing and simplifying both communication and interoperability between AI Agents or even complete Agentic Systems.

As the official documentation states, A2A's key goals are:

Interoperability: Bridge the communication gap between disparate agentic systems.
Collaboration: Enable agents to delegate tasks, exchange context, and work together on complex user requests.
Discovery: Allow agents to dynamically find and understand the capabilities of other agents.
Flexibility: Support various interaction modes including synchronous request/response, streaming for real-time updates, and asynchronous push notifications for long-running tasks.
Security: Facilitate secure communication patterns suitable for enterprise environments, relying on standard web security practices.
Asynchronicity: Natively support long-running tasks and interactions that may involve human-in-the-loop scenarios.

The communication is based on HTTP(S) as the transport protocol and defines that each server exposes its services through a URL included in its AgentCard. All data exchange is based on JSON-RPC 2.0, ensuring that requests and responses follow a consistent and standard format, always with Content-Type: application/json.

And, for real-time updates, A2A supports streaming using Server-Sent Events (SSE). In these cases, the server returns continuous events with embedded JSON-RPC responses, allowing agents to maintain open communication flows for long-duration messages or tasks.

Official SDK

The official SDK allows us to abstract away from writing JSON code by using classes and methods that facilitate communication. The recommended way to install the SDK is using UV by running the following:

uv add a2a-sdk

Important note: You need to initialize the uv project since our A2A servers will run using uv

🧩 Core Components

A2A communication is built around several key components that define the message structure required for proper agent interaction.

A2A Client: An application or agent that initiates requests to an A2A Server on behalf of a user or another system.
A2A Server (Remote Agent): An agent or agentic system that exposes an A2A-compliant HTTP endpoint, processing tasks and providing responses.
Agent Card: A JSON metadata document published by an A2A Server, describing its identity, capabilities, skills, service endpoint, and authentication requirements.
Task: The fundamental unit of work managed by A2A, identified by a unique ID. Tasks are stateful and progress through a defined lifecycle.
Message: A communication turn between a client and a remote agent, having a role ("user" or "agent") and containing one or more Parts.
Part: The smallest unit of content within a Message or Artifact (e.g., TextPart, FilePart, DataPart).
Artifact: An output (e.g., a document, image, structured data) generated by the agent as a result of a task, composed of Parts.

Important Note: The protocol is based on JSON-RPC 2.0, which means all messages are sent in JSON format. To simplify development, we will use the official SDK.

📡 Communicating Agents

A2A specifies three different communication patterns for A2A Servers to interact with A2A Clients:

Standard HTTP(S) Communication: The client sends a request and the server sends a response, completing the standard HTTP(S) protocol cycle
Streaming (SSE): Real-time, incremental updates for tasks (status changes, artifact chunks) delivered via Server-Sent Events
Push Notifications: Asynchronous task updates delivered via server-initiated HTTP POST requests to a client-provided webhook URL, for long-running or disconnected scenarios

You can see a communication sequence diagram in A2A Request Lifecycle and dive deeper into communication methods in Streaming & Asynchronous Operations in A2A.

The Agent's communication method is defined in its Agent Card. Our system has three A2A agents, each using a different communication approach. Let's go ahead and start creating our agents.

🏗️ Building A2A Agents

The first step in creating an A2A agent is defining its Agent Card. This card is essential as it describes the server's identity, capabilities, skills, service endpoint URL, and authentication requirements. Clients use the information in the Agent Card to understand how to interact with the agent.

Agent Card

As described previously, the Agent Card is a JSON metadata document published by the A2A Server, describing its identity, capabilities, skills, service endpoint, authentication and how clients should interact with it.

The recommended location for the Agent Card, following the well-known URI strategy, is http(s)://{server_domain}/.well-known/agent.json. Using the official SDK, the Agent Card will be available at that path automatically. Below, you can see the Agent Cards for each of our A2A Agents.

Agent Card: Employee Flight Request -> This card defines that our agent will respond to every request immediately. It also specifies that the agent has three skills: list_pending_requests_skill, list_booked_requests_skill and check_employee_request_skill. The protocol does not specify how the agent knows which skill should be performed upon a request - it's the agent's responsibility to determine which skill to execute.

public_agent_card = AgentCard(
    name='Employee Flight Request Management Agent',
    description='Agent for managing and checking employee flight requests and bookings',
    url='http://localhost:9992/',
    version='1.0.0',
    defaultInputModes=['text'],
    defaultOutputModes=['text'],
    capabilities=AgentCapabilities(streaming=False),
    skills=[
        list_pending_requests_skill,
        list_booked_requests_skill,
        check_employee_request_skill
    ],
    supportsAuthenticatedExtendedCard=False,
)

Agent Card: Airport Knowledge Base -> Here, the card specifies that the agent will stream the response to the client using streaming=True.

public_agent_card = AgentCard(
    name='Airport Knowledge Base Agent',
    description='Knowledge base agent for retrieving correct airport names and city-airport mappings',
    url='http://localhost:9991/',
    version='1.0.0',
    defaultInputModes=['text'],
    defaultOutputModes=['text'],
    capabilities=AgentCapabilities(streaming=True),
    skills=[airport_knowledge_skill],
)

Agent Card: Flight Search -> This agent, as its card describes by pushNotifications=True, will send push notifications to the clients.

public_agent_card = AgentCard(
    name='Flight Search Agent',
    description='Real-time flight search agent with push notification capabilities for aviation data',
    url='http://localhost:9993/',
    version='1.0.0',
    defaultInputModes=['text'],
    defaultOutputModes=['text'],
    capabilities=AgentCapabilities(streaming=True, pushNotifications=True),
    skills=[flight_search_skill],
    supportsAuthenticatedExtendedCard=False,
)

NOTE: Since the main purpose of this article and the project that implements the code explained here is to demonstrate the A2A Protocol, our chatbot knows beforehand how each agent will send the responses. However, in a real scenario, the client may need to implement a way to handle communications based on the agent cards.

Agent Skills

An Agent Skill is a specific capability, function, or area of expertise the agent can perform or address. An agent can define more than one skill - as our Employee Flight Request Management Agent does - in its card. Nevertheless, the protocol says nothing about how the agent knows which skill the user is trying to execute. Thus, it is the responsibility of the agent to determine which skill to perform based on the client's message.

Below is the definition of one skill used by the Employee Flight Request Management Agent - the other skills are defined in a similar fashion.

list_pending_requests_skill = AgentSkill(
    id='list_pending_requests',
    name='List Pending Flight Requests',
    description='List all employee flight requests that are not yet booked',
    tags=['flight', 'requests', 'pending', 'left', 'available', 'not booked', 'employee'],
    examples=[
        'list pending flight requests',
        'show pending requests',
        'which flights are not booked',
        'display remaining requests'
    ],
)

Agent Executor

The Agent Executor is the central component that handles the processing logic of A2A agents and is responsible for processing incoming requests and generating corresponding responses. The SDK provides an abstract base class a2a.server.agent_execution.AgentExecutor that we must implement to create our agent. This class defines two main methods:

async def execute(self, context: RequestContext, event_queue: EventQueue): Handles incoming requests that expect a response or a stream of events.
async def cancel(self, context: RequestContext, event_queue: EventQueue): Handles requests to cancel an ongoing task.

The RequestContext provides information about the incoming request, and the EventQueue is used to send events back to the client.

This is the Agent Executor implementation for our Employee Flight Request Management Agent:

class EmployeeFlightRequestAgentExecutor(AgentExecutor):
    """Employee flight request management agent executor."""

    def __init__(self):
        self.agent = EmployeeFlightRequestAgent()

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -> None:
        query = context.get_user_input()

        response = await self.agent.invoke(query)

        await event_queue.enqueue_event(new_agent_text_message(response))

    async def cancel(
        self, context: RequestContext, event_queue: EventQueue
    ) -> None:
        await event_queue.enqueue_event(new_agent_text_message("❌ Flight request operation cancelled"))

This line response = await self.agent.invoke(query) calls and executes the actual logic of our agent, querying the mocked database and returning the data.

Notice the line await event_queue.enqueue_event(new_agent_text_message(response)). This is really important because it's how the protocol allows the server to respond to the clients. The event_queue.enqueue_event is the way to return messages even if stream is False in the Agent Card.

Creating and sending messages

In this section we will explore how to create messages and send them to the clients. In the section Our chatbot we will describe how it handles each kind of communication.

The simplest way to create a message to send to an A2A Client is using a2a.utils.new_agent_text_message(text: str, context_id: str | None = None, task_id: str | None = None) -> Message:). This function returns the following object:

return Message(
    role=Role.agent,
    parts=[Part(root=TextPart(text=text))],
    messageId=str(uuid.uuid4()),
    taskId=task_id,
    contextId=context_id,
)

Our Employee Flight Request Management Agent uses this method to create messages in response to the client. The created message is sent using the event_queue.enqueue_event method. See the code below.

class EmployeeFlightRequestAgentExecutor(AgentExecutor):

    ...

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -> None:
        query = context.get_user_input()

        response = await self.agent.invoke(query)

        await event_queue.enqueue_event(new_agent_text_message(response))

Our next agent, Airport Knowledge Base, streams messages to the clients. To achieve this, we need to use another class provided by the SDK: a2a.server.tasks.TaskUpdater. This class allows agents to publish updates to a task's event queue. Based on this, the messages to stream must contain the task.id and task.contextId.

This is the Agent Executor for that agent:

class AirportKnowledgeBaseAgentExecutor(AgentExecutor):
    """Airport knowledge base agent executor."""

    def __init__(self):
        self.agent = AirportKnowledgeBaseAgent()

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -> None:
        query = context.get_user_input()
        task = context.current_task
        if not task:
            task = new_task(context.message)

        updater = TaskUpdater(event_queue, task.id, task.contextId)
        await self.agent.invoke(task, updater, query)

Note the creation of the TaskUpdater instance: it takes the event_queue, the task id and the context id from the task. Then, within the method self.agent.invoke(...) we use the updater object to stream messages as follows:

async def invoke(self, task: Task, updater: TaskUpdater, query: str = None) -> None:
    """
    Retrieve airport information using fuzzy matching by name and municipality.

    Args:
        context: Request context
        event_queue: Event queue for streaming messages
        query: Search string (city or airport name)

    Returns:
        String with top 5 airport names and top 5 cities with their airports
    """
    if self.airport_knowledge.empty:
        await updater.update_status(
            TaskState.failed,
            new_agent_text_message(
                "Airport knowledge base not loaded. Please check the database files.",
                task.contextId,
                task.id,
            ),
            final=True
        )
        return

    await updater.update_status(
        TaskState.working,
        new_agent_text_message(
            "📚 Accessing airport knowledge base...",
            task.contextId,
            task.id,
        ),
    )

    ...

    part = TextPart(text=result_lines)
    message = Message(
        role=Role.agent,
        parts=[part],
        messageId=str(uuid.uuid4()),
    )
    await updater.complete(message=message)

Throughout this code we can also introduce some useful classes from the SDK (package a2a.types):

TaskState: enum representing the possible states of a Task
Role: enum representing a message sender's role
TextPart: represents a text segment within parts
Message: class that represents a single message exchanged between user and agent.

So far we've seen how agents respond in a single timeline: from receiving the request until sending their response, either through a single message or by streaming multiple messages until completing the cycle.

But if a task could take a long time to finish, it's not a good idea to make the client wait until the end while keeping a connection alive. For these cases, we can use push notifications. The last agent we will create does exactly this.

As a requirement imposed by the protocol, a client that wants to receive push notifications should explicitly specify the endpoint enabled for that purpose. We will see how to do that in the section Our chatbot. Here, we describe how the push notification process works.

In the section Building A2A Servers we will see that we need to use a request_handler for creating an A2A Server. The SDK provides us with the following implementation: a2a.server.request_handlers.DefaultRequestHandler. At the time this article was written (Jul 2025), that handler does not properly manage push notifications. Therefore, you can extend that class and override the methods within it. The class CustomRequestHandler does exactly that by overriding only one method: on_message_send_stream.

These are the most relevant parts of the new implementation

class CustomRequestHandler(DefaultRequestHandler):
    """Custom request handler that extends DefaultRequestHandler.

    This handler maintains all default functionality while providing
    custom implementation for the streaming message send method.
    """

    async def on_message_send_stream(
        self,
        params: MessageSendParams,
        context: ServerCallContext | None = None,
    ) -> AsyncGenerator[Event]:
        """Custom handler for 'message/stream' (streaming).

        Starts the agent execution and yields events as they are produced
        by the agent.
        """
        task_manager = TaskManager(
            task_id=params.message.taskId,
            context_id=params.message.contextId,
            task_store=self.task_store,
            initial_message=params.message,
        )
        # Start new code #
        task = Task(
            id=params.message.taskId,
            contextId=params.message.contextId,
            status=TaskStatus(
                state=TaskState.submitted,
            )
        )
        task = await task_manager.save_task_event(task)
        task: Task | None = await task_manager.get_task()
        # End new code #

        if task:
            task = task_manager.update_with_message(params.message, task)

            if self.should_add_push_info(params):
                assert isinstance(self._push_notifier, PushNotifier)
                assert isinstance(
                    params.configuration, MessageSendConfiguration
                )
                assert isinstance(
                    params.configuration.pushNotificationConfig,
                    PushNotificationConfig,
                )
                await self._push_notifier.set_info(
                    task.id, params.configuration.pushNotificationConfig
                )
        else:
            queue = EventQueue()

        ...

        try:
            ...
            async for event in result_aggregator.consume_and_emit(consumer):
                ...
                if self._push_notifier and task_id:
                    latest_task = await result_aggregator.current_result
                    if isinstance(latest_task, Task):
                        await self._push_notifier.send_notification(latest_task)
                yield event
        except Exception as e:
            print(f"❌ {e}")
        finally:
            await self._cleanup_producer(producer_task, task_id)

Moreover, to enable the ability to send push notifications in our agent, we need to add push_notifier=InMemoryPushNotifier(httpx_client=httpx.AsyncClient()) when creating the request_handler for building the A2A Server. This is shown again in the section Running A2A servers.

That's all. If pushNotifications=True in the Agent Card, the PushNotifier is set in the request_handler and the client provides the push notification endpoint, the SDK will automatically send messages to the provided endpoint each time a Task changes its status. It's important to mention that an instance of the class Task is being pushed. This is relevant because the endpoint our chatbot exposes for listening to notifications will receive that object as JSON and should be able to parse it.

Because of this, the Agent Executor of our agent simply sends messages using the updater as follows:

class FlightSearchAgentExecutor(AgentExecutor):
    """Flight search agent executor with ReAct capabilities."""

    def __init__(self):
        self.agent = FlightSearchAgent()

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -> None:
        """Execute flight search request."""
        query = context.get_user_input()

        updater = TaskUpdater(event_queue, context.current_task.id, context.current_task.contextId)
        await updater.update_status(
            TaskState.submitted,
            new_agent_text_message(
                "🤖 Flight Search Agent activated...",
                context.current_task.contextId,
                context.current_task.id
            )
        )

        message = await self.agent.invoke(context.current_task.id, context.current_task.contextId, query)

        await updater.update_status(TaskState.working, message)
        await updater.update_status(TaskState.completed)

The method self.agent.invoke returns the following object:

push_notification_payload = {
    "flights": final_response,
    "source": "flight_search_agent",
    "metadata": {"task_id": task_id, "context_id": context_id}
}

part = TextPart(text=json.dumps(push_notification_payload))
message = Message(
    role=Role.agent,
    parts=[part],
    messageId=str(uuid.uuid4()),
    taskId=task_id,
    contextId=context_id,
)
return message

Building A2A Servers

At this point we have already achieved:

1️⃣ Creating agent cards specifying their endpoint (url: where the A2A service can be reached), skills (skills) and capabilities (capabilities: communication methods) among other properties

2️⃣ Adding agent skills to the agent cards

3️⃣ Implementing the agent executor for each agent

4️⃣ Composing and sending messages depending on the agent's communication method

Now, it's time to build the A2A servers. For this purpose, the SDK provides us with the class a2a.server.apps.A2AStarletteApplication.

To create an A2A Server, it's mandatory to use an AgentCard, a RequestHandler (to route incoming A2A RPC calls to the appropriate methods on your executor), an AgentExecutor (to execute the core logic of how agents process requests and generate responses) and a TaskStore (to manage the lifecycle of tasks).

As can be seen in the code below, the SDK provides useful implementations of these classes.

request_handler = DefaultRequestHandler(
    agent_executor=EmployeeFlightRequestAgentExecutor(),
    task_store=InMemoryTaskStore(),
)

server = A2AStarletteApplication(
    agent_card=public_agent_card,
    http_handler=request_handler,
)

uvicorn.run(server.build(), host='0.0.0.0', port=9992)

Note: Since Starlette Application and uvicorn are beyond the scope of this article, if you need more information, you can read about them in Starlette and uvicorn respectively.

Running A2A servers

The recommended directory structure for the server is as follows:

├── name_of_the_agent/
│   ├── __main__.py -> contains AgentSkill, AgentCard, CustomRequestHandler, A2AStarletteApplication and the line to start the uvicorn server
│   └── agent_executor.py -> the Agent Executor implementation with the actual logic for processing incoming requests and generating responses
│   └── ...
│   └── subdirectories/
│       └── ...

In the previous sections we already showed how to implement the Agent Executor. Below, you can see an example of the __main__.py file:

# ...IMPORTS...

if __name__ == '__main__':

    flight_search_skill = AgentSkill(
        ...
    )

    public_agent_card = AgentCard(
        ...
    )

    request_handler = CustomRequestHandler(
        agent_executor=FlightSearchAgentExecutor(),
        task_store=InMemoryTaskStore(),
        push_notifier=InMemoryPushNotifier(httpx_client=httpx.AsyncClient()) # ONLY needed if the Agent will send push notifications
    )

    server = A2AStarletteApplication(
        agent_card=public_agent_card,
        http_handler=request_handler,
    )

    uvicorn.run(server.build(), host='0.0.0.0', port=9993)

With all of that in place, we can run our A2A Server by:

cd name_of_the_agent/
uv run . --host 0.0.0.0

Important Note: To run the server this way, you need to initialize a uv project within the agent's folder. You can find more information in UV Projects.

🤖 Our chatbot

The final piece of our AI system is our chatbot: the entry point for user interaction. Our chatbot has five important components that we'll analyze below and defines the main function that contains the logic to allow users to enter prompts and display responses from our A2A Agents.

The *Tool(BaseTool) classes implement langchain.tools.BaseTool since they are tools that our LangChain Agent can invoke. They all override the methods def _run(self, query: str) -> str and async def _arun(self, query: str) -> str with the actual logic of the tool.

Note: The chat_agent.py file needs to be refactored to move the classes declared in it to independent files.

A2AAgentRegistry

A mock registry that manages A2A agent information, including their capabilities, base URLs, and initialized clients for communication.

To obtain the information present in the Agent Cards, we use a2a.client.A2ACardResolver which automatically calls the endpoint http(s)://{server_domain}/.well-known/agent.json and initiate communication with our A2A Agents using a2a.client.A2AClient.

class A2AAgentRegistry:
    """Mock registry for A2A agents. Later will be replaced with real discovery."""

    def __init__(self):
        self.agents = {
            "airport_knowledge_base": {
                ...
                "base_url": "http://localhost:9991",
                ..
            },
            "employee_flight_requests": {
                ...
                "base_url": "http://localhost:9992",
                ..
            },
            "flight_search": {
                ...
                "base_url": "http://localhost:9993",
                ...
            }
        }

    async def initialize_agents(self, httpx_client: httpx.AsyncClient):
        """Initialize A2A clients for all registered agents."""
        for _, agent_info in self.agents.items():
            try:
                resolver = A2ACardResolver(
                    httpx_client=httpx_client,
                    base_url=agent_info["base_url"]
                )

                try:
                    card: AgentCard = await resolver.get_agent_card()
                    client = A2AClient(httpx_client=httpx_client, agent_card=card)

                    agent_info["card"] = card
                    agent_info["client"] = client
                    print(f"✅ Initialized {agent_info['name']} at {agent_info['base_url']}")
                    print(f"   📝 Description: {agent_info['description']}")

                except Exception as e:
                    print(f"⚠️  Could not connect to {agent_info['name']} at {agent_info['base_url']}: {e}")
                    agent_info["card"] = None
                    agent_info["client"] = None

            except Exception as e:
                print(f"❌ Failed to initialize {agent_info['name']}: {e}")

    def get_agent(self, agent_id: str) -> Optional[Dict[str, Any]]:
        """Get agent info by ID."""
        return self.agents.get(agent_id)

    def list_available_agents(self) -> List[str]:
        """List all agents that are available (have active clients)."""
        return [
            agent_id for agent_id, info in self.agents.items() 
            if info["client"] is not None
        ]

EmployeeFlightRequestTool

A LangChain tool that checks the status of employee flight requests and booking information by communicating with the employee flight request agent. Our A2A Agent Employee Flight Request Management Agent uses capabilities=AgentCapabilities(streaming=False), so the message is sent and the response is awaited.

class EmployeeFlightRequestTool(BaseTool):
    """
    ...
    """

    ...

    def __init__(self, agent_registry: A2AAgentRegistry):
        super().__init__(agent_registry=agent_registry)

    async def _arun(self, query: str) -> str:
        """Async implementation to call the employee flight request agent."""
        agent_info = self.agent_registry.get_agent("employee_flight_requests")

        if not agent_info or not agent_info["client"]:
            return "❌ Employee flight request agent is not available. Please check if the service is running."

        try:
            part = TextPart(text=query)
            message = Message(
                role=Role.user,
                parts=[part],
                messageId=str(uuid4()),
            )

            send_message_payload = MessageSendParams(message=message)

            request = SendMessageRequest(
                id=str(uuid4()), 
                params=send_message_payload,
            )

            print(f"\n📋 Checking flight requests for: {query}")
            client = agent_info["client"]
            return await client.send_message(request)

        except Exception as e:
            return f"❌ Error calling employee flight request agent: {str(e)}"
    ...

AirportKnowledgeTool

A LangChain tool that retrieves airport information from the knowledge base agent when users ask about airport names or airports in specific cities. Our A2A Agent Airport Knowledge Base Agent uses capabilities=AgentCapabilities(streaming=True), so this tool must receive a stream of messages.

class AirportKnowledgeTool(BaseTool):
    """
    ...
    """

    ...

    def __init__(self, agent_registry: A2AAgentRegistry):
        super().__init__(agent_registry=agent_registry)

    async def _arun(self, query: str) -> str:
        """Async implementation to call the airport knowledge base agent."""
        agent_info = self.agent_registry.get_agent("airport_knowledge_base")

        if not agent_info or not agent_info["client"]:
            return "❌ Airport knowledge base agent is not available. Please check if the service is running."

        try:
            part = TextPart(text=query)
            message = Message(
                role=Role.user,
                parts=[part],
                messageId=str(uuid4()),
            )

            streaming_request = SendStreamingMessageRequest(
                id=str(uuid4()), 
                params=MessageSendParams(message=message)
            )

            client = agent_info["client"]
            stream_response = client.send_message_streaming(streaming_request)

            full_response = ""
            print(f"\n📚 Looking up airport information for: {query}")

            async for chunk in stream_response:
                json_chunk = chunk.model_dump(mode='json', exclude_none=True)
                if json_chunk['result']['status']['state'] == TaskState.completed:
                    full_response = f"\n✅ Knowledge base lookup completed\n{json_chunk['result']['status']['message']['parts'][0]['text']}\n"
                    break
                else:
                    print(f"📨 {json_chunk['result']['status']['message']['parts'][0]['text']}\n")

            return full_response if full_response else "✅ Knowledge base lookup completed - check the streaming output above."

        except Exception as e:
            return f"❌ Error calling airport knowledge base agent: {str(e)}"
    ...

FlightSearchTool

A LangChain tool that searches for scheduled flights using Aviation Stack API through the flight search agent and handles results via push notifications since our A2A Agent Flight Search Agent uses capabilities=AgentCapabilities(streaming=True, pushNotifications=True).

It's important to note how the message is created when using a2a.types.SendStreamingMessageRequest since in this case we need to tell our A2A Agent which endpoint we make available to receive the push notifications.

class FlightSearchTool(BaseTool):
    """
    ...
    """

    ...
    flight_search_callback_url: str = f"http://localhost:{HTTP_SERVER_PORT}{FLIGHTS_ENDPOINT_PATH}" # TODO: DO NOT hardcode the callback URL

    def __init__(self, agent_registry: A2AAgentRegistry):
        super().__init__(agent_registry=agent_registry)

    async def _arun(self, query: str) -> str:
        """Async implementation to call the flight search agent."""
        agent_info = self.agent_registry.get_agent("flight_search")

        if not agent_info or not agent_info["client"]:
            return "❌ Flight search agent is not available. Please check if the service is running."

        async def async_search():
            """Execute flight search asynchronously."""
            try:
                client: A2AClient = agent_info["client"]

                part = TextPart(text=query)
                message = Message(
                    role=Role.user,
                    parts=[part],
                    messageId=str(uuid4()),
                    contextId=str(uuid4()),
                    taskId=str(uuid4())
                )

                request = SendStreamingMessageRequest(
                    id=str(uuid4()),
                    params=MessageSendParams(
                        message=message,
                        configuration=MessageSendConfiguration(
                            acceptedOutputModes=["text"],
                            pushNotificationConfig=PushNotificationConfig(
                                url=self.flight_search_callback_url
                            )
                        )
                    )
                )

                response = client.send_message_streaming(request=request)

                async for chunk in response:
                    pass # we don't care about the stream since messages will be received through the exposed endpoint

            except Exception as e:
                print(f"❌ Error in background flight search: {str(e)}")

        asyncio.create_task(async_search()) # here we decouple this tool's execution from the LangChain flow

        print(f"🛫 Flight search initiated in background for: {query}")

        return "✅ Flight search initiated - results will be sent via push notification once completed"

ReactChatAgent

A LangGraph 'ReAct' Agent that orchestrates interactions between the user and our A2A agents through specialized tools. The most notable features are:

Creates a FastAPI server that exposes an endpoint to receive push notifications
Uses langgraph.checkpoint.memory.MemorySaver so our agent has short-term memory
Defines the setup_http_endpoints(...) method to listen and process push notifications sent by the A2A Agent
Uses the langgraph.prebuilt.create_react_agent function -ReAct Agents are deprecated so this function creates a Graph that calls tools in a loop until a stopping condition is met- to create a Compiled Graph from LangChain that acts as the "brain" of our chat
Defines the chat(self, user_input: str, thread_id: str = "default") -> str: method that makes the call to the LLM

class ReactChatAgent:
    """LangGraph ReAct agent that can interact with A2A agents through tools and receive external messages via HTTP."""

    def __init__(self):
        self.agent_registry = A2AAgentRegistry()

        api_key = os.getenv("ANTHROPIC_API_KEY")
        if not api_key:
            raise ValueError("ANTHROPIC_API_KEY environment variable must be set")

        self.model = ChatAnthropic(
            model="claude-3-5-sonnet-20241022",
            temperature=0,
            api_key=api_key
        )

        self.memory = MemorySaver()

        self.agent_graph = None

        self.external_message_queue = Queue()

        self.app = FastAPI(title="ReAct Chat Agent API", version="1.0.0")
        self.setup_http_endpoints()

    def setup_http_endpoints(self):
        """Setup HTTP endpoints for receiving external messages."""

        @self.app.post(FLIGHTS_ENDPOINT_PATH)
        async def receive_flight_findings(flight_finding: Task):
            """Receive flight findings and add them to the message queue."""
            try:
                # full implementation in the GitHub repository

            except Exception as e:
                raise HTTPException(status_code=500, detail=f"Error processing flight findings: {str(e)}")
        ...

    async def initialize(self, httpx_client: httpx.AsyncClient):
        """Initialize the agent and its tools."""
        print("🤖 Initializing LangGraph ReAct Chat Agent with Anthropic Claude...")

        await self.agent_registry.initialize_agents(httpx_client)

        tools = [
            AirportKnowledgeTool(self.agent_registry),
            EmployeeFlightRequestTool(self.agent_registry),
            FlightSearchTool(self.agent_registry)
        ]

        system_prompt = f"""You are a helpful assistant that manages employee flight requests in a corporate environment and can search for scheduled flights.
        ...
        """

        self.agent_graph = create_react_agent(
            model=self.model,
            tools=tools,
            checkpointer=self.memory,
            prompt=system_prompt
        )

        available_agents = self.agent_registry.list_available_agents()
        print(f"✅ LangGraph ReAct Agent initialized with {len(available_agents)} available A2A agents: {available_agents}")
        print("🧠 Using Anthropic Claude as the reasoning engine")
        print(f"📡 HTTP endpoint available at: http://localhost:{HTTP_SERVER_PORT}{FLIGHTS_ENDPOINT_PATH}")

    async def process_external_message(self, external_msg: InternalMessage, thread_id: str | None = None) -> str:
        """Process an external message and add it to agent memory."""
        # full implementation in the GitHub repository

    async def chat(self, user_input: str, thread_id: str = "default") -> str:
        ...
        try:
            config = {"configurable": {"thread_id": thread_id}}

            messages = [("user", user_input)]

            last_message = ""
            async for chunk in self.agent_graph.astream(
                {"messages": messages},
                config=config
            ):
                print(chunk)
                last_message = chunk

            return last_message if last_message else "Response completed - check the output above."

        except Exception as e:
            return f"❌ Error processing request: {str(e)}"

`main` function

When the function starts, it creates and initializes our ReactChatAgent() which in turn calls the Agent Registry and creates our A2A Clients to communicate with our A2A Servers (A2A Agents).

It also runs the FastAPI server in a separate execution thread, which allows listening to push notifications as they arrive.

The code inside the while loop is a bit complex since it processes the message queue from push notifications, interleaved with responses to user prompts, as they arrive.

async def main():
    """Main CLI loop for the chat agent with HTTP endpoint integration."""
    print("🚀 Starting LangGraph ReAct Chat Agent with A2A Integration")
    print("🧠 Powered by Anthropic Claude")
    print("📡 HTTP API Server Enabled")
    print("=" * 60)

    logging.basicConfig(level=logging.WARNING)

    async with httpx.AsyncClient() as httpx_client:
        agent = ReactChatAgent()
        await agent.initialize(httpx_client)

        print(f"🌐 Starting HTTP server on port {HTTP_SERVER_PORT}...")
        http_thread = threading.Thread(target=run_http_server, args=(agent,))
        http_thread.daemon = True
        http_thread.start()

        print("\n💬 Chat Agent Ready! (Type 'quit' to exit)")
        ...

        thread_id = "console_session_" + str(uuid4())[:8]
        prompt_shown = False
        should_exit = False

        while not should_exit:
            try:
                while not agent.external_message_queue.empty():
                    external_msg = agent.external_message_queue.get()
                    await agent.process_external_message(external_msg)
                    prompt_shown = False

                if not prompt_shown:
                    print("\n👤 You: ", end="", flush=True)
                    prompt_shown = True

                if sys.stdin in select.select([sys.stdin], [], [], 0.1)[0]:
                    user_input = input().strip()
                    prompt_shown = False

                    if user_input.lower() in ['quit', 'exit', 'bye']:
                        try:
                            loop = asyncio.get_event_loop()
                            if loop.is_closed():
                                print("⚠️  Event loop is closed - cleaning up gracefully...")
                            else:
                                print("✅ Event loop is healthy")
                        except RuntimeError:
                            print("ℹ️  No event loop available in current context")

                        print("👋 Goodbye!")
                        should_exit = True

                    if user_input:
                        print("🤖 LLM: ...", end="\n", flush=True)
                        response = await agent.chat(user_input, thread_id)

                        if isinstance(response, dict):
                            for chunk_type, chunk_data in response.items():
                                if isinstance(chunk_data, dict) and 'messages' in chunk_data:
                                    for message in chunk_data['messages']:
                                        if hasattr(message, 'content'):
                                            print(f"\n**** 🤖 Agent pretty print *****\n{message.content}\n" + "*" * 31)
                                            break

            except KeyboardInterrupt:
                print("\n👋 Goodbye!")
                should_exit = True
            except Exception as e:
                ...

        sys.exit(0)

Finally, to run our chatbot we execute:

python3 chat_agent.py

🚀 Running our chatbot

It's time to run our chatbot. But first, we need to start each of our A2A Agents.

# Start the airport knowledge base agent
cd airport_knowledge_base_agent/
uv run . --host 0.0.0.0 &

---
✅ Initialized flight request database with 10 records
INFO:     Started server process [53208]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9992 (Press CTRL+C to quit)
INFO:     127.0.0.1:60396 - "GET /.well-known/agent.json HTTP/1.1" 200 OK  # This shows that our chatbot called the agent card

# Start the employee flight requests agent 
cd ../employee_flight_requests_agent/
uv run . --host 0.0.0.0 &

---
✅ Loaded airport knowledge base: 8467 airports from 235 countries
INFO:     Started server process [53217]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9991 (Press CTRL+C to quit)
INFO:     127.0.0.1:60394 - "GET /.well-known/agent.json HTTP/1.1" 200 OK  # This shows that our chatbot called the agent card

# Start the flight search agent
cd ../flight_search_agent/
uv run . --host 0.0.0.0 &

---
✅ Initialized Flight Search ReAct Agent with Aviation Stack API
INFO:     Started server process [53228]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9993 (Press CTRL+C to quit)
INFO:     127.0.0.1:60398 - "GET /.well-known/agent.json HTTP/1.1" 200 OK  # This shows that our chatbot called the agent card

Now, we can run our chatbot:

python chat_agent.py

---
🚀 Starting LangGraph ReAct Chat Agent with A2A Integration
🧠 Powered by Anthropic Claude
📡 HTTP API Server Enabled
============================================================
🤖 Initializing LangGraph ReAct Chat Agent with Anthropic Claude...
✅ Initialized Airport Knowledge Base Agent at http://localhost:9991
   📝 Description: Knowledge base for airport information and city-airport mappings
✅ Initialized Employee Flight Request Agent at http://localhost:9992
   📝 Description: Check employee flight requests and booking status
✅ Initialized Flight Search Agent at http://localhost:9993
   📝 Description: Scheduled flight search using Aviation Stack
✅ LangGraph ReAct Agent initialized with 3 available A2A agents: ['airport_knowledge_base', 'employee_flight_requests', 'flight_search']
🧠 Using Anthropic Claude as the reasoning engine
📡 HTTP endpoint available at: http://localhost:9990/api/flights-findings
🌐 Starting HTTP server on port 9990...

💬 Chat Agent Ready! (Type 'quit' to exit)
You can ask about:
  - Airport knowledge base: 'find airports in Madrid'
  - Airport information: 'what airports are in Tokyo'
  - Employee flight requests: 'check pending flight requests'
  - Employee status: 'check John Smith flight request'
  - Flight search: 'search flights from AEP on 2025-11-20'
  - Real-time flights: 'find flights from JFK to LAX on 2025-12-01'

📡 HTTP Endpoints available:
  - POST http://localhost:9990/api/flights-findings
  - GET  http://localhost:9990/api/status
------------------------------------------------------------

👤 You:

First, we ask our agent "What are the pending flight requests?". Our agent will use the employee_flight_requests tool to perform the search and display the results.

👤 You: What are the pending flight requests?
🤖 LLM: ...
{'agent': {'messages': [AIMessage(content=[{'text': "I'll help you check the pending flight requests using the employee_flight_requests tool."...
...
**** 🤖 Agent pretty print *****
Based on the results, there are 5 pending flight requests awaiting booking:

1. Robert Johnson: New York to Los Angeles (Dec 1, 2025)
2. Anna Thompson: London to Dublin (Oct 5, 2025)
3. Sophie Martin: Paris to Rome (July 12, 2025)
4. Elena Popov: Berlin to Amsterdam (Nov 18, 2025)
5. Lisa Anderson: Sydney to Melbourne (Aug 25, 2025)

All these requests are currently in "Awaiting booking" status. Would you like more specific information about any of these requests or would you like to check the status of booked flights as well?
*******************************

Since our agent has memory, we can ask: "Which airports can Anna depart from?". In this case, we see that it uses the airport_knowledge_base tool.

👤 You: Which airports can Anna depart from?
🤖 LLM: ...
{'agent': {'messages': [AIMessage(content=[{'text': "Since Anna's request is for a flight from London, I'll use the airport_knowledge_base tool to find the available airports in London."...
...
**** 🤖 Agent pretty print *****
For Anna's flight from London to Dublin, she can depart from any of these major London airports in the UK:

1. London Heathrow Airport (LHR) - The largest and most well-connected airport
2. London Gatwick Airport (LGW) - Second largest airport
3. London Stansted Airport (STN) - Major hub for low-cost carriers
4. London City Airport (LCY) - Convenient for business travelers, located in the city
5. London Biggin Hill Airport (BQH) - Smaller airport primarily for private aviation
6. RAF Northolt (NHT) - Military airport with limited civilian use

The most commonly used airports for commercial flights to Dublin would be Heathrow (LHR), Gatwick (LGW), or Stansted (STN). Would you like me to search for specific flights from any of these airports to Dublin for Anna's travel date (October 5, 2025)?
*******************************

As the final step, we ask our agent: "Yes, what flights are available from Heathrow?". As we can see, this time it uses the flight_search tool.

👤 You: "Yes, what flights are available from Heathrow?"
🤖 LLM: ...
{'agent': {'messages': [AIMessage(content=[{'text': "I'll search for flights from London Heathrow (LHR) to Dublin (DUB) for Anna's travel date of October 5, 2025, using the flight_search tool."...
...
**** 🤖 Agent pretty print *****
I've initiated the flight search from London Heathrow (LHR) to Dublin (DUB) for October 5, 2025. The search has been started and the results will be sent via push notifications. Once we receive the results, you'll be able to see all available flights for that route and date, including:
- Flight numbers
- Departure and arrival times
- Airlines
- Aircraft types
- Terminal information

Please wait for the push notification with the detailed flight results, and then we can help select the most suitable flight for Anna's travel.
*******************************

Since the A2A Agent for flight search sends push notifications, our agent displays the messages as they arrive.

👤 You: 
🔍 External message

------------------------------------------------------------
🤖 Processing external message...
{'agent': {'messages': [AIMessage(content="I'll help summarize the flight findings received. ...
...
**** 🤖 Agent Response to External Message pretty print *****
I'll help summarize the flight findings received. These are flights departing from London Heathrow (LHR) Terminal 2 at 06:00. Here's a breakdown of the available routes:

1. London (LHR) to Zurich (ZRH):
- Swiss/Air Canada codeshare flight LX345/AC6756
- Departure: 06:00, Terminal 2, Gate A18
- Arrival: 08:40, Terminal 2
- Aircraft: Airbus A220-100

2. London (LHR) to Vienna (VIE):
- Austrian Airlines flight OS458 (codeshared by Air Canada, ANA, and Asiana)
- Departure: 06:00, Terminal 2
- Arrival: 09:10, Terminal 3
- Aircraft: Airbus A320-271N

3. London (LHR) to Lisbon (LIS):
- TAP Air Portugal flight TP1363 (codeshared by Air Canada, Azul, Air India, and Azores Airlines)
- Departure: 06:00, Terminal 2, Gate A17
- Arrival: 08:45, Terminal 1
- Aircraft: Airbus A320-251N

All flights depart at the same time (06:00) from different gates at Terminal 2. These are primarily operated by European carriers with various codeshare agreements with other airlines.
*************************************************************

We've achieved it! We've successfully searched for available flights for one of the requested trips. To exit our chatbot, we type quit.

👤 You: quit
✅ Event loop is healthy
👋 Goodbye!
🤖 LLM: ...
{'agent': {'messages': [AIMessage(content='Goodbye!...
...
**** 🤖 Agent pretty print *****
Goodbye! Let me know if you need any further assistance with flight requests, bookings, or airport information in the future.
*******************************

Note: If you're interested, you can see the entire output of the chatbot as well as the output of each of our agents in this document 👉 chatbot_responses.md.

✅ Key Features

Multi-Agent Communication: Seamless coordination between specialized A2A agents using different communication patterns (standard HTTP, streaming, and push notifications)
Protocol Standardization: Built on Google's A2A protocol ensuring interoperability and scalability across different agentic systems
Real-time Flight Data: Integration with Aviation Stack API for live flight information and airport recommendations
Smart Agent Orchestration: LangGraph ReAct Agent that intelligently routes user requests to the appropriate A2A agents
Flexible Communication Methods: Demonstrates all three A2A communication patterns in a single system
Corporate Flight Management: Complete workflow for managing employee flight requests from pending to booked status
Interactive Chat Interface: Command-line interface powered by Anthropic Claude for natural language interactions
Push Notification Support: Asynchronous task handling for long-running operations without blocking the user experience

💬 Final thoughts

This post demonstrates how the A2A protocol can be used to build sophisticated multi-agent systems that coordinate and collaborate effectively. By standardizing agent-to-agent communication, A2A opens up new possibilities for creating complex AI workflows where specialized agents can work together seamlessly.

The flight management system we built showcases the power of combining different communication patterns within a single application. From immediate responses for flight request status to streaming airport information and asynchronous flight searches, each agent operates optimally according to its specific requirements.

As AI systems continue to evolve toward more distributed and specialized architectures, protocols like A2A will become increasingly important for enabling the next generation of collaborative AI applications.

📚 Resources

Full code of this post 👉 ezequiroga/a2a-bases

Google A2A Official documentation 👉 A2A Protocol

Google A2A Protocol JSON Specification 👉 A2A Protocol Specification

A2A Protocol Documentation 👉 A2A Protocol Documentation

Agent2Agent (A2A) Python SDK Tutorial 👉 A2A Protocol Documentation

Google GitHub SDK examples repository 👉 A2A Python SDK

Google Python SDK Reference 👉 Python SDK Reference

Agent2Agent (A2A) Samples 👉 A2A Samples