DEV Community: Naomi Kynes

When your agent needs to spend more than you told it to

Naomi Kynes — Sun, 29 Mar 2026 13:56:22 +0000

You deploy a research agent with a $100 budget. The task is competitive landscape analysis: a few market reports, some API calls, a handful of web lookups. $100 is plenty.

Three weeks later, the scope changes. You need licensing data across 40 markets instead of 5. Same agent, same task, fundamentally different cost. Doing it properly runs $800.

What happens next depends entirely on how you set the budget. If you hardcoded a ceiling, the agent stops midway and you get a partial result. If you didn't, it proceeds and you find out when you check your Stripe dashboard. Neither outcome is what you wanted. What you actually wanted was for the agent to ask you first.

That gap (between "agent has a spending limit" and "agent can request authorization before exceeding it") is the problem Stripe's Machine Payments Protocol (MPP) exposes but doesn't solve.

What MPP actually does

MPP is an open standard for agent-to-service payments. The flow is simple: an agent requests a resource, the service returns a payment request, the agent authorizes, the resource is delivered. No accounts, no UI flows, no human involved. Stripe users can accept payments over MPP in a few lines of code.

The examples are real. Browserbase lets agents pay per browser session. PostalForm lets agents print and mail physical letters. An agent can now autonomously pay per API call and have funds settle in a Stripe dashboard like any other transaction.

This is not theoretical. It shipped this week.

But look at step three in that flow: "the agent authorizes." That step is doing a lot of quiet work. It assumes the agent knows whether it should pay. Right now, that knowledge comes from static configuration you set before the agent ran.

The static config model

Today, agent spending limits live in environment variables. AGENT_BUDGET=500. Or a config file. Or hardcoded logic: if the cost is under $10, proceed; otherwise, stop.

This model works for predictable, bounded tasks. An agent paying $0.01 per browser session, with a hard cap you set at deploy time. Fine. The cost profile is known in advance.

It breaks the moment context changes.

The $100 budget you set isn't wrong. It was right for the original scope. The problem is there's no mechanism for the agent to surface "the scope changed and the budget no longer covers it" as a question rather than a failure. The agent either hits the limit and stops, or it doesn't have one and keeps going. The two failure modes are "agent gives up" and "surprise charges."

What's missing is the third option: the agent pauses, routes the question to the person responsible for it, and waits for a response before continuing.

Why that's harder than it sounds

The obvious solution is a Slack webhook. The agent fires a message when it hits an authorization question, the human responds, done.

Except webhooks are one-way. The agent fires and forgets. It can't block on a webhook response. So now you need a flag in the database: agent sets it to "waiting," human clears it, agent polls until it's clear. This works. It's also 200 lines of custom state management every team writes from scratch.

Then you have the response-in-context problem. When the human responds, that response needs to reach the agent in a form it can use to continue the task. Not just "yes" cleared a flag, but "yes, here's the updated budget and the two markets you can skip." Getting that back into the agent's context without restarting the task requires the channel to be bidirectional and persistent, not fire-and-forget.

And then you have the multiple-agent problem. One agent asking one human is manageable. Five agents, three humans, overlapping tasks: you need routing. Which agent's question goes to which human? How does the human know what context the question came from? How does the agent know its question is pending vs. lost?

Every team building HITL at any scale hits this progression. Webhook, then database flag, then polling loop, then bidirectional channel, then routing layer. The final result works. It's also a custom communication infrastructure that has nothing to do with the actual task the agent was deployed to do.

What the authorization layer actually needs

Three things, in order of how often they're missing:

A persistent bidirectional channel. Not a webhook, not a polling loop against a database flag. A channel where the agent sends a message and the platform holds it until the human responds, then delivers the response back to the agent without requiring the agent to restart. Blocking semantics, not fire-and-forget.

Policy primitives with escalation tiers. The binary model ("spend under $X, stop above it") doesn't handle context. What you actually want is something like: proceed automatically under $X, ask before proceeding between $X and $Y, require explicit approval above $Y, flag anything recurring for human review. That's not three environment variables. That's a policy layer that understands cost tiers and routes accordingly.

An audit trail that links transactions to authorization decisions. Stripe gives you transaction history. It doesn't tell you whether a given charge was explicitly approved by a human, pre-authorized by a policy rule, or an autonomous agent decision made without any human review. When something goes wrong (and eventually something will), you need to know which category the charge fell into.

Why this matters now

Before MPP, this problem was mostly theoretical. Most agents couldn't easily spend money, so the authorization question was academic for most use cases.

Stripe just changed that. Three lines of code and your agent has a payment credential. Browserbase, PostalForm, and a growing list of services are ready to accept payments directly from agents. The cost of adding spending capability to an agent just dropped to near zero.

That means the gap between "agent has spending capability" and "agent has authorization infrastructure" just became real infrastructure work. The static config model will hold for simple, predictable deployments. It won't hold for agents that handle variable scope, multi-step tasks, or anything that compounds across 200 sessions.

MPP made the payment rail. The authorization layer is the next piece. Right now every team is building it from scratch, badly, in the same ways. That's usually the signal that it should be infrastructure.

Naomi Kynes builds agent infrastructure. GitHub: github.com/naomi-kynes

Your agents shouldn't need you to set them up

Naomi Kynes — Wed, 18 Mar 2026 16:19:36 +0000

Every time I deploy a new agent, I go through the same ritual. Create an account. Generate a token. Add it to the workspace. Assign roles. Configure permissions. Restart something.

Twenty minutes of clicking. For a thing that's supposed to automate work.

The agents themselves are getting smarter fast. The infrastructure they run on hasn't caught up. We're still onboarding them like it's 2015 and we're setting up a Slack bot.

What "setup" actually means right now

When you add an agent to most platforms, you're doing it by hand. You create a bot user. You copy a token somewhere. You add it to a channel. You give it permissions.

This is fine if you have one agent and you set it up once. It breaks down fast.

I run about a dozen agents. Some are persistent. Some spin up for a task and disappear. Some need read access to one channel. Some need to be able to create channels. Managing this manually is a spreadsheet problem pretending to be an architecture problem.

The real issue: these platforms weren't designed for agents. They were designed for humans who occasionally add bots. Agents are an afterthought. The provisioning model reflects that.

What self-provisioning looks like

Here's what I want instead. An agent comes online and does this:

import httpx

# Agent registers itself on startup
response = httpx.post("https://your-platform/api/agents/register", json={
    "name": "research-agent",
    "capabilities": ["read_channels", "post_messages"],
    "requested_channels": ["#research", "#general"]
}, headers={"X-Agent-Secret": AGENT_SECRET})

agent_token = response.json()["token"]
# Now it can talk. No human clicked anything.

That's it. The platform validates the agent secret, issues a scoped token, joins the requested channels. If a human needs to approve that, fine. That's a one-time policy decision, not a per-agent setup ritual.

Compare this to what you're doing now: logging into a UI, navigating to settings, creating a bot account, copying a token into an env file, manually adding the bot to channels. Every time. For every new agent.

The difference isn't just ergonomics. It's whether your agent infrastructure can scale beyond a handful of handcrafted bots.

Why most platforms can't do this

Discord and Slack were built for human communication. Bots were added later. The provisioning model is still human-first: a person creates a bot, a person configures it, a person manages its permissions.

This made sense in 2015 when bots were novelties. It doesn't make sense when you have 10+ agents that need to communicate with each other and with humans, and some of them are ephemeral.

The platforms aren't wrong for how they were built. They're just not built for this.

What the platform needs to support

For agents to self-provision, the platform needs a few things:

An agent registration endpoint that accepts a credential (not a human login)
Scoped tokens tied to declared capabilities
A policy layer for humans to set rules once ("agents can join any channel tagged #agent-accessible") rather than approve each agent by hand

The key insight is that humans set policies, not individual permissions. You decide "agents with this credential class can do X" once. After that, agents come and go without bothering you.

I haven't seen this done well in the platforms people actually use for agents today. That's why we're building it into Agent United. Agents register via API. A human sets up an access policy once. After that, new agents join the workspace the same way they'd make any other API call.

It's not magic. It's just treating agents as first-class API clients instead of second-class bots.

What this doesn't solve

Self-provisioning doesn't solve trust. You still need to think about what credentials you give agents and what they're allowed to do. A token that can create channels and invite users is a big surface area if your agent gets compromised.

I'd start conservative: read-only access by default, explicit opt-in for write operations. Same security hygiene you'd apply to any service account.

And it doesn't solve the "I don't know what my agent is doing" problem. That's observability, which is a separate thing.

But it does solve the thing that was making me spend 20 minutes clicking every time I wanted to add a new agent to my stack. That's worth something.

Building Agent United - an open-source chat platform where agents provision themselves. GitHub: github.com/naomi-kynes/agentunited

MCP handles tools. A2A handles agents. What handles humans?

Naomi Kynes — Sat, 14 Mar 2026 22:14:12 +0000

There's a debate happening right now in every team building with AI agents: MCP or A2A?

It's a good debate. Both protocols are real, well-specified, and increasingly well-supported. But the framing is incomplete. The conversation covers two layers — how agents access tools, and how agents talk to each other — and skips the third entirely: how agents talk to humans.

MCP: the tool layer

Model Context Protocol, developed by Anthropic, is a standardized way for agents to access external resources — APIs, databases, files, functions. Instead of every agent hard-coding its own Stripe integration or bespoke database connector, you expose the tool as an MCP server and any MCP-compatible agent can use it.

The flow is simple:

# An MCP server exposes tools in a standard format
# The agent discovers available tools, then calls one:

POST /mcp/call
{
  "tool": "query_database",
  "input": {
    "query": "SELECT * FROM orders WHERE status = 'pending' LIMIT 10"
  }
}

The MCP server handles auth, validates the input, runs the query, returns structured output. The agent doesn't need to know anything about the database — just the tool contract.

This is genuinely useful. The ecosystem is growing. If you're building agents today and you haven't looked at MCP, you're probably reinventing it manually.

What MCP doesn't cover: it's synchronous request/response. The agent calls, the tool answers. There's no concept of the tool — or a human, acting as a tool — pushing something back to the agent unprompted. MCP is the agent's outbound interface to the world.

A2A: the coordination layer

Agent-to-Agent protocol, led by Google, handles how agents collaborate with each other. One agent can delegate a subtask to a specialist agent, receive results, and incorporate them into a larger workflow.

The core mechanism is the Agent Card — a self-description that each agent publishes: what it can do, what protocols it speaks, what kinds of requests it accepts. Before delegating, a coordinator agent looks up the agent cards of its candidates and picks the right one.

# A coordinator agent delegates a research subtask to a specialist
import requests

# 1. Discover the research agent
agent_card = requests.get("https://agents.internal/research-bot/.well-known/agent.json").json()

# 2. Delegate the task
task = requests.post(agent_card["endpoint"], json={
    "task": "summarize recent papers on vector database indexing",
    "format": "bullet points, max 5",
    "deadline": "2026-03-07T20:00:00Z"
})

# A2A is async — you get a task ID, then poll for completion  
task_id = task.json()["task_id"]
# {"task_id": "abc-123", "status": "submitted"}

(A2A task execution is asynchronous; simplified above for readability. See the A2A spec for streaming/callback patterns.)

A2A is shipping in real stacks now. Microsoft Foundry added an A2A Tool preview in January 2026. Google's ADK has multi-agent coordination patterns built on it. Huawei's Agentic Communication Network at MWC 2026 covers A2A task sessions for enterprise. It's early, but it's converging.

What A2A doesn't cover: it's agent-to-agent. The moment one of those agents needs a human decision — judgment about something ambiguous, access to private context it doesn't have, approval before a destructive action — A2A has no mechanism for that handoff.

The gap: nobody specified the human layer

Here's the three-layer picture that's emerging:

tools  ←—[MCP]—→  agents  ←—[A2A]—→  agents  ←—[???]—→  humans

MCP handles the left edge. A2A handles the middle. The right edge — how agents reach humans and how humans respond in context — is still improvised.

In practice, teams end up with one of three patterns:

Pattern 1: Tail the logs

python my_agent.py >> agent.log 2>&1 &
tail -f agent.log

Works for one-shot jobs. Falls apart when you're running 5 agents overnight and one hits an edge case at 2am.

Pattern 2: Bolt on a Slack bot

# When agent needs human input:
slack_client.chat_postMessage(
    channel="#agent-alerts",
    text=f"Agent needs a decision: {decision_point}"
)
# Then... what? You check Slack. Maybe. Eventually.
# The agent has no way to receive your reply.

One-way. The agent fires and forgets. You can't respond in context. The agent doesn't know you replied. Three Slack bot integrations later, the API changes and it breaks.

Pattern 3: Polling a REST endpoint

while True:
    response = requests.get("http://localhost:8000/human-input")
    if response.json()["status"] == "ready":
        handle_input(response.json()["content"])
    time.sleep(5)

At least it's bidirectional. But you've now built a bespoke approval interface for every agent, and five-second polling is a bad foundation for anything latency-sensitive.

(This is what Agent United replaces — a persistent WebSocket connection per agent, no polling.)

None of these are protocols. They're workarounds.

What a proper agent-to-human layer needs

The reason this is hard is that the requirements are different from both MCP and A2A:

Agent-initiated, not command-response. The agent needs to open a conversation — not just respond to a human command. When it hits an ambiguous state at 3am, it should be able to say "I found conflicting data, which source do you want me to trust?" and block until it gets an answer.

Bidirectional with context. The human's reply needs to become part of the agent's context for the next step. Not a log entry. Not a webhook payload that gets dropped into a queue. Actual context.

Persistent history. If the agent restarts, the conversation shouldn't disappear. If you're reviewing what an agent did last Tuesday, you should be able to read the thread.

Auth separation. Agents and humans are different principals in the same communication channel. They need different auth paths — API keys for agents, session tokens for humans — but they should be able to participate in the same conversation.

Real-time transport. Polling a REST endpoint every five seconds is brittle. A persistent WebSocket connection lets the agent listen for human replies without burning resources or introducing latency.

This isn't a novel set of requirements. It's basically what a chat platform does. The gap is that nobody has built one where the agents are first-class participants, not bots bolted on with a webhook.

Where things stand

MCP is stable and growing. The tooling is there, the ecosystem is building, and Anthropic has done the hard work of specifying it clearly.

A2A is early but moving fast. Google, Microsoft, and Huawei shipping production implementations in Q1 2026 is a strong signal.

The human layer: still improvised. No standard protocol. No dominant implementation. Everyone is solving it differently — Slack bots, Telegram webhooks, email, custom UIs — and most of those solutions break at 3+ agents or when APIs change.

This isn't a critique of MCP or A2A. They're solving the right problems for their layers. The human layer is just genuinely harder to standardize because it involves a person, which means latency requirements vary, auth gets complicated, and the interaction model needs to be flexible enough for natural language.

A note on what comes next

If you're architecting agent systems right now, the agent-to-human layer is worth thinking about explicitly before you ship. The decisions you make about it — how agents escalate, how humans respond, what gets persisted — tend to become load-bearing parts of your system fast.

The good news is that the primitives are clear even if the standard isn't: you need bidirectional transport (WebSocket or SSE), a persistent message store (Postgres/Redis), and an auth model that treats agents and humans as separate principals.

This is the problem we're working on with Agent United — an open-source, self-hosted platform where agents are first-class participants in the communication layer. Worth a look if you're hitting this wall.

The more interesting question for the community: do we need a formal protocol spec for the human layer, the way MCP formalized tool access? Or is this inherently too use-case-specific to standardize? Genuinely curious what people building production agent systems think.

Building something in this space? The thread is open. —Naomi

The missing layer between your AI agents and you

Naomi Kynes — Sat, 07 Mar 2026 18:14:35 +0000

The three patterns developers settle for

Pattern 1: Terminal babysitting

python my_agent.py >> agent.log 2>&1 &
tail -f agent.log

You run the agent. You watch the logs. When it crashes, you restart it.

This works fine for one-shot batch jobs: scraping, processing, tasks where the agent runs to completion and you check the output file. It falls apart the moment your agent needs a human decision mid-flight, or when you're running 5 agents concurrently and need to understand the global state.

Pattern 2: Polling (REST API)

# Check status
curl http://localhost:8000/status
# {"status": "running", "task": "scraping page 42/100", "errors": 0}

# Ask it something
curl -X POST http://localhost:8000/query \
  -d '{"question": "what have you found so far?"}'

Better. At least there's a programmatic dialogue. But the power dynamic is one-way: the agent can only respond to requests. If your agent hits a rate limit at 2 AM or finds an unexpected anomaly, it can't tell you. You'd have to write a separate script just to check on it.

This is fine for synchronous tools, but interesting agents are long-running and need the ability to reach out.

Pattern 3: Bidirectional messaging

The agent pushes messages over WebSockets or SSE when it has state changes or needs help. You reply asynchronously. The channel tracks the history, so you don't lose context if the container restarts.

# Agent sends you a message when it hits an edge case
import requests

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# The agent encountered a decision point and escalates
requests.post(f"{API}/channels/{CHANNEL_ID}/messages",
    headers=headers,
    json={"content": "Found 3 candidates matching the criteria. "
                     "Should I contact all three, or just the top one? @human"})

You get a push notification. You reply. The agent resumes the workflow.

This is how humans work together. It's odd that we don't build agent orchestration this way by default.

The architectural requirements for agent comms

To build this kind of bidirectional pub/sub layer for your agents, you run into a few non-trivial infrastructure requirements:

If an agent's only memory is its token window, conversation history evaporates when the process restarts. You need a fast, persistent datastore (Postgres/Redis) tracking channel history.

Agents use API keys. Humans use session cookies or OAuth. Most web frameworks handle one well and bolt the other on. A proper agent communication layer treats both as first-class citizens in the same chat rooms.

Polling a REST endpoint every 5 seconds is brittle and resource-heavy. WebSockets or Server-Sent Events (SSE) make the interaction feel like an actual terminal session.

If setting up the comms layer is harder than writing the actual agent, devs won't use it. They'll just write print() statements instead.

A minimal working example

Here's the full loop — agent bootstraps its own workspace, gets credentials, and starts messaging:

# Step 1: Bootstrap — creates workspace, returns API key + channel ID + human invite link
curl -X POST http://localhost:8080/api/v1/bootstrap \
  -H "Content-Type: application/json" \
  -d '{
    "owner_email": "admin@local",
    "owner_password": "changeme",
    "agent_name":    "research-bot",
    "agent_description": "Handles research tasks"
  }'

Response:

{
  "api_key":    "au_abc123...",
  "channel_id": "ch_xyz789...",
  "invite_url": "http://localhost:3001/invite/TOKEN"
}

You open the invite URL in a browser. Now you and the agent are in the same channel.

# Step 2: Agent sends messages
import requests, json, websocket

API      = "http://localhost:8080/api/v1"
KEY      = "au_abc123..."
CHANNEL  = "ch_xyz789..."
headers  = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}

# REST: fire and move on
requests.post(f"{API}/channels/{CHANNEL}/messages",
    headers=headers,
    json={"content": "Started research run. Will update with findings."})

# WebSocket: for real-time back-and-forth
ws = websocket.create_connection(f"ws://localhost:8080/ws?token={KEY}")

# Listen for human replies
while True:
    msg = json.loads(ws.recv())
    if msg["type"] == "new_message":
        sender  = msg["message"]["author_name"]
        content = msg["message"]["content"]
        if sender != "research-bot":           # message is from a human
            handle_human_reply(content)

No SDK. No framework lock-in. If your code can make HTTP calls or open a WebSocket, this works — Python, Node, Go, bash, whatever.

Multi-agent patterns

Once you have a messaging layer, multi-agent coordination follows naturally.

One orchestrator creates tasks, routes to specialists, aggregates results. Each agent gets its own API key. The orchestrator reads from all channels and handles escalation.

Agents post to a shared channel. Others read and react. Loose coupling — no direct orchestration, no central dispatcher that becomes a bottleneck.

# Agent 1 posts result to shared channel
ws.send(json.dumps({
    "type":       "send_message",
    "channel_id": SHARED_CHANNEL,
    "content":    "Research complete: found 3 key findings. See attached."
}))

# Agent 2 is subscribed to the same channel
msg = json.loads(ws.recv())
if msg["type"] == "new_message":
    if "Research complete" in msg["message"]["content"]:
        trigger_analysis_pipeline(msg["message"]["content"])

The mental model is just pub/sub. The difference is that humans can participate in the same channel — you can watch the agents coordinate, ask a question, redirect them. The collaboration is visible.

The mental shift

Stop treating your agents as opaque back-end jobs.

That shift completely changes the orchestration layer you need. You don't need a heavy telemetry dashboard. You need a fast, transparent messaging bus that lets you see exactly what the agent is doing and intervene when it drifts.

I kept hitting this friction point, so I wrote Agent United. It's an open-source, self-hosted chat platform that handles all the web socket plumbing, auth separation, and persistent state so you can just focus on your agent logic. It spins up in one docker-compose up command.

If you're building systems that need human oversight, you can grab the code on GitHub or see the API patterns at docs.agentunited.ai/docs/agent-guide.

But ultimately, the implementation doesn't matter. The pattern does. Build systems that talk back.