DEV Community: Rajeev Ramani

My Agent Did Everything Right, Then Gave Up

Rajeev Ramani — Wed, 18 Feb 2026 00:42:52 +0000

In my last post, I talked about the metadata problem — how deciding what to expose through MCP was harder than the protocol itself. I wrote about the token tax, about how tool definitions eat context before the model does anything useful.

I took that lesson to heart. Our MCP responses became minimal. {”ok”: true} with a reference to what was created. Lean. Token-efficient. Exactly what I thought agents needed.

I was wrong about who I was optimising for.

The Trace

We’ve been testing two Python agents against Flowplane’s MCP server — an ops agent for diagnostics and a dev agent for deployments. I gave the dev agent a straightforward task:

Deploy httpbin at localhost:8001 on path /v1/api/users -> /anything at port 10018

Here’s what the agent did:

-> dev_preflight_check({”path”:”/v1/api/users”,”listen_port”:10018,”cluster_name”:”httpbin-svc”})

OK: port available, no name conflicts

-> cp_create_cluster({”name”:”httpbin-svc”,”endpoints”:[{”address”:”localhost”,”port”:8001}]})

OK: {”ok”:true,”ref”:{”type”:”cluster”,”name”:”httpbin-svc”,”id”:”a3f1...”}}

-> cp_create_route_config({”name”:”httpbin-routes”,”virtual_hosts”:[

{”name”:”httpbin-vhost”,”domains”:[”*”],”routes”:[

{”name”:”httpbin-route”,”match”:{”prefix”:”/v1/api/users”},”route”:{”cluster”:”httpbin-svc”,”prefix_rewrite”:”/anything”}}

]}

]})

OK: {”ok”:true,”ref”:{”type”:”route_config”,”name”:”httpbin-routes”,”id”:”b7e2...”}}

Three calls in, the agent had done everything right. Correct creation order, good naming, inline virtual hosts and routes in a single call. The reasoning was flawless.

Then it fell apart.

-> cp_list_virtual_hosts({”route_config_id”:”b7e2...”})

OK: [{”name”:”httpbin-vhost”,”id”:”c4d8...”}]

-> cp_get_virtual_host({”id”:”c4d8...”})

OK: {”name”:”httpbin-vhost”,”domains”:[”*”],...}

-> cp_list_routes({”virtual_host_id”:”c4d8...”})

OK: [{”name”:”httpbin-route”,”id”:”d9a1...”}]

-> cp_get_cluster({”name”:”httpbin-svc”})

OK: {”name”:”httpbin-svc”,”endpoints”:[...]}

-> cp_list_listeners({”route_config”:”d9a1...”})

OK: []

Five verification calls. The agent was checking its own work — confirming that the virtual hosts, routes, and cluster it just created actually existed. It already knew they did. It created them. But it didn’t trust the response enough to move on.

Worse, look at that last call. It passed a route UUID where a route_config name was expected. Got zero results. And then it stopped. Never created the listener.

End result: cluster exists, route config exists with proper virtual hosts and routes — all sitting in the database, correctly configured. But the agent concluded the deployment failed because it couldn’t verify its own work.

The Wrong Kind of Efficient

When I wrote about the token tax in the first post, I was thinking about tool definitions — the schemas and descriptions that eat context before anything happens. So we made our responses lean too. Create a route config with inline virtual hosts and routes? Here’s your confirmation:

{”ok”: true, “ref”: {”type”: “route_config”, “name”: “httpbin-routes”, “id”: “b7e2...”}}

A human developer sees that and knows exactly what happened. They’ve read the docs, they understand the API contract, and they’ve written application code that handles this response based on that understanding. The response doesn’t need to be self-explanatory — the knowledge lives in the consuming application code, not in the payload.

An agent has none of that. It sees ok: true and a reference to the top-level object. Were the inline virtual hosts created? How many routes ended up in the database? Is the route config ready to be attached to a listener, or does it need more configuration? The response doesn’t say.

So the agent does what any reasonable system would do when it lacks confidence: it investigates. It calls cp_list_virtual_hosts to confirm they exist. It calls cp_get_virtual_host to check the details. It calls cp_list_routes to verify the routes landed. Each call burns tokens and introduces another point where things can go sideways — like passing a UUID where a name was expected. The agent passed a route UUID to an endpoint that expected a route_config name — a consistency gap in our API surface that deserves its own post."

I was optimising for the wrong consumer. Token-efficient responses are great when your consumer already has the mental model. When your consumer is building the mental model from your responses alone, brevity becomes ambiguity.

What the Agent Actually Needed

After studying the trace, the fix was straightforward. Not more data — more relevant data:

{

“ok”: true,

“ref”: {”type”: “route_config”, “name”: “httpbin-routes”, “id”: “b7e2...”},

“created”: {

“virtual_hosts”: 1,

“routes”: 1

},

“next_step”: “Create a listener referencing route_config ‘httpbin-routes’ with cp_create_listener”

}

Three additions:

Confirmation of nested effects. The created field tells the agent that its inline virtual hosts and routes were actually persisted. No verification calls needed.

Next step guidance. The next_step field tells the agent what to do now. This sounds hand-holdy, but agents don’t have muscle memory. A DevOps engineer who’s deployed twenty services knows the listener comes next. An agent running this workflow for the first time — or the hundredth time with a blank context window — doesn’t.

Names, not just IDs. Notice the next step says route_config ‘httpbin-routes’, not route_config ‘b7e2...’. Our agent knew the name “httpbin-routes” because it chose that name. But the response ecosystem kept handing back UUIDs, and the agent started using those instead. When it passed a UUID to an endpoint expecting a name, it got zero results and assumed failure.

The Verification Loop

There’s a pattern here worth naming: the verification loop. An agent creates a resource, gets a minimal acknowledgment, then spends 3-5 additional calls confirming what it just did. Each call costs tokens. Each call introduces a chance for ID/name confusion or hitting unexpected edge cases. And the information was available at creation time — we just didn’t return it.

In our trace, the verification loop consumed more tokens than the actual deployment. Five GET calls to verify three creates. The agent was doing more reading than writing, and all of it was unnecessary.

The fix isn’t to prevent agents from making verification calls. It’s to make them unnecessary. If your create response confirms what was created, includes the side effects, and points to the next step, the agent has no reason to look back.

The Tension

There’s a real tension here that I don’t think has a clean answer yet.

Token efficiency says: return less. Every byte in the response is a byte the model has to process. Keep it lean.

Agent confidence says: return more. Every ambiguity in the response triggers verification behaviour. The agent will spend those tokens anyway — either reading your response or making follow-up calls. Follow-up calls cost more.

My current thinking: responses should be informationally dense but structurally simple. A flat created object with counts is cheaper than the agent making three list calls. A one-line next_step string is cheaper than the agent reasoning about workflow ordering from scratch. You’re not adding bloat — you’re preventing it downstream.

What I Didn’t Expect

In the first post, I described a two-layer challenge: tool design (what to expose) and metadata design (how to describe it). I’d now add a third: response design — what you send back after the tool runs.

REST has conventions for status codes and resource representations. GraphQL lets clients specify what they want back. But neither tradition accounts for a consumer that needs to build confidence about what just happened and decide what to do next, all from a single response.

API design for agents will become its own discipline. It borrows from REST, from GraphQL, from conversational UI design — but it’s not quite any of them. The consumer isn’t rendering a page or populating a cache. It’s making a decision. Your response is the input to that decision.

We’re still early in figuring this out. The pattern I’ve landed on — confirmation of effects plus next-step guidance — works for our deployment workflows. Whether it generalizes, I don’t know yet. But the principle feels right: _ design your responses for the consumer that has to reason about them, not the one that already knows what they mean. _

If you're building MCP servers and running into similar patterns — or if you've found different solutions — I'd love to hear about it. You can find Flowplane at github.com/rajeevramani/flowplane, or connect with me on LinkedIn.

MCP Made Me Rethink Who My Software Serves

Rajeev Ramani — Tue, 10 Feb 2026 01:13:02 +0000

The Hard Part of MCP Isn’t the Protocol

The Model Context Protocol is everywhere. Claude Desktop, Cursor, Windsurf—everyone’s racing to connect AI to tools.

I spent the last few weeks adding MCP to Flowplane. What started as “expose some endpoints to Claude” became something I didn’t expect: a complete rethink of who my software serves.

Two Audiences, Two Servers

MCP itself is simple—JSON-RPC 2.0, well-defined message types. I had it working in a day.

The hard part wasn’t figuring out what to expose, but what and how much metadata would be needed to let an agent effectively use the platform.

Flowplane manages Envoy proxies. My first instinct: wrap the infrastructure primitives—clusters, listeners, routes—in MCP tools. Ship it.

Then I realised the MCP layer in Flowplane wasn’t serving one kind of consumer — it had to serve two very different ones. A customer’s agent doesn’t want to “create a cluster with round-robin load balancing.” They want to call getUser(id: “123”). They don’t care about Envoy. They care about the APIs behind it. An internal DevOps agent, on the other hand, needs to create and manage APIs and their corresponding resources. I wasn’t just exposing tools — I was designing an AI-facing platform layer.

So I decided to serve two different categories of tools.

Control Plane tools for platform engineering agents who build the gateway—19 tools for managing clusters, listeners, routes, and filters.

Gateway API tools for everyone else—dynamically generated from OpenAPI specs or learned from traffic. When you call these requests, they go through Envoy with the same JWT validation and rate limiting as any other client.

The Metadata Problem

MCP tools need good metadata—names, descriptions, schemas. Without them, AI can’t figure out which tool to use.

The gap between a tool with an OpenAPI spec and a fallback generated from path patterns is brutal. One has rich descriptions and typed parameters. The other technically exists.

I built two approaches to close this gap:

OpenAPI extraction — When you import a spec, Flowplane pulls operationId, summaries, and request/response schemas automatically. Every route gets rich metadata immediately. Full confidence.
Schema learning — For APIs without specs, Flowplane observes traffic and infers schemas from actual requests and responses. Field types, required parameters, response shapes—all learned over time. Confidence grows with volume.

The goal: no route should ever be “not ready” for MCP. Whether you have perfect documentation or none at all, the tools should be usable.

It’s not glamorous work. But it’s the difference between tools AI can use effectively and tools that just exist.

The Token Tax

Every MCP tool costs tokens before you use it.

Tool definitions—names, descriptions, parameter schemas—live in the model’s context window. 19 Control Plane tools are manageable. Hundreds of Gateway API tools, one per endpoint, are not.

With a large API surface, you’re spending context just describing what’s available. The model hasn’t done anything yet.

I’m still figuring out the right approach—tool categories, lazy loading, trimming verbose schemas. Good metadata helps AI pick the right tool, but too much crowds out the actual conversation.

The irony : Good metadata helps AI pick the right tool, but too much crowds out the actual conversation. Right now, finding the balance feels more like art than science.

The Point

MCP isn’t complicated. Deciding what to expose is.

The usual API questions still apply—audience, capabilities, permissions. But MCP adds a new one: what information does the model need to choose the right tool, and how do I surface it?

That’s where I spent most of my time. The protocol took a day. The metadata problem is ongoing.

Everyone’s Building AI Apps. I Used AI to Build Infrastructure

Rajeev Ramani — Sun, 04 Jan 2026 07:10:28 +0000

If you’re paying attention to tech news, everyone’s building AI applications. RAG pipelines, agent frameworks, LLM wrappers. I went the other way.

I used AI to build infrastructure.

Specifically, I spent the last few months building a control plane for Envoy, the proxy that powers service meshes like Istio and sits behind most modern API gateways.

Here’s what makes this interesting: I’d never touched Envoy’s xDS protocol before starting—complete novice. xDS is Envoy’s configuration protocol—a set of gRPC APIs (LDS, RDS, CDS, EDS, SDS) that let you dynamically configure the proxy without restarts. It’s powerful. It’s also dense protobuf that demands deep protocol knowledge.

The thing that surprised me most was just how complex xDS actually is. Not conceptually—the ideas are straightforward. But the implementation details, the edge cases, the way resources reference each other. I’d underestimated it.

So I leaned heavily on Claude Code. And when I say heavily, I mean 70%+ of Flowplane was AI-assisted. Not just autocomplete suggestions—actual architectural decisions, debugging sessions, test generation.

What emerged is a control plane that lets you configure Envoy through REST APIs instead of writing raw protobuf. Create clusters, routes, listeners, filters—all through JSON that Flowplane translates into xDS. It handles JWT auth, OAuth2, rate limiting, header mutation, and 13 filter types total. Multi-tenant, team-scoped resources.

The most interesting part is the learning engine. Point it at traffic flowing through Envoy, and it infers API schemas from observed request/response patterns. No documentation? Watch the traffic, learn the shape.

The contrarian bet here isn’t Flowplane itself—it’s the approach. While everyone races to build the next AI application, there’s a whole layer of infrastructure that’s underserved. Envoy has millions of deployments. How many open source control planes exist for it? Not many.

Open source matters. Envoy proved that a proxy could be a commodity. Control planes exist too—Istio, Kuma, Contour, Envoy Gateway—all open source, all free. But they’re Kubernetes-first. Envoy Gateway recently added experimental standalone support, but it’s still built around CRDs and the Gateway API. If you want to run Envoy standalone—as an edge gateway, on VMs, without K8s—you’re either writing xDS config by hand or paying for enterprise solutions.

I’m not sure if Flowplane is the answer. But I’m increasingly convinced that AI doesn’t just help you build AI apps faster—it lets a single person tackle infrastructure problems that previously needed teams.

What infrastructure is underserved in your stack? If you’re running Envoy outside Kubernetes, or just curious about the project, reach out on github or find me on Linkedin

I have trying to a solve a personal problem of configuring envoy dynamically and I think this may benefit others.

Rajeev Ramani — Wed, 24 Dec 2025 03:28:10 +0000

Flowplane: Configure Envoy with REST Instead of xDS

Rajeev Ramani ・ Dec 24 '25

#tooling #devops #networking #api

Flowplane: Configure Envoy with REST Instead of xDS

Rajeev Ramani — Wed, 24 Dec 2025 01:35:44 +0000

Envoy is powerful, but configuring it can be painful.

If you've ever tried to set up a dynamic Envoy control plane, you know the drill: work with xDS protocol libraries like go-control-plane, construct deeply nested resource definitions, implement state management and versioning, figure out why your CDS update isn't being picked up. It's a lot of ceremony just to route traffic to a backend.

I built Flowplane to skip all that.

What is Flowplane?

Flowplane is an open source control plane that exposes REST APIs for managing Envoy configuration. Instead of working with xDS libraries and nested structs, you POST JSON to create clusters, routes, and listeners. Flowplane handles the xDS protocol and serves configuration to your Envoy proxies via gRPC.

# Create a cluster
curl -X POST http://localhost:8080/api/v1/clusters \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "team": "platform",
    "name": "httpbin-cluster",
    "endpoints": [{"host": "httpbin.org", "port": 443}],
    "useTls": true
  }'

That's it. No xDS library integration, no deeply nested resource construction required.

How is Flowplane different from Envoy Gateway?

If you're familiar with the Envoy ecosystem, you might be wondering how Flowplane compares to Envoy Gateway. They solve similar problems but take different approaches.

Envoy Gateway is the official Envoy project for managing proxies. It implements the Kubernetes Gateway API, so you configure it with CRDs like Gateway and HTTPRoute via kubectl apply. It's designed for teams already working in Kubernetes who think in those abstractions.

Flowplane exposes Envoy's xDS concepts directly through REST. You're still working with clusters, listeners, and routes — the same mental model as Envoy — but through JSON and HTTP rather than Go/Python code or Kubernetes manifests.

	Envoy Gateway	Flowplane
Config model	Kubernetes Gateway API	REST API (JSON)
Primary environment	Kubernetes-native	Runs anywhere
How you configure	`kubectl apply`	`curl`, CLI, or Web UI
Multi-tenancy	K8s namespace isolation	Built-in team scoping

Choose Envoy Gateway if: You're on Kubernetes and want tight integration with the Gateway API ecosystem.

Choose Flowplane if: You're running Envoy outside Kubernetes, want API-driven configuration for CI/CD integration, or prefer working with REST over YAML manifests.

Why I built this

Envoy is a battle-tested proxy that powers infrastructure at Lyft, Google, and Stripe. It's fast, extensible, and includes features other gateways charge for — OAuth2, JWT authentication, rate limiting, external authorization — all built in.

The problem is the learning curve.

Envoy's configuration is notoriously complex. You need to understand xDS protocols, work with verbose Go or Python libraries to construct resource definitions, and piece together listeners, clusters, and filter chains. For teams that just want to route traffic and secure their APIs, it's a steep hill to climb.

There are existing open source projects that help you build Envoy control planes, but most still require working directly with xDS libraries and understanding protocol intricacies, or they assume you're running Kubernetes. If you're on ECS, VMs, or bare metal and just want an API gateway with OAuth2 and rate limiting, a full service mesh is overkill.

I built Flowplane to bridge that gap — a RESTful abstraction that gives teams access to Envoy's powerful features without requiring Kubernetes or deep xDS expertise. POST some JSON, get a working proxy config.

Key features

REST API for everything — Clusters, listeners, routes, filters, secrets. All manageable through standard HTTP endpoints.

CLI support — Prefer the command line? Flowplane includes a CLI for managing resources.

13 HTTP filters out of the box — JWT auth, OAuth2, rate limiting, CORS, header mutation, external authorization. Configure them with JSON, not nested code structures.

Multi-tenant by default — Resources are scoped to teams with token-based auth. Useful when multiple teams share proxy infrastructure.

API schema learning — This one's a bit different. Flowplane can capture traffic samples and infer JSON schemas from observed requests/responses. Handy for documenting APIs that don't have specs.

Web UI included — A SvelteKit dashboard for managing resources if you prefer clicking over curling.

Quick start

Docker (Recommended)

docker run -d \
  --name flowplane \
  -p 8080:8080 \
  -p 50051:50051 \
  -v flowplane_data:/app/data \
  -e FLOWPLANE_DATABASE_URL=sqlite:///app/data/flowplane.db \
  ghcr.io/rajeevramani/flowplane:latest

Binary

Download from GitHub Releases:

# Linux (x86_64)
curl -LO https://github.com/rajeevramani/flowplane/releases/download/v0.0.11/flowplane-x86_64-unknown-linux-gnu.tar.gz
tar xzf flowplane-x86_64-unknown-linux-gnu.tar.gz

# macOS (Apple Silicon)
curl -LO https://github.com/rajeevramani/flowplane/releases/download/v0.0.11/flowplane-aarch64-apple-darwin.tar.gz
tar xzf flowplane-aarch64-apple-darwin.tar.gz

# Run
./flowplane-*/flowplane

Access points

Service	URL
API	http://localhost:8080/api/v1/
UI	http://localhost:8080/
Swagger UI	http://localhost:8080/swagger-ui/
xDS (gRPC)	localhost:50051

What's next

Flowplane is at v0.0.11 — functional but early. I'm using it for my own projects and looking for feedback from others who've felt the Envoy configuration pain.

If you've struggled with Envoy configuration or want a simpler path to its features, give it a try:

GitHub: github.com/rajeevramani/flowplane
Docs: Available in the repo under /docs

Star the repo, open an issue with feedback, or drop a comment here. I'd love to hear what's working and what's missing.