DEV Community

Cover image for We Replaced Hours of Manual API Testing With an AI Agent Running Integration Tests in Real Time
Olamide Adebayo
Olamide Adebayo

Posted on

We Replaced Hours of Manual API Testing With an AI Agent Running Integration Tests in Real Time

— and it didn't just write the tests. It debugged Kubernetes RBAC, fixed race conditions, and shipped all 40 passing.



Let me tell you what happened this week, because I think it signals something real about where developer workflows are heading.

I'm building a PaaS that lets you deploy MCP (Model Context Protocol) servers on Kubernetes with a single API call. This week we shipped a new feature branch: replica management — the ability to start, stop, restart, and scale running MCP servers.

Standard stuff. But the way we tested it was anything but.


The Old Way: Postman (and Why It Slows You Down)

Most backend engineers know the drill.

You open Postman. You find the right collection. You manually update the Authorization header with a freshly copied token. You click Send on the first endpoint, copy an ID from the response, paste it into the next request, click Send again. You spend 20 minutes setting up a flow that tests 5 endpoints in sequence — not because the logic is hard, but because the tooling is manual.

And that's for tests you've already built.

For a new feature branch? You're starting from scratch. Writing request bodies, thinking through edge cases, figuring out what state needs to exist before you can even test step 3. It eats your afternoon.

Yes, Postman has an MCP server now. That's progress. But it still requires you to open an application, navigate a GUI, and manage state by hand. The interface changed; the friction didn't disappear.



Enter Bruno CLI — and the Shift That Changes Everything

Bruno is an open-source API client. What makes it different isn't the UI — it's that your entire collection lives as plain text .bru files, committed to your repo, right next to your source code.

That one decision unlocks everything.

Because when your tests are files in a repo, an AI agent can read them, write them, run them, and iterate on failures — all without you touching a browser.

Here's the command that ran our entire 10-step integration test suite:

bru run "Replica Management" --env Local --env-var authToken="$TOKEN"
Enter fullscreen mode Exit fullscreen mode

That's it. One line. 40 assertions. 1.3 seconds.


What the Agent Actually Did

We didn't just ask Claude to "write some tests." We gave it access to the running system — the codebase, kubectl, the live API — and let it work.

Here's what happened:

1. It read the feature branch code to understand the new stop/start/restart/scale endpoints before writing a single test.

2. It wrote 10 sequential test steps as .bru files, each building on the last — finding a live deployment, stopping it, checking status, starting it, restarting, scaling up to 2 replicas, scaling back down, validating that stdio transport correctly rejects a scale-to-2 request with a 400 error, then restoring state.

3. Tests failed. The agent debugged them in real time.

The first run showed 401 Unauthorized. The agent found that collection.bru had a hardcoded expired JWT and changed it to {{authToken}} — parameterized via env var.

The second run showed 502 on the start endpoint. The agent ran kubectl to check the operator logs. Found the operator was stuck in Pending because a ClusterRole was missing the apply verb for services and statefulsets. It diagnosed it, documented the fix.

The third run: 502 again, different error. The ScaleMCPServer function was calling GetScale on a Deployment, then UpdateScale — but the operator was reconciling the Deployment 5 times per second, changing its resourceVersion between our two calls. Classic Kubernetes optimistic concurrency conflict. The agent read the Go client code, identified that the retry loop was reusing a stale object, and refactored both ScaleMCPServer and UpdateMCPServer to re-fetch inside the retry closure on every attempt.

40/40. All green.



This Is Not About Replacing Tests — It's About Removing the Gap

Here's what I think gets missed in conversations about AI and testing.

The narrative is usually: "AI writes your unit tests." Fine. But unit tests mock everything. They're fast feedback on logic, not on the real system.

What we're talking about is different. This is an agent that:

  • Has live access to the running API
  • Understands the feature branch code
  • Writes tests that reflect actual system behavior
  • Runs them against a real Kubernetes cluster
  • Reads the failure output, forms a hypothesis, digs into source code or infrastructure, makes a fix, and retries

That's the workflow of a senior engineer doing a proper integration testing pass on a feature branch — compressed from hours into minutes.

And critically: the tests it writes don't disappear. They become part of the repo. Every future developer, every future CI run, every future AI agent gets the benefit.


The Deeper Shift: Tests as Living Documentation

When integration tests live as .bru files next to your source code, they stop being a separate concern. They become part of how you describe what your API does.

The test file for our stop endpoint doesn't just assert status: 200. It documents that:

  • The MCPServer CRD is preserved (not deleted) when stopped
  • ready_replicas drops to 0 even though replicas config stays at 1
  • The subdomain and ingress survive the stop for fast restart

That's knowledge that would otherwise live in someone's head, a Notion doc, or nowhere at all.


What This Means for Your Team

If you're still manually clicking through Postman for every feature branch test, consider this:

  1. Move your API collections to Bruno — plain text files, version controlled, portable
  2. Give your AI agent access to run thembru run is a single shell command
  3. Let the agent write tests against new branches — not one-shot generation, but iterative: run → fail → diagnose → fix → rerun
  4. Commit the tests — they compound. Every feature ships with its integration tests checked in.

The tooling is already there. The AI capability is already there. The gap is just knowing they can work together.

We've published the Bruno skill we use so any Claude Code agent can pick it up instantly:

npx skills add https://github.com/olamide226/agent-skills --skill bruno
Enter fullscreen mode Exit fullscreen mode

Full skill source: github.com/olamide226/agent-skills — bruno/SKILL.md


Join the waitlist for MCPLambda — infrastructure for deploying MCP servers at scale. If you're working on the MCP ecosystem or thinking about developer tooling for AI-native backends, I'd love to connect.


What's your current setup for integration testing feature branches? I'm curious how teams are handling this — drop a comment below.


Tags: #AI #DeveloperTools #APITesting #Kubernetes #BackendEngineering #MCP #ClaudeCode #DevEx #Bruno #Integration Testing

Top comments (0)