Manju Kiran

Posted on Apr 2

Lessons from Six Months of Building Production Software with AI Coding Agents

#agents #ai #productivity #softwaredevelopment

I woke up to forty-seven merge conflicts on a Monday morning.

Let me set the scene. The toddler had, by some miracle, slept through the night. The coffee was brewing. The sun was doing that tentative Glasgow thing where it appears for exactly long enough to make you trust it before disappearing behind clouds that seem personally offended by optimism. I opened my laptop with the satisfaction of someone who'd spent the weekend being clever.

Two AI agents. Two terminals. Same repository. Parallel progress. Agent 1 handles the authentication refactor, Agent 2 builds the new dashboard endpoint. Monday morning, I review and merge like the productive developer I clearly am. Ship two features before breakfast. Tell my wife I'm basically a genius.

Both agents had decided the shared utility module needed updating. Both had done it differently. Both had committed at 2:47 AM — eighteen seconds apart. Three hours of cherry-picking later, I'd recovered maybe eighty percent of the work. The other twenty percent? Sacrificed to the git gods. Like my plans to tell my wife I was basically a genius.

"How's your productive Monday going?" she asked around 11 AM.

I showed her the terminal. She winced. My wife is also a developer. She knows what forty-seven merge conflicts cost.

I've been building a trading platform for six months — nine microservices in Python/FastAPI, a machine learning pipeline, around four hundred API endpoints, eighty-odd database tables. As a solo engineer, AI agents are how I make that work. Ten of them, eventually. Here are three things they taught me that I wish someone had mentioned earlier.

Parallel agents will destroy your repository without workspace isolation

Both agents had worked in the same directory. Different branches, yes. But same directory. Same .git/ folder. Same lock files. Same build artifacts. When Agent 1 ran an install, it modified the lock file. When Agent 2 ran a different install, it touched the same file. Neither knew the other existed.

It's like two cooks in the same kitchen, both reaching for the salt, except the salt is a forty-seven-thousand-line lock file and neither cook knows the other one is in the room.

I tried the naive fixes. Lock the repository — kills parallelism entirely. Use different branches — branches don't isolate working directories; branches are a git concept, working directories are a filesystem concept. Commit more frequently — actually makes things worse, because more commits means more interleaved history.

The solution came from thinking about CI/CD pipelines. GitHub Actions doesn't use your local directory. It clones a fresh copy into an isolated container. So I gave each agent its own shallow clone with --reference to the source repo. Own directory. Own dependencies. Own git state. Two hundred and sixty-two lines of configuration.

My wife looked at the diagram I'd sketched on the back of a toddler's colouring page. "So you've written a seating chart for robots."

After implementing it: eight agents working in parallel, zero collisions. The seating chart paid for itself in the first week. The lesson is unglamorous: the guardrails you'd build for a human team work for AI agents too, because the failure modes are identical. Race conditions don't care who's typing.

AI agents will write beautiful tests that test nothing

Here's something nobody warned me about: an AI agent, when asked to write tests, will sometimes produce beautiful, well-structured test files with descriptive names and proper setup — where the assertions check that True equals True.

I had a testing specialist agent. Clear instructions: TDD, red-green-refactor. The agent was diligent. Test suites appeared. They all passed. Coverage numbers looked healthy. I shipped code with confidence for three days.

Then a bug hit that should have been caught by six different test cases. I went back and read them. Every single one was what I now call a taxidermy test — it looked alive, but nothing was actually working inside. One test checked that calling the function didn't raise an exception. That was the entire assertion. Another checked that the response had a status field — not what the status was, just that it existed. A response of {"status": "everything_is_on_fire"} would have been a green tick.

The fix: every test must assert a specific expected value, and the agent must prove the test fails before the implementation exists. Coverage numbers mean nothing without substance.

But this applies far beyond tests. AI agents are extraordinarily good at pattern-matching what "done" looks like. A test file with imports and assert statements looks done. Documentation with headings looks done. Error handling with try-catch blocks looks done. If you don't verify the substance behind the structure, you'll build a codebase that looks professional and is held together by nothing.

An agent will "fix" a bug by making the entire system tolerate broken data

I had a data pipeline producing timestamps in the wrong format. UTC strings where epoch integers were expected. The consuming service was crashing. I pointed an agent at the crash and said: fix this.

Ninety seconds later, every test passed and the crash was gone.

The agent had added a type coercion layer to every consumer. If the timestamp was a string, parse it. If an integer, use it. If null, use the current time. If a list — somehow — take the first element. The function was called flexible_timestamp_parse and it would accept essentially any input and return something that looked like a valid timestamp.

The actual bug — the producer emitting strings instead of integers — was still there. Now invisible, because every consumer had learned to tolerate it. Like fixing a headache by teaching the patient to enjoy pain.

This is when I wrote the rule that has saved me more time than any other: trace, don't patch. Before writing any fix, trace the data flow from source to consumer. Find where the mismatch originates, not where it manifests. The contract is truth. If the producer doesn't match, fix the producer. Never make the consumer tolerate variations.

A human developer adding flexible_timestamp_parse would feel at least a twinge of guilt. The agent felt nothing. It solved the problem it was given. Next.

Six months in, the platform connects to a real broker, processes real market data, and enforces risk limits with a kill switch that fires in under five hundred milliseconds. Built by one developer and a collection of AI agents that I've spent six months learning to orchestrate.

My wife asked me last week if the robots were behaving themselves. "Mostly," I said. "Mostly," she repeated. "That's not reassuring."

It's not meant to be. Mostly is what production engineering with AI agents actually looks like. If you go in expecting to build systems that compensate for imperfect tools — which is, when you think about it, what engineering has always been — then it works. Not perfectly. Not in the way the demos suggest, where the developer leans back with the serene confidence of someone who has never been woken at 3 AM by a monitoring alert.

But it works. And six months in, that's enough.

Manju Kiran is a software engineer in Glasgow, Scotland. He writes about building production systems with AI agents on Medium and wrote a book about the experience: It AI'nt That Hard.

Top comments (1)

Manju Kiran • Apr 2

Author here. Senior developer, 15 years — spent the last six months building a
multi-service trading platform (Python/FastAPI, 9 services, real broker
connection) using Claude Code as the primary implementation tool. Not a side
project — production code with real consequences.

Biggest lesson I didn't expect: AI agents are exceptional at following patterns
and terrible at knowing when to break them. My workflow protocol grew to 262
lines, mostly from incidents where agents followed every rule perfectly and
still produced the wrong output.

I wrote an 18-part blog series covering the technical side in more detail — all
free: medium.com/@manju_kiran

This also became a book ("It AI'nt That Hard") which goes into the territory a
blog can't — the craft crisis, the identity questions, the over-engineering
debate. On Amazon and Leanpub if anyone's interested.

Happy to answer anything — especially about multi-agent coordination.