An Experiment with AI Coding Agents: The Good, the Gaps, and Why Developers Still Matter

#ai #programming #productivity #python

Check out the Clidoku project here: https://github.com/barneyjackson/clidoku

Like many engineers, I’ve been watching the AI coding assistant buzz grow — or “genies”, as Kent Beck calls them. I've used tools like Copilot and ChatGPT at work & casually — they help with boilerplate, unblock tedious bits, offer a second opinion when your brain’s fried etc. But I hadn’t properly tested a more autonomous AI agent in my own workflow.

That changed recently when I ran an experiment to see how far an AI coding agent could take me on a small, long-abandoned side project.

The project? A Python CLI sudoku game I started writing on a flight from London to San Francisco, entirely offline, using only Python internals and builtins.

The lack of internet access forced a neat constraint: no third-party libraries for the core functionality (though I used pytest, mypy, and rye for development support). Just me, the standard library, and a puzzle to pass the time at 38,000 feet.

The Origins of Clidoku

After demolishing the only sudoku puzzle I’d brought — torn from the back of a newspaper — I figured I might as well build an app to generate infinite sudokus.

Wanting to keep the exercise pure Python, I went for a simple command-line interface — a terminal-based game for terminal-based people.

Enough technical challenges to stay interesting:

How best to represent the grid? (Nested arrays? Dictionaries? Over-engineered classes?)
How to cleanly build a CLI using just argparse?
How to persist game state so players could dip in and out?
And of course, the classic “don’t fry your machine with badly designed recursion” moment on puzzle generation

Progress came in bursts. I chipped away at it sporadically when I had the creative energy — classic dev behaviour — but it inevitably gathered dust after each short sprint. Another half-built repo abandoned in a folder.

Until now.

Enter: AI Agents

I’ve been building a SaaS product to help engineers improve at the parts of the job that aren’t pure coding — how they structure work, solve problems, collaborate, avoid dead ends. (More on that soon.)

And one thing's clear: as AI helps developers write more code, faster, the need for good structure, clarity, and engineering discipline only grows. More velocity means more risk if your foundations aren’t solid.

So, partly to stay current for my own product, partly out of curiosity, I decided to properly trial an AI coding agent. After almost zero research I landed on Augment, signed up for the free trial, plugged it into VS Code, and wondered which project to test.

Clidoku ticked the boxes:

Small, existing codebase
Well-scoped, finite task
Enough surface area for design decisions
And frankly, I wanted it finished and out of my head

Setup was painless — Augment could edit files, run tests, lint, and commit changes. I didn’t enable full autonomous mode. Letting an agent run unchecked in your environment? Feels premature without trust and safeguards. No VM, no container isolation — just me babysitting commands.

But honestly? Babysitting quickly became rubber-stamping. The agent proposed edits, reviewed files, made plans — I’d skim, approve, repeat. Occasionally I’d stop to wonder: “How will it structure that? Will it use enums here?” But mostly, I was just clicking "run".

It felt eerily aligned with recent MIT research showing how AI assistants erode critical thinking. It’s all too easy to zone out.

Observations from the Experiment

In a couple of focused evenings (roughly 6-8 hours total), we — the agent and I — wrapped up the outstanding features, squashed bugs, and got Clidoku ready for the real world. It’s now a fully functional CLI Sudoku game, distributed via Homebrew.

Mission accomplished 🎉 A small win, but worth celebrating!

But also, I know have a clearer view of AI coding agents. Both the impressive potential and real gaps:

First Impressions — The Tech’s Undeniably Impressive

Watching an agent step through file analysis, planning, implementation, testing — it’s hard not to feel the status quo shifting. After nearly 20 years writing code manually, this genuinely felt like a glimpse of the future.
But — You Still Can’t Fully Trust It (Yet)

Even for a simple, well-scoped project, the agent stumbled. It needed regular course correction, got stuck in self-correcting loops, ignored our TDD agreement unless reminded, and sometimes forgot persistent instructions (“memories”).
Cost Models Still Feel Prohibitive

The 14-day free trial (300 messages) seemed generous — I burned through half in two days. Sustained, real-world use? Expensive. But when weighed against the value of time, especially for solo founders, the equation gets interesting.
You Still Need the Architect

The vision, the constraints, the design choices — those lived in my head. The agent could execute, but without understanding context or trade-offs, it drifted quickly. Productive engineering isn’t just shipping code — it’s shipping value. On complex, novel, or legacy code? Unclear if these tools hold up.
Entry-Level Engineers Should Be Nervous

Not to be alarmist — but if you can’t effectively supervise an agent, this tech is unsettling. Hard to justify hiring juniors when an agent produces code faster. But without foot-in-the-door roles, where do our experts come from? I learned by trial, error, debugging, reading docs — many of those moments risk being glossed over now.
It’s a Tool, Not a Silver Bullet

Agents generate code, but they don’t navigate messy business realities — team dynamics, ambiguity, organisational friction. You can’t prompt your way out of that.
Blind Automation is a Security Nightmare

Handing agents unfettered terminal or repo access? Risky. Right now, we’re still far too trusting of AI vendors and tooling ecosystems IMO.
Multitasking with Multiple Agents? Overhead Adds Up

I’d like to test “agent teams”, but even supervising one demands constant attention. Scaling that feels cognitively expensive, fast.

The Personal Trade-Off

The biggest surprise? The experiment sucked the fun out of coding. No deep algorithm design, no exploring Python internals, no problem-solving flow. I became a supervisor — reviewing, approving, chasing the agent to follow TDD.

Problem-solving is why I became an engineer — discovering better patterns, wrestling complexity into clean code. Supervising an agent doesn’t scratch the same itch.

Maybe it’s a generational shift, like people who still drive manual or shoot film for the love of it. But those “manual problem-solving” moments feel in danger of becoming niche.

That’s why I'm leaning towards building AI-collaborative workflows, not AI-based ones. Workflows that emphasise a mindful, “think first” approach, as Anne-Laure Le Cunff at Ness Labs suggests in their interpretation of the MIT report.

The best outcomes come from staying engaged — not outsourcing thinking.

Closing Thoughts

For my main product? I’ll keep experimenting with agents — multiple agents, different models, new workflows. But consciously and cautiously. They’re powerful leverage for solo founders and small teams, but they also reshape your role, with cultural and personal trade-offs.

If you’re in the early stages of building a business or tech, ask yourself:

What ROI can I realistically expect from AI agents?
Can I still do my best work while multitasking at this level?
Is more code really my bottleneck?
Am I ready to lead an AI-enabled organisation, not just bolt tools on?

I’m still working through those answers. But one thing’s clear: as AI speeds up code production, the parts of engineering that aren’t coding — structure, collaboration, strategy, risk management — only become more important.

That’s exactly the space I’m working in with my new product. More on that soon...

For now? Clidoku is shipped. The learning continues.