Sam Hartley

Posted on Apr 5

I Let AI Coding Agents Build My Side Projects for a Month — Here's My Honest Take

#webdev #coding #ai #productivity

Last month I ran an experiment: instead of writing code myself, I delegated as much as possible to AI coding agents. Not just autocomplete — full autonomous agents that read files, run commands, and ship features.

I've been running a home lab (Mac Mini M4 + a Windows PC with GPUs + an Ubuntu box) for a while now, and I already had my dev workflow automated with AI agents. But this time I pushed further: what if the agents didn't just help me code, but actually wrote the code?

Here's what happened.

The Setup

I used a mix of tools:

Claude Code (CLI) — my go-to for complex, multi-file tasks
Codex (OpenAI) — good for one-shot generation with clear specs
Local models via Ollama — for quick iterations without burning API credits

The workflow: I'd describe what I wanted in plain English, point the agent at the right directory, and let it work. Sometimes I'd review the output. Sometimes I'd just run the tests and ship it.

What Actually Worked

1. Boilerplate and scaffolding — 10/10

This is where agents shine. "Set up a FastAPI project with SQLite, async endpoints, and Pydantic models for a subscriber management system." Done in 90 seconds. Would've taken me 20 minutes of copy-pasting from old projects.

2. Refactoring — 8/10

"Refactor this 400-line script into separate modules with proper error handling." The agents were surprisingly good at this. They understood the intent, split things logically, and even added type hints I'd been too lazy to write.

The two points I'm docking: they sometimes over-abstract. I'd ask for "cleaner code" and get an enterprise-grade factory pattern for a 50-line script. You still need taste.

3. Writing tests — 9/10

This was the biggest win I didn't expect. I hate writing tests. The agents love writing tests. "Write pytest tests for this module, cover edge cases" → comprehensive test suite in under a minute.

The catch: you need to actually read the tests. I caught a few that were testing the wrong thing — they'd pass, but they weren't testing what mattered. Still, it's a massive time saver.

4. Debugging — 7/10

Hit or miss. For straightforward bugs ("this function returns None when it should return a list"), agents crush it. For subtle timing issues or race conditions? They'd suggest fixes that looked right but didn't address the root cause.

My rule now: agents for the first pass, then I debug the debugger's output.

5. Documentation — 9/10

README files, docstrings, API docs. Agents are better at this than I am, honestly. They're more thorough, more consistent, and they don't get lazy halfway through.

What Didn't Work

Complex architecture decisions

"Should I use Redis or just in-memory caching for this?" The agent will give you a perfectly reasonable answer either way. That's the problem — it doesn't know your constraints the way you do. How many users? What's your memory budget? Are you running on a Raspberry Pi or a 128GB server?

I stopped asking agents for architecture advice. I make the decision, then let them implement it.

Multi-service orchestration

When I needed to coordinate between my Telegram bot, a background worker, and a database — and they all needed to agree on a shared state model — the agent would nail each piece individually but miss the integration points. I'd end up with three perfectly written services that didn't quite talk to each other.

Anything involving my specific hardware

"Configure this for my RTX 3060 with 12GB VRAM running on Windows with WSL2." The agents would give me generic CUDA setup instructions instead of what actually works on my rig. Local knowledge is still human knowledge.

The Numbers

Over the month, across 4 side projects:

Metric	Before (manual)	After (agent-assisted)
Time to MVP	~2 weeks	~4 days
Lines written by me	~80%	~30%
Bugs in first deploy	~8-12	~5-8
Time spent reviewing agent code	0	~2h/day
Overall velocity	1x	~2.5x

The 2.5x multiplier is real but misleading. I'm faster at producing code, but I spend more time reviewing code. The net gain is still significant — maybe 1.8x when you account for review time.

What I Changed About My Workflow

1. I write specs, not code.
My job shifted from "programmer" to "technical product manager." I write clear descriptions of what I want, define the interfaces, and let the agent fill in the implementation.

2. I review harder.
When I wrote the code myself, I'd eyeball it and move on. When an agent writes it, I actually read every line. Paradoxically, this has made me a better reviewer.

3. I prototype faster, throw away more.
Since generating a prototype takes minutes instead of hours, I build 2-3 approaches and pick the best one. This was a luxury I couldn't afford before.

4. I still write the hard parts.
State machines, complex business logic, performance-critical paths — I write these myself. Not because the agents can't, but because I need to understand them deeply to maintain them later.

The Uncomfortable Truth

Here's what nobody in the "AI will replace developers" discourse talks about: you need to be a good developer to use AI coding agents well.

Every time the agent produced something subtly wrong, I caught it because I knew what right looked like. Every time it over-engineered a solution, I could simplify it because I understood the problem. Every time it picked the wrong library or pattern, I could redirect it because I had opinions forged by years of mistakes.

Junior devs using these tools will ship faster. They'll also ship more bugs, more over-engineering, and more "it works but nobody can maintain it" code. The tools amplify whatever skill level you bring.

My Recommendation

If you're not using AI coding agents yet, start with:

Tests first. Let agents write your test suites. Low risk, high reward.
Boilerplate second. Project setup, CRUD endpoints, config files.
Refactoring third. Point it at your worst file and say "clean this up."
Complex features last. Only after you trust the tool and know its limits.

And always, always review the output. The agent is your fastest junior developer. It's also your most confident one — and confidence without experience is how bugs get shipped.

I write about building things with AI, self-hosting, and turning side projects into income. If you're into that, I post a new article every few days.

Running my own AI setup locally? Here's how I do it for $0/month.

DEV Community