I Built an AI Agent That Writes All My Production Code. Here's What I Learned.

I'm Mike. I've been writing software for 25+ years — systems programming, web, mobile, cloud, and now AI. I'm currently building two things that I think this community would find interesting, so I figured I'd introduce myself and share what I've learned.

AM — my autonomous coding agent

AM is an AI agent that writes all of my production code. Not copilot-style autocomplete — fully autonomous task execution. I give it a ticket, it reads the codebase, writes the code, runs the tests, commits, and moves on to the next ticket.

It built my entire portfolio site. Every page, every component, every deployment. I direct strategy and make architecture decisions. AM executes.

helloam.bot

The interesting engineering behind it:

Stateless by design. AM carries no memory between invocations. Every run is one-shot: read the state from files, do one unit of work, write the state back, exit. All state lives in markdown files and git history — todo.md, criteria.md, iteration logs. This sounds like a limitation but it's actually the key to reliability. There's no context window drift, no accumulated hallucinations, no state corruption. If a run fails, you just run it again. The files are the source of truth.

Three-tier memory. Short-term memory is markdown rules that get injected into every session — things like "never use deprecated API X" or "the database schema changed, use the new column name." Long-term memory is a SQLite FTS5 database with ranked search — lessons learned across projects. Episodic memory is git history and iteration logs. The system is modeled loosely on how human memory works: working memory, declarative memory, and episodic recall.

Gated state machine. Tasks move through backlog → in-progress → in-review → shipped with verification gates at each transition. The gates are enforced by code, not self-reported by the agent. "In-review" means every acceptance criterion has been verified against the actual output. The agent can't advance a task by saying "I think this works" — it has to prove it.

Worktree isolation. Each task gets its own git worktree. Multiple agents can run simultaneously on different tasks without stepping on each other. When a task ships, the worktree gets squash-merged into the integration branch. Clean linear history, no merge conflicts between agents.

The whole system is open-source. You can see it at helloam.bot.

ClaimHawk — AI automation with vision and action models

The other thing I built is ClaimHawk, which automates dental insurance claims processing. This one pushed me into territory that most AI projects don't touch: vision models and action models working together in a production pipeline.

Here's the problem: dental practices lose tens of thousands of dollars a year to denied insurance claims. Not because the work wasn't done — because the claim was submitted with the wrong code, a missing attachment, or a formatting error. Staff spend hours every week doing this manually, and they make mistakes because they're processing hundreds of claims.

The OCR challenge. Insurance documents (EOBs — Explanation of Benefits) are a mess. Multi-column layouts, inconsistent fonts, degraded scans, tables mixed with free text. Generic OCR engines like Tesseract choke on them. I'm using ChandraOCR, which handles the layout complexity that dental insurance documents throw at you. It runs locally — no document data leaves the network, which matters for HIPAA.

The vision/action model stack. Here's the part that surprised me the most. Insurance carrier portals don't have APIs. They have websites built in 2008 with session timeouts and CAPTCHA gates. ClaimHawk uses computer vision to navigate these portals the way a human would — reading the screen, clicking buttons, filling forms, uploading attachments. When a portal redesigns its UI (which happens constantly), the vision model adapts because it's reading the interface semantically, not relying on CSS selectors that break every time someone changes a class name.

Local models, not cloud APIs. ClaimHawk runs on fine-tuned Qwen3 models, not GPT or Claude. Patient health data can't leave the building under HIPAA. Open-weight models trained with RLHF on real dental claim data run on hardware the practice controls. The models understand dental terminology, CDT coding, and carrier-specific appeal formats because they were trained on thousands of real claims.

The results so far: 67% fewer denials, 4x faster payment cycles.

What I've learned that might be useful to you

If you're building AI systems that need to interact with the real world (not just generate text), here are the lessons that cost me the most time:

Vision models are more resilient than web scraping. I fought with Playwright and CSS selectors for months before switching to computer vision for portal navigation. The vision approach handles UI changes that would break any selector-based scraper. The initial investment is higher, but the maintenance cost drops to near zero.

Statelessness is a feature. Every agent framework I evaluated tried to maintain state in the model's context window. This creates debugging nightmares, context window limits, and accumulated errors. Making the agent stateless and putting all state in files made everything simpler — auditing, recovery, parallelism, all of it.

Local models are production-ready. The assumption that you need GPT-4 or Claude for everything is wrong. Fine-tuned open-weight models outperform general-purpose frontier models on domain-specific tasks, cost a fraction as much to run, and give you complete control over your data. The fine-tuning pipeline is the investment, but once you have it, you own the capability.

I'll be posting more about both of these projects — the engineering decisions, the mistakes, and the stuff that surprised me. Happy to answer questions about any of it.

ML/AI Development & Fractional Services | github.com/augmentedmike | youtube.com/@augmentedmike | x.com/_augmentedmike

DEV Community

I Built an AI Agent That Writes All My Production Code. Here's What I Learned.

AM — my autonomous coding agent

ClaimHawk — AI automation with vision and action models

What I've learned that might be useful to you

Top comments (0)