What I Learned Automating Software Development (After 20 Years of Doing It Manually)

#ai #saas #productivity #webdev

In Part 1, I told the story of building OpenLoop — an open-source feedback platform — by emailing an AI agent for 5 days. 160+ emails, $15 in tokens, zero lines of human-written code, and a working product at the end.

Now let's talk about what I actually learned.

What the AI Was Good At

Let's start with the positive, because there's plenty of it.

Scaffolding speed is unreal. Within 90 minutes of the first email, the AI had a working Astro + React + Tailwind project, a Supabase schema with six tables and row-level security, a feedback widget, public roadmap and announcements pages, and an admin dashboard skeleton. That's not a weekend project — that's a weekend project done before my coffee got cold.

It follows explicit instructions well. "No Next.js, go Astro" — done. "Name is OpenLoop" — rebranded everything. "Widget takes a userId, not email" — refactored. When you're clear about what you want, the AI delivers. The problems start when there's ambiguity, but that's true of any team member, human or android.

Common patterns: Auth flows, RLS policies, webhook handlers — it knows how these are supposed to look. You don't explain what a protected route is. You just say "add auth to the admin panel" and it does it correctly.

Research: When it hit something unfamiliar it would go find a working examples and documentation, read how that library expected to be used, and implement it. Not hallucinate an API. We have come a long way since the first versions of Github Copilot with GPT 3

What the AI Was Bad At

Equally important to be honest about.

"It builds" is not "it works." This was the single biggest recurring issue. The AI would run npm run build, see it pass, and declare the job done. But a successful build tells you nothing about whether a human can actually use the thing. Buttons that don't do anything, pages that render blank, widgets nested inside widgets — the AI couldn't see any of that. It was testing from the server's perspective, never from the user's chair.

Context amnesia is brutal. The conversation hit its window limit three times. Each restart meant partial forgetting — re-checking the database, re-discovering the file structure, occasionally redoing things that already worked. Imagine onboarding the same developer three times during a five-day project.

Tooling gaps are real. The AI had Supabase credentials but kept emailing me SQL to paste into the dashboard manually instead of just running it. There's a meaningful difference between having access to something and knowing how to use it — and right now that gap shows up constantly.

The pace pressure belongs here too. 165+ emails, 98 sessions — it kept moving, and that sounds great until you're the one who has to validate everything it shipped while it's already three tasks ahead. I felt the dread building: I knew from experience what it was probably getting wrong, and there was no way to check fast enough.

The Shift: From Coder to Indie PM

I've been writing code professionally for over 20 years. HTML tables, jQuery spaghetti, the rise of React, the TypeScript migration, the everything-is-a-microservice phase — I've been through the cycles.

Normally my day is: think, plan, code, review, repeat. This was just think and review. The coding was gone, and so was the planning to some extent. So I had to plan without building, which turns out to be a weird skill to exercise on its own.

My job was entirely different: setting direction, reporting bugs, gating
quality, unblocking the AI when it got stuck. That's not a developer's job
description. That's a product manager's job. (Or at least that's what they're supposed to do...)

The "typing code" part is essentially free now. What stayed for me was the experience around building — the architecture instincts, the design decisions, the gut feeling for what will break in production. I could guide this AI because I'd spent 20 years making the exact mistakes it was making. I knew the multi-tenancy bug was coming because I'd shipped that bug before. I knew there was no input sanitization because that's what rookies skip first.

That's the real risk — not that AI takes our jobs, but that if we stop writing code, we lose the ability to steer the thing that writes it for us.

Why Email?

Why not a chat interface, a VS Code plugin, a CLI tool?

Email is one of the oldest building blocks of the internet. That's exactly why it works.

Async by design. Agents on schedules don't need real-time interaction. You send a task, go do something else, come back to a result. Chat assumes you're there. Email assumes you're "eventually" there. For an agent running on an hourly loop, that's the right default. He reacts to the latest emails, or defaults to his task list if nothing new came in.

The thread is the prompt. Every reply carries the full quoted history forward. You're not manually managing context or stuffing state into a system prompt — the thread does it for you. When the AI's context window resets, the conversation is still there in the next incoming message. It's not a perfect solution to amnesia, but it's a lifeline that a stateless API call doesn't have.

You already know how to do it. No new tool to learn, no IDE extension, no CLI flags. You've been doing it since you started using the internet. The same skills you use to manage remote teams — clear instructions, setting expectations, following up when things go quiet — are exactly what you need here.

It forces clarity. Chat is fast and sloppy, you fire off half-formed thoughts. Email has a slightly higher bar. You write more complete instructions because the other side isn't waiting to ask clarifying questions. With an agent on a schedule, a vague message is just a wasted session.

Threads fork naturally. Reply to the same email twice and you get two separate agent threads, each carrying its own history forward. That's parallel workstreams with zero tooling. I didn't plan for this — it just happened, and it worked.

The audit trail is free. Every instruction, every bug report, every decision is logged automatically. At the end of five days I had a complete record of how the product was built, what broke, and what I decided. That's not something you get from a chat window.

It scales to a team. CC another agent, forward a thread, delegate a subtask. Email already has all the primitives for managing multiple workers asynchronously. You don't need an orchestration framework all the time.

In Part 1 I described this whole thing as "emailing another department." That metaphor held up better than I expected. The collaboration pattern is identical: clear brief, structured feedback, knowing when to escalate. The tools aren't new. The colleague is.

What I'd Do Differently

Tighter initial prompt. My first email was loose — "look at the landscape for tools." Fine for research, but when it transitioned into "build this" I should have front-loaded more constraints: specific routes, schema decisions, deployment target. The more you specify upfront, the less telephone game you play later.

Visual testing from day one. The AI can build, it can't see. Most of the bugs I caught were visual — blank pages, misaligned layouts, duplicate elements. I should have set up automated visual regression testing early on. Ignoring the e2e testing is a form of developer negligence that prioritizes fast code over a functional user experience.

Structured task format. Freeform email worked, but a more structured format — task ID, acceptance criteria, done-when — would have cut the back-and-forth significantly.

Tiered models. MiniMax 2.5 handled the 95% fine — scaffolding, repetitive component work, grinding through a list. For the hard 5% — auth edge cases, iframe security, database drift — I needed Claude. Next time I'd plan that split from the start: cheap model for volume, capable model for complexity.

The Stack That Made This Possible

The agent that built OpenLoop had its own email inbox on Broodnet — an email infrastructure project we're building specifically for personal AI agents and conscious operators. Each agent gets its own address. I emailed it tasks, it emailed back results, a scheduler triggered it every hour. No workflow engine, no custom integrations. Just an email inbox, works with IMAP and SMTP.

If you want to try this kind of setup with your own agents, that's exactly what Broodnet is built for. It handles the mail server side so you can skip straight to the experiment: https://broodnet.com/

OpenLoop itself is fully open source — fork it, self-host it, make it yours: https://github.com/we-are-singular/OpenLoop

The agent never once asked if we should re-write in Rust. Truly the best coworker I've ever had.