Mario Hayashi

Posted on Apr 15 • Edited on Apr 30 • Originally published at blog.mariohayashi.com

The Factory Must Grow: I Replaced Myself With AI. Now What?

#agents #ai #automation #softwareengineering

tl;dr -- I made an orchestration system that creates PRDs, writes code, opens PRs and handles review feedback. And then I realised I'd automated myself out of the parts I once called my job.

The PR That Changed Things

My orchestration system opened a PR. Tests passing and commit messages better than any I could write. The code was clean. I left my feedback in the PR but the work was solid. After this first PR, I started feeding the task pipeline with more ideas while it churned out PRDs, implemented them and addressed feedback, without much of my attention.

That was the moment it clicked for me: I am just feeding product ideas to the machine. The system handles everything else. Ideas really are cheap now.

If 2025 was the year of agentic AI, 2026 is the year agents will be operationalised at scale. It's both exciting and scary to see half of what I used to call a job automated away.

Step #1: It starts with just a thought. A note in Github

My Slow Start

Most engineers will have been following the rise of agentic AI very closely in 2025. Not me. I was changing nappies. I have a young son and didn't have the bandwidth to follow what was happening. I leaned on Cursor's tooling, had AI help me generate code, but never graduated beyond one-off agentic work. I was comfortable under my rock.

"Just Try Building One Yourself"

An old friend, a software engineer I worked with several times over the years, changed my mind. He had kept a close watch on AI while I was busy being tired from parenting. His advice was simple: build an agent yourself.

I let it sit at the back of my mind for a few weeks. Then I got curious and decided to build it. The gap between reading about agents and watching one work on your own codebase is the gap between reading a recipe and tasting the food. The understanding only clicks when you see an AI agent produce PRs in your codebase, with your conventions. Cheaply also. I'm on Claude's Max plan, so I pay a flat subscription. Over the last week my orchestrator ran 300 worker runs across four workflows and burned through at least $240 of API-equivalent tokens. I say "at least" because crashed runs (of which there were many) never emit a final cost event, so the real number is higher.

At that pace it would be north of $1,000 a month without the subscription. Instead I pay the flat Max fee and the system keeps going. A full pipeline -- PRD, code, review, fix -- averages around $4 in API-equivalent per shipped PR. I hope subscriptions remain this affordable but I know pricing could change any time. I'll park that thought for now.

From Bash Scripts to an Orchestration System

Before the orchestration system, I built basic agents.

I wrote about that first version in An autonomous dev pipeline for one: bash scripts, cron, tmux and Claude glued together until the behaviour was reliable enough to trust. It picked up tasks, wrote code, opened pull requests and handled review feedback. "Beginner" agentic, held together with glue and hope.

It worked well enough but this time I rebuilt the whole thing in TypeScript. Proper architecture and state management. Where bash was duct tape and slightly chaotic, TypeScript is steel: typed interfaces, phase boundaries and methodical error handling.

Managing Many Agents

One agent was manageable. Then I wanted a specialised one for planning, another one for code review, another for fixing PR review feedback. Managing the agents became tricky. Capability is not the bottleneck, orchestration is. I needed a conductor for my orchestra.

Put another way, the factory must grow.

If you recognise that phrase, you already understand how I felt the need to get this right. What started as "let me just automate this one thing" became a full orchestration system: idea goes in, PRD comes out, issues get created, workers pick them up, code gets written, PRs get opened, reviews get addressed, fixes get pushed. I ideate and check in. The system handles the rest.

Factorio: The Factory Must Grow. Source: https://commons.wikimedia.org/wiki/File:Factorio_Space_Age_Gleba_Screenshot.jpg

The pipeline looks something like this:

idea > PRD > issues > code > tests > PR > review > fixes > merge

I'm still figuring this out, but the pattern is clear. Each step is a phase with defined inputs, outputs and failure modes. The system retries, backs off, stops to ask for help as needed.

Under the Hood

The architecture is simpler than you would expect. GitHub is my system of record, my state machine. Each column in the Project board is a state. The orchestrator polls the board, picks up eligible issues and spawns workers.

Step #2: After your idea/note is processed, agent creates a PRD after clarifying ambiguities with me

Step #3: Agent creates a PR

Step #4: Agent reviews its own code and I review it only thereafter

Step #5: Approve and merge the work

Each worker gets its own fresh workspace directory, where it can read, write and commit without stepping on other agents' toes. The worker is a Claude CLI subprocess streaming JSON events and the orchestrator watches that stream for completion.

The part that took the longest to get right was retries. The interesting failures are the ones that look like success. An agent finishes its work, reports success, the orchestrator changes the issue to Done, and nothing has actually been pushed. A pre-commit hook silently rejected the push and the work is sitting orphaned in the workspace directory. Put another way, the agent gets close to finishing, pauses on a turn limit, resumes, gets close again, pauses again, repeat ad infinitum. Or the agent burns through its cost cap without ever committing. Oops.

There are two kinds of retry. A continuation is the agent pausing mid-task because it's hit its turn limit. It resumes with its full conversation history and picks up where it left off. A failure retry is the agent crashing. It retries with a fresh start, no memory, backoff before trying again. Continuation and failure retries both have per-issue caps now. The orchestrator also checks git status after every reported success and abandons any issue that breaches the (configurable) dollar budget. Most of the retry logic exists because one of these went wrong.

The entire workflow configuration (one each for PRD, dev, review) lives in a single file. States, cost limits and prompt templates live in it. When you change the file, the orchestrator hot-reloads it. I spent a good amount of time adjusting these workflow files but it's starting to work.

                    ┌─────────────────────────────┐
                    │      ORCHESTRATOR TICK      │
                    └──────────────┬──────────────┘
                                   │
                    ┌──────────────▼────────────────┐
                    │  1. RECONCILE running workers │
                    └──────────────┬────────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │  2. GATE on rate limits     │
                    └──────────────┬──────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │  3. FETCH candidate issues  │
                    └──────────────┬──────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │  4. DISPATCH workers        │
                    └──────────────┬──────────────┘
                                   │
                    └───── schedule next tick ──────┘

The Thinking Is Still Mine (For Now)

I've replaced (most of) myself. But I can still decide what to build. And what not to build. I can still judge whether code is correct or there's code smell. I can feed the pipeline with ideas that are worth pursuing and axe the ones that are not. I can judge whether the output matches the intent. Being less in the weeds, I have more time to think about strategy.

The orchestration system handles the execution. The thinking, taste and judgement are still mine, for now.

On the flip-side, every layer of abstraction creates demand for someone who understands the layer below it. The more software we automate, the more we need people who can fix the pipes when they burst.

We've all been replaced in small ways before. I started my career writing vanilla JS and jQuery code that has been replaced by higher level libraries and frameworks powering today's web apps. Abstractions make yesterday's hard problems trivial. Each time, the work shifts slightly. The only difference this time is that the shift is... vast.

I am the product and tech strategy machine now. I will feed the machine with ideas until I am replaced again and then I'll have to carve out another higher level role. The factory must grow.

If you're experimenting with AI in your workflow, I'd love to hear from you! I write more like this at blog.mariohayashi.com, and feel free to follow me on X: @logicalicy.