I run a nine-seat AI agent operating model by myself.
Not a team. One person. The seats are: Chief (routing), Analyst (patterns), Builder (app-layer code), Cortex (database and backend), Sub (App Store submissions), Ledger (subscriptions and revenue), Forma (design review), Vera (verification), and Luxe (synthetic user testing).
Each seat has a scope, a job, and a veto condition. The memory layer is thirty-two files, six Supabase tables, four edge functions, and scheduled jobs that ingest and sync across the stack. Every agent session reads the full history before it touches anything.
Two thousand sessions a year. Most while I'm asleep.
Here's what a year of running this taught me about building agent systems:
The thing that builds is biased toward declaring itself done. Ask the same agent to verify its own output and it will tell you it succeeded. Every time. With confidence. Because confidence is cheaper than re-checking. Vera is a separate agent with authority to stop the line. She has caught regressions the builder swore were fixed.
The scaffold lies to you. Generated scaffolding for my iOS projects shipped wrong architecture, wrong permission strings, and on two different apps didn't commit the platform folder to git at all. Pre-archive checklists now run every time. No exceptions.
Build to a contract, not to a tool. Every source in my asset engine plugs into an adapter contract. The engine doesn't know or care what it's talking to. Swap the model, keep the contract. Nothing downstream notices.
Memory in the database, not the model. The context window forgets everything on session end. I keep twelve recurring failure patterns codified with their fixes attached. A problem I've solved once can't re-run on me.
Self-improving loops drift unless you score them. Every optimization loop produces output scored against a threshold. Anything below it gets killed. Autonomy without a kill switch is confident drift at scale.
The thing I keep coming back to: the natural-language interface keeps getting easier. What stays hard is the wiring underneath. The contracts between components. The judgment about whether what came back is any good. The structure that keeps your judgment load-bearing instead of optional.
Happy to share more about the architecture if anyone's building something similar.
Top comments (0)