Yesterday I published a piece arguing that a fully autonomous AI startup loop hits two ceilings: an idea ceiling and an execution ceiling. The execution ceiling, I wrote, is where thinking is fully autonomous but doing gets stopped at human gates — CAPTCHA, KYC, capital, "are you human?"
That framing was correct but coarse. I wrote it from the armchair. So the next day I went and ran the actual experiment, because a theory about where AI stops is worthless until you push an agent right up to the wall and watch exactly where it bounces.
This is the field report. The conclusion is sharper than the theory: labor is delegable, identity is not. And the gap between those two is much smaller than I expected.
The setup
I picked the most concrete, highest-stakes execution task I could find that wasn't just "post some content": configuring the payout settings on a digital-product platform — the screen where you tell the platform which bank account should receive your money.
This is a good test because it's not toy automation. It touches:
- Structured financial data (bank codes, branch codes, account numbers)
- Government-shaped identity data (legal name, address, date of birth)
- Multi-script input (in my locale, names have to be entered in more than one writing system)
- Real validation (the form rejects malformed input; you can't fake your way past it)
- A persistence step (save, and the platform actually stores it against a real account)
If an AI agent can drive this to completion, "autonomous execution" stops being a slogan. If it can't, I wanted to know precisely which field it died on.
I drove the agent through a real browser session (CDP-attached, so it was operating an actual logged-in browser, not a sandbox). I gave it the goal — "complete the payout configuration" — and the personal facts it would obviously need, and then I watched.
What the agent did entirely on its own
More than I expected. Specifically:
It resolved bank and branch codes by searching. I did not hand it the numeric codes for the bank or the branch. It went and found them — bank code, branch code — from public references, then entered them in the correct fields. This matters more than it sounds, and I'll come back to it.
It handled multi-script name entry. My locale requires the account holder's name in multiple writing systems — the standard form, a phonetic form, and a romanized form. The agent did the transliteration across all of them and placed each in the right field. This is exactly the kind of fiddly, error-prone, "ugh I have to do this carefully" task that humans hate and quietly get wrong.
It structured the address. Not "paste a string" — the form wanted address split into components, and it decomposed a plain address into the structured fields the form expected.
It passed validation. Malformed entries got rejected by the form, the agent read the rejection, corrected, and re-submitted. No human in the loop for the correction cycle.
It saved. The configuration persisted. The form was, functionally, done.
I want to be honest about how much that is. Filling out a financial form, in a foreign-to-the-form writing system, looking up the institutional codes yourself, decomposing freeform data into structured fields, recovering from validation errors — if a human assistant did that for you, you'd call it competent work. The agent did it without supervision on the mechanics. The mechanics were never the wall.
What actually required a human
After all of that, exactly two categories of thing could not come from the agent. Only two. And they're more specific than the "KYC / CAPTCHA / capital" bucket I waved at yesterday.
1. Person-specific, non-public facts
The account number. The date of birth. The exact residential address. These are things the agent literally cannot know, because they aren't anywhere it can read. They're not a capability gap — the agent is perfectly capable of typing an account number into a field; it proved that. It's an information-location gap. The data lives in my head and on my documents, not in any corpus or any search result.
And here's the subtle part: once I spoke those facts, the agent did the input. I didn't fill in the account number; I said the account number, and the agent placed it. So even for the human-only data, the human contributes the fact, not the labor. The typing, the field-matching, the format-correcting — still the machine.
Contrast this with the bank/branch codes. Those are also numbers, also required, also the kind of thing you'd assume a human has to provide. But they're public. They're scattered and annoying to find, but they're findable. So the agent found them. The line isn't "numbers humans must provide" — it's non-public facts humans must provide. Public-but-scattered data is squarely on the AI's side of the wall now. Search closes that gap.
That reframes the whole thing. The human's job in a procedure like this is not "provide the data." It's "provide the private data." Everything public, everything derivable, everything structural — the agent absorbs.
2. Proof that I am this specific person
This is the real wall. Not "are you a human?" — yesterday's framing — but its final form: "are you THIS human?"
A money-receiving setup eventually wants to bind the account to a verified legal identity: a government-issued ID, a confirmation that the person configuring this is the person who legally owns the destination account. That step is not an information problem and not a labor problem. It's an identity problem. There is no string I can dictate that lets the agent be me to a verifier. Identity is the one input that, by design, cannot be relayed through a proxy — because the entire point of identity verification is to defeat proxies.
This is the clean edge I was looking for. Everything upstream of it — the entire form — is delegable. The identity bind is not, and not by accident. It's not weakly defended; it's the thing the whole system exists to protect.
The precise statement
Yesterday: thinking is autonomous, execution is gated by humans.
After actually running it: that's true, but the gate is narrow and I can now describe its exact shape.
Labor is fully delegable to the agent. What is not delegable is (a) facts that are private to the principal, and (b) proof of the principal's identity. Everything else — including public-but-hard-to-find data, transliteration, structuring, validation recovery, and persistence — crosses to the machine.
Two things follow from that, and they're useful if you're building with agents:
The human-in-the-loop surface is smaller than people assume. When teams say "this needs a human," they usually mean the whole task. In practice the irreducibly-human part of a procedure like this was two dictated facts and one identity check. Everything wrapped around those — the 90% that is tedious form labor — is automatable today. If your mental model is "forms need humans," you're leaving most of the work on the table.
The remaining 10% is not a temporary limitation — it's structural. I keep wanting to treat the identity bind as a gap that better tooling will close. It won't, and it shouldn't. Private facts are private by definition; identity proof is anti-proxy by purpose. Better models don't erode either one. So when you design an agent workflow that touches money or legal standing, don't architect for "full autonomy soon." Architect for "autonomous up to the identity bind, then a clean, minimal human handoff." Design the handoff to be exactly two things wide: dictate the private facts, present the identity. Nothing more should fall to the human.
Why I think this matters beyond one form
The interesting question from the first piece was whether a pure-AI loop could ever be a business rather than just think like one. This experiment narrows the answer.
An agent can run essentially the entire operational body of a business — the research, the structuring, the form labor, the error recovery, the persistence. What it cannot do is be the legal person the business hangs on. The principal stays human not because the principal is smarter or more capable in the moment — on the mechanics, they're slower — but because the principal is the identity anchor. The one irreducible human role left is: be the person the system is allowed to trust.
That's a strangely small role. It's not founder-as-doer. It's founder-as-anchor. You dictate what's private, you prove who you are, and the machine does the rest of the body of work.
I find that genuinely clarifying rather than discouraging. Yesterday I thought the execution ceiling was a vague wall somewhere in "doing." Today I know it's a thin, sharp line with a precise location: it runs between labor and identity, and labor is already on the far side.
Build first. The boundary draws itself once you push something real all the way to the edge.
— Sai
If this was useful: I packaged the prompts I actually use to run autonomous agents into two field packs — 100 Prompts for Autonomous Agents and Claude Code Power-User Prompts. Same build-first mindset, ready to paste into your terminal.
Top comments (0)