I run a small fleet of AI coding agents. They build and maintain my projects, one writes code, one reviews it, one verifies, and one routes the work between them. Most people point a setup like this at "ship faster." Last week I pointed mine at a different question: out of everything I've built, what's the one thing worth turning into a real product?
The most useful thing they did was tell me no. Twice. With citations.
Here's how that went, because the discipline behind it is more useful than the answer.
The first idea I was wrong about
I had a favorite. A small memory layer for AI agents, the kind of thing that watches what your assistant does and remembers the patterns. I'd already built it, it was on PyPI, and I'd half-convinced myself it was the one. So I did the thing I've learned to always do before I get attached: I handed it to one of the agents and told it to take the idea apart. Not to help me improve it. To refute it.
It came back having broken every reason I had.
The one that ended the argument: the platform my tool runs on now ships that exact feature natively. Free, automatic, zero setup, doing the thing my tool asked you to do by hand. You can't win a market where the host gives your product away, better, for nothing. The second blow was the distribution plan I'd been quietly counting on, the registries where people "discover" these tools. I'd assumed they were a discovery channel. The data said they're a graveyard. Most listings get zero installs, and the only thing that ranks you is the reputation you already had before you showed up. The channel I thought would do my marketing was just my marketing problem with extra steps.
I dropped the idea the same afternoon. It stung a little. It also saved me a few weeks of building something dead on arrival.
The second idea, where I caught myself cheating
So I pivoted to what looked like the obvious survivor: a security tool I'd already launched, the verifiable, runs-in-your-browser, nothing-leaves-your-machine kind. This time I didn't trust a single agent. I ran three in parallel. One mapped the competition. One measured whether the underlying problem was even real. And one had a single job: kill the idea. Default to "this won't work," and only fail to kill it after genuinely trying.
The problem turned out to be very real. There's a documented trail of engineers pasting secrets into chatbots, companies banning the tools over it, hard numbers on how often it happens. Demand was not the issue.
The slot was. While I'd been admiring my own demo, a well-funded incumbent had shipped the exact retention feature I was planning, free for individual developers, already sitting at number one in the marketplace I'd have to compete in. The clever "verify it yourself" angle I thought was my moat? Nine other free tools already make the identical claim, word for word. My differentiator was table stakes.
Dead again. But the part worth keeping is what the third agent flagged on its way out: my framing was rigged. I had quietly demoted the stronger candidate to make room for the one I found more exciting. I'd picked the idea I wanted to be true over the one the evidence supported, and I'd written the framing to hide that from myself. The agent's word for it was "force-fit confirmed." It was right.
The discipline, not the verdict
None of this works if the agent's job is to make you feel good. The whole thing turns on one instruction: try to break it, default to skeptical, and if you can't break it after real effort, that absence is the only signal worth trusting. I make them list what they couldn't verify, so a confident-sounding answer can't smuggle in a guess. I make them check my framing for motivated reasoning, because the failure mode isn't a wrong fact, it's me arranging true facts into a flattering shape.
That last one is the part I can't do reliably on my own. I'm too close to my own ideas to notice when I'm protecting them. An adversary with no stake in my ego, pointed at my reasoning instead of my code, turns out to be the highest-leverage thing in the whole setup.
What I actually learned
The hype version of AI assistance is "it builds whatever you ask, faster." The version that earned its keep this week did the opposite. It refused to build, twice, and showed its work both times.
Agreement is cheap. Anything will agree with you. An agent that disagrees with you and brings receipts is rare, and it's worth more than ten that ship your bad idea at high speed. Most people aim these tools at "build my idea." The move that paid off was aiming one at "now try to kill it."
It cost me two ideas I was attached to. It saved me from building either one. I'll take that trade every time.
I run a fleet of AI agents that maintain a zero-dependency security monorepo. The open-source pieces live at github.com/WRG-11.
Top comments (0)