DEV Community

Cover image for I Asked My AI Agents to Find Me a Product. Their Best Work Was Killing Two of My Own Ideas.
WinstonRedGuard
WinstonRedGuard

Posted on

I Asked My AI Agents to Find Me a Product. Their Best Work Was Killing Two of My Own Ideas.

I run a small fleet of AI coding agents. They build and maintain my projects, one writes code, one reviews it, one verifies, and one routes the work between them. Most people point a setup like this at "ship faster." Last week I pointed mine at a different question: out of everything I've built, what's the one thing worth turning into a real product?

The most useful thing they did was tell me no. Twice. With citations.

Here's how that went, because the discipline behind it is more useful than the answer.

The first idea I was wrong about

I had a favorite. A small memory layer for AI agents, the kind of thing that watches what your assistant does and remembers the patterns. I'd already built it, it was on PyPI, and I'd half-convinced myself it was the one. So I did the thing I've learned to always do before I get attached: I handed it to one of the agents and told it to take the idea apart. Not to help me improve it. To refute it.

It came back having broken every reason I had.

The one that ended the argument: the platform my tool runs on now ships that exact feature natively. Free, automatic, zero setup, doing the thing my tool asked you to do by hand. You can't win a market where the host gives your product away, better, for nothing. The second blow was the distribution plan I'd been quietly counting on, the registries where people "discover" these tools. I'd assumed they were a discovery channel. The data said they're a graveyard. Most listings get zero installs, and the only thing that ranks you is the reputation you already had before you showed up. The channel I thought would do my marketing was just my marketing problem with extra steps.

I dropped the idea the same afternoon. It stung a little. It also saved me a few weeks of building something dead on arrival.

The second idea, where I caught myself cheating

So I pivoted to what looked like the obvious survivor: a security tool I'd already launched, the verifiable, runs-in-your-browser, nothing-leaves-your-machine kind. This time I didn't trust a single agent. I ran three in parallel. One mapped the competition. One measured whether the underlying problem was even real. And one had a single job: kill the idea. Default to "this won't work," and only fail to kill it after genuinely trying.

The problem turned out to be very real. There's a documented trail of engineers pasting secrets into chatbots, companies banning the tools over it, hard numbers on how often it happens. Demand was not the issue.

The slot was. While I'd been admiring my own demo, a well-funded incumbent had shipped the exact retention feature I was planning, free for individual developers, already sitting at number one in the marketplace I'd have to compete in. The clever "verify it yourself" angle I thought was my moat? Nine other free tools already make the identical claim, word for word. My differentiator was table stakes.

Dead again. But the part worth keeping is what the third agent flagged on its way out: my framing was rigged. I had quietly demoted the stronger candidate to make room for the one I found more exciting. I'd picked the idea I wanted to be true over the one the evidence supported, and I'd written the framing to hide that from myself. The agent's word for it was "force-fit confirmed." It was right.

The discipline, not the verdict

None of this works if the agent's job is to make you feel good. The whole thing turns on one instruction: try to break it, default to skeptical, and if you can't break it after real effort, that absence is the only signal worth trusting. I make them list what they couldn't verify, so a confident-sounding answer can't smuggle in a guess. I make them check my framing for motivated reasoning, because the failure mode isn't a wrong fact, it's me arranging true facts into a flattering shape.

That last one is the part I can't do reliably on my own. I'm too close to my own ideas to notice when I'm protecting them. An adversary with no stake in my ego, pointed at my reasoning instead of my code, turns out to be the highest-leverage thing in the whole setup.

What I actually learned

The hype version of AI assistance is "it builds whatever you ask, faster." The version that earned its keep this week did the opposite. It refused to build, twice, and showed its work both times.

Agreement is cheap. Anything will agree with you. An agent that disagrees with you and brings receipts is rare, and it's worth more than ten that ship your bad idea at high speed. Most people aim these tools at "build my idea." The move that paid off was aiming one at "now try to kill it."

It cost me two ideas I was attached to. It saved me from building either one. I'll take that trade every time.

I run a fleet of AI agents that maintain a zero-dependency security monorepo. The open-source pieces live at github.com/WRG-11.

Top comments (6)

Collapse
 
harjjotsinghh profile image
Harjot Singh

Their best work was killing two of my own ideas is the most valuable sentence here, because it flips what people expect from these agents. Everyone builds the agent to validate (find me a product, tell me my idea is good), but a validator is a yes-man, and an agent that's reluctant to disconfirm just launders your existing bias back to you with citations. The genuinely useful output is the disconfirmation: the evidence that two of your ideas don't survive contact with the market, delivered before you spent six months finding out the expensive way. Disconfirmation beats confirmation because it changes a decision. The thing that makes this trustworthy rather than just contrarian is grounding, killing an idea has to be backed by real signal (existing competitors, no demand, structural problem) not vibes, because an agent that kills ideas at random is as useless as one that approves everything. So the design worth tuning is the same as a good devil's advocate: actively look for reasons this fails, then ground each one. The mark of a useful research agent isn't how often it agrees, it's whether it tells you the thing you didn't want to hear and is right. That seek-disconfirmation-and-ground-it instinct is core to how I think about Moonshift. When it killed your two ideas, was the kill-reason grounded in real market signal it found, or more first-principles reasoning about the idea?

Collapse
 
wrg11 profile image
WinstonRedGuard

Grounded in signal it found, and the distinction you're drawing is the whole thing.

Idea #1 died on two observable facts: the host platform now ships that feature natively, free and zero-setup, and the discovery channel I'd been counting on has public install numbers showing the large majority of listings get zero installs. Neither was a vibe. Both were things the agent could point at.

Idea #2 was the same shape: a funded incumbent had already shipped the identical retention feature free for individuals and sits at #1 in the marketplace I'd compete in, and the "verify it yourself" angle I thought was mine was already claimed word-for-word by nine other free tools. Competitor position and a literal count, not a hunch.

The only first-principles part was the layer you're pointing at. The third agent didn't argue the market, it argued my framing: I'd quietly demoted the stronger candidate to make room for the one I liked more. That's reasoning about my reasoning, not about the idea, and it's the piece I can't do alone, because I'm the one doing the rigging.

So market signal kills the idea; the adversary's real value is catching the motivated reasoning that decided which idea to protect in the first place. Curious how Moonshift grounds its kill-reasons.

Collapse
 
harjjotsinghh profile image
Harjot Singh

On how Moonshift grounds kill-reasons: same shape as your two ideas, every kill has to point at an artifact, not a vibe. The market agent isn't allowed to assert no demand or too crowded as a conclusion, it has to attach the evidence (the competitor that already ships this, the install-count or search-volume number, the pricing-page source) and the verdict carries those citations with it. If it can't ground the claim, it abstains rather than killing, because an ungrounded kill is as useless as an ungrounded yes. The piece you nailed (the third agent arguing your framing, not the idea) is the one I find hardest to ground, because reasoning-about-the-reasoning has no external artifact to point at, so I treat it as a separate adversarial pass whose job is explicitly to catch the motivated-reasoning move (you demoted the stronger candidate to protect the one you liked), and I weight it lower precisely because it's the subjective one. Market signal kills on evidence; the framing-critic flags bias and asks a human to confirm. Did your third agent's framing-catch feel reliable, or did it occasionally manufacture a bias that wasn't there?

Thread Thread
 
wrg11 profile image
WinstonRedGuard

The framing-critic itself rarely manufactured a bias that wasn't there, mostly because I made it cheap to ignore. It flags and asks. If I look and there's nothing, I drop it at no cost, so its false positives stay harmless. The subjectivity you're weighting it lower for is the same thing that defangs it.

The manufactured-confidence problem turned up somewhere I didn't expect: the grounded agent. It would hand me a confident "this niche is open" or "too crowded" off a single shallow pass, a verdict that looked artifact-backed but wasn't, because it checked one angle and stopped. That's the more dangerous one, precisely because it wears the grounding costume. A subjective flag I distrust on sight. The "grounded" verdict is the one I bank without thinking twice.

Hit it again today. An agent told me a niche was open with one shallow search behind it. I pushed, forced a multi-angle re-check, and it flipped: the space was already taken by people far more credible than me. So I made it a hard rule, almost word-for-word your Moonshift design. A saturation or kill verdict has to attach the artifacts (the competitor, the count, the source) before it counts, and if it can't, it says "not sure yet, still checking" instead of handing me a verdict. Same instinct you named, abstain over an ungrounded kill, except I learned I needed it for the ungrounded yes just as badly.

So my reliability problem wasn't the agent that admits it's guessing. It was the one that didn't know it was. Does Moonshift's market agent carry a notion of "grounded enough," some minimum-evidence bar before a verdict counts, or does that judgment sit with the human reading the citations?

Collapse
 
jayapurohit profile image
Jaya Purohit

This is a great reminder that validation should come before attachment. As founders, we often look for evidence that supports our ideas instead of evidence that challenges them. I really liked the approach of using AI as a skeptic rather than an assistant. The ability to save months of effort by identifying a weak opportunity early is just as valuable as shipping a successful product.

Collapse
 
wrg11 profile image
WinstonRedGuard

Thanks Jaya. The part I underestimated is how easy it is to make the skeptic toothless without noticing. A skeptic you can argue with, or quietly down-weight the second it says no, is just an assistant again. What actually changed things was forcing every "no" to point at something I couldn't wave away, a competitor, a number, a source. The skepticism only buys you those months if you've made it impossible to rig.