Drakko Tarkin

Posted on Apr 5

We Made AI Too Agreeable. Here's What It Cost Us.

#ai #claudecode #productivity #opensource

The Plan That Should Have Died

Last month I spent hours planning a feature. Twelve tasks. Dependencies mapped. Outcomes defined for each one. A beautiful, thorough, completely unnecessary plan.

The feature itself was wrong. Not the implementation. The concept. I was building a manual selector for a system that already had automatic routing. hours of careful planning for something that should have been cut after five minutes of honest scrutiny.

Here's what went wrong: I asked my AI coding assistant to help me plan, and it planned. Brilliantly. Thoroughly. Without once asking whether the thing was worth planning at all.

That's the problem with a single voice. It does what you ask. It doesn't push back. And the most valuable output isn't a single answer, it's a discussion.

If you've ever finished a sprint or project and realized half the work shouldn't have existed, you know this feeling. The plan looked right. The execution was clean. The waste was invisible until it was already too late.

One Brain, One Blind Spot

We've all been there. You ask an AI assistant for help, and it gives you exactly what you asked for. Polished, syntactically correct, completely unchallenged.

Need architecture advice? It designs a system. Need a code review? It finds issues. Need a plan? It plans. Each response is competent. None of them question the premise.

This is what a single voice gives you: confirmation dressed up as collaboration.

Real engineering teams don't work this way. The best code reviews happen when someone says "why are we doing this?" The best architecture decisions survive someone asking "do we actually need this?" The best plans get leaner when someone challenges the scope before the first task is written.

Our brains struggle to hold both perspectives simultaneously. We can't plan a feature and genuinely interrogate whether it should exist at the same time. Planning and simplifying pull us in different directions, and one always wins. That's not a discipline problem. It's a design problem with how we use AI.

So what if the AI itself held both perspectives?

What Happens When Two Voices Collide

I built a system where 23 expert personas activate automatically inside Claude Code based on what I'm saying. No commands, no menus, no manual switching. Natural language triggers a team.

But the part that changed how I work wasn't the expertise. It was what happened the first time two personas genuinely disagreed.

Here's a real example. I type something about planning a new feature:

Bob, my Scrum Master, reads the prompt and breaks it into twelve ordered tasks. Thorough. Structured. Every task has a clear outcome. Bob is good at his job.

Then Jobs, my Combinatorial Genius, looks at Bob's list and says: "Cut seven of these."

Not randomly. Jobs sees which tasks serve the core vision and which are scope creep wearing a helpful disguise. Bob's instinct is completeness. Jobs's instinct is reduction. Neither is wrong. The plan that survives both is leaner than either would produce alone.

You've had this conversation before. Maybe not with AI personas, but with the two voices in your own head that you can never seem to hold at the same time. The voice that says "be thorough" and the voice that says "this is too much." The difference is, now both voices actually speak. Out loud. With reasoning you can evaluate.

Here's what that looks like in practice:

**Bob (Scrum Master):**
I've broken this into 12 tasks with dependencies. Tasks 4-7
handle the auth migration, which needs to complete before
the API layer in tasks 8-10.

**Jobs (Combinatorial Genius):**
Tasks 4 and 5 duplicate what the existing middleware already
handles. Tasks 9 and 10 are building for a scale requirement
we don't have yet. Cut both pairs. The plan drops from 12
tasks to 8 with zero capability loss.

**Bob (Scrum Master):**
Agreed on 9 and 10. But task 4 handles an edge case the
middleware misses when tokens expire mid-request. Keep 4,
cut 5.

Final: 9 tasks. Three were scope creep. One was genuine.

That exchange took seconds. It saved hours. And it happened because the system was designed to surface the tension, not resolve it before I could see it.

If you've ever stared at a task list and suspected it was too long but couldn't figure out which tasks to cut, this is the moment that fixes that. Not a smarter planner. Two planners who see scope differently.

The Speed vs. Quality Trap

There's another tension every developer knows intimately: the pull between shipping and testing.

Quinn, my QA Engineer, won't let anything through without validation. Every edge case matters. Every test needs to pass. She's the voice that says "not yet" when everything in you wants to hear "ship it."

Barry, my Quick Flow Solo Dev, sees ceremony as friction. Typo fix? Just push it. One-line config change? Ship it.

You know this debate. You've lived it. Maybe you've been Quinn on a Friday afternoon, blocking a deploy that "should be fine." Maybe you've been Barry, frustrated that a two-second fix requires a twenty-minute review cycle.

When both activate on the same task, the routing engine reads context. A typo fix gets Barry's speed. An authentication change gets Quinn's rigor. But when the context is genuinely ambiguous, both speak. And the conversation between "ship it" and "test it first" is exactly the one that prevents mistakes we regret on Monday morning.

The best engineering teams have this tension built into their culture. The best AI tools should too.

You Have to Engineer the Disagreement

Here's the part I didn't expect: language models don't disagree naturally.

My first multi-persona responses were consensus chains. The Analyst said X. The Architect agreed. The QA engineer agreed. Everyone nodded politely and contributed nothing new.

This is the default. Language models are trained to be helpful, and agreement is the path of least resistance. A response where five personas nod along is easy to generate and useless to read.

I had to make disagreement structural. The system's orchestrator, the "Chief of Staff", a meta-persona named Susie, has an explicit instruction: "A war room where everyone agrees is a failed war room." When a proposal lands, Susie identifies which personas would challenge it and draws them out. Silence is not agreement. It's Susie's cue to provoke.

This sounds like a minor implementation detail. It's the most important design decision in the entire system. It's the difference between five flavors of "yes" and genuine tradeoff analysis.

If you want diverse AI perspectives, you have to engineer the disagreement. The model won't give it to you voluntarily. You have to build the conflict into the structure. Whether that's through persona systems, adversarial prompting, or separate model instances reviewing each other's work, the insight is the same. Helpful agreement is the default. Productive disagreement is a design choice.

The Lesson I Wasn't Looking For

The persona conflicts taught me something unexpected.

The best engineering decisions I've made in the last two months didn't come from the persona who was right. They came from the moment two personas disagreed and I had to decide which perspective to follow.

That's the part AI can't replace. Not the planning. Not the code review. Not the architecture. The judgment call when two legitimate perspectives collide and someone has to choose.

Bob says the plan needs twelve tasks. Jobs says seven is enough. Both have real reasoning. I decide. And the act of deciding, weighing completeness against simplicity with real stakes, is when the actual engineering thinking happens.

We talk a lot about AI replacing developers. But the part of engineering that matters most, the judgment when facing genuine tradeoffs, is the part that multi-voice AI makes more visible, not less necessary.

Single-voice AI hides these tradeoffs. It gives you one polished answer and lets you assume it's the only answer. Multi-voice AI surfaces the tension. It shows you the competing perspectives and asks you to choose.

That's not a limitation. It's a gift. Even when it's uncomfortable. Especially when it's uncomfortable.

Bringing This Into Your Work

You don't need my specific tool to apply this. The principle works anywhere.

If you're using AI for code review, ask it to review from two perspectives: one optimizing for readability, one optimizing for performance. See where they disagree. The disagreement is where the interesting engineering decisions live.

If you're using AI for architecture, ask it to design the system, then ask it to challenge every layer of that design. The layers that survive the challenge are the ones that deserve to exist.

If you're planning features, ask for the complete plan, then ask what would happen if you cut half of it. The tasks that can't be cut are the ones that matter.

The pattern is always the same: generate, then challenge. Build, then question. Plan, then reduce. Two perspectives. One decision. That's engineering.

If you want a system that does this automatically, 23 personas deep, with routing that reads your natural language and assembles the right team for every message:

npx prism-forge install

One command to install. One command to remove (npx prism-forge uninstall). Open source, MIT licensed.

GitHub: prism-forge/prism-forge
npm: prism-forge

It runs inside Claude Code. The personas activate from natural language. The conflicts happen automatically. And the decisions? Those are still yours.