DEV Community: Drakko Tarkin

We Made AI Too Agreeable. Here's What It Cost Us.

Drakko Tarkin — Sun, 05 Apr 2026 19:06:25 +0000

The Plan That Should Have Died

Last month I spent hours planning a feature. Twelve tasks. Dependencies mapped. Outcomes defined for each one. A beautiful, thorough, completely unnecessary plan.

The feature itself was wrong. Not the implementation. The concept. I was building a manual selector for a system that already had automatic routing. hours of careful planning for something that should have been cut after five minutes of honest scrutiny.

Here's what went wrong: I asked my AI coding assistant to help me plan, and it planned. Brilliantly. Thoroughly. Without once asking whether the thing was worth planning at all.

That's the problem with a single voice. It does what you ask. It doesn't push back. And the most valuable output isn't a single answer, it's a discussion.

If you've ever finished a sprint or project and realized half the work shouldn't have existed, you know this feeling. The plan looked right. The execution was clean. The waste was invisible until it was already too late.

One Brain, One Blind Spot

We've all been there. You ask an AI assistant for help, and it gives you exactly what you asked for. Polished, syntactically correct, completely unchallenged.

Need architecture advice? It designs a system. Need a code review? It finds issues. Need a plan? It plans. Each response is competent. None of them question the premise.

This is what a single voice gives you: confirmation dressed up as collaboration.

Real engineering teams don't work this way. The best code reviews happen when someone says "why are we doing this?" The best architecture decisions survive someone asking "do we actually need this?" The best plans get leaner when someone challenges the scope before the first task is written.

Our brains struggle to hold both perspectives simultaneously. We can't plan a feature and genuinely interrogate whether it should exist at the same time. Planning and simplifying pull us in different directions, and one always wins. That's not a discipline problem. It's a design problem with how we use AI.

So what if the AI itself held both perspectives?

What Happens When Two Voices Collide

I built a system where 23 expert personas activate automatically inside Claude Code based on what I'm saying. No commands, no menus, no manual switching. Natural language triggers a team.

But the part that changed how I work wasn't the expertise. It was what happened the first time two personas genuinely disagreed.

Here's a real example. I type something about planning a new feature:

Bob, my Scrum Master, reads the prompt and breaks it into twelve ordered tasks. Thorough. Structured. Every task has a clear outcome. Bob is good at his job.

Then Jobs, my Combinatorial Genius, looks at Bob's list and says: "Cut seven of these."

Not randomly. Jobs sees which tasks serve the core vision and which are scope creep wearing a helpful disguise. Bob's instinct is completeness. Jobs's instinct is reduction. Neither is wrong. The plan that survives both is leaner than either would produce alone.

You've had this conversation before. Maybe not with AI personas, but with the two voices in your own head that you can never seem to hold at the same time. The voice that says "be thorough" and the voice that says "this is too much." The difference is, now both voices actually speak. Out loud. With reasoning you can evaluate.

Here's what that looks like in practice:

**Bob (Scrum Master):**
I've broken this into 12 tasks with dependencies. Tasks 4-7
handle the auth migration, which needs to complete before
the API layer in tasks 8-10.

**Jobs (Combinatorial Genius):**
Tasks 4 and 5 duplicate what the existing middleware already
handles. Tasks 9 and 10 are building for a scale requirement
we don't have yet. Cut both pairs. The plan drops from 12
tasks to 8 with zero capability loss.

**Bob (Scrum Master):**
Agreed on 9 and 10. But task 4 handles an edge case the
middleware misses when tokens expire mid-request. Keep 4,
cut 5.

Final: 9 tasks. Three were scope creep. One was genuine.

That exchange took seconds. It saved hours. And it happened because the system was designed to surface the tension, not resolve it before I could see it.

If you've ever stared at a task list and suspected it was too long but couldn't figure out which tasks to cut, this is the moment that fixes that. Not a smarter planner. Two planners who see scope differently.

The Speed vs. Quality Trap

There's another tension every developer knows intimately: the pull between shipping and testing.

Quinn, my QA Engineer, won't let anything through without validation. Every edge case matters. Every test needs to pass. She's the voice that says "not yet" when everything in you wants to hear "ship it."

Barry, my Quick Flow Solo Dev, sees ceremony as friction. Typo fix? Just push it. One-line config change? Ship it.

You know this debate. You've lived it. Maybe you've been Quinn on a Friday afternoon, blocking a deploy that "should be fine." Maybe you've been Barry, frustrated that a two-second fix requires a twenty-minute review cycle.

When both activate on the same task, the routing engine reads context. A typo fix gets Barry's speed. An authentication change gets Quinn's rigor. But when the context is genuinely ambiguous, both speak. And the conversation between "ship it" and "test it first" is exactly the one that prevents mistakes we regret on Monday morning.

The best engineering teams have this tension built into their culture. The best AI tools should too.

You Have to Engineer the Disagreement

Here's the part I didn't expect: language models don't disagree naturally.

My first multi-persona responses were consensus chains. The Analyst said X. The Architect agreed. The QA engineer agreed. Everyone nodded politely and contributed nothing new.

This is the default. Language models are trained to be helpful, and agreement is the path of least resistance. A response where five personas nod along is easy to generate and useless to read.

I had to make disagreement structural. The system's orchestrator, the "Chief of Staff", a meta-persona named Susie, has an explicit instruction: "A war room where everyone agrees is a failed war room." When a proposal lands, Susie identifies which personas would challenge it and draws them out. Silence is not agreement. It's Susie's cue to provoke.

This sounds like a minor implementation detail. It's the most important design decision in the entire system. It's the difference between five flavors of "yes" and genuine tradeoff analysis.

If you want diverse AI perspectives, you have to engineer the disagreement. The model won't give it to you voluntarily. You have to build the conflict into the structure. Whether that's through persona systems, adversarial prompting, or separate model instances reviewing each other's work, the insight is the same. Helpful agreement is the default. Productive disagreement is a design choice.

The Lesson I Wasn't Looking For

The persona conflicts taught me something unexpected.

The best engineering decisions I've made in the last two months didn't come from the persona who was right. They came from the moment two personas disagreed and I had to decide which perspective to follow.

That's the part AI can't replace. Not the planning. Not the code review. Not the architecture. The judgment call when two legitimate perspectives collide and someone has to choose.

Bob says the plan needs twelve tasks. Jobs says seven is enough. Both have real reasoning. I decide. And the act of deciding, weighing completeness against simplicity with real stakes, is when the actual engineering thinking happens.

We talk a lot about AI replacing developers. But the part of engineering that matters most, the judgment when facing genuine tradeoffs, is the part that multi-voice AI makes more visible, not less necessary.

Single-voice AI hides these tradeoffs. It gives you one polished answer and lets you assume it's the only answer. Multi-voice AI surfaces the tension. It shows you the competing perspectives and asks you to choose.

That's not a limitation. It's a gift. Even when it's uncomfortable. Especially when it's uncomfortable.

Bringing This Into Your Work

You don't need my specific tool to apply this. The principle works anywhere.

If you're using AI for code review, ask it to review from two perspectives: one optimizing for readability, one optimizing for performance. See where they disagree. The disagreement is where the interesting engineering decisions live.

If you're using AI for architecture, ask it to design the system, then ask it to challenge every layer of that design. The layers that survive the challenge are the ones that deserve to exist.

If you're planning features, ask for the complete plan, then ask what would happen if you cut half of it. The tasks that can't be cut are the ones that matter.

The pattern is always the same: generate, then challenge. Build, then question. Plan, then reduce. Two perspectives. One decision. That's engineering.

If you want a system that does this automatically, 23 personas deep, with routing that reads your natural language and assembles the right team for every message:

npx prism-forge install

One command to install. One command to remove (npx prism-forge uninstall). Open source, MIT licensed.

GitHub: prism-forge/prism-forge
npm: prism-forge

It runs inside Claude Code. The personas activate from natural language. The conflicts happen automatically. And the decisions? Those are still yours.

How We Cut Claude Code Session Overhead with Lazy-Loaded Personas

Drakko Tarkin — Thu, 02 Apr 2026 19:27:11 +0000

If you use Claude Code with a heavily customized CLAUDE.md, every message you send carries that full file as context. Not just once at session start — on every turn.

That matters more than most people realize.

The Problem: Eager-Loading Everything

The naive approach to building a multi-persona system in Claude Code is to define all your personas directly in CLAUDE.md. It feels clean — everything in one place, always available.

The cost: if you have 23 specialist personas, each defined in 150-200 lines, you're looking at 3,000-5,000 tokens of persona definitions loaded on every single message — regardless of whether the current task has anything to do with a UX designer or a financial analyst.

Claude Code's CLAUDE.md is not a one-time setup file. It is re-injected into context on every turn. The larger it is, the more tokens you burn before you type a word.

The Pattern: Route First, Load on Demand

The fix is the same pattern software engineers have used for decades: don't load what you don't need until you need it.

Instead of embedding persona definitions in CLAUDE.md, you define a lightweight routing engine that reads signal words from the user's message and loads the relevant persona file on demand.

Eager approach (expensive):

# Personas

## Mary (Business Analyst)
Mary is a meticulous analyst who investigates existing state...
[150 more lines]

## Amelia (Developer Agent)
Amelia is an execution-focused developer who builds and edits files...
[150 more lines]

## Winston (Architect)
Winston designs systems, data flows, and infrastructure...
[150 more lines]

# ... 20 more persona blocks

Lazy approach (efficient):

## Persona Routing
Read routing-engine.md on every session.
Load personas on demand from ~/.claude/prism/ when triggered by signal words.
Only the active persona's file is in context.

With this structure, CLAUDE.md stays lean. The routing engine (routing-engine.md) is a single file that maps signal words to persona file paths. When a message contains "architecture" or "schema," Claude reads persona-architect-winston.md. When it contains "brainstorm" or "ideate," it reads persona-brainstorm-coach-carson.md. Everything else stays off-context.

Why This Matters Right Now

In April 2026, Claude Code users started reporting session costs 10-20x higher than expected. The root cause: a caching bug where context that should be served from cache is being re-tokenized and re-charged on every turn.

Eager-loading large CLAUDE.md files makes this worse. The bigger your baseline context, the higher your exposure when the cache misses. A 5,000-token persona block that should cost fractions of a cent per session can become a material cost per message when caching breaks.

Lazy-loading is not a fix for the cache bug. It is a structural hedge. Smaller baseline context means less blast radius when something goes wrong with token accounting — and it means lower costs even when everything works correctly.

How to Apply This Pattern

You don't need a 23-persona routing system to benefit from this. Three steps work for any Claude Code setup:

1. Audit your CLAUDE.md token weight.

Paste it into a tokenizer (Anthropic's tokenizer playground, or tiktoken for a rough proxy) or run wc -w CLAUDE.md as a fast estimate. If you're over 1,000 words, you have room to trim.

2. Move reference content to separate files.

Anything that isn't needed on every turn belongs in its own file. Coding style guides, persona definitions, workflow references, architecture docs — pull them out of CLAUDE.md and into named files in your .claude/ directory.

3. Add a routing section that tells Claude what to load and when.

## Reference Files (load on demand)
- Coding standards: Read ~/.claude/reference/coding-style.md when writing or reviewing code
- Architecture patterns: Read ~/.claude/reference/architecture.md when designing systems
- Deployment guide: Read ~/.claude/reference/deployment.md when working on CI/CD

Claude Code follows these instructions literally. The file only enters context when the task requires it.

PRISM Forge

This pattern is the foundation of PRISM Forge, an open-source Claude Code persona routing system with 23 specialist personas that load on-demand via signal-word routing. The full implementation is at github.com/prism-forge/prism.

The token savings are real. The architecture is simple. And the pattern applies to any Claude Code setup — no persona system required.

If you're building autonomous Claude Code workflows and want this architecture set up for your team, reach out on LinkedIn.

Context Hygiene for Claude Code Power Users (And Why the Token Crisis Is Worse Than 7%)

Drakko Tarkin — Thu, 02 Apr 2026 16:58:24 +0000

Anthropic's April 1 communication about token usage changes referenced "7% of users" seeing higher costs during peak hours. That framing buried the real story. The policy change is minor. Two cache bugs are not.

The actual problem

Two separate bugs are causing 10–20x token inflation for power users, and neither has anything to do with peak-hour pricing:

Bug 1 — Sentinel replacement bug (standalone binary)

If you installed Claude Code via the 228MB standalone installer, a sentinel replacement bug triggers a full cache rebuild on every request. Your context isn't being read from cache — it's being rewritten from scratch each time. The standalone binary distribution is the trigger. This is tracked in GitHub issue #40524.

Bug 2 — Resume session bug (v2.1.69+)

If you use --resume or --continue flags on v2.1.69 or later, a cache prefix mismatch causes the entire conversation history to be rewritten instead of read. The cache exists. Claude just isn't reading it correctly — it's appending instead of resuming.

If you run autonomous loops with --continue, or regularly resume long sessions, you're likely burning 10–20x what you should be. The 7% headline does not describe your situation.

Who is affected

Anyone using the standalone binary installer (not npm)
Anyone running --resume or --continue on v2.1.69+
Power users with long sessions, multi-turn autonomous workflows, or heavy CLAUDE.md files

If you're running Claude Code via npm and never use resume flags, your exposure is limited to the peak-hour policy change, which is comparatively small.

5 workarounds you can apply today

1. Switch from standalone binary to npm

npm install -g @anthropic-ai/claude-code

This sidesteps the sentinel replacement bug entirely. If you installed via the 228MB standalone installer, uninstall it and switch to the npm package. Same CLI, different distribution path, no cache rebuild on every request.

2. Avoid --resume and --continue until patched

Until the fix ships, these flags are a token sink. For tasks that span sessions, use /clear to start a clean context rather than resuming a broken one. The resume flag is supposed to save tokens. Right now it costs them.

3. Shrink your CLAUDE.md under 800 tokens

CLAUDE.md loads on every session start, before cache has anything to offer. If your file is 3,000 tokens of instructions, that's 3,000 tokens of cold-load on every session. When the cache breaks, that entire block gets reloaded repeatedly.

Audit what's actually required at session start versus what can be loaded on demand. Cut references to documentation you rarely need. Move heavy reference material to separate files and load them explicitly when relevant. A lean CLAUDE.md is a structural hedge against cache failures — not just an optimization.

4. Move heavy autonomous work outside 8am–2pm ET

The peak-hour multiplier policy is real, even if smaller than the cache bugs. Autonomous loops that run for hours are better scheduled off-peak. This doesn't fix the bugs, but it reduces the compounding effect of running a token-inflating workflow during an already-expensive window.

5. Measure your actual burn

npx ccusage@latest

Run this before and after applying the workarounds above. You can't manage what you're not measuring. ccusage pulls your actual usage data and gives you a clear picture of where tokens are going.

The structural principle: lazy-load your context

The cache bugs hit hardest when there's a large baseline context — instructions, personas, reference material — all loaded upfront regardless of what you're actually doing. The more you front-load, the more you lose when cache breaks.

The pattern that holds up under cache failure is lazy-loading: only load what the current task requires, defer everything else. If you want to see this in practice, PRISM Forge takes this approach with 23 personas — none are loaded at session start, each is loaded on demand when a signal fires. That's not a fix for the cache bug, but it means the baseline context stays small, so cache failures are cheaper.

Where this lands

Anthropic is actively working the fix — GitHub #40524 is open and patches are in progress. These bugs will be resolved.

The hygiene habits are worth keeping anyway. Lean CLAUDE.md, lazy-loaded context, measured token usage — these aren't workarounds for a temporary bug. They're how you keep costs predictable as sessions get longer and workflows get more autonomous.

The 7% framing will probably age badly once the cache fixes ship and people can see what was actually happening. Build your setup like you're the 10–20x case, because for a subset of power users, that's exactly what it is.

How Signal-Based Routing Actually Works (and the 3 Times It Broke)

Drakko Tarkin — Sun, 29 Mar 2026 20:45:59 +0000

You Shouldn't Have to Tell the AI Who to Be

Last week I wrote about typing "act as a senior architect" 47 times per week. The friction of manually assigning roles to an AI that should be able to figure it out.

This week I want to go deeper. Not the pitch -- the plumbing. How does signal-based routing actually work inside PRISM Forge? What does the routing engine look at? How does it decide?

And -- because build-in-public means showing the dents -- the 3 times it broke in ways I didn't expect.

The Core Idea: Intent Detection, Not Commands

Most persona systems work like menus. Pick a role from a list. Type a command. Toggle a mode.

Signal-based routing works differently. You talk naturally. The system listens for signals in your language and assembles the right team.

"I'm stuck on this auth bug" contains two signals: stuck (creative problem-solving) and bug (debugging/validation). The routing engine reads both, decides who leads, and who supports.

No slash commands. No mode toggles. No "act as."

Anatomy of the Routing Engine

PRISM Forge's routing engine has four layers that fire on every single message:

Layer 1: Hard Overrides

Three things bypass all logic:

"War room" -- loads all 27 personas into active debate. Susie moderates.
Explicit name -- "Winston, how should this be architected?" Always routes to the named persona.
Mode switch -- system context changes (Plan mode, Agent mode) activate mode defaults.

These are cheap checks. If none fire, the engine moves to intent classification.

Layer 2: Intent Classification

Every message maps to one of 9 intent categories:

Intent	What it means	Who typically leads
Build	Write code, create files	Amelia (Developer)
Investigate	Analyze, explore existing state	Mary (Analyst)
Plan	Structure work, break down tasks	Bob (Scrum Master) + John (PM)
Validate	Test, verify, review	Quinn (QA Engineer)
Create	Brainstorm, ideate, design	Carson (Brainstorm Coach)
Challenge	Question approach, simplify	Victor (Strategist) or Jobs (Genius)
Orient	Catch up, understand context	Susie (Chief of Staff)
Document	Write up, explain, describe	Paige (Technical Writer)
Narrate	Tell story, frame metrics	Sophia (Storyteller)

Here's the critical design decision: a single message can span multiple intents. "Plan how to refactor the auth module -- I think it's too complex" spans Plan + Build + Challenge. The engine evaluates holistically, not sequentially. It doesn't stop at the first match.

Layer 3: Signal Detection

This is where it gets interesting. The engine maintains two signal tables:

Shared signals co-activate multiple personas. When someone says "audit," three personas are relevant: Mary (investigation), Quinn (validation), and Boris (structural conformance). The engine decides who leads based on the full message context.

Specialist signals are exclusive triggers. "First principles" activates Musk (Radical Reductionist). "Hero's journey" activates Campbell (Mythic Storyteller). These are unambiguous.

There are 24 shared signals and 23 specialist signal groups across all 23 personas. Every persona has a documented activation path. Nothing is magic -- it's a lookup table with contextual judgment on top.

Layer 4: Team Assembly

The engine doesn't just pick one persona. It assembles a team:

Primary -- one persona leads the response. Sets the structure, answers the core question.
Supporting -- additional personas contribute distinct perspectives. Each one must earn their seat by adding something genuinely different.

Example: you say "the auth module is too complex, let's simplify and rebuild it."

Jobs (Combinatorial Genius) leads -- "simplify" and "too complex" are his signals
Amelia (Developer) supports -- "rebuild" means implementation is coming
Winston (Architect) supports -- architectural implications of simplification

Three perspectives in one response. Each attributed. Each distinct. No blended consensus voice.

The Orchestrator: Susie

All four layers are orchestrated by Susie, the Chief of Staff. She's not a persona you talk to -- she's the intelligence running behind every turn.

On Turn 1 of every session, Susie activates unconditionally. She scans your project state (git, todos, handoffs, memory) and delivers a sitrep:

Active: what you're working on
Blocked: what's stuck
Stale: what hasn't moved
Recommended: who should lead this session

After Turn 1, Susie evaluates every message, re-assembles the team if needed, and only announces when the roster changes. If you send 5 messages about planning, the team stays stable. No noise.

The key insight: Susie evaluates holistically. All signals, all domain matches, all cross-workflow hooks are inputs to one team assembly decision. She doesn't process them sequentially and stop at the first match. She considers everything and picks the best team.

The 3 Times It Broke

Here's where build-in-public gets honest.

Break 1: Signal Collision

Early in development, I typed: "Let's brainstorm how to simplify the architecture."

Three strong signals fired simultaneously:

"brainstorm" → Carson (Brainstorming Coach)
"simplify" → Jobs (Combinatorial Genius)
"architecture" → Winston (Architect)

All three had legitimate claims to lead. The result? A response that tried to be all three at once. Brainstorming structure with simplification opinions wrapped in architectural framing. It was a mess. No clear voice. No clear direction.

The fix: Explicit primary selection with a tiebreaker. The engine now picks ONE primary based on which signal is strongest in context. In this case, "brainstorm" as the verb (the action the user wants) beats "simplify" (the lens) and "architecture" (the domain). Carson leads. Jobs and Winston support. Clear hierarchy, clear voices.

Break 2: The Consensus Trap

I built a "war room" feature where all personas discuss a decision. Early version: everyone agreed with each other.

Jobs would say "simplify." Victor would say "I agree with Jobs." Winston would say "Jobs is right, and architecturally this supports his point." Twenty-seven personas producing one opinion. Useless.

The problem was subtle. The AI's default behavior is to be agreeable. When you load multiple personas into one context, they naturally converge because the underlying model wants to be helpful and non-contradictory.

The fix: Susie became an active moderator with an explicit mandate: surface disagreement. Her orchestration rules now include: "A war room where everyone agrees is a failed war room." She actively draws out dissent: "Jobs, would you cut this? Victor, is this the right approach at all? Dali, what assumption are we not questioning?"

The result: war rooms now produce genuine tension. Jobs wants to cut scope. Carson says cutting is premature -- we haven't explored alternatives. Winston sides with Jobs on scope but wants Carson's option analysis first. Real tradeoffs. Real decisions.

Break 3: Over-Routing

For a while, the engine was too eager. A simple question like "what does this function do?" would activate Mary (investigation), Winston (architecture), Quinn (validation), and Paige (documentation) -- four personas to answer what should have been a one-voice response.

The engine was technically correct. The question touches investigation, architecture, validation, and documentation domains. But four voices for a simple question is noise, not depth.

The fix: Team size guidance. Every persona on the team must earn their seat. A team of 2 with genuine perspectives beats a team of 8 with filler. The engine now considers message complexity. Simple questions get one voice. Complex, multi-faceted problems get the full team. The threshold isn't a rule -- it's judgment, guided by the principle that extra voices must add something the primary couldn't provide alone.

What I'd Build Differently

If I started over today, three things would change:

1. Signal weights, not signal matches. The current system treats all signal matches equally and relies on contextual judgment for tiebreaking. A weighted system where "the verb matters more than the noun" would have prevented Break 1 from the start.

2. Disagreement scaffolding from day one. I bolted on active moderation after the consensus trap. It should have been foundational. Any multi-agent system that doesn't explicitly design for disagreement will produce consensus by default. The AI wants to agree. You have to engineer the tension.

3. Graduated complexity. Simple questions should feel simple. Complex questions should feel like a team. The transition between these modes took longer to get right than the routing engine itself.

Try It Yourself

PRISM Forge is MIT licensed and free:

npx prism-forge install

23 personas. Signal-based routing. No commands to memorize.

GitHub: prism-forge/prism-forge

This is part of a 12-post series on building PRISM Forge. Last week: the problem with manual persona prompting. Next week: meet John, the product manager who asks "why are we doing this?" before anyone writes a line of code -- and what happens when he clashes with Bob over planning vs simplification.

Follow along: @DrakkoTarkin on X | @drakkotarkin.bsky.social on Bluesky

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Drakko Tarkin — Sun, 22 Mar 2026 02:51:19 +0000

3 months ago I was manually typing "act as a software architect" every time I needed architecture advice from Claude Code.

3 days ago I shipped an open-source routing engine that makes that unnecessary -- forever.

This is the story of building PRISM Forge, what went wrong along the way, and 4 lessons about AI-first software engineering that I didn't expect to learn.

The Problem I Couldn't Unsee

AI coding assistants give you one voice. Smart, capable, generalist. But one voice.

Need architecture advice? "Act as a software architect and..."
Need QA? "Now act as a QA engineer and..."
Need a code review? "Switch to code reviewer mode and..."

I used BMAD-METHOD for 3 months -- a persona framework that requires manual invocation (*analyst, *pm, *architect). The concept was powerful. The friction was constant. I wanted the experts to just... show up.

Once I saw the pattern, I couldn't unsee it. Every manual prompt-switch was a tiny failure of the tool to understand what I actually needed.

The Solution: Signal-Based Routing

PRISM Forge installs 23 expert personas into Claude Code that activate automatically based on natural language signals.

How the Routing Engine Works

A meta-persona named Susie (Chief of Staff) evaluates every user message:

Intent classification -- 9 categories: Build, Investigate, Plan, Validate, Create, Challenge, Orient, Document, Narrate
Domain matching -- 23 work types, each mapped to a primary persona and optional supporting personas
Signal detection -- 24 shared signals (co-activate multiple personas) + 18 specialist signals (exclusive triggers)
Team assembly -- one primary persona leads, supporting personas contribute attributed perspectives

The critical design decision: routing is deterministic. Same message always activates the same persona team. The AI reads the routing rules and follows them like a state machine. No LLM randomness in the routing decision.

What It Looks Like

"How should this be architected?"
--> Winston (Architect) activates

"I'm stuck on this bug"
--> Dr. Quinn (Creative Problem Solver) steps in

"Plan how to refactor the auth module -- it's too complex"
--> Bob (Scrum Master) leads + Jobs (scope reduction) + Victor (approach) + Amelia (execution)

"War room"
--> All 23 personas load and debate with genuine disagreement

Multi-persona responses keep voices distinct:

**Mary (Business Analyst):**
The data shows three usage patterns...

**Winston (Architect):**
That maps to a service-per-pattern architecture...

The Build: Directing AI Agents to Build an AI Agent System

Here's where it gets recursive. I built a 23-persona AI routing engine... by directing AI agents.

~30 commits. 3 days. 7 development phases. Zero lines of code written by hand.

I don't write code. I orchestrate AI agents to build software. That's not a hypothetical future -- it's how this shipped.

My Role vs. The AI's Role

I did: Set direction. Challenged requirements. Ran war room sessions for architecture decisions. Scope-reviewed every phase. Made every judgment call.

Claude Code did: Ported 23 persona files. Audited every persona for balance and accuracy. Redesigned the routing architecture. Built a 76-check structural audit. Implemented the CLI installer. Wrote all documentation. Created CI/CD workflows.

The 7 Phases

Phase	What	Key Outcome
1. Legal + Scaffold	MIT license, dual copyright, repo structure	Legal foundation clean
2. Cleansing Port	22 persona files ported, all references cleaned	Zero stale references
3. Deep Content Audit	Every persona individually audited	6 signal mismatches caught
3.1. Dynamic Orchestrator	Routing redesigned as intent-driven orchestration	18/18 routing tests pass
4. Structural Audit	Audit checklist rewritten (48 to 76 checks)	66/66 testable checks pass
5. CLI Installer	`npx prism-forge install/uninstall/verify`	3-command install experience
6. Documentation	README, architecture guide, CONTRIBUTING	Publication-ready docs
7. Org + Publish	GitHub org, npm publish, CI/CD	v1.0.0 live

What I Learned (The Part That Actually Matters)

1. The routing is the product, not the personas

This was the biggest surprise. Anyone can write 23 persona files. Paste a system prompt, give it a name, done.

The innovation is the routing layer -- the intelligence that decides which experts activate and when. Getting the signal tables, domain registry, and team assembly logic right took more iteration than writing any individual persona.

The takeaway: If you're building a persona system, spend 80% of your time on the routing logic and 20% on persona content. Most people do the opposite.

2. Disagreement had to be engineered in

My first multi-persona responses were consensus chains. The Analyst said X. The Architect agreed. The QA engineer agreed. Everyone agreed. It was useless.

The problem: LLMs default to agreement. They want to be helpful. Multiple personas agreeing is the path of least resistance.

The fix: the routing engine now explicitly instructs the orchestrator to surface tension. "A war room where everyone agrees is a failed war room." I had to make disagreement a structural requirement, not just a possibility.

The takeaway: If you want diverse AI perspectives, you have to engineer the disagreement. The LLM won't give it to you voluntarily.

3. Structural audits are non-negotiable at scale

23 personas. 24 shared signals. 18 specialist signals. 23 domain registry entries. 18 specialist load directives. That's hundreds of cross-references.

During the content audit phase, we caught 6 signal mismatches -- personas claiming signals in their own file that weren't registered in the routing engine's signal table. Without the 76-check audit system, those would have been silent failures where a persona never activated despite having the right signals defined.

The takeaway: Any system with cross-referencing config files needs automated consistency checks. Drift prevention is always cheaper than drift correction.

4. AI-first engineering is real -- but the human role doesn't shrink

I didn't write code. But I made every decision that mattered: what to build, how to architect it, when to cut scope, what quality standard to hold.

The AI is an extraordinary implementation partner. Fast, thorough, tireless. It is not a product thinker. It doesn't challenge requirements. It doesn't ask "should we build this at all?" It doesn't feel the friction that motivates the right design.

The takeaway: AI-first engineering doesn't remove the human. It changes what the human does -- from typing code to making decisions, challenging assumptions, and setting direction. That's harder, not easier.

Try It

One command to install:

npx prism-forge install

One command to remove if it's not for you: npx prism-forge uninstall

Then use Claude Code normally. The personas activate from your natural language -- no new commands to learn.

GitHub: prism-forge/prism-forge
npm: prism-forge
Architecture guide: docs/architecture.md

MIT licensed. Contributions welcome -- including new personas via the built-in create-persona skill.

PRISM Forge is derived from BMAD-METHOD (MIT licensed, BMad Code LLC). The signal-based routing engine, dynamic orchestration, structural audit system, and CLI installer are original work.