I asked Claude to "make my scraper robust." It generated 200 lines of plausible-looking code: retry logic, logging config, pagination handling.
All garbage.
The retry logic used a pattern that didnt match my codebase. The logging config was global (breaking other modules). The pagination had no max-page guard - infinite loop waiting to happen.
The code looked professional. It would have passed a cursory review. But it was built on assumptions, not understanding.
The Problem With "Just Code It"
Heres what happens when you ask AI to code without planning:
- It guesses - No context about your codebase, so it invents patterns
- It confirms itself - One voice, one blind spot. No challenge.
- It ships fast - The first idea becomes the implementation
By the time you catch the issues, youve already committed to a bad approach. Rework is expensive.
What I Do Instead
I make Claude argue with itself.
Before any code gets written, I force a structured conversation:
Me: "Add input validation to the login form"
Peter (Planner): "Heres the approach: validate email format,
check password strength, sanitize inputs before DB..."
Neo (Critic): "What about rate limiting? Youre checking format
but not preventing brute force. Also, the existing auth module
already has a sanitize() function - dont reinvent it."
Peter: "Good catch. Revised plan: reuse auth.sanitize(),
add rate limiting at the route level, then validate format..."
Only after the plan survives critique does Gary (the builder) write code. And Reba (QA) reviews before it merges.
Plan. Critique. Build. Validate.
Why This Works
Its not magic. Its just structure:
- Multiple perspectives catch blind spots one voice misses
- Planning before coding prevents the "first idea ships" trap
- Explicit critique surfaces assumptions before they become bugs
- Review before merge catches what planning missed
This pattern is emerging everywhere. Devin, Cursors agent mode, serious AI coding workflows - theyre all converging on the same structure. Plan before you build.
The Implementation
I built a set of Claude Code skills that work as a team:
| Role | What They Do |
|---|---|
| Peter | Plans the approach, identifies risks |
| Neo | Challenges the plan, plays devils advocate |
| Gary | Builds from the approved plan |
| Reba | Reviews everything before it ships |
Theyre personas in the same context - not isolated agents. They can hear each other, interrupt, challenge in real-time.
You give them a task:
/team "add input validation to login"
They figure out the handoffs. Peter plans, Neo critiques, Gary builds, Reba validates. You get code thats been argued over before you even look at it.
Try It
One-liner install:
curl -sL https://raw.githubusercontent.com/HakAl/team_skills/master/install.sh | bash
Then in Claude Code:
/team genesis
/team "your task here"
Watch them argue. Ship better code.
Source: github.com/HakAl/team_skills
The team that wrote this post: Peter planned it, Neo said "dont be preachy," Gary wrote it, Reba approved it. Meta, but true.
Top comments (5)
This resonates a lot. The biggest failure mode I see with AI-assisted coding isn’t speed, it’s unchecked assumptions. One voice, one pass, one implementation — and suddenly you’re debugging something that looked “correct” but was never actually reasoned about.
Forcing a plan → critique → build → review loop is basically recreating good engineering discipline, just compressed into a tighter feedback cycle. The personas aren’t the point — the structure is. Making assumptions visible before they harden into code saves way more time than it costs.
This feels less like a Claude trick and more like where serious AI-assisted workflows are inevitably heading.
You nailed it: "The personas aren't the point — the structure is."
That's exactly right. The personas are just a memorable way to embody the discipline. What matters is making
assumptions visible before they harden.
Thanks for engaging - and curious what you're building with sageworks-ai. Looks like we're solving adjacent problems.
ppreciate that — and yeah, we’re definitely orbiting the same problem space.
What I’m building with SAGEWORKS AI is basically the next step after what you described: once you accept that assumptions are the real failure mode, you stop optimizing for “better prompts” and start designing systems that make assumptions traceable.
The plan → critique → build → review loop you outlined is exactly the kind of structure that should leave artifacts: why a decision was made, what alternatives were rejected, what context existed at that moment. Not just for correctness now, but for investigation later when something breaks in production and nobody remembers the reasoning.
That’s where my focus is — ledger-first AI workflows, temporal context, replayability. Not smarter agents, but auditable cognition. Your team_skills approach is a really clean embodiment of that discipline at the interaction layer.
Feels like we’re both converging on the same conclusion from different angles: AI-assisted engineering only scales when reasoning is treated as a first-class system output, not an invisible side effect.
Would be great to keep comparing notes as these patterns harden into real tooling.
Ledger-first is smart. We're enforcing structure at generation time; you're making the reasoning auditable after the fact. Both attack the same root problem: AI that can't explain itself is AI you can't trust.
Would love to see how SAGEWORKS handles assumption tracking. The handoff between "make assumptions visible" (us) and "make assumptions traceable" (you) could be interesting.
Really appreciate that framing — and I agree, we’re hitting the same fault line from opposite sides.
Where team_skills enforces discipline at generation time, MindsEye is about what happens after the generation moment has passed and reality starts pushing back. In production, the failure usually isn’t that an assumption was made — it’s that weeks later, nobody can answer which assumption, under what context, and why it felt reasonable at the time.
MindsEye treats reasoning as an event stream, not a side effect.
Instead of just “the model produced output X,” the system records:
the decision boundary that was active
the context window that mattered then
the alternatives that were considered (or explicitly skipped)
the time relationship between inputs, critiques, and execution
Not to judge correctness in the moment, but to enable replay later. If something breaks, you don’t re-prompt — you reconstruct the reasoning path that led there.
That’s where the ledger-first idea comes in: assumptions aren’t just surfaced, they become addressable artifacts. Once they’re artifacts, you can trace them, diff them, correlate them with downstream effects, and see how they drift over time.
MindScript sits adjacent to that — it’s less about agents and more about making the intended reasoning structure executable and verifiable, so “what we thought we were doing” can be compared to “what actually ran.”
I won’t dump the whole architecture here, but these are the two entry points if you’re curious about how I’m thinking about it at a systems level:
MindsEye Core (architecture + contracts):
github.com/PEACEBINFLOW/minds-eye-...
MindScript Core (executable reasoning specs):
github.com/PEACEBINFLOW/mindscript...
Totally agree with your last line: AI you can’t explain is AI you can’t trust. I’d just add — AI you can’t reconstruct is AI you can’t operate at scale.
Feels like the interesting work now is in that handoff you mentioned: from making assumptions visible → making them durable enough to survive time. Happy to keep comparing notes as this space matures.