The Fallacies of GenAI Development named eight assumptions that break AI-assisted development. The resolutions were framed as human knowledge — things the engineer must understand and apply.
But the resolutions don't have to live in the engineer's head. They can live in the agent's workflow.
Projects like Superpowers proved that agents can follow structured methodologies — brainstorm before coding, write tests before implementation, review against specs before declaring success. The skills are mandatory workflows, not suggestions. The agent checks for relevant skills before any task.
The same approach works for the Fallacies resolutions. Each one can be encoded as an agent skill that fires automatically. The engineer doesn't need to remember "check for existing libraries before generating." The agent does it as a mandatory step.
Here are the eight skills. Each one resolves one fallacy. Each one is achievable today with current agent capabilities.
But first, a critical boundary.
The line between agent skill and human judgment
Not everything in a Fallacy resolution should be automated. The Fallacies series itself warns against this — Fallacy #3 (AI can't verify AI) and Fallacy #4 (dropping review) exist because teams automated judgment calls that should have stayed with humans.
Each skill below has two halves:
The mechanical half (agent does this): Search for existing libraries. Run the compiler. Execute the linter. Read the specification file. Count the boundaries. These are deterministic actions with deterministic outputs. The agent executes them. No judgment required.
The judgment half (agent surfaces this to the human): "Is this the right library?" "Does this architectural constraint still apply?" "Should this uncovered decision become a new spec?" These require context, domain knowledge, and strategic thinking. The agent surfaces the question. The human answers it.
The agent does the LEGWORK. The human makes the CALL. The agent that tries to make the call is Fallacy #3 — using AI to verify AI. The agent that doesn't do the legwork is wasting human attention on mechanical work (Fallacy #4).
WRONG: Agent decides "this library is the right choice" → Fallacy #3
WRONG: Human searches for libraries manually → Fallacy #4
RIGHT: Agent searches, presents 3 options with tradeoffs → human picks
Every skill below respects this boundary. Watch for the split: steps marked [MECHANICAL] are what the agent does autonomously. Steps marked [SURFACE] are what the agent presents for human decision.
Skill 1: Compose-First
Resolves Fallacy #1: Faster generation ≠ faster engineering
When it fires: Before generating any implementation code.
What the agent does:
1. [MECHANICAL] Parse the task: what capability is needed?
2. [MECHANICAL] Search for existing functions in the codebase that already provide it
3. [MECHANICAL] Search for well-maintained upstream libraries that provide it
4. [SURFACE] Present findings: "Found 2 existing options: [library A] (last updated
3 days ago, 12k stars) and [library B] (last updated 8 months ago,
200 stars). Also found internal utils/retry.go with similar logic.
Shall I compose from one of these, or generate new?"
5. [MECHANICAL] After human picks: write the import + glue code
6. [MECHANICAL] Log the decision: "Composed from [library] per human approval"
Why this is a superpower: The agent that composes instead of generating produces 80-95% less code for the same capability. Less code = less to maintain, test, debug, secure. The agent becomes a librarian, not a typist. The codebase shrinks while capabilities grow.
What it looks like in practice:
Without skill:
Task: "Add HTTP retry logic"
Agent: generates 150 lines of retry implementation
With skill:
Task: "Add HTTP retry logic"
Agent: "Found existing retry library in go.mod dependencies.
Writing 6 lines of configuration instead of 150 lines
of implementation."
Skill 2: Property-Check
Resolves Fallacy #2: Plausible ≠ correct
When it fires: After generating any code, before presenting to the human.
What the agent does:
1. [MECHANICAL] Read the project's property definitions (if they exist):
.properties/ directory, INVARIANTS.md, CI check configs,
type constraints, API contracts, schema definitions
2. [MECHANICAL] Run available mechanical checks:
type checker, linter rules, contract tests
3. [MECHANICAL] For properties with clear pass/fail: evaluate and report result
4. [SURFACE] For properties requiring judgment: "Generated code touches user
data. INVARIANT says 'all user-data endpoints require auth
middleware.' I added the auth wrapper — please verify this is
the correct middleware for this endpoint."
5. [MECHANICAL] Report: "Mechanically verified: N properties passed.
Flagged for human review: M properties (listed above)."
Why this is a superpower: The agent doesn't just generate plausible code. It checks its own output against declared properties before the human ever sees it. The human receives code that's already been evaluated against the team's safety boundaries — not just code that looks right.
What it looks like in practice:
Without skill:
Agent generates API endpoint. Looks correct. Human merges.
Endpoint returns PII without authentication. Discovered in production.
With skill:
Agent generates API endpoint.
Property check: "INVARIANT: All endpoints returning user data
require authentication middleware."
Agent: "Generated endpoint does not include auth middleware.
Adding authentication wrapper before presenting."
Skill 3: Mechanical-Verify
Resolves Fallacy #3: AI can't verify AI
When it fires: When the agent needs to verify its own output.
What the agent does:
1. [MECHANICAL] Classify each property to verify:
Type constraint → run compiler
API contract → run contract test
Structural → run linter/static analysis
Universal → run property-based test
Subjective → flag for human (NOT self-review)
2. [MECHANICAL] Run ALL mechanical checks. Collect results.
3. [SURFACE] For subjective properties: "This error message says
'invalid input.' Is that clear enough for your users,
or should it specify what's invalid?"
4. [MECHANICAL] Report: "Mechanically verified: [list with pass/fail].
Human review needed: [subjective items listed above]."
Why this is a superpower: The agent stops pretending it can judge its own output. It runs every mechanical check available and presents the RESULTS, not its OPINION. For subjective properties, it doesn't self-review — it surfaces the question to the human. The agent knows what it can verify deterministically and what it can't.
What it looks like in practice:
Without skill:
Agent reviews its own code: "This looks correct."
The code has a subtle type mismatch the agent doesn't catch
because it pattern-matches appearance, not logic.
With skill:
Agent: "Running compiler... type mismatch on line 47:
expected []byte, got string. Fixing before presenting."
The compiler caught what the agent's self-review would miss.
Skill 4: Spec-Before-Code
Resolves Fallacy #4: Dropping review ≠ removing bottleneck
When it fires: Before writing any implementation, after the brainstorming/planning phase.
What the agent does:
1. [MECHANICAL] Read all specifications that govern the target module:
module interface, API contract, database schema, ADRs, conventions
2. [MECHANICAL] Extract the list of constraints that apply
3. [SURFACE] Present constraints to human: "Before I write code, these
constraints apply to this module: [list]. Are these correct?
Any I'm missing?"
4. [MECHANICAL] After human confirms: generate within those constraints
5. [MECHANICAL] After generating: verify output satisfies each constraint
using available mechanical checks
6. [SURFACE] If any constraint can't be mechanically verified: "I couldn't
confirm compliance with [constraint]. Please review this
specific aspect."
Why this is a superpower: The agent doesn't just generate code and hope someone reviews it. It reads the existing specifications, confirms the constraints with the human, generates within those constraints, and verifies compliance. The human reviews the constraint list (small, fast) instead of the code (large, slow). The review moves to the right level — specifications for humans, code verification for the agent.
What it looks like in practice:
Without skill:
Agent generates code. Human reviews 200 lines. Takes 45 minutes.
Misses that the code uses exceptions instead of result types.
With skill:
Agent: "Module conventions require result types for error handling
(from ADR-007). Generating with result types."
Human reviews: "Yes, those constraints are correct. Go ahead."
Agent generates. Agent verifies against constraints.
Human reviews the 3-line constraint confirmation, not the 200-line implementation.
Skill 5: Output-Audit
Resolves Fallacy #5: Better context ≠ correct output
When it fires: After generation, specifically checking output against properties that AREN'T in the retrieved context.
What the agent does:
1. [MECHANICAL] After generating code, search project docs for architectural
properties that apply but weren't in the original context:
- ADRs mentioning timeout, authentication, PII, concurrency
- CI check configurations
- CLAUDE.md / CONVENTIONS.md constraints
2. [MECHANICAL] For each found property: check if the generated code
violates it using available mechanical tools
3. [SURFACE] Present findings: "Found 3 architectural properties not in
my original context. Timeout policy (ADR-012) applies —
I added context.WithTimeout. PII handling policy applies —
please verify I'm not logging the user email on line 34.
Encryption-at-rest policy does not apply to this code path."
Why this is a superpower: The agent compensates for its own context limitation. RAG retrieves documents that are semantically similar to the task. Architectural properties are semantically DISTANT from the code they govern. This skill explicitly searches for the properties that RAG would miss — because they live in ADRs, convention documents, and CI configurations that aren't similar to the implementation task in vector space.
What it looks like in practice:
Without skill:
RAG retrieves correct API docs. Agent generates correct API call.
Code makes synchronous external call without timeout inside a
transaction. Timeout policy is in ADR-012, never retrieved.
Connection pool exhausts in production.
With skill:
Agent generates API call. Output-audit fires.
Agent: "Checking timeout policies... Found ADR-012:
'All external calls require context.WithTimeout(5s).'
Generated code lacks timeout. Adding before presenting."
Skill 6: Deletion-Aware
Resolves Fallacy #6: Generated code is a liability
When the agent fires: During implementation and refactoring tasks.
What the agent does:
1. [MECHANICAL] Before generating, search the codebase for existing
implementations of the same or similar functionality
2. [SURFACE] If duplicates found: "Found 3 existing implementations
of date formatting: utils/dates.go, handlers/format.go,
api/helpers.go. Recommend consolidating to one. Which
should be the canonical version, or should I extract a
new shared function?"
3. [MECHANICAL] After human decides: implement the consolidation
4. [MECHANICAL] After completing any task, report additions AND deletions:
"Added 45 lines. Deleted 120 lines. Net: -75 lines."
Why this is a superpower: The agent actively shrinks the codebase. Instead of the default behavior (generate new code for every task), the agent searches for duplication, extracts shared functions, and deletes redundant implementations. The deletions-to-additions ratio improves. The maintenance burden decreases with each task instead of increasing.
What it looks like in practice:
Without skill:
Five developers prompt agents for date formatting over a month.
Five implementations exist. Bug found in one. Other four remain broken.
With skill:
Agent: "Found existing formatDate() in utils/dates.go.
Using existing implementation instead of generating new one.
Also found two other formatDate variants in handlers/ and api/.
Recommend consolidating to the utils/ version. Shall I refactor?"
Skill 7: Boundary-Read
Resolves Fallacy #7: Specs already exist, they're not new work
When it fires: At the start of every task, before any code generation.
What the agent does:
1. [MECHANICAL] Identify which module/package the task targets
2. [MECHANICAL] Read the module's existing boundaries:
exported interface, import restrictions, API contract,
database schema, configuration schema
3. [MECHANICAL] Generate code that satisfies all identified boundaries
4. [MECHANICAL] After generating: run linter/depguard to verify
no boundary is violated
5. [SURFACE] If the task REQUIRES changing a boundary: "This task
needs a new exported function in pkg/stave/. Changing
the public interface. Please confirm this is intended
— it affects all consumers of this package."
6. [MECHANICAL] Report: "Module has N boundaries. Implementation
satisfies all N. No cross-boundary imports."
Why this is a superpower: The agent treats existing specifications as first-class constraints instead of ignoring them. Most agents generate code that happens to match the module's style. This agent READS the interface definition, KNOWS what's exported and what isn't, and REFUSES to generate code that violates the boundary. The Parnas boundaries the team already built are finally enforced — by the agent itself.
What it looks like in practice:
Without skill:
Agent generates code that imports an internal package from
a different module. No linter catches it. The hexagonal
architecture erodes silently.
With skill:
Agent: "Module internal/core/ is restricted — depguard rule
prevents imports from internal/app/. Generating without
cross-boundary import. Using the public interface in pkg/stave/ instead."
Skill 8: Protocol-Sync
Resolves Fallacy #8: More agents ≠ more productivity
When it fires: At the start of every task, when multiple agents are working on the same codebase.
What the agent does:
1. [MECHANICAL] Read the shared specification repo:
naming conventions, error handling strategy, retry policy,
API contract versions, architecture decision records
2. [MECHANICAL] Check the specification VERSION — confirm it matches
what other agents are reading (prevent split-brain)
3. [MECHANICAL] Generate code that conforms to ALL shared specifications
4. [MECHANICAL] After generating: verify conformance using linter rules
and convention checks
5. [SURFACE] If a decision isn't covered by any specification:
"This task requires choosing a serialization format for
the new event type. No convention covers this. Options:
JSON (consistent with existing events) or Protobuf
(consistent with the gRPC migration plan in ADR-015).
Which should I use? Should this become a new convention?"
6. [MECHANICAL] Report: "Conforming to spec version 2.4. All conventions
match. One uncovered decision flagged above."
Why this is a superpower: The agent doesn't make invisible architectural decisions. It reads the coordination protocols (shared specifications), follows them, and flags decisions that aren't covered. Multiple agents producing code that follows the same conventions, same error handling, same retry strategy — without any coordination meetings. The specifications are the protocols. The agents are the nodes. The distributed system is consistent.
What it looks like in practice:
Without skill:
Agent A uses camelCase for JSON fields.
Agent B uses snake_case.
Integration breaks silently. Bug takes two days to trace.
With skill:
Both agents read conventions.md at task start.
Both generate camelCase JSON fields.
Integration works on the first try.
What makes these different from "just prompting better"
These aren't prompt improvements. They're structural workflow changes with a clear boundary between agent action and human judgment:
Prompting: "Remember to check for existing libraries"
→ The agent might or might not. Probabilistic.
Skill: compose-first fires automatically before every
implementation task. The agent MUST search before
generating. Mandatory workflow, not suggestion.
BUT: The agent searches [MECHANICAL].
The human picks which library to use [SURFACE].
The agent never decides "this library is fine" on its own.
This boundary separates these skills from Fallacy #3 (AI verifying AI). The agent does exhaustive, deterministic legwork — searching, reading, running checks, collecting results. The human makes judgment calls — which library, whether a constraint applies, whether an uncovered decision should become a new specification. The agent that crosses this boundary is automating judgment with correlated failure modes. The agent that respects it is doing the mechanical work that frees human judgment for where it matters.
The Superpowers framework proved this distinction. Skills aren't suggestions. They're mandatory workflows that fire based on triggers. The agent checks for relevant skills before any task. The skills execute as part of the agent's process, not as afterthoughts.
Each skill above has:
- A trigger (when it fires)
- A process (what the agent does, step by step)
- A verification (how to confirm the skill executed correctly)
- A report (what the agent tells the human)
This is the same structure Superpowers uses for brainstorming, TDD, and code review. The Fallacy resolutions fit the same framework — because they're the same kind of thing: structured workflows that prevent a known failure mode.
The compound effect
An agent running all eight skills simultaneously:
- Searches for existing implementations before generating (Compose-First)
- Reads module boundaries and specifications (Boundary-Read)
- Reads shared conventions (Protocol-Sync)
- Confirms constraints with the human (Spec-Before-Code)
- Generates within all identified constraints
- Runs mechanical verification (Mechanical-Verify)
- Checks output against architectural properties not in context (Output-Audit)
- Evaluates against declared properties (Property-Check)
- Searches for redundant code to delete (Deletion-Aware)
- Reports what was composed, generated, verified, and flagged
This agent produces less code, more capability, fewer violations, and better architectural coherence than an agent running without these skills — using the SAME model, the SAME context, the SAME prompts. The difference isn't the AI. It's the workflow around the AI.
The Fallacies identified what breaks. The skills fix it — not by making the human smarter, but by making the agent's process better. The engineer who installs these skills gives their agent the architectural judgment that the model doesn't have. The model provides the generation. The skills provide the discipline. The combination "AI-assisted development that works" looks like.
The Fallacies of GenAI Development: Index · #1 Faster Generation · #2-#8 linked from index.
Superpowers by Jesse Vincent — the agentic skills framework that proved agents can follow mandatory structured workflows. 209k stars. The skills above follow the same pattern.
Stave — the specification gate that implements Property-Check and Boundary-Read for cloud infrastructure. 2,662 safety invariants. Deterministic verification.
Top comments (0)