DEV Community

Bala Paranj
Bala Paranj

Posted on

From Fallacies to Superpowers: Eight Agent Skills That Make AI-Assisted Development Work

The Fallacies of GenAI Development named eight assumptions that break AI-assisted development. The resolutions were framed as human knowledge — things the engineer must understand and apply.

But the resolutions don't have to live in the engineer's head. They can live in the agent's workflow.

Projects like Superpowers proved that agents can follow structured methodologies — brainstorm before coding, write tests before implementation, review against specs before declaring success. The skills are mandatory workflows, not suggestions. The agent checks for relevant skills before any task.

The same approach works for the Fallacies resolutions. Each one can be encoded as an agent skill that fires automatically. The engineer doesn't need to remember "check for existing libraries before generating." The agent does it as a mandatory step.

Here are the eight skills. Each one resolves one fallacy. Each one is achievable today with current agent capabilities.

But first, a critical boundary.


The line between agent skill and human judgment

Not everything in a Fallacy resolution should be automated. The Fallacies series itself warns against this — Fallacy #3 (AI can't verify AI) and Fallacy #4 (dropping review) exist because teams automated judgment calls that should have stayed with humans.

Each skill below has two halves:

The mechanical half (agent does this): Search for existing libraries. Run the compiler. Execute the linter. Read the specification file. Count the boundaries. These are deterministic actions with deterministic outputs. The agent executes them. No judgment required.

The judgment half (agent surfaces this to the human): "Is this the right library?" "Does this architectural constraint still apply?" "Should this uncovered decision become a new spec?" These require context, domain knowledge, and strategic thinking. The agent surfaces the question. The human answers it.

The agent does the LEGWORK. The human makes the CALL. The agent that tries to make the call is Fallacy #3 — using AI to verify AI. The agent that doesn't do the legwork is wasting human attention on mechanical work (Fallacy #4).

WRONG:  Agent decides "this library is the right choice" → Fallacy #3
WRONG:  Human searches for libraries manually             → Fallacy #4
RIGHT:  Agent searches, presents 3 options with tradeoffs → human picks
Enter fullscreen mode Exit fullscreen mode

Every skill below respects this boundary. Watch for the split: steps marked [MECHANICAL] are what the agent does autonomously. Steps marked [SURFACE] are what the agent presents for human decision.


Skill 1: Compose-First

Resolves Fallacy #1: Faster generation ≠ faster engineering

When it fires: Before generating any implementation code.

What the agent does:

1. [MECHANICAL] Parse the task: what capability is needed?
2. [MECHANICAL] Search for existing functions in the codebase that already provide it
3. [MECHANICAL] Search for well-maintained upstream libraries that provide it
4. [SURFACE]    Present findings: "Found 2 existing options: [library A] (last updated 
                3 days ago, 12k stars) and [library B] (last updated 8 months ago, 
                200 stars). Also found internal utils/retry.go with similar logic.
                Shall I compose from one of these, or generate new?"
5. [MECHANICAL] After human picks: write the import + glue code
6. [MECHANICAL] Log the decision: "Composed from [library] per human approval"
Enter fullscreen mode Exit fullscreen mode

Why this is a superpower: The agent that composes instead of generating produces 80-95% less code for the same capability. Less code = less to maintain, test, debug, secure. The agent becomes a librarian, not a typist. The codebase shrinks while capabilities grow.

What it looks like in practice:

Without skill:
    Task: "Add HTTP retry logic"
    Agent: generates 150 lines of retry implementation

With skill:
    Task: "Add HTTP retry logic"  
    Agent: "Found existing retry library in go.mod dependencies.
            Writing 6 lines of configuration instead of 150 lines
            of implementation."
Enter fullscreen mode Exit fullscreen mode

Skill 2: Property-Check

Resolves Fallacy #2: Plausible ≠ correct

When it fires: After generating any code, before presenting to the human.

What the agent does:

1. [MECHANICAL] Read the project's property definitions (if they exist):
                .properties/ directory, INVARIANTS.md, CI check configs,
                type constraints, API contracts, schema definitions
2. [MECHANICAL] Run available mechanical checks:
                type checker, linter rules, contract tests
3. [MECHANICAL] For properties with clear pass/fail: evaluate and report result
4. [SURFACE]    For properties requiring judgment: "Generated code touches user 
                data. INVARIANT says 'all user-data endpoints require auth 
                middleware.' I added the auth wrapper — please verify this is 
                the correct middleware for this endpoint."
5. [MECHANICAL] Report: "Mechanically verified: N properties passed. 
                Flagged for human review: M properties (listed above)."
Enter fullscreen mode Exit fullscreen mode

Why this is a superpower: The agent doesn't just generate plausible code. It checks its own output against declared properties before the human ever sees it. The human receives code that's already been evaluated against the team's safety boundaries — not just code that looks right.

What it looks like in practice:

Without skill:
    Agent generates API endpoint. Looks correct. Human merges.
    Endpoint returns PII without authentication. Discovered in production.

With skill:
    Agent generates API endpoint.
    Property check: "INVARIANT: All endpoints returning user data 
    require authentication middleware."
    Agent: "Generated endpoint does not include auth middleware. 
    Adding authentication wrapper before presenting."
Enter fullscreen mode Exit fullscreen mode

Skill 3: Mechanical-Verify

Resolves Fallacy #3: AI can't verify AI

When it fires: When the agent needs to verify its own output.

What the agent does:

1. [MECHANICAL] Classify each property to verify:
                Type constraint → run compiler
                API contract    → run contract test
                Structural      → run linter/static analysis
                Universal       → run property-based test
                Subjective      → flag for human (NOT self-review)
2. [MECHANICAL] Run ALL mechanical checks. Collect results.
3. [SURFACE]    For subjective properties: "This error message says
                'invalid input.' Is that clear enough for your users, 
                or should it specify what's invalid?"
4. [MECHANICAL] Report: "Mechanically verified: [list with pass/fail].
                Human review needed: [subjective items listed above]."
Enter fullscreen mode Exit fullscreen mode

Why this is a superpower: The agent stops pretending it can judge its own output. It runs every mechanical check available and presents the RESULTS, not its OPINION. For subjective properties, it doesn't self-review — it surfaces the question to the human. The agent knows what it can verify deterministically and what it can't.

What it looks like in practice:

Without skill:
    Agent reviews its own code: "This looks correct."
    The code has a subtle type mismatch the agent doesn't catch
    because it pattern-matches appearance, not logic.

With skill:
    Agent: "Running compiler... type mismatch on line 47:
    expected []byte, got string. Fixing before presenting."
    The compiler caught what the agent's self-review would miss.
Enter fullscreen mode Exit fullscreen mode

Skill 4: Spec-Before-Code

Resolves Fallacy #4: Dropping review ≠ removing bottleneck

When it fires: Before writing any implementation, after the brainstorming/planning phase.

What the agent does:

1. [MECHANICAL] Read all specifications that govern the target module:
                module interface, API contract, database schema, ADRs, conventions
2. [MECHANICAL] Extract the list of constraints that apply
3. [SURFACE]    Present constraints to human: "Before I write code, these 
                constraints apply to this module: [list]. Are these correct? 
                Any I'm missing?"
4. [MECHANICAL] After human confirms: generate within those constraints
5. [MECHANICAL] After generating: verify output satisfies each constraint
                using available mechanical checks
6. [SURFACE]    If any constraint can't be mechanically verified: "I couldn't
                confirm compliance with [constraint]. Please review this 
                specific aspect."
Enter fullscreen mode Exit fullscreen mode

Why this is a superpower: The agent doesn't just generate code and hope someone reviews it. It reads the existing specifications, confirms the constraints with the human, generates within those constraints, and verifies compliance. The human reviews the constraint list (small, fast) instead of the code (large, slow). The review moves to the right level — specifications for humans, code verification for the agent.

What it looks like in practice:

Without skill:
    Agent generates code. Human reviews 200 lines. Takes 45 minutes.
    Misses that the code uses exceptions instead of result types.

With skill:
    Agent: "Module conventions require result types for error handling
    (from ADR-007). Generating with result types."
    Human reviews: "Yes, those constraints are correct. Go ahead."
    Agent generates. Agent verifies against constraints. 
    Human reviews the 3-line constraint confirmation, not the 200-line implementation.
Enter fullscreen mode Exit fullscreen mode

Skill 5: Output-Audit

Resolves Fallacy #5: Better context ≠ correct output

When it fires: After generation, specifically checking output against properties that AREN'T in the retrieved context.

What the agent does:

1. [MECHANICAL] After generating code, search project docs for architectural
                properties that apply but weren't in the original context:
                - ADRs mentioning timeout, authentication, PII, concurrency
                - CI check configurations
                - CLAUDE.md / CONVENTIONS.md constraints
2. [MECHANICAL] For each found property: check if the generated code 
                violates it using available mechanical tools
3. [SURFACE]    Present findings: "Found 3 architectural properties not in 
                my original context. Timeout policy (ADR-012) applies — 
                I added context.WithTimeout. PII handling policy applies — 
                please verify I'm not logging the user email on line 34.
                Encryption-at-rest policy does not apply to this code path."
Enter fullscreen mode Exit fullscreen mode

Why this is a superpower: The agent compensates for its own context limitation. RAG retrieves documents that are semantically similar to the task. Architectural properties are semantically DISTANT from the code they govern. This skill explicitly searches for the properties that RAG would miss — because they live in ADRs, convention documents, and CI configurations that aren't similar to the implementation task in vector space.

What it looks like in practice:

Without skill:
    RAG retrieves correct API docs. Agent generates correct API call.
    Code makes synchronous external call without timeout inside a
    transaction. Timeout policy is in ADR-012, never retrieved.
    Connection pool exhausts in production.

With skill:
    Agent generates API call. Output-audit fires.
    Agent: "Checking timeout policies... Found ADR-012: 
    'All external calls require context.WithTimeout(5s).'
    Generated code lacks timeout. Adding before presenting."
Enter fullscreen mode Exit fullscreen mode

Skill 6: Deletion-Aware

Resolves Fallacy #6: Generated code is a liability

When the agent fires: During implementation and refactoring tasks.

What the agent does:

1. [MECHANICAL] Before generating, search the codebase for existing 
                implementations of the same or similar functionality
2. [SURFACE]    If duplicates found: "Found 3 existing implementations 
                of date formatting: utils/dates.go, handlers/format.go, 
                api/helpers.go. Recommend consolidating to one. Which 
                should be the canonical version, or should I extract a 
                new shared function?"
3. [MECHANICAL] After human decides: implement the consolidation
4. [MECHANICAL] After completing any task, report additions AND deletions: 
                "Added 45 lines. Deleted 120 lines. Net: -75 lines."
Enter fullscreen mode Exit fullscreen mode

Why this is a superpower: The agent actively shrinks the codebase. Instead of the default behavior (generate new code for every task), the agent searches for duplication, extracts shared functions, and deletes redundant implementations. The deletions-to-additions ratio improves. The maintenance burden decreases with each task instead of increasing.

What it looks like in practice:

Without skill:
    Five developers prompt agents for date formatting over a month.
    Five implementations exist. Bug found in one. Other four remain broken.

With skill:
    Agent: "Found existing formatDate() in utils/dates.go.
    Using existing implementation instead of generating new one.
    Also found two other formatDate variants in handlers/ and api/.
    Recommend consolidating to the utils/ version. Shall I refactor?"
Enter fullscreen mode Exit fullscreen mode

Skill 7: Boundary-Read

Resolves Fallacy #7: Specs already exist, they're not new work

When it fires: At the start of every task, before any code generation.

What the agent does:

1. [MECHANICAL] Identify which module/package the task targets
2. [MECHANICAL] Read the module's existing boundaries:
                exported interface, import restrictions, API contract, 
                database schema, configuration schema
3. [MECHANICAL] Generate code that satisfies all identified boundaries
4. [MECHANICAL] After generating: run linter/depguard to verify 
                no boundary is violated
5. [SURFACE]    If the task REQUIRES changing a boundary: "This task 
                needs a new exported function in pkg/stave/. Changing 
                the public interface. Please confirm this is intended 
                — it affects all consumers of this package."
6. [MECHANICAL] Report: "Module has N boundaries. Implementation 
                satisfies all N. No cross-boundary imports."
Enter fullscreen mode Exit fullscreen mode

Why this is a superpower: The agent treats existing specifications as first-class constraints instead of ignoring them. Most agents generate code that happens to match the module's style. This agent READS the interface definition, KNOWS what's exported and what isn't, and REFUSES to generate code that violates the boundary. The Parnas boundaries the team already built are finally enforced — by the agent itself.

What it looks like in practice:

Without skill:
    Agent generates code that imports an internal package from
    a different module. No linter catches it. The hexagonal
    architecture erodes silently.

With skill:
    Agent: "Module internal/core/ is restricted — depguard rule
    prevents imports from internal/app/. Generating without
    cross-boundary import. Using the public interface in pkg/stave/ instead."
Enter fullscreen mode Exit fullscreen mode

Skill 8: Protocol-Sync

Resolves Fallacy #8: More agents ≠ more productivity

When it fires: At the start of every task, when multiple agents are working on the same codebase.

What the agent does:

1. [MECHANICAL] Read the shared specification repo:
                naming conventions, error handling strategy, retry policy,
                API contract versions, architecture decision records
2. [MECHANICAL] Check the specification VERSION — confirm it matches
                what other agents are reading (prevent split-brain)
3. [MECHANICAL] Generate code that conforms to ALL shared specifications
4. [MECHANICAL] After generating: verify conformance using linter rules
                and convention checks
5. [SURFACE]    If a decision isn't covered by any specification: 
                "This task requires choosing a serialization format for 
                the new event type. No convention covers this. Options: 
                JSON (consistent with existing events) or Protobuf 
                (consistent with the gRPC migration plan in ADR-015). 
                Which should I use? Should this become a new convention?"
6. [MECHANICAL] Report: "Conforming to spec version 2.4. All conventions 
                match. One uncovered decision flagged above."
Enter fullscreen mode Exit fullscreen mode

Why this is a superpower: The agent doesn't make invisible architectural decisions. It reads the coordination protocols (shared specifications), follows them, and flags decisions that aren't covered. Multiple agents producing code that follows the same conventions, same error handling, same retry strategy — without any coordination meetings. The specifications are the protocols. The agents are the nodes. The distributed system is consistent.

What it looks like in practice:

Without skill:
    Agent A uses camelCase for JSON fields.
    Agent B uses snake_case.
    Integration breaks silently. Bug takes two days to trace.

With skill:
    Both agents read conventions.md at task start.
    Both generate camelCase JSON fields.
    Integration works on the first try.
Enter fullscreen mode Exit fullscreen mode

What makes these different from "just prompting better"

These aren't prompt improvements. They're structural workflow changes with a clear boundary between agent action and human judgment:

Prompting:    "Remember to check for existing libraries"
              → The agent might or might not. Probabilistic.

Skill:        compose-first fires automatically before every
              implementation task. The agent MUST search before
              generating. Mandatory workflow, not suggestion.

BUT:          The agent searches [MECHANICAL].
              The human picks which library to use [SURFACE].
              The agent never decides "this library is fine" on its own.
Enter fullscreen mode Exit fullscreen mode

This boundary separates these skills from Fallacy #3 (AI verifying AI). The agent does exhaustive, deterministic legwork — searching, reading, running checks, collecting results. The human makes judgment calls — which library, whether a constraint applies, whether an uncovered decision should become a new specification. The agent that crosses this boundary is automating judgment with correlated failure modes. The agent that respects it is doing the mechanical work that frees human judgment for where it matters.

The Superpowers framework proved this distinction. Skills aren't suggestions. They're mandatory workflows that fire based on triggers. The agent checks for relevant skills before any task. The skills execute as part of the agent's process, not as afterthoughts.

Each skill above has:

  • A trigger (when it fires)
  • A process (what the agent does, step by step)
  • A verification (how to confirm the skill executed correctly)
  • A report (what the agent tells the human)

This is the same structure Superpowers uses for brainstorming, TDD, and code review. The Fallacy resolutions fit the same framework — because they're the same kind of thing: structured workflows that prevent a known failure mode.

The compound effect

An agent running all eight skills simultaneously:

  1. Searches for existing implementations before generating (Compose-First)
  2. Reads module boundaries and specifications (Boundary-Read)
  3. Reads shared conventions (Protocol-Sync)
  4. Confirms constraints with the human (Spec-Before-Code)
  5. Generates within all identified constraints
  6. Runs mechanical verification (Mechanical-Verify)
  7. Checks output against architectural properties not in context (Output-Audit)
  8. Evaluates against declared properties (Property-Check)
  9. Searches for redundant code to delete (Deletion-Aware)
  10. Reports what was composed, generated, verified, and flagged

This agent produces less code, more capability, fewer violations, and better architectural coherence than an agent running without these skills — using the SAME model, the SAME context, the SAME prompts. The difference isn't the AI. It's the workflow around the AI.

The Fallacies identified what breaks. The skills fix it — not by making the human smarter, but by making the agent's process better. The engineer who installs these skills gives their agent the architectural judgment that the model doesn't have. The model provides the generation. The skills provide the discipline. The combination "AI-assisted development that works" looks like.


The Fallacies of GenAI Development: Index · #1 Faster Generation · #2-#8 linked from index.

Superpowers by Jesse Vincent — the agentic skills framework that proved agents can follow mandatory structured workflows. 209k stars. The skills above follow the same pattern.

Stave — the specification gate that implements Property-Check and Boundary-Read for cloud infrastructure. 2,662 safety invariants. Deterministic verification.

Top comments (0)