Juan Torchia

Posted on May 3 • Originally published at juanchi.dev

Specsmaxxing: I Wrote YAML Specs for My AI Agents — Here's What Changed (and What Didn't)

#english #nextjs #typescript #claudecode

Specsmaxxing: I Wrote YAML Specs for My AI Agents — Here's What Changed (and What Didn't)

A YAML spec for an AI agent is basically the blueprint you leave for the contractor when you can't be on-site. If the blueprint is solid, they build exactly what you want. If there's one ambiguous detail — "wall at the back" with no measurements — they make a call, and when you show up, the wall is in the wrong place.

Once you see it that way, you can't unsee it in every prompt you throw at an agent.

Three weeks ago I read the Hacker News thread on specsmaxxing — the idea of writing formal YAML specs as an antidote to "AI psychosis": that feeling of total loss of control when agents generate code without clear context, each doing its own thing, and suddenly you have a system that doesn't feel like yours anymore. The idea hit me with the same mix of "this actually makes sense" and "do I really need another YAML in my life?" that almost everything I read at 11pm on a Monday hits me with.

I implemented it anyway. Here's what I found.

The Real Problem: Why Nobody Talks About the YAML Itself

First, let's be precise about what we're actually discussing.

AI psychosis isn't a clinical term or marketing fluff. It's the concrete experience of opening a PR generated by an agent, seeing that it made 47 decisions you never specified, and realizing that 80% of it is fine but the remaining 20% is so woven into the 80% that you can't extract it without throwing everything away. This happened to me in production. It happened to people on my team. And the pattern I kept seeing in the Claude Code logs was always the same: the agent wasn't broken — it was poorly briefed.

Specsmaxxing proposes this: before the agent touches a single line of code, you hand it a YAML file with the complete feature spec, the constraints, the expected patterns, and the success criteria. Not a long prompt. A versioned, reviewable, auditable structure.

The promise is legitimate. My thesis is that specsmaxxing solves the human-to-agent communication problem, but it displaces the quality problem into the YAML itself — and nobody audits the YAML, nobody tests it, and nobody talks about it like it's a first-class artifact.

How I Implemented It: Real Structure and Annotated Code

My current stack: Next.js, TypeScript, PostgreSQL on Railway, Claude Code as the primary agent. The first feature I tested specsmaxxing on was refactoring an authentication module carrying technical debt since 2023.

This is the spec structure I ended up using:

# spec-auth-refactor.yaml
# Version: 1.0.0 — Juanchi, 2025
# IMPORTANT: this file is the source of truth for the agent.
# Any ambiguity here becomes an arbitrary decision by the agent.

meta:
  feature: "Refactor authentication module"
  owner: "juan@juanchi.dev"
  priority: high
  context: >
    The current module mixes session logic with business logic.
    There are tests that depend on global state. Do not touch the public interface.

constraints:
  language: TypeScript
  framework: Next.js 14 (App Router)
  do_not_break:
    - "Public API of useAuth()"
    - "Compatibility with existing tokens in production"
  required_patterns:
    - "Repository pattern for DB access"
    - "Typed errors, never throw strings"
  forbidden_patterns:
    - "any in new types"
    - "console.log in production code"
    - "Business logic in middleware"

success_criteria:
  - "Existing tests pass without modification"
  - "Coverage does not drop below current 78%"
  - "No new circular dependencies (verify with madge)"
  - "Clean Next.js build"

expected_output:
  new_files:
    - "src/lib/auth/repository.ts"
    - "src/lib/auth/session.service.ts"
  modified_files:
    - "src/hooks/useAuth.ts" # Internals only, public interface untouched
  do_not_touch:
    - "src/middleware.ts"
    - "src/app/api/auth/**"

The agent got this alongside a 3-line prompt: "Implement the refactor according to the attached spec. If you hit any ambiguity, stop and ask before continuing."

What improved:

The agent stopped inventing names. The patterns I defined as required appeared consistently. The public interface wasn't touched. Coverage landed at 81%, above the minimum. I estimate that saved me somewhere between 40 and 60 minutes of review time that in previous refactors I'd spend correcting naming and architecture decisions that weren't wrong, just not what I would have done.

What didn't improve:

The agent followed the spec to the letter — including where the spec was wrong. I'd written "Typed errors, never throw strings" and the agent created an error type hierarchy so granular I ended up with 14 distinct error classes for a module with 6 real cases. Technically correct per what I asked. Completely overengineered in practice.

The problem wasn't the agent. It was the YAML.

The Real Gotchas: Where Specsmaxxing Bites You

1. The YAML Inherits the Ambiguity of the Prompt

"Typed errors" can mean a 3-level hierarchy or a 14-level one. The spec didn't bound it. The agent chose to maximize. The result was correct and excessive at the same time.

The lesson: every item in the YAML needs an example or a numeric limit. Saying what isn't enough — you need how much and up to where.

2. Negative Constraints Are Harder to Audit

forbidden_patterns are easier to define than to verify. You can write "any in new types" and the agent will respect it in files it creates — but if it modifies an existing file that already had any, the constraint falls into a gray zone. I discovered this when the agent touched useAuth.ts and let an existing any pass through, because its interpretation was "don't introduce new any."

Was it right? Technically yes. Was it what I wanted? No.

3. The Spec Goes Stale Faster Than the Code

In small projects this doesn't hurt much. In projects with multiple agents running in parallel — something I already documented when I tested parallel agents in Zed — yesterday's spec no longer reflects today's repo state. And an agent working from a stale spec is worse than an agent with no spec at all, because it has misplaced confidence.

4. Nobody Versions the Spec With the Same Rigor as the Code

This one bothers me the most. The spec lives in the repo, sure. But the success criteria have no automated tests. The coverage does not drop below 78% — I checked that manually. The no new circular dependencies — I ran that manually too:

# Circular dependency check post-refactor
npx madge --circular src/lib/auth/

# Clean output — no cycles detected
# Circular dependency detected:
# (none)

Fine. But if I don't automate that in CI, the next agent iteration can break it and I won't know until someone runs it by hand again.

5. Goodhart's Law Applies Directly

When a measure becomes a target, it stops being a good measure. I asked the agent not to let coverage drop, and the agent wrote tests that cover lines but not behavior. They all passed. Coverage hit 81%. And three of those tests are basically expect(true).toBe(true) with extra steps. I caught this reviewing the tests by hand — something I don't always have time to do. The spec cannot replace human judgment about real quality.

FAQ: YAML Specs for AI Agents

Is specsmaxxing the same as writing a classic PRD?

Not exactly. A traditional PRD is written for humans: it has narrative context, business justification, user stories. A YAML spec for agents is written to be parsed and interpreted by a language model — it's more declarative, more restrictive, closer to a schema than a document. There's overlap, but the format and precision level are different.

What happens if the agent ignores parts of the spec?

Depends on the agent and how you deliver the spec. In my experience with Claude Code, if the spec is in the prompt context and you reference it explicitly, compliance is high — close to 90% in my informal measurements. The remaining 10% are interpretations in ambiguous zones, not active ignorance. If the agent is systematically ignoring the spec, the problem is usually in how you're delivering it, not in the agent.

Is it worth it for small projects, or does it only scale for teams?

For personal projects touching a couple of files, the overhead of writing the spec outweighs the benefit. I start finding it useful when a feature touches more than 5 files or when the agent is going to make more than 10 design decisions. Below that, a well-written prompt is enough.

How do you version specs alongside the code?

I keep them in a /specs folder at the repo root, named by feature and date: spec-auth-refactor-2025-06.yaml. When the feature closes, the spec stays as historical documentation. I don't delete them because they're useful for understanding why the code ended up the way it did — something I touched on when I audited who owns the code Claude writes.

Is there a risk the agent uses the spec to do things you didn't expect?

Yes, and it's the least-discussed risk. A spec that defines expected_output with new files can lead the agent to create those files even if during implementation it becomes clear they're unnecessary. The agent optimizes to satisfy the spec, not to find the simplest solution. I had to explicitly add "If a file listed in expected_output turns out to be unnecessary, flag it before omitting it" after an iteration where the agent created an empty file just to check off the list.

Does this change anything about supply chain risks in my dependencies?

Indirectly, yes. When the agent has freedom to choose dependencies, it can introduce packages that haven't gone through my audit process. I saw this when I simulated the same supply chain attack vector against my ML dependencies. I now have a allowed_dependencies section in the spec with an explicit allowlist, and one rule: "Any new dependency requires explicit approval before it gets added."

What I Accept, What I Don't Buy, and What's Still Pending

Specsmaxxing is an honest idea that solves a real problem: agents need structured context to stop inventing things. My own logs confirm this. Review time dropped, naming consistency improved, the patterns I asked for showed up.

But there's something I don't buy in the enthusiastic HN narrative: that the YAML is the solution to the quality problem. It's not. The YAML displaces the problem — from the prompt to the file, from execution time to writing time. And writing quality specs is a skill you have to develop the same way you develop good tests or good prompts. It's not free, it's not obvious, and nobody is teaching it yet.

Here's my point: if you start doing specsmaxxing and everything works perfectly on the first try, either the spec is too simple or you're not looking at it critically enough. The real maturity of this approach will come when we have linters for specs, when CI can verify that the agent's output actually satisfies the declared criteria, and when we treat the YAML with the same rigor we treat production code.

Until then, it's a powerful tool with a big blind spot. Use it with your eyes open.

If you're running agents in production and want to compare how you're structuring specs, reach out. I have strong opinions and concrete cases, and I'm genuinely curious whether the pattern I found applies beyond my stack.

Original source: Hacker News

This article was originally published on juanchi.dev

DEV Community

Specsmaxxing: I Wrote YAML Specs for My AI Agents — Here's What Changed (and What Didn't)

Specsmaxxing: I Wrote YAML Specs for My AI Agents — Here's What Changed (and What Didn't)

The Real Problem: Why Nobody Talks About the YAML Itself

How I Implemented It: Real Structure and Annotated Code

The Real Gotchas: Where Specsmaxxing Bites You

1. The YAML Inherits the Ambiguity of the Prompt

2. Negative Constraints Are Harder to Audit

3. The Spec Goes Stale Faster Than the Code

4. Nobody Versions the Spec With the Same Rigor as the Code

5. Goodhart's Law Applies Directly

FAQ: YAML Specs for AI Agents

What I Accept, What I Don't Buy, and What's Still Pending

Top comments (0)