DEV Community: Hunter Wiginton

# AI Coding Assistants Aren't Magicians: Why Pattern Matching Can't Replace Engineering Judgment

Hunter Wiginton — Wed, 20 May 2026 10:36:02 +0000

The technology is transformative. The hype is dangerous.

"Software engineering is dead." "AI will replace all coders by 2027." "You don't need to understand the code anymore."

I've watched talented people believe this. And I've watched them ship broken systems because of it.

Let me be clear upfront: I love AI. I use Claude Code, OpenCode, and various AI assistants every single day. I build production AI agents that process thousands of requests. This technology is genuinely transformative, and it's changing how we build software on a fundamental level. This technology is here to stay even if some or even most of its current applications end up going away when the bubble pops.

But there's a dangerous narrative taking hold. It says that AI coding assistants are so powerful that engineering knowledge is becoming obsolete. That anyone can ship software now. That the deep understanding engineers have of systems, architecture, and the specific machines they work on is no longer necessary.

This is wrong. And believing it will cost you a lot more than $20 per month. 🛠️

What AI Assistants Actually Are

Here's what the marketing says: "AI that understands your code."

Here's what's actually happening: statistical pattern completion based on training data.

AI coding assistants work by predicting the most likely next tokens based on the context you've provided. They draw from patterns seen in millions of code examples. They're incredibly good at recognizing "code that looks like this usually has code that looks like that."

But they have no actual understanding of what the code does. They can't reason about runtime behavior, only textual patterns.

This distinction matters enormously:

Pattern matching: "This looks like a database query, so I'll suggest connection pooling because I've seen that pattern before."

Engineering judgment: "Given our load patterns, latency requirements, and the fact that our database is across a WAN link, connection pooling with these specific settings will work, but we also need circuit breakers because this connection will fail during the 2 AM maintenance window."

The AI knows patterns. The engineer knows this system.

The Context Window Problem Nobody Discusses

Even the best AI models have context limits that are typically somewhere between 100K to 200K tokens. That sounds like a lot until you realize your production system has millions of lines of code across dozens of services, years of git history encoding institutional knowledge, and countless implicit assumptions baked into deployment pipelines.

AI literally cannot see your full architecture. It's working with a keyhole view of a mansion.

What this means in practice:

AI doesn't know about the service three hops away that depends on your API contract
AI doesn't see the monitoring dashboards that will break when you rename that field
AI can't understand the implicit assumptions in your CI/CD pipeline
AI has no idea that the "simple refactor" it's suggesting will break integration tests in a repo it's never seen

The danger? Non-engineers (and engineers who've gotten sloppy) assume that if the AI didn't warn about a problem, the problem doesn't exist.

It does. The AI just can't see it.

Pattern Matching Fails at the Edges

AI excels at common patterns. CRUD operations, authentication flows, standard algorithms, well-documented libraries. It's great at anything that looks like code it's seen millions of times.

AI fails catastrophically at the edges. Novel architecture decisions, system-specific edge cases, business logic that isn't in any training data, performance optimization for your specific load profile, security implications unique to your data model.

I've seen this firsthand building production AI agents:

Hallucinated parameters. One of my agents started calling tools with parameters I never defined. The AI "pattern matched" from similar tools in its training data and invented fields that didn't exist in my schema. The system crashed with validation errors before the tool could even execute.

Null assumption failures. AI-generated code assumed a timestamp field would always be present because timestamps always exist in the patterns it learned from. Production data disagreed. Records without that field caused null pointer exceptions. Users got error screens.

Context staleness. An AI agent made decisions based on cached data it couldn't know was stale. Users saw incorrect counts. Trust eroded.

The pattern is clear: AI fails at exactly the places where engineering judgment matters most. The edges. The exceptions. The "it depends" decisions that separate working systems from broken ones.

The Architecture Blindspot

Here's what engineers actually do that AI cannot:

Hold the whole system in their head. How services interact. Where bottlenecks hide. Which components are fragile. What happens when X fails while Y is under load. This holistic understanding doesn't fit in a context window.

Make tradeoff decisions with incomplete information. Consistency versus availability. Speed versus correctness. Technical debt versus shipping this quarter. These aren't pattern-matching problems, they're judgment calls that require understanding business context, team capabilities, and organizational priorities.

Anticipate failure modes. "What happens when the database is slow?" "What if this queue backs up?" "What if a user does something unexpected?" Pattern matching only knows happy paths from training data. Engineers have been burned enough to think adversarially. Some models are getting better at this, but it's far from what you get from an experienced engineer.

Understand business context. Why this feature matters. What "done" actually means. Which corners can be cut and which must be protected. AI has zero business context. It just has code patterns.

When you skip the engineer, or when engineers skip their own judgment, you ship code that works in demos and breaks in production. And when it breaks, nobody understands why.

The Real Danger: Learned Helplessness

Here's what concerns me most. I'm watching a generation of developers become dependent in ways that will hurt them.

Junior devs who can't debug without AI because they've always had AI fix things for them
Engineers who don't understand the code they've shipped because they just accepted AI suggestions
Teams where nobody actually knows how the system works because it was built through AI prompts
Technical debt mounting in ways nobody can untangle because it's AI-generated code that nobody fully grasped

The vicious cycle looks like this:

AI generates code
It mostly works
Engineer doesn't fully understand it but ships anyway
Bug appears in production
Engineer asks AI to fix it
AI patches the symptom without understanding root cause
Technical debt compounds
System becomes increasingly fragile
Eventually, a full rewrite is required by engineers who actually understand what they're building

The warning: If you can't debug the code without AI, you don't understand the system. And systems you don't understand will eventually betray you in ways you can't predict or fix.

How to Use AI Without Losing Your Edge

I'm not saying stop using AI. I use it constantly and it makes me significantly more productive. However, I use it with a specific mental model.

AI is a junior developer with perfect memory and zero judgment.

It can recall syntax and patterns instantly. It works fast. It never gets tired. But it also doesn't understand why it's suggesting what it's suggesting. It can't evaluate whether its suggestion is appropriate for your specific context.

That being said, here's how you should be treating your LLM coding assistant:

Review everything like you'd review a junior's PR. Trust the syntax, verify the logic. Never approve code you don't understand just because AI wrote it.
Use AI for acceleration, not replacement. Boilerplate generation? Great. Architecture decisions? That's your job. Test generation? Yes, then review every assertion. Business logic? Verify line by line.
Maintain your understanding. If AI writes it, you read it thoroughly. If you can't explain why the code works, don't ship it. Keep your debugging skills sharp, and practice without AI regularly so you don't atrophy.
Know where the edges are. AI shines on common patterns. Your value is the uncommon ones. Focus your attention on integration points, failure modes, and business logic.
Question the confident answers. AI sounds confident even when it's hallucinating. Especially verify suggestions that seem "too easy." If it feels like magic, it's probably wrong.

The Irreplaceable Value of Engineering Judgment

Here's what we actually get paid for as engineers:

Making decisions with incomplete information
Anticipating problems before they happen
Understanding systems holistically
Translating business needs into technical solutions
Knowing when to push back on requirements that don't make sense

None of this is pattern matching. 💡

The uncomfortable truth is that AI doesn't diminish the value of engineering, it raises the bar. The routine work gets automated. What remains is the hard stuff: judgment, architecture, tradeoffs, understanding.

Engineers who relied primarily on knowing syntax and common patterns are going to struggle. That work is being commoditized.

Engineers who relied on judgment, on understanding systems deeply, on making hard calls under uncertainty are going to thrive. That work is more valuable than ever because it's exactly what AI can't do.

The Balanced Take

Again, AI coding assistants are genuinely powerful. I use them daily. They save me hours on boilerplate, test generation, documentation, and code exploration. They've fundamentally changed how I work, and I wouldn't go back.

But they cannot replace the understanding that makes engineers valuable. They can't see your whole system. They can't make judgment calls. They can't anticipate failure modes unique to your architecture. They can't understand why the business needs what it needs.

If you're a non-technical founder thinking you can skip engineers and just prompt your way to a product, here is your warning. You're building a house of cards. It might stand for a while, but it will collapse eventually. And when it does, you'll need engineers who actually understand things to rebuild it.

If you're an engineer getting sloppy, accepting AI suggestions without understanding them, losing your debugging skills, the market will eventually correct for this. The engineers who maintain their judgment while leveraging AI speed will outcompete those who became dependent.

If you're learning to code: use AI to accelerate your learning. It's an incredible tool for exploration and getting unstuck. But learn the fundamentals. Understand what's happening under the hood. Those skills will save you when the AI fails, and it will fail, at exactly the moment you need it most.

The Future Belongs to Amplified Engineers

The narrative that AI will replace engineers misses the point. AI is a tool, albeit an extraordinarily powerful one. Like all tools, it amplifies what you bring to it.

If you bring deep understanding, sound judgment, and knowledge of your specific systems, AI amplifies that. You become dramatically more productive while maintaining the quality that matters.

If you bring nothing but the ability to prompt and accept suggestions, AI amplifies that too. You become fast at shipping code nobody understands, building systems that will eventually crumble.

The choice is yours.

Use the tool. Love the tool. But don't mistake the tool for the craftsperson.

Where have you seen AI-generated code fail in ways only engineering judgment could catch? I'm collecting war stories, so drop yours in the comments.

Debugging AI Agent Hallucinations: A Checklist from Production

Hunter Wiginton — Wed, 13 May 2026 12:10:00 +0000

The systematic approach I use after building agents that process thousands of requests daily

Your AI agent worked perfectly in testing. Then production happened — and suddenly it's inventing parameters that don't exist, calling tools with impossible values, and confidently returning nonsense.

Welcome to the hallucination problem nobody warns you about.

I build production AI agents that handle order processing, failure detection, and automated remediation. After debugging more hallucination incidents than I'd like to admit, I've developed a systematic checklist for tracking down these issues. This isn't about prompt engineering tricks. It's about building systems that don't let hallucinations happen in the first place.

The Problem Nobody Warns You About

When people talk about AI hallucinations, they usually mean factual errors — the model making up statistics or citing papers that don't exist. But agents have a worse problem: structural hallucinations.

Your agent doesn't just hallucinate facts. It hallucinates tool parameters. It invents API fields. It calls functions with arguments you never defined. And unlike factual hallucinations, structural ones break your system immediately and catastrophically.

The debugging checklist below comes from real production incidents. Each item addresses a specific failure mode I've encountered. 🛠️

The Checklist

1. Validate Your Tool Schemas Are Actually Being Followed

The Problem: The model invents parameters not in your schema.

I discovered this the hard way when one of my agents started failing intermittently. The logs showed tool calls with parameters I'd never defined. The model was actually hallucinating input fields that didn't exist in my schema, causing validation errors before the tool could even execute.

What to Check:

Are your schema definitions strict? If you're allowing additionalProperties: true, you're inviting hallucinations.
Is your model known for reliable tool-calling? Some models are significantly better than others at respecting schemas.
Are your parameter names unambiguous? Names like data or input invite creative interpretation.

Quick Fix: Log raw tool calls before execution. Compare against your schema. You'll often catch the hallucination before it causes downstream failures.

2. Handle Null and Missing Fields Defensively

The Problem: The agent assumes data exists that doesn't.

One of my agents processes failed tasks from an external system. It worked great until we hit records where the timestamp was null. The agent tried to access properties on null values, but when it came back empty it didn't crash. It made created a timestamp out of thin air, and left the user staring at data that didn't make sense.

The API documentation said the field would always be present. Production disagreed.

What to Check:

Are you validating API responses before passing them to the agent?
Does your tool return structured errors vs. raw exceptions?
Are optional fields actually marked optional in your types?

Quick Fix: Add null checks in your tool invocation layer. When data is missing, return those empty results, not hallucinations. Let the agent work with "no data found" rather than "there should always be data here"

3. Audit What the Agent Actually Sees (Context Debugging)

The Problem: The agent works with stale or incorrect context.

This one was subtle. My agent showed users a count of failed tasks: "You have 25 tasks requiring review." But after they fixed a few tasks and returned to the review screen, the count still showed 25 even though only 15 remained.

The agent was using cached context variables instead of re-fetching fresh data. It made decisions based on a world that no longer existed.

What to Check:

Is context being refreshed or cached between interactions?
When does the agent re-fetch data vs. use stored values?
Are there race conditions between user actions and agent reads?

Quick Fix: Log context state at every decision point. Add timestamps to cached data so you can see when staleness becomes a problem.

4. Test the Specific Model, Not Just "An LLM"

The Problem: Different models hallucinate differently.

I had an agent that worked flawlessly with one model. When we switched to a faster, cheaper model for cost optimization, the hallucination rate spiked. The new model was inventing tool parameters the old one never did, and it was caching context more aggressively.

Same prompts. Same schemas. Different model. Different failures.

What to Check:

Have you tested YOUR specific tool schemas with YOUR specific model?
Are you using a model optimized for tool use, or a general chat model?
Does the model respect required vs optional parameter distinctions?

Quick Fix: Create a tool-calling test suite that runs against each model you're considering. What works for GPT-4 might fail with Gemini, and vice versa. Test before you deploy. Also, this is where agent observability platforms can really save your bacon.

5. Make Errors Parseable, Not Exceptional

The Problem: Raw errors confuse the agent, leading to hallucinated recovery.

When my tools threw exceptions, the agent received error stack traces. It would then try to "interpret" what went wrong and guess what the correct response should have been. Sometimes it guessed right. Usually it didn't.

The agent was hallucinating recovery strategies for errors it didn't understand.

What to Check:

Do your tools return structured error responses?
Can the agent distinguish "no results found" from "error occurred"?
Are error messages actionable, or just stack traces?

Quick Fix: Wrap all tools to return a consistent structure:

{
  "success": true,
  "data": [...],
  "error": null
}

Or on failure:

{
  "success": false,
  "data": null,
  "error": "Task ID not found in database"
}

The agent can reason about structured errors. It cannot reason about NullPointerException at line 247.

6. Constrain the Solution Space

The Problem: Too much freedom equals too much hallucination.

When I let my agent fetch "all failed tasks," it sometimes returned hundreds of items and then hallucinated patterns in the data that didn't exist. Limiting the fetch to 25 items at a time dramatically reduced hallucination rates.

Less data to process meant less opportunity for creative interpretation.

What to Check:

Are response sizes bounded?
Are enum values explicitly listed in your schema, or are you using free-form strings?
Does the agent have "escape hatches" that encourage invention?

Quick Fix: Add explicit limits everywhere. Use enums instead of strings where possible. The tighter the constraints, the less room for hallucination.

7. Log at the Boundary, Not Just the Output

The Problem: You see the hallucination but not what caused it.

The agent returned wrong data. But was it a hallucination in reasoning? A bad tool response? Stale context? Without boundary logging, you're debugging blind.

What to Log:

Raw input to the agent (full context)
Tool call request (what the agent asked for)
Tool call response (what it received)
Agent's reasoning (if your framework exposes it)

Quick Fix: Implement structured logging with correlation IDs. When something fails, you should be able to replay the exact sequence: context → tool call → response → output. 💡

The Meta-Lesson

Here's what debugging dozens of hallucination incidents taught me:

Agents amplify your architecture's weaknesses.

If your API has inconsistent null handling, agents will stumble on it
If your schemas are ambiguous, agents will interpret creatively
If your error handling is sloppy, agents will hallucinate recovery

The fix isn't better prompts. It's better systems.

Every hallucination I've debugged traced back to a system weakness — loose schemas, missing validation, stale caches, inconsistent error handling. The agent just exposed what was already broken.

Quick Reference Checklist

Save this for your next debugging session:

□ Tool schemas are strict (no extra properties allowed)
□ Null/missing fields handled before agent sees them
□ Context is fresh at decision points
□ Model tested specifically for tool-calling
□ Errors return structured responses, not exceptions
□ Response sizes are bounded
□ Logging captures: input → tool call → response → output

Your Turn

What's the weirdest hallucination you've debugged in production? Drop it in the comments — I'm collecting war stories.

TDD is Backwards: Why Prototype-First Development Ships Better Software

Hunter Wiginton — Thu, 07 May 2026 15:23:52 +0000

Stop writing tests before you know what you're building

You're about to build a new feature. The TDD playbook says write the tests first. But what tests? You don't even know what the API should look like yet. You don't know if this approach will work. You spend 2 hours writing tests for an interface that you'll rewrite in 30 minutes once you actually understand the problem.

This isn't learning. It's cargo-cult development.

I've spent the last year building multiple production tools. I've built a CLI for repository intelligence, a suite of workflow automation scripts, production agents for an enterprise system, and not one started with tests. All shipped successfully. Recently, my team at work switched from Behavior Driven Development (BDD) to Specification Driven Development (SDD), and the lightbulb finally clicked.

There's a better path: build the prototype first, formalize it with specifications, then let those specs drive your tests. This isn't cowboy coding, it's pragmatic engineering that respects how software actually evolves.

The TDD Ritual We Keep Performing

Test Driven Development has become religious dogma. The ritual goes like this:

Write a failing test (red)
Write minimal code to pass (green)
Refactor
Repeat

The benefits sound compelling: testable code, thoughtful interfaces, regression safety, no over-engineering. It's been called an "industry best practice" for so long that questioning it feels like heresy.

But here's the hidden assumption that breaks everything: TDD assumes you already know what you're building.

When you're implementing a known algorithm like sorting, searching, or standard data structures, then TDD works beautifully. The interface is predetermined. The behavior is well-defined. You're translating a spec that exists in your head (or a textbook) into code.

But when you're exploring a new problem space, and you don't know if your approach will even work, TDD falls apart.

The evidence is everywhere if you look:

Surveys consistently show less than 30% of developers practice strict TDD
Successful open-source projects rarely start with comprehensive test suites
Early-stage startups ship working prototypes first, tests later
Even TDD advocates describe it as "difficult" and requiring "discipline", which is usually just code for "this doesn't feel natural"

What Actually Happens When Building Something New

Let me show you what really happens when you're solving a novel problem.

RepoG: Repository Intelligence CLI

I built RepoG, a CLI tool that provides semantic search and AI-powered analysis over your git repositories. It's now published to Homebrew with real users.

I didn't write a single test during initial exploration.

Here's what the development actually looked like:

Week 1: Built repog init, repog sync, and repog embed commands by trying different approaches. I experimented with three different chunking strategies before finding one that actually worked for code.

Week 2: Evaluated vector databases. I Tried Pinecone. I Tried Weaviate. I Tried Qdrant. Then I settled on SQLite with the sqlite-vec extension. Each attempt involved real code, real API calls, real performance testing.

Week 3: Discovered the API surface that made sense. I Added tests during v0.1.0 finalization only after I understood what the tool actually needed to do.

Result: Shipped to production. Published to Homebrew.

If I'd started with TDD:

All tests for chunking strategy #1 would be deleted
All tests for Pinecone integration would be deleted
All tests for the original API design would be rewritten
I would have wasted hours testing interfaces that never shipped

The tests I eventually wrote? Rock solid. Why? Because they validated a stable API that I understood deeply after building it.

Staksmith: My Personal Workflow Automation

I built five workflow automation skills: Inbox Gradient Accelerator (auto-classifies notes using AI), Weekly Momentum Report (aggregates git commits and tasks), Code-to-Docs Sync (detects documentation drift), and two others.

Zero tests across all five skills.

Why? They're exploratory bash scripts combined with AI prompts. The "test" is simple: does this actually save time in my workflow?

I iterated on prompt engineering, confidence thresholds, output formats based on real usage. Each script was rewritten 3-5 times as I discovered what actually mattered.

Tests would have been rewritten alongside every iteration. Or worse—I would have felt pressure to keep a bad design just because I'd invested time writing tests for it.

Enterprise Agent Development

At work, I build production agents that process thousands of requests daily. Recently, I built an agent that identifies failed dispatch tasks requiring manual intervention.

The development process:

Tried one AI model, discovered it hallucinated tool parameters
Switched models, refined tool schemas based on actual API behavior
Discovered edge cases: null timestamp fields, missing triggered dates
Refined error handling based on production data

Tests written upfront would have validated hallucinated interfaces that never existed in production.

The pattern is clear: when you don't know what you're building, tests are documentation of ignorance.

Specification-Driven Development: The Missing Link

In April 2026, my team made a subtle but profound shift from Behavior Driven Development (BDD) to Specification Driven Development (SDD).

BDD said: Write behavior specs in Gherkin format, let those drive tests.

SDD says: Write comprehensive product specifications, let those drive everything.

The critical difference? BDD still wants you to specify behavior before understanding the problem deeply. SDD acknowledges you need a working prototype to write meaningful specifications.

The SDD Workflow

Phase 1: Prototype (Exploration)

Build a working proof-of-concept. Try different approaches. Understand the actual problem space.

No tests yet. You're learning.

Phase 2: Specify (Formalization)

Once you have a working prototype, document what it should do, not just what it currently does.

Define clear boundaries and constraints, specify edge cases and error handling, outline expected behaviors and outcomes, and create a formal specification document.

Here's what a real specification looks like (simplified from used by real engineers at a real software company):

# Failed Task Identifier Specification

## Purpose
Identify failed tasks requiring manual intervention

## Input Constraints
- Must handle null timestamp fields
- Must validate before making API calls
- Must return structured error responses (not raw errors)

## Expected Behaviors
- Fetch tasks from last 7 days by default
- Filter by status: FAILED
- Return count + task details
- Handle API errors gracefully (return empty list, not error)

## Success Criteria
- Zero hallucinated parameters
- Consistent counts across multiple invocations
- Proper null checking prevents runtime errors

The specification becomes your source of truth.

Phase 3: Test (Validation)

Now you write tests based on the specification.

Tests validate the spec, not your exploration. Tests document intended behavior, not implementation accidents. Tests remain stable as implementation details change.

Phase 4: Iterate (Refinement)

When requirements change:

Update the specification
Update tests to match new spec
Refactor implementation knowing spec + tests protect you

Why This Works

Specifications require domain understanding, and you get that from prototyping.

Tests validate specifications (which are stable), not implementations (which change frequently during exploration).

The spec becomes living documentation that guides future development.

I converted the failed task agent to SDD after building it. The specification revealed gaps I'd missed: inadequate error handling, missing validation, inconsistent behavior under edge cases. Now the tests validate against the spec, and when I refactor the implementation, the tests don't break because they're testing behavior, not structure.

When TDD Actually Makes Sense

I'm not anti-testing. I'm anti-premature-testing.

TDD is excellent for:

1. Implementing Known Algorithms

Sorting, searching, data structure operations. The interface is predetermined, and the behavior is well-defined. You're just translating a known specification into code.

2. Bug Fixes with Regression Tests

Write a test that reproduces the bug. Fix the bug. Test prevents regression. This is actually where TDD came from.

3. API Contract Enforcement

Public APIs with versioning commitments. Breaking changes are expensive. Tests document and enforce the contract.

4. Refactoring Existing Code

You know what it should do because it already does it. Tests ensure behavior preservation during refactoring.

The key distinction:

TDD works when the problem is known
Prototype-first works when the problem is unknown
SDD bridges the gap between exploration and formalization

The Modern Development Workflow 🛠️

Here's the pragmatic approach that respects how software actually evolves:

Unknown Problem → Prototype → Specify → Test → Production
   (Explore)      (Discover)  (Formalize) (Validate) (Maintain)

Stage 1: Prototype

Goal: Does this approach even work?

Tools: REPL, throwaway scripts, experimental code

Output: Working proof-of-concept

Tests: None yet

Stage 2: Specify

Goal: What should this do?

Tools: Specification documents, Architecture Decision Records (ADRs)

Output: Formal requirements and constraints

Tests: Not yet, spec comes first

Stage 3: Test

Goal: Does it meet the specification?

Tools: Unit tests, integration tests, end-to-end tests

Output: Test suite validating spec compliance

Tests: Now write tests driven by the specification

Stage 4: Iterate

Goal: Maintain and evolve

Process: Spec change → Test update → Implementation update

Tests remain stable because they validate the spec, not implementation details.

Real Examples

ProtoFlow (my subscription-based prototyping service):

Started with comprehensive implementation plan (specification-first)
Building features based on spec
Tests will validate: subscription tier limits, request workflows, file handling
Spec written before code because the problem is well understood (I've seen similar apps)

RepoG (my repository intelligence CLI):

Problem was novel (semantic search over repos with multi-model support)
Prototyped first, discovered constraints, then formalized
Tests written after understanding the actual requirements

Different problems require different approaches. That's the point.

Why This Matters in the AI Era 💡

AI coding assistants like Claude Code, GitHub Copilot, and Cursor have fundamentally changed the economics of software development.

With these tools:

Generating tests is trivial
Generating implementation code is trivial
Understanding what to build is not trivial

The new bottleneck is specification, not implementation.

What AI Can't Do

Decide what problem to solve
Determine the right abstraction level
Make architectural trade-offs
Write meaningful specifications (requires deep domain understanding)

What AI Excels At

Generating tests from specifications
Implementing code to match specs
Refactoring while preserving behavior
Finding edge cases in specifications

The Economic Shift

Old world: Tests were expensive to write, so write them first to ensure good design.

AI world: Tests are cheap to generate, but specifications are expensive to write well. Prototype first to inform specifications.

Your time is better spent:

Building prototypes to understand the problem space (AI assists)
Writing clear specifications based on what you learned (human insight)
Letting AI generate tests that validate the spec (AI excels)
Iterating on real usage (human judgment)

I use Claude Code daily. It can generate a comprehensive test suite from my specification in minutes. It cannot tell me if I'm solving the right problem.

Handling the Pushback

"But TDD forces better design!"

No. Specifications force better design. TDD just forces testable code, which isn't the same thing.

Testable code can still have terrible abstractions, leaky boundaries, and solve the wrong problem. Prototype-first lets you discover the right design through exploration, then formalize it with specifications.

"Without tests first, you'll write untestable code!"

Only if you never write tests. The SDD workflow includes tests, they're just written after you understand what you're testing.

Modern refactoring tools (especially AI-assisted) make it straightforward to retrofit testability. I've refactored entire modules to be more testable after the fact using Claude Code. It took hours, not weeks.

"This is just cowboy coding with extra steps!"

Let's be clear about the differences:

Cowboy coding: No tests, no specs, ship and pray

This approach: Prototype → Specify → Test → Ship with confidence

The specification step is the discipline. It's actually more rigorous than TDD because it forces you to think about the problem holistically. You're not just thinking about testable interfaces, but the entire behavior, edge cases, error handling, and success criteria.

"What about code coverage?"

Code coverage is a metric, not a goal.

100% coverage of the wrong abstraction is worthless. Better: 80% coverage of well-specified behavior after understanding the problem deeply.

I've seen codebases with 95% test coverage that were impossible to change because tests were coupled to implementation details. I've seen codebases with 60% test coverage that were easy to maintain because tests validated behavior through specifications.

Test the right things.

Practical Guidelines

Use the right tool for the job.

When to prototype first:

Building something new or novel
Unclear problem space
Evaluating multiple approaches
Early-stage product development
Exploratory automation and tooling

When to specify first (SDD):

Well-understood problem
Clear requirements upfront
Regulated industries
Public APIs
Team collaboration on defined features

When to use TDD:

Implementing known algorithms
Bug fixes
Refactoring existing code
API contract preservation

Red Flags You're Doing TDD Wrong

Rewriting tests multiple times during initial development
Tests that just mirror implementation
"Testing" private methods
Extensive mocking to make tests pass
Tests that break on every refactor

Green Flags You're Doing SDD Right

Specification is readable by non-programmers
Tests validate specification, not implementation details
Specification includes edge cases discovered during prototyping
Specification guides future development decisions
Tests remain stable as implementation evolves

A Template for Specifications

If you're wondering what a full SDD specification looks like, I've created a generalized template based on what my team uses at work (adapted to be product-agnostic rather than agent-specific).

The template includes sections for:

Purpose & Success Metrics: What this does and how you'll measure success
Context: When to use this (and when not to use it)
Dependencies: What you need from other teams/systems
User Workflow: End-to-end flow with error handling
Technical Specification: API contracts, data models, external dependencies
Acceptance Criteria: Happy path, edge cases, error handling
Examples: Real inputs/outputs with business value explanations
Testing Strategy: What to test and how

Get the complete SDD Template — includes both Markdown and PDF versions.

The key insight: you fill this out after prototyping, when you actually understand the problem. Then the specification drives your tests.

Test the Right Thing at the Right Time

The TDD dogma assumes we know what we're building before we start. But most interesting software problems require exploration first, formalization second.

The modern workflow respects this reality:

Prototype when the problem is unclear
Specify once you understand what you're building
Test to validate the specification
Iterate with confidence

This isn't abandoning testing. It's testing smarter.

Specifications informed by working prototypes lead to better tests than tests written in a vacuum. Tests that validate specifications remain stable as implementations evolve. Tests that validate implementation details break constantly.

Your job as an engineer is to solve problems, not to follow rituals.

Sometimes that means writing tests first—when you're implementing a known algorithm, fixing a bug, or enforcing an API contract.

Often it means building a working prototype, understanding what you learned, formalizing it with specifications, and then writing tests that validate those specifications.

Test the right thing at the right time. Everything else is dogma.

Which approach matches how you actually work — TDD, prototype-first, or somewhere in between? Drop your take in the comments. 💡

Claude Code + Obsidian: How I Built an AI-Powered Second Brain

Hunter Wiginton — Wed, 29 Apr 2026 19:13:19 +0000

I've been using Obsidian with the PARA method for a while now. It's great for organizing notes, but I always felt like I was only scratching the surface of what a personal knowledge management system could do. The notes were there, but finding connections, processing my inbox, and actually using my accumulated knowledge required more manual effort than I wanted.

Then I discovered that Claude Code — Anthropic's CLI tool — could be pointed at any directory, not just code repositories. That's when things got interesting. 🛠️

Over the past few weeks, I've built a set of custom slash commands that turn Claude into an intelligent assistant for my Obsidian vault. It can now process my inbox using PARA principles, trace how ideas have evolved over time, find unexpected connections between topics, and even answer questions the way I would based on my own writing. This article walks through exactly how I set it up and the commands I created.

The Foundation: CLAUDE.md

Before creating custom commands, you need to give Claude context about your vault. Claude Code looks for a CLAUDE.md file in the root of whatever directory it's working in. This file teaches Claude how your system works.

Here's the structure I use:

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview

This is an Obsidian vault organized using the PARA method (Projects, Areas, Resources, Archive). All notes are in Markdown format with Obsidian-specific syntax.

## Folder Structure

- `0. Inbox/` - Unsorted notes and incoming content
- `0.1 Tasks_List/` - Master task aggregation using Obsidian Tasks plugin
- `1. Projects/` - Active projects with deadlines
- `2. Areas/` - Ongoing responsibilities, no end date
- `3. Resources/` - Reference materials, topics of interest
- `4. Archive/` - Completed/inactive items
- `_templates/` - Obsidian templates for new notes
- `_Weekly/` - Weekly notes organized by year (YYYY-WXX format)

## Obsidian-Specific Syntax

### Task Queries
The vault uses the Obsidian Tasks plugin. Task queries look like:

`tasks
not done
path includes 1. Projects/ProjectName
`

### Internal Links
Standard Obsidian `[[wikilinks]]` are used for note linking.

The key is to explain your organizational system, any plugins you use, and the syntax patterns Claude should expect. This context makes every subsequent interaction more useful.

Creating Custom Slash Commands

Claude Code supports custom slash commands through markdown files in .claude/commands/. The filename becomes the command name—so trace.md becomes /trace. Each file contains instructions that Claude follows when you invoke the command.

Here's the directory structure:

.claude/
└── commands/
    ├── trace.md
    ├── sync.md
    ├── connect.md
    ├── inbox.md
    ├── graduate.md
    ├── ghost.md
    └── challenge.md

Here's each command I created and why I built it.

Command 1: /sync — Load Your Full Context

The Problem: Every time I started a new Claude Code session, I had to re-explain what I was working on, what my priorities were, and what projects were active.

The Solution: The /sync command loads my entire current context in one shot.

Usage:

/sync

What it does:

Reads recent weekly notes (last 7 days)
Scans all active project folders
Loads the Master Task List
Checks recent inbox items
Finds all notes modified in the last 7 days
Searches for priority indicators (focus, urgent, important)
Outputs a structured summary

The output looks like:

# Current Context Sync

## Active Projects
- Repog - Working on bug fixes
- BillScribe - MVP feature complete, testing phase

## Current Focus
Semantic search

## Open Tasks
### High Priority
- Complete API documentation
- Review PR for auth flow

## Recent Activity (Last 7 Days)
[Summary of what's been worked on]

I run this at the start of every session. It's like giving Claude a brain dump of my current state so we can pick up right where I left off.

Command 2: /trace — Track How Ideas Evolve

The Problem: I'd have a vague sense that I'd written about something before, but couldn't remember where or how my thinking had changed over time.

The Solution: The /trace command builds a timeline of any topic across my vault.

Usage:

/trace recursion
/trace "knowledge graphs"
/trace AI agents

What it does:

Searches the vault for all mentions of the topic
Gathers file creation and modification dates
Extracts [[wikilinks]] to find connections
Outputs a timeline showing first appearance, evolution, and current connections

The output looks like:

# Idea Timeline: [Topic]

### First Appearance
- **Date**: 2025-08-15
- **File**: 2. Areas/Software_Engineering/Recursion.md
- **Context**: Initial notes from algorithm course

### Evolution
- **2025-09-22** - Applied in OMS_Agents project
- **2025-11-03** - Connected to knowledge graph traversal

### Current State
- Total mentions: 12
- Most connected notes: [[Graph Traversal]], [[Algorithm Patterns]]

This has been invaluable for writing and for understanding how my thinking develops over time.

Command 3: /connect — Find Unexpected Relationships

The Problem: I suspected two ideas were related but couldn't see the connection. Or I wanted to discover relationships I hadn't noticed.

The Solution: The /connect command traces paths through my wikilink graph.

Usage:

/connect recursion and machine learning
/connect AI agents, knowledge graphs
/connect

What it does (with topics):

Builds a link graph from all [[wikilinks]]
Finds notes mentioning each topic
Traces connection paths (direct, one-hop, two-hop)
Identifies bridge notes connecting both domains

What it does (without topics):

Maps the entire vault's link structure
Identifies isolated clusters of notes
Finds semantically similar but unlinked notes
Suggests bridge opportunities

Running /connect with no arguments is like getting a health check on your knowledge graph. It shows you orphan notes, isolated clusters, and connections you might want to make.

Command 4: /inbox — PARA-Aware Inbox Processing

The Problem: My inbox would accumulate notes faster than I could process them. Deciding where each note should go required mentally loading my entire folder structure.

The Solution: The /inbox command processes each note using PARA principles and asks for confirmation before moving anything.

Usage:

/inbox

What it does:

Inventories all notes in 0. Inbox/
Maps existing structure in Projects, Areas, Resources, Archive
For each note, presents a recommendation:

## Building_A2A_Compatible_Agents.md

**Content Summary:** Article highlights about A2A agent protocols

**Recommended Destination:** 3. Resources/Software_Engineering/AI_ML_&_Agents/
**Reason:** Reference material about AI development patterns

**Alternative Locations:**
- 1. Projects/OMS_Agents/ - relates to active project
- 2. Areas/Lorien_AI/ - relates to ongoing AI work

**Action?**
1. Move to recommended location
2. Move to alternative 1
3. Move to alternative 2
4. Skip (leave in inbox)

Waits for confirmation before moving each file
For multi-relevance notes, moves to Resources and creates links in other locations

The per-file confirmation is crucial. I don't want an AI bulk-moving my notes to the wrong places. This way I stay in control while Claude does the heavy lifting of analyzing content and suggesting destinations.

Command 5: /graduate — Extract Ideas from Weekly Notes

The Problem: My weekly notes were full of half-formed thoughts that deserved their own space, but I never went back to develop them.

The Solution: The /graduate command scans weekly notes for undeveloped ideas and promotes them to standalone files.

Usage:

/graduate        # Last 4 weeks
/graduate 2      # Last 2 weeks
/graduate all    # All weekly notes

What it looks for:

Standalone observations not tied to tasks
Unanswered questions
"I think...", "Maybe...", "What if..." statements
Parenthetical asides with novel thoughts
Reflections and realizations
Half-finished thoughts

What it creates:

# [Core Claim as Title]

**Graduated from**: [[2026-W11]]
**Date**: 2026-03-19
**Status**: Seedling

## Core Claim
[One clear sentence stating the idea]

## Context
[What prompted this thought]

## Original Excerpt
> [Quote from the weekly note]

## Connections
- [[Related Note]] - [how it connects]

## Questions to Explore
- [Questions this raises]

Graduated notes go to 0. Inbox/Graduates/ so they can be processed by /inbox later. This creates a nice pipeline: ideas surface in weekly notes, get graduated to their own files, then get filed into the appropriate PARA location.

Command 6: /ghost — Answer Questions in Your Voice

The Problem: Sometimes I need to draft a response or think through a question, but I want it to sound like me and reflect my actual beliefs.

The Solution: The /ghost command answers questions based on my writing style and stated beliefs.

Usage:

/ghost What's the best way to learn a new programming language?
/ghost Should startups use microservices?
/ghost How do I balance work and side projects?

What it does:

Searches for relevant notes on the topic
Analyzes my writing style (tone, argument patterns, vocabulary)
Extracts my stated beliefs with source citations
Synthesizes an answer in my voice
References specific notes naturally

The output:

# How I Would Answer: "Should startups use microservices?"

[Answer written in my voice, referencing my actual opinions]

## Sources Used
- [[Microservices_Out_Monoliths_Back_In]] - skepticism about microservices for small teams
- [[Infrastructure_Design_Decisions]] - preference for simplicity

## Voice Notes
- **Tone**: Direct, practical, slightly contrarian
- **Key principles applied**: Simplicity over scalability premature optimization
- **Confidence level**: High (multiple notes on this topic)

This is great for drafting emails, preparing for discussions, or just externalizing my thinking on a topic.

Command 7: /challenge — Stress-Test Your Beliefs

The Problem: Before making big decisions, I wanted to pressure-test my thinking. Where are my blind spots? What assumptions am I making?

The Solution: The /challenge command finds contradictions and weak points in my beliefs on any topic.

Usage:

/challenge microservices architecture
/challenge my approach to time management
/challenge the decision to change jobs

What it finds:

Direct contradictions: Note A says X, Note B says not-X
Hidden assumptions: Unstated premises my beliefs depend on
Reasoning weaknesses: Logical gaps, unsupported leaps
Missing perspectives: Viewpoints I haven't considered

The output:

# Belief Stress Test: [Topic]

## Your Current Position
[Summary of stated beliefs]

## Contradictions Found
### Contradiction 1: Simplicity vs. Scalability
**Position A:** "Always start with a monolith"
**Position B:** "Design for scale from day one"
**The tension:** These can conflict when...

## Hidden Assumptions
### Assumption 1: Team size stays small
**You're assuming:** Your team won't grow significantly
**But what if:** You need to onboard 10 engineers next quarter?

## Questions Worth Sitting With
1. What would change your mind about this?
2. Who disagrees with you that you respect?

## Overall Assessment
- **Belief coherence:** Medium
- **Assumption risk:** High on team size assumption
- **Recommended action:** Clarify conditions under which each approach applies

Running /challenge before a big decision has already saved me from a few mistakes. It's like having a thoughtful devil's advocate on demand.

Command 8: /ideas — Generate Fresh Ideas from Your Patterns

The Problem: When I wanted inspiration for what to build, write, or explore next, I'd either stare at a blank page or browse the internet for ideas that had nothing to do with my actual interests.

The Solution: The /ideas command mines my vault for patterns and generates ideas grounded in what I'm already curious about.

Usage:

/ideas

What it does:

Scans recent activity (last 30 days)
Analyzes weekly notes, projects, areas, and resources
Identifies recurring themes, frustrations, and unanswered questions
Finds people mentioned but not contacted
Spots tool opportunities from manual processes
Surfaces writing topics based on opinions and experiences

The output:

# Ideas Report

## Tools to Build
### High Potential
#### 1. Vault Link Validator
**The Problem:** Broken wikilinks accumulate over time
**Evidence:** Found complaints in [[2026-W10]], [[2026-W08]]
**Your Advantage:** Already familiar with Obsidian plugin API
**First Step:** Audit current broken links

## People to Reach Out To
### High Priority
#### 1. [Expert in Knowledge Graphs]
**Why:** Directly relevant to OMS work
**Context:** Mentioned in [[AI_Agents_Landscape]]
**Angle:** Ask about graph traversal patterns

## Topics to Investigate
### Deep Dives Needed
#### 1. Vector Embeddings for Note Retrieval
**Current Understanding:** Basic concept only
**Gap:** Implementation details for local-first apps
**Why It Matters:** Could improve /connect command

## Things to Write
### Ready to Write
#### 1. "Why Weekly Notes Beat Daily Notes"
**Core Argument:** Less pressure, better reflection
**Supporting Notes:** [[How_I_Never_Forget_Anything]], weekly templates
**Unique Angle:** PARA integration perspective

The best part is that every idea comes with evidence from my own notes. It's not generic brainstorming—it's pattern recognition on my actual interests.

The Weekly Review Workflow

These commands work together in my weekly review:

Start with /sync to load current context
Run /graduate to extract ideas from weekly notes
Run /inbox to process any accumulated notes
Use /connect (no args) to check for orphan notes and missed connections
Run /challenge on any decisions I'm considering
Run /ideas monthly to generate fresh directions based on patterns

This workflow keeps my vault healthy while surfacing ideas that might otherwise get lost.

Getting Started 💡

If you want to set this up for your own vault:

Install Claude Code - Follow the instructions at claude.ai/code
Create CLAUDE.md in your vault root with your folder structure and syntax patterns
Create .claude/commands/ directory
Add command files - Each .md file becomes a slash command
Run Claude Code from your vault directory: claude

The commands I've shared are tuned for my PARA setup, but the patterns transfer to any organizational system. The key insight is that Claude Code isn't just for code—it's for any directory of text files. And Obsidian vaults are exactly that.

Creating Your Own Commands

The best part about this setup is that you don't need to write the command files yourself. Just describe what you want to Claude Code, and it will create the command for you.

Here's an example prompt:

"Please create a slash command called /review that scans my weekly notes from the past month and generates a summary of what I accomplished, what's still in progress, and what I learned. It should organize findings by project and highlight any recurring themes or blockers."

Or something simpler:

"Create a command called /random that picks a random note from my vault that I haven't opened in over 30 days and suggests why I might want to revisit it."

The key elements of a good command prompt:

Name the command - What you'll type to invoke it (/review, /random)
Describe the input - What it should scan or take as arguments
Specify the output - What format you want the results in
Add constraints - Any rules or exceptions to follow

Claude will create the .md file in .claude/commands/ with detailed instructions. You can then refine it by asking for changes or editing the file directly.

What's Next

I'm still experimenting with new commands. Some ideas I'm exploring:

/weekly - Generate the weekly note template with pre-filled context
/research - Deep dive into a topic using both vault content and web search
/publish - Prepare a note for publishing by checking links and formatting

The meta-insight here is that your knowledge management system can be programmable. Instead of just storing and linking notes, you can build workflows that actively work with your knowledge. Claude Code makes this accessible without needing to write actual code — you just write instructions in plain English.

In 2026, there's no reason your notes should just sit there. Put them to work.

If you're using Obsidian and want to try this setup, start with just /sync and /inbox. Those two commands alone will change how you interact with your vault.

Which command would be most useful for your workflow? Drop it in the comments — I'm curious what problems you'd solve first. ✨