DEV Community

keeper
keeper

Posted on

You Know Where to Stand. Here's How to Build the Ground.

The Conversation So Far

Two posts ago, I asked an innocent question: how do you test AI-generated code?

That question metastasized. It led through epistemology (what does AI actually know?), through cognitive science (what does it mean to have knowledge vs. process information?), and into a surprising terminus: the five-layer framework. A map of what AI can replace, what it can approach, and what it structurally cannot reach — because some knowledge is earned through lived time, not compressible into tokens.

That post was the epistemology of the AI era. It answered: what is the difference between human and machine knowledge?

Last post, I turned the map sideways. If the layers describe what AI can and cannot do, then they also describe where the market is going. Layer 1 is a blood-red ocean (application knowledge, commoditizing now). Layer 4 and Layer 0a are deep blue (meta-cognitive creation, embodied grounding — structurally irreplaceable).

That post was the strategy of the AI era. It answered: where should you stand?

The conclusion was sharp: stand perpendicular to AI's penetration direction. Never compete on the layer AI is currently eating.

But here's the thing about standing perpendicular — it's a direction, not a path. You know where to stand. You don't yet know how to build the ground under your feet.

This post closes the trilogy. It's the methodology: the operating system you install to keep yourself safe as the ground shifts beneath you.


The Five-Step Operating Cycle

The framework isn't a one-time insight. It's a living map that needs maintenance. Here's the cycle I've been running for the past year — five steps, repeated at different cadences.

Step 1: Map — Draw Your Domain on the Framework

Every quarter, I spend two hours doing one thing: redrawing the map of my domain against the five layers.

Not the generic framework. My specific domain in this quarter.

Here's what that looks like for me right now, as someone building AI quality infrastructure:

Layer What It Looks Like in My World State This Quarter
Layer 4 (Meta-Cognition) Can I create a new category of quality tool? Exploring — "verification-as-learning" is a new framing
Layer 3 (Meta-Domain) How do I design verification loops that improve over time? Building — ai-qc's core loop
Layer 2 (Craft) Writing Python, infrastructure, CI/CD Maintenance — well-understood
Layer 1 (Application) Prompt engineering, API patterns, tool usage Commoditized — not worth investing in
Layer 0b (Instrumental) N/A in my domain Skipped
Layer 0a (Embodied) The 15 years of shipped products, outages, and scars that inform my judgment Deepest asset — irreplaceable

The mapping has two rules:

Rule 1: Be brutal about Layer 1. Anything you could learn from a manual, a course, or a 30-minute YouTube tutorial is Layer 1. If you're spending energy there, stop. AI will do it cheaper tomorrow.

Rule 2: Be honest about Layer 0a. Your experience is valuable only if you actually learned from it. Five years of repeating the same mistakes is not embodied grounding. It's fossilized habit.

Every quarter, I ask: has anything moved? Has AI crossed into a layer that was blue last quarter? Has a new capability opened a position I couldn't occupy before?

The output of this step is a list of exactly two bets: one investment (the layer I'm pushing into) and one divestment (the layer I'm letting go of).

Step 2: Position — Pick Your Next Six Months

The map gives you clarity. The positioning step gives you focus.

Using the three principles from the strategy post:

  1. Margin collapses where AI penetrates. So where is AI penetrating right now in your specific domain?
  2. Premiums shift to the layer above. So what's the layer above your current center of gravity?
  3. Stand perpendicular. So what dimension of value can you offer that is orthogonal to AI's current trajectory?

For me, the positioning for H2 2026 looks like:

  • Divest from: building generic AI tools (Layer 1-2) — the market is saturated
  • Invest in: building domain-specific validation systems (Layer 3) — each industry needs its own verification grammar
  • Experiment with: writing the philosophy of why this matters (Layer 4) — which creates the demand for Layer 3

The output of this step is a single sentence: "For the next six months, my center of gravity is Layer X, serving domain Y, through mechanism Z."

Step 3: Defend — Audit Your Moat

This is the step most people skip. They map. They position. Then they rush to build without checking whether their moat is real.

I use what I call the Three Incompressibles — checks that your position is structurally defensible:

1. The Garbage Time Check. Is there a chunk of work in your process that cannot be automated because it requires human judgment that can only be earned through doing it badly first?

If your entire workflow could be handed to an intern with an AI tool and a checklist, your moat is thin. The things that require someone who has made the mistake before, who feels the warning signs — those are your moat.

For a senior engineer, this is debugging at 3 AM. For a doctor, it's the diagnosis that doesn't fit any textbook. For a lawyer, it's the case where precedent is silent and you have to argue from principle.

2. The Long-Tail Failure Check. Can you enumerate all the ways your system will fail?

If you can, AI can handle it. The value is in the failures you cannot predict — the emergent behaviors, the novel edge cases, the systemic side effects that only reveal themselves under real conditions. Your ability to see those coming (and respond when you can't) is irreplaceable.

3. The Trust Credit Check. If you handed your output to someone who trusts you, what are they trusting you *for?*

Trust is the ultimate incompressible. It takes time to build, cannot be transferred, and is destroyed in an instant. The trust someone places in your judgment specifically — not in the general capability of AI systems — is pure Layer 0a.

The output of this step is a concrete list: these are the three things I protect. Everything else can be automated, outsourced, or AI-augmented.

Step 4: Build — Systematize Your Judgment

This is where method meets craft. The goal: encode your judgment into systems that scale beyond you.

The key insight is subtle but critical:

A judgment that only you can make is a bottleneck. A judgment that you can systematize is a lever.

Your goal is not to make yourself irreplaceable as an individual. That's a trap. Your goal is to make your approach repeatable, so that:

  • Your team benefits from your judgment when you're not in the room
  • Your tools catch what you've learned to catch
  • Your next level of judgment has room to emerge because the current level is handled

I'll show three concrete system patterns in the next section.

The output of this step is a working system — code, process, documentation, or some combination — that encodes a chunk of your judgment.

Step 5: Loop — Set the Cadence

The final step is the meta-step: set the rhythm at which you repeat Steps 1-4.

My cadence:

  • Quarterly: Full Map → Position cycle (4 hours total)
  • Monthly: Defend audit (30 minutes) — check if any of my three incompressibles have eroded
  • Weekly: Build checkpoint (1 hour) — am I actually systematizing my judgment, or just firefighting?
  • Daily: The Loop check — did I learn something today that changes my map?

The specific cadence matters less than the commitment to the rhythm. Without a loop, the framework is just a blog post. With the loop, it's an operating system.


Three System Patterns

Here are the three patterns I've built and used in practice. Each corresponds to a different level of systematization.

Pattern 1: The Verification Loop (L1-L4 Layered Validation)

This is the pattern behind ai-qc — a system that doesn't just test AI output, but learns from the testing process.

The architecture is layered:

┌─────────────────────────────────────────────┐
│  Layer 4: Meta-Evaluation                    │
│  "Is the verification strategy itself        │
│   catching what matters?"                    │
│  → Human reviews a sample of false negatives │
├─────────────────────────────────────────────┤
│  Layer 3: Property-Based Oracle              │
│  "What invariants must ALWAYS hold?"         │
│  → Hypothesis-style property specs           │
│    (human-authored, AI-assisted)             │
├─────────────────────────────────────────────┤
│  Layer 2: Behavioral Tests                   │
│  "Does the output meet the spec?"            │
│  → Traditional test suite                    │
│    (AI-generated, human-reviewed)            │
├─────────────────────────────────────────────┤
│  Layer 1: Syntax & Format                    │
│  "Does it compile? Is it well-formed?"       │
│  → Automated linting, type checking,         │
│    schema validation                         │
└─────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The loop runs from bottom to top. Layer 1 catches the obvious stuff (80% of errors). Layer 2 catches behavioral mismatches (15%). Layer 3 catches deep logical errors (4.9%). Layer 4 catches the 0.1% that changes your entire approach.

The key design principle: each layer feeds the layer above. When Layer 3 catches a property violation, the human doesn't just fix it — they ask "does this violation indicate a missing property in my specification framework?" That question is the Layer 4 signal.

In practice, this means the system gets better over time. The first week catches a lot. By week 12, it's catching things that would surprise even the domain expert.

Real example: A client was using AI to generate data pipeline transformations. The AI was flawless at syntax (Layer 1) and good at behavioral correctness (Layer 2). But it kept introducing subtle semantic drifts — renaming columns in ways that downstream consumers didn't expect. We added a Layer 3 property: schema stability across the pipeline. The AI itself helped implement the check. The result? The 0.1% error rate dropped to <0.01%, and the human's role shifted from catching bugs to designing better invariants.

Pattern 2: The Decision Matrix (Judgment Frameworks)

Some judgments can't be fully automated. But they can be structured.

I built a decision matrix for a common question: should I accept this AI-generated output, or reject it and regenerate?

def accept_or_regenerate(output, context):
    score = 0

    # Layer 1 checks (automated)
    if passes_syntax_check(output): score += 1
    if passes_type_check(output): score += 1

    # Layer 2 checks (semi-automated)
    if matches_behavioral_spec(output): score += 2
    if handles_edge_cases(output): score += 1

    # Layer 3 checks (human-in-the-loop)
    if aligns_with_domain_invariants(output, context):
        score += 3

    # Layer 0a check (pure human judgment)
    if feels_wrong(output):  # Gut check
        score -= 2

    return "accept" if score >= 5 else "regenerate"
Enter fullscreen mode Exit fullscreen mode

The matrix is simple enough to be explained in 5 minutes. But its value is not in the algorithm — it's in the conversation it enables:

  • Why did you regenerate? "Because the matrix said score was 3."
  • Should we adjust the threshold? "We've been rejecting too much. Let's recalibrate."

The matrix makes judgment visible and debatable instead of being a black box in someone's head.

Real example: In our ai-qc development, we used this matrix to decide when to merge PRs from AI-coworkers. The matrix caught something interesting: human reviewers consistently rated AI output lower than the automated checks did. The human "feels wrong" factor (Layer 0a) was the strongest signal. We didn't automate it away — we used it as a forcing function to improve our invariants. When the human said "this feels wrong" and the matrix said "it's fine," we asked why. That question improved the system more than any automated tweak could.

Pattern 3: The Teaching System (Scaling Your Judgment)

The most powerful system is the one that teaches others to make the judgment you make.

I've been building what I call a judgment curriculum — a structured set of exercises that encode my decision patterns into teachable form.

The format:

  1. Principles (3-5 max, concrete, with examples)
  2. Patterns (reusable templates for common decisions)
  3. Practice scenarios (real cases where judgment was required, stripped of identifying details)
  4. Feedback loops (how to tell if you're improving)

For the ai-qc project, the curriculum looks like:

  • Principle 1: Always test the invariant, not the implementation.
  • Principle 2: When AI surprises you, update your invariants before your prompts.
  • Principle 3: The cost of a false negative is always higher than the cost of a false positive in Layer 3.

Each principle has a story behind it. Each story is a case where I made the wrong call and learned something.

The curriculum doesn't replace experience. But it compresses the early part of the learning curve. Someone who works through it will make fewer of the mistakes I made — and will recognize the novel mistakes faster.

Real example: A junior engineer on my team went from "AI output? Looks fine to me" to "this transformation is semantically correct but structurally fragile — we need a different invariant" in 6 weeks. Not because they were brilliant (though they are). Because the curriculum gave them a structured way to see what they were looking at. The system didn't replace their judgment. It enabled it.


The Hardest Part: Letting Go

I've saved this for the end because it's the part nobody wants to talk about.

Frameworks have shelf lives.

The five-layer framework is useful today. It will not be useful forever. The boundary between Layer 1 and Layer 2 is blurring. Layer 3 is becoming accessible to frontier models. Layer 4 — the creation of genuinely new frameworks — is the only layer that feels structurally safe. But I've been wrong before.

Here are the signals I watch for that tell me a framework is degrading:

Signal 1: The Framework Becomes a Religion

When you find yourself defending the framework instead of using it, alarm bells should ring. The question shifts from "does this help me see?" to "is this consistent with the framework?" That's the death of insight.

Antidote: Every quarter, try to falsify the framework. Spend 30 minutes looking for something the framework cannot explain or handles poorly. If you find it, don't patch the framework — revise it.

Signal 2: AI Crosses a Layer Boundary

This is the concrete signal. You're watching AI capabilities, and one day you notice: the thing I said AI couldn't do last year, it can now do, passably.

This happened to me with Layer 3. A year ago, designing verification loops felt structurally safe. Today, frontier models can propose reasonable verification strategies. They're not great yet, but they're passable. That's enough to compress margins.

Antidote: When AI crosses a boundary, don't fight the slide. Accept that the layer above just became your new Layer 1. Run the cycle again. Map. Position. Defend. Build. Loop.

Signal 3: You Stop Learning

The most personal signal. If you find yourself applying the same frameworks, making the same kinds of judgments, not surprised by anything — your operating system is stale. The world has changed and you haven't noticed.

Antidote: Deliberately expose yourself to something the framework doesn't handle. Read outside your field. Talk to someone who disagrees with you about AI risk. Build something that violates your own principles.

How to Swap Frameworks Safely

When the time comes to replace the framework — and it will — here's the safe process:

  1. Don't abandon the old framework until the new one is delivering. Run them in parallel for one cycle.
  2. Keep the parts that still work. A framework replacement isn't a revolution. It's a refactor. The invariants you've built around L3 validation? Those probably survive. The specific positioning advice? Probably needs updating.
  3. Write the obituary before the funeral. I've started writing a document called "What the Five-Layer Framework Got Wrong." It's mostly empty today. But writing it makes me look for the gaps, which keeps the framework honest.

From Trilogy to Book

This trilogy started with a question about testing AI code. It became a framework, then a strategy, then an operating system.

There's a book in this. I'm calling it the Five-Layer Operating System, and it'll be the core volume in a series about building judgment that machines cannot replicate.

The book will cover:

  • Diagnostics: How to map any domain against the layers
  • Strategy: How to position yourself and your organization
  • Methodology: The systems and cycles in this post, expanded with more patterns
  • Philosophy: The uncomfortable questions about time, embodiment, and what we're optimizing for

If that sounds like something you'd read, let me know. The conversations this trilogy has sparked — the pushback, the refinements, the stories of people who've used the framework in unexpected ways — are already shaping what the book becomes.


The Trilogy In One Sentence, If You Need One

The epistemology told you what AI cannot know. The strategy told you where to stand. The methodology tells you how to build the ground under your feet, and when to move it.

The ground will move. That's the point. The danger is not the moving ground — it's the belief that the ground doesn't move.

Keep mapping. Keep positioning. Keep defending. Keep building. Keep looping.

That's not a career strategy. That's an operating system for staying alive in a world where the only constant is that the layers keep shifting.


This is the third and final post in the trilogy. Read the first: From "How to Test AI Code" to "What Makes Us Human". Read the second: AI Is Eating the World Layer by Layer — Here's Where to Stand.

Code examples reference ai-qc, an open-source framework for property-based verification of AI-generated code.

Tags: ai, career, productivity, philosophy

Top comments (0)