Bala Paranj

Posted on Jun 29

The 3C Protocol: Paying Down the Debt You Didn't Know You Were Taking On

#ai #softwaredevelopment #architecture #engineering

✓ Human-authored analysis; AI used for formatting and proofreading.

The examples use Stave as the concrete case. But the protocol applies to any system built with AI assistance faster than understanding could follow.

Cognitive Debt

You built something with AI. It works. It passes tests. Users can run it. Maybe after a few months — you can't fully explain how parts of it work. Because the code arrived faster than your mental model could absorb it.

Cognitive debt is the gap between what the system does and what you can explain, debug, change, and defend.

In traditional technical debt, the code is messy. In cognitive debt, the code might be clean — well-structured, passing CI, even well-commented — but your mental model of it is messy or non-existent. The debt isn't in the artifact. It's in you.

Every developer building with AI has this debt. It compounds silently. The moment it surfaces is when something breaks and you can't trace why, or when someone asks how a subsystem works and you reach for the AI to explain your own code back to you. At that point the AI isn't your tool anymore. You're its operator.

The 3C Protocol is a structured way to pay it down: Compress the system into mental models you can hold, Compile mastery by stress-testing the logic against your understanding, Consolidate what you've learned into durable external memory — both documents and executable tests — so you don't re-learn it every time you open the codebase.

1. COMPRESS: Turn the complexity into mental models

The system feels like a disconnected pile of modules, configs, and scripts. Compress the noise into maps you can reason about.

The Value Chain Map

Answer one question: how does a raw input become the system's output? What is the logic path. What transformation happens at each stage, and what does each stage assume about what came before it?

Here's the value chain of a cloud security tool that evaluates configuration snapshots against safety invariants:

Config snapshot (JSON)
  → Predicate evaluation (CEL engine)
    → Individual findings (one per violated control)
      → Compound chain composition (risk engine)
        → Compound findings (multiple controls composing
           into a single cross-resource risk)

And a parallel export path:

Config snapshot
  → Fact projector
    → Structured triples (JSONL) or logic assertions (SMT-LIB)
      → External reasoning engines consume independently

Two insights fall out of drawing this map that you wouldn't get from reading the code: the primary detection mechanism is the predicate evaluator. The structured export is output, a distinction the code's directory structure doesn't make obvious. If you can't draw this map for your own system, you have debt in the value chain itself, and reading individual functions will not address paying down the cognitive debt.

The Path Not Taken

The biggest risk with AI-generated code is that the AI often chooses an implementation because it saw it in a training set, not because it's the best fit for your constraints. The code works, so you accept it. But ownership requires understanding not just what the code does, but why it doesn't do it another way.

For every major architectural decision, ask: what are two other ways to implement this, and why is the current approach better or worse?

For example: the tool above uses CEL (Common Expression Language) for predicate evaluation. Why not Rego? Why not raw Go functions? CEL is non-Turing-complete, side-effect-free, and translates cleanly to Z3 assertions — properties that Rego lacks and raw Go functions can't guarantee. If you can articulate the trade-off, you own the decision. If you can only say "the AI chose CEL," you don't own it — you inherited it.

This is the hallmark of seniority: not knowing what the code does, but knowing what it chose not to do and why.

The Reasoning Skeleton

Identify the five to seven core invariant families. The kinds of things your system checks. For each, write:

The rule, in one sentence anyone could follow.
How it's detected, mechanically (which component, what logic).
What evidence it produces (what the output names and traces).

For example: "No unauthenticated path may reach data tagged as sensitive. Detected by: predicate evaluation of each control in the chain, composed by the risk engine when threshold is met. Evidence: the finding names the identity pool, the role, and the data store, with trace links to specific config properties."

If you can write this table for every invariant family, you own the reasoning. If you can't write a row, the debt lives in that row.

The Ownership Triage

Map every major subsystem to a row in a table with four columns: what it does, your understanding (percent), AI dependency (how much was AI-generated), and risk if you're wrong about it.

The fourth column — AI dependency — is the multiplier. A subsystem you built with AI and never traced end-to-end is at whatever percentage you could explain under questioning without opening the code. High AI dependency combined with high risk and low understanding is critical debt. That's your first target.

The triage tells you where to focus: critical debt first, then high risk and low understanding. Read the subsystems where being wrong is most dangerous, where your understanding is weakest, and where the AI did the most work you never verified.

2. COMPILE: Rebuild mastery by stress-testing the logic

Compression gives you maps. Compilation gives you ownership. The word is deliberate: in software, compilation is automated. Here it is not. Mental compilation is the work of turning the AI's source code — its output — into a mental executable: your ability to predict what the system will do in a case you haven't seen. No tool does this for you.

The No-Blind-Execution Rule

When you use the AI to pay down debt, ask it to explain and challenge:

"Explain the evaluation chain that produces this specific finding."
"What input property must be true for this check to fire? What if the upstream collector didn't populate it?"
"Show me the edge case where this logic produces a false positive."
"Generate a failing test fixture that should trigger this check but doesn't."

The AI shifts from ghostwriter to Socratic tutor. Tell it explicitly: "Quiz me on how this subsystem handles [edge case X]." Ask it to present you with a scenario and make you predict the output before revealing the answer. You're still using the AI, but the cognitive work — understanding, verifying, predicting — stays with you. The AI provides the raw material; you do the reasoning.

The Compile Loop

For every subsystem, run five steps:

1. Trace one output end-to-end. Pick a real output your system produced. Trace it backward through every stage of the value chain — from the raw input, through each transformation, to the final output. At each stage, verify you understand why this stage produced what it did.

2. Break it deliberately. Change one input property and predict what will happen. Then run it and check. If you predicted correctly, you own that path. If you didn't, the gap between your prediction and reality is the debt — and now it's a specific, nameable gap.

3. Trace the export path. If your system produces structured output for downstream consumers, trace one fact from its source through the export logic to the downstream consumer. If you can write a new downstream rule that fires on the exported fact, you own the export pipeline. If you can't, you don't.

4. Reconstruct the requirements. Look at a component's inputs and outputs. Without looking at the implementation, try to write the prompt that would generate that logic from scratch. Describe what it should accept, what it should produce, and the constraints it must satisfy. If you can write the prompt, you understand the requirements. If you can't — if you'd need to peek at the code to describe what it's supposed to do — you don't understand the problem, let alone the solution. The implementation might be correct, but you can't verify that because you don't own the specification.

5. Name what you find. If a variable is named res1, rename it to what it represents. This is the Aha! Refactor: the moment you understand what something does, you are obligated to rename it. The rename is a physical signal that debt has been paid. If you can name it, you own it. If you can't name it, you don't understand it yet. The inability to name is the most precise diagnostic of cognitive debt.

3. CONSOLIDATE: Build durable external memory

You'll pay down debt in a focused session, and three weeks later the understanding will have faded. Because it lived only in your head, which has a limited context window. Consolidation means building external memory so the understanding persists — in documents you read and in tests that enforce what you learned.

Five documents that hold the system

They're documentation for you to reload your own context when you come back to a subsystem you haven't touched in a month.

SYSTEM_OVERVIEW.md — The Why. The value chain map from step 1, in prose. Why the system is shaped this way, what each stage does and the architectural boundaries. The path-not-taken decisions: what alternatives were considered and why this shape won. One page. When you open the codebase cold, you read this first.

DOMAIN_MODEL.md — The What. The reasoning skeleton — the invariant families, the rule each one enforces, the evidence each one produces. The definitions of terms the system uses (what is a "compound risk," what is a "ghost reference," what is a "shadow admin" in this system's vocabulary). When someone asks "what does it check," this is the answer.

ENGINES.md — The How. What the primary evaluation engine can express and what it cannot express. What the export layer does and why it exists (to let external engines reason about properties the primary engine structurally can't reach). The known limits — if a property isn't projected into the export, downstream engines can't see it. When you're debugging a false negative, this tells you whether the problem is in detection or in export.

RUNBOOK.md — The Action. How to add a new check, a new compound chain, a new export predicate, a new downstream rule. How to debug a finding that fires when it shouldn't or doesn't fire when it should. Step by step, tested, current. When you need to change the system, you follow this rather than re-deriving the process from scratch.

ARCHITECTURE.md — The Bones. The dependency rule (what depends on what, and what must never depend on what). Where domain data lives versus where domain logic lives. The hexagonal boundary. When the AI proposes a change that violates an architectural constraint, this lets you catch it.

Invariants as tests

Documents rot. They describe intent at the time of writing, and the code drifts away from them silently. The second form of consolidation is encoding what you learned into executable tests that break when the system violates the invariant you understood.

These aren't unit tests that verify the code works. They're semantic tests that verify the reasoning still holds.

If an invariant is "no unauthenticated path may reach sensitive data," write a test that specifically attempts to bypass authentication to reach that data. If the invariant is "every compound finding must trace back to at least two independent controls," write a test that asserts the trace link count on every compound output. If the invariant is "the export layer must project every fact the downstream engine needs," write a test that runs a downstream rule and asserts it receives the expected input.

These tests consolidate your understanding into the CI/CD pipeline. They don't replace the documents — you still need the prose to reload your mental model. But the tests catch the moment the system diverges from the model you documented. When a semantic test breaks, it tells you exactly which invariant was violated, which means you know exactly which part of your understanding needs to be re-examined. Documents tell you what the system should do. Semantic tests tell you when it stops doing it.

The five documents plus the semantic test suite form the complete external memory. The documents are for you. The tests are for the system. Together, they make your understanding durable across time, across context switches, and across the AI-assisted changes that will continue arriving faster than your mental model can absorb them.

Two kinds of session

Once the protocol is internalized, the ongoing practice splits into two complementary session types:

Session A — Technical debt (code). Pick one row from the ownership triage, prioritizing critical debt (high risk, high AI dependency, low understanding). Read every function in that directory. For each function, write a one-line comment explaining what it does in your own words. If you can't write the comment, you don't own the function yet. Ask the AI to explain it, then rewrite the explanation in your own words. The AI's explanation is a loan. Your rewrite is the repayment. Try the Black-Box Reconstruction: can you describe this function's contract well enough to regenerate it from a prompt? If yes, you own it. If no, the gap is named. Goal: move the understanding percentage up for one row.

Session B — Domain debt (the problem space). Pick one invariant family or compound chain. Read every check's predicate. Understand what input properties each one requires. Manually create an input that should trigger the chain. Run it and verify. Then create an input that should not trigger it and verify silence. If both work as expected, you own that reasoning path. If either surprises you, the surprise is the debt, and now it has a name and a location. Write a semantic test that encodes what you just verified — so the next time someone changes this path, the test catches any regression in the invariant you now understand.

Alternate between A and B. Technical debt without domain understanding produces "I know how the code works but not what it means." Domain debt without technical understanding produces "I know what it should do but not how it does it." You need both, and they reinforce each other.

The mindset shift

There's one change that matters more than any process step:

When you can say "I designed the tool to reason this way" — and defend it under questioning — the debt is paid. That's the Socratic test of ownership: not whether you can recite what the system does, but whether you can survive a skeptic's cross-examination of why.

The AI is still useful throughout — as an explainer, a challenger, a fixture generator, a quizmaster. What changes is the relationship. You're directing a research assistant to help you understand a system you architect. The difference is whether the understanding lives in you or only in the model. If it lives only in the model, you don't own the system. You operate it.

Ownership means: you can change it without fear, debug it without the AI, explain it to a skeptic, and defend its reasoning under questioning. Everything short of that is debt. The 3C Protocol is how you pay it down, one subsystem at a time, without starting over.

The examples in this article use Stave, an open-source cloud security tool (Apache 2.0, early-stage). The 3C Protocol itself is general. It applies to any system built with AI assistance where the builder's understanding lagged behind the code's arrival. If you've found a different method for paying down cognitive debt in AI-assisted codebases, I'd like to hear it. This is a problem the industry has named but hasn't yet solved well.

DEV Community