Nic Lydon

Posted on Jun 9

Adversarial Review Is Not a Vibe Check

I had a security review that was technically complete and still not good enough. The code had the controls. The tests existed. The mitigations covered the risks. The final decision was reasonable.

But the review did not preserve the mapping between the adversarial prompt, the findings, the tests, and the decision. A future reader could see that the system passed its security gates, but only after reconstructing the relationship between scattered pieces of evidence.

That is not a review. That is archaeology with better formatting.

A vibe check can say "looks good." A real adversarial review has to prove what it looked at.

So I went back and made the review explicit. Not because the code changed. It did not. The review needed to prove what it had reviewed.

That was the first lesson.

The second came later: once agents are doing real work, adversarial review cannot remain a markdown ritual. It needs durable workflow state. Evidence rows, handoffs, close gates, terminal markers, and a way to keep "pending review" separate from "reviewed and safe to close."

The document shape mattered first.

Then the backlog shape mattered.

The setup

I am building a private connector between Nexus, my personal intelligence substrate, and Imprint, a public profile/export system.

You do not need to care about the names. The boundary is the important part.

Nexus contains private source material. Imprint exports structured profile data. The connector's job is not just to move records from one place to another. It has to enforce consent, privacy, replayability, auditability, and public/private separation while doing it.

The Sprint 15 adversarial review started from a prompt with seven attack surfaces: config smuggling, Nexus possession as consent, dry-run output leakage, replay manifests, audit logs, connector authority creep, and the public/private boundary.

The first review grouped the findings into four broad sections: config smuggling, consent enforcement, dry-run output leakage, and connector authority creep. Those were real risks. The mitigations were real. The tests were real.

But three of the seven attack surfaces were only covered indirectly. Replay manifests, audit logs, and the public/private boundary were present in the evidence, but they were not named as first-class findings.

That was the gap.

Not a security gap in the implementation. A review-completeness gap.

For adversarial review, that still matters.

Implicit coverage is where bad reviews go to look acceptable

You ask a system, reviewer, or agent to assess a set of risks. It comes back with plausible findings. The findings are true. The tests pass. The conclusion may even be correct.

But the structure does not let you answer the basic question:

Did we actually review every attack surface we said we were going to review?

In this case, the answer was "yes, but you have to know where to look."

That is not good enough.

If an adversarial prompt names seven attack surfaces, the review should preserve those seven surfaces. Each one should have a risk statement, mitigation, evidence, and disposition. If one is not applicable, say why. If one is covered by architecture rather than a unit test, name the boundary.

The pattern is simple:

attack surface -> risk -> mitigation -> evidence -> decision

That is the shape I want the review to preserve. Not because it is elegant. Because it keeps the review from compressing a specific adversarial contract into a confident paragraph.

Otherwise, a future reader has to infer coverage from nearby prose. And future readers are tired, distracted, and often you six weeks later, wondering why your past self chose violence.

Making the mapping explicit

I reorganized the review from four broad findings into seven explicit sections, A1 through A7, matching the original adversarial prompt.

The code did not change. The tests did not change. The mitigations were already there. What changed was traceability.

Replay manifests are the clearest example. They were named in the adversarial prompt, but not called out as their own finding in the first review. That matters because replay artifacts need to preserve enough state to make a run reproducible without becoming private-data leaks.

The improved review asks whether replay manifests leak private state or omit compatibility fields. It answers with the design: replay manifests serialize only the redacted configuration shape, including connector name, provider type, enabled families, connector version, and policy version. They do not serialize provider paths, credentials, raw record IDs, fixture names, or private source content.

Then it points to test_replay_manifest_uses_redacted_config_shape, which verifies that the manifest remains deterministic enough for replay compatibility while avoiding paths and record file names.

That is the difference between "we probably covered replay safety under dry-run leakage" and "A4 covers replay manifests; here is the risk, mitigation, and test."

Audit logs had a similar issue. They were covered under dry-run leakage, but audit logs are a different failure surface. Dry-run output is what the operator sees directly. Audit logs are what the system preserves. The updated review asks whether audit logs expose warnings, raw errors, metadata, provider details, or private record material, then points to test_audit_log_public_safe_summary_hides_raw_text_and_paths.

The public/private boundary was covered by design, but not named clearly enough. This connector lives in a private package. Public Imprint should not import Nexus-specific code, schema assumptions, private fixtures, or pipeline authority. The mitigation is architectural: the Nexus connector remains isolated in a private package, and public Imprint imports only generic connector interfaces.

That matters because not every security boundary is a runtime assertion. Some boundaries are repo shape, package ownership, import direction, and naming conventions. If the review only recognizes unit tests as evidence, it will miss those controls.

Evidence is the review

The improved review includes an evidence table mapping each attack surface to the tests or artifacts that support it.

That table is not decoration. It is the review.

A review should not just say "passed." It should say what passed, against which threat, with what evidence.

The GO decision needs the same treatment. The original decision was accurate:

GO for private deployment packaging after git/remote/deploy target is confirmed. The local implementation passes synthetic fixture, privacy, replay, audit, consent, and public Imprint compatibility gates.

That is fine as a conclusion. But conclusions are not controls.

The updated review clarified each gate. Synthetic counts matched policy. Dry-run output exposed no raw text, user IDs, fixture paths, SQL, or provider internals. Replay used a deterministic redacted config shape. Audit exported only counts and a manifest reference. Consent exclusions happened before support. Public compatibility meant the Nexus connector stayed private.

That turns the GO decision from a stamp into an audit trail.

Agents make almost-right failures easier

This matters more once agents are doing the work.

A human reviewer can be sloppy. An agent can be sloppy at machine speed, with excellent tone, confident formatting, and enough true statements to hide the gaps.

The dangerous failure mode is not that the agent makes everything up. It is that the agent is almost right. It reviews five of seven risks, points to real tests, gives the right conclusion, and never tells you which requested surfaces were not explicitly mapped.

That kind of output passes casual inspection.

It should not pass review.

A passing test suite tells you what you asserted.

Adversarial review asks what you forgot to assert.

The control is not "ask the model to be more critical." The control is structure.

No silent merging. No broad "covered under privacy." No "looks good overall."

No vibe checks.

Then the review became a workflow

The Sprint 15 review fixed the artifact shape, but that was only half the problem.

In Nexus, I also run an Operator Backlog: a Postgres-backed queue of work items that agents can investigate, implement, block, or close depending on tags, scopes, and evidence. Once agent work started moving through that backlog, adversarial review could not live only in sprint files.

It needed to become a lane.

That became ProjectAR.

ProjectAR is a task-only adversarial review role. It does not implement the fix. It does not own the backlog. It reviews the latest investigation, handoff artifact, or close candidate produced by another role.

The workflow shape is the backlog version of the document pattern:

candidate -> investigation -> adversarial review -> approval or block -> close evidence

In plain English: the worker can produce evidence that an item is ready to close, but a separate review lane has to decide whether that evidence is sufficient. The reviewer can approve it, block it, or route it somewhere else. That decision has to survive as part of the item history.

Otherwise, "implemented" quietly becomes "reviewed."

The tag model mattered more than I expected. I ended up with two concepts: adversarial_review as the active pending lane, and adversarial_reviewed as the terminal marker that says the review happened.

The active tag asks whether a review role should pick this up. The terminal marker records whether the item passed through review before closure.

The backlog needs both because those are different states. If the same marker means "needs review" and "was reviewed," the queue cannot tell whether an item is pending, blocked, approved, or merely carrying historical evidence. If the active tag disappears at closure without a terminal marker replacing it, the system loses proof that the review ever happened.

That is the workflow version of implicit coverage.

At the document layer, the review covered the risk, but the mapping was not obvious. At the workflow layer, the item may have been reviewed, but the state did not make that visible.

Both make future readers trust that the system did the right thing because the artifact no longer proves it.

The bar

No code changed in the Sprint 15 review-completeness pass. No tests changed. No mitigation changed.

The review changed.

That sounds cosmetic until you need to rely on the review later. Then it becomes the difference between "I think we looked at that" and "A5 covers audit logs; here is the exact test and why the GO decision includes it."

ProjectAR pushed the same idea into the backlog. It made adversarial review visible as operational state, not just a section heading.

That is the bar I want for agent-assisted review.

Not perfect certainty. Not theater. Not a longer markdown file because security people enjoy suffering.

A traceable adversarial review.

One where every claimed boundary has evidence, every requested attack surface has a disposition, every reviewed backlog item preserves its review state, and the final decision can be reconstructed without trusting the reviewer's confidence.

Because confidence is cheap.

Mapping is the control.

And state is what keeps the control alive after the markdown scrolls offscreen.