DEV Community

Cover image for Why binary AI disclosure systems fail (and how to design better ones)
Alan West
Alan West

Posted on

Why binary AI disclosure systems fail (and how to design better ones)

Last month a friend on a small engineering team showed me their internal AI policy. Every commit had to be tagged with a yes/no flag: was AI used. If yes, the commit went through extra review, sometimes rewritten, occasionally rejected outright. Within six weeks, the team had quietly stopped tagging anything as AI-assisted. Not because they stopped using it, but because tagging it accurately had become a tax on their day.

This is a pattern I've now seen at three companies. The disclosure system gets built with good intentions, the enforcement becomes punitive, and the data the system collects ends up worse than no data at all. People aren't lying because they're dishonest — they're routing around a system that punishes the truth.

Let me walk through why this happens at a systems level, and what a saner design looks like.

The problem with binary flags

The core failure mode is that AI usage isn't binary. "Did you use AI for this commit" collapses a wide spectrum into a single bit. Consider how differently these scenarios feel in practice:

  • You wrote everything yourself, then asked an LLM to explain a confusing error message
  • You generated a regex with AI, tested it thoroughly, then committed it
  • You accepted a long autocomplete suggestion for a boilerplate function
  • You had AI write a whole module, then refactored half of it
  • You pasted the entire failing function into a chatbot and used the response verbatim

All of these get the same ai-assisted: true tag. The flag carries no information about what was done, how it was verified, or how confident the author is in the output. So when reviewers see the flag, they treat all of them like the worst case. And authors learn that the safest move is to either skip the flag or quietly minimize what they report.

This is a classic measurement problem: when the metric is too coarse to reflect reality, people don't behave more honestly — they stop engaging with the metric.

Root cause: the system measures the wrong thing

The deeper issue is that most disclosure systems measure intent ("did you mean to use AI") rather than artifact properties ("what is true about this code"). Intent is unverifiable and self-reported. Artifact properties can be observed.

What the reviewer actually cares about is something like:

  • Has this code been executed?
  • Are there tests covering it?
  • Does it match patterns the codebase already uses?
  • Has the author read every line and can defend the choices?

None of those questions need a disclosure flag. They need tooling. And once you have the tooling, the disclosure becomes mostly redundant — or at least drops from a gate to a signal.

A better approach: structured provenance, not confessional flags

Here's the shape of a system I helped a team migrate to last year. Instead of one boolean, commits carry structured metadata in a trailer. Git already supports this natively:

# Example commit trailer format
# Stored as plain trailers, parseable by `git interpret-trailers`
git commit -m "Add retry logic to payment webhook handler

Assist-Tool: copilot-inline
Assist-Scope: scaffolding
Author-Reviewed: full
Tests-Added: webhooks/payment.test.ts"
Enter fullscreen mode Exit fullscreen mode

A few things change with this shape. First, Assist-Scope lets the author say what the assist was — scaffolding, naming, a regex, a whole function. Second, Author-Reviewed separates "AI wrote it" from "I understand it." Third, Tests-Added makes verification a first-class field. You're no longer asking the author to confess; you're asking them to describe.

Parsing these is straightforward. Here's a tiny pre-push hook that warns when scope is full-function but no tests are listed:

// .git/hooks/pre-push (Node)
const { execSync } = require('child_process');

// Get commits being pushed to the remote
const range = process.argv[2] || 'origin/main..HEAD';
const log = execSync(`git log ${range} --format=%B%x00`).toString();

const commits = log.split('\x00').filter(Boolean);
let warnings = 0;

for (const msg of commits) {
  // git interpret-trailers handles the parsing edge cases for us
  const trailers = execSync('git interpret-trailers --parse', { input: msg })
    .toString()
    .trim()
    .split('\n')
    .filter(Boolean);

  const fields = Object.fromEntries(
    trailers.map(t => t.split(': ', 2))
  );

  // Heuristic: full-function assists without tests get a soft warning
  if (fields['Assist-Scope'] === 'full-function' && !fields['Tests-Added']) {
    console.warn(`Warning: commit claims full-function assist but lists no tests`);
    warnings++;
  }
}

if (warnings > 0) {
  // Don't block — just inform. Blocking is what breaks these systems.
  console.warn(`${warnings} commit(s) flagged. Push proceeding.`);
}
Enter fullscreen mode Exit fullscreen mode

Notice the hook doesn't block the push. That's deliberate. The moment the hook starts rejecting commits, you've recreated the exact incentive problem we started with. The point is to surface information, not to gatekeep.

Step-by-step migration

If you've inherited a binary disclosure system that's already producing bad data, here's the order I'd recommend changing things in:

  1. Stop punishing the flag. Before anything else, decouple the AI-usage flag from review severity. Until you do this, no replacement system will collect honest data either.
  2. Add structured trailers as optional. Roll out the new schema as additive. People who keep using the old flag still work. People who want to give richer context can.
  3. Build dashboards on the new fields. Once you have a few weeks of structured data, you can see things like "what percent of full-function assists ship without tests." That's actionable.
  4. Retire the binary flag. Only after the new system is producing useful data should you sunset the old one. Don't do this on day one.

The order matters. Most failed rollouts of this kind try to introduce the new system while still enforcing the old one, and end up with the worst of both.

Prevention: design for the steady state, not the launch

The meta-lesson here applies to any policy that depends on self-reporting. Ask yourself: in the steady state, six months in, what's the path of least resistance for a tired engineer at 5pm on a Friday? If the honest path is more painful than the dishonest one, the data you collect will be garbage. It doesn't matter how clearly you wrote the policy doc.

A few prevention tips I've landed on:

  • Make accurate reporting cheaper than dishonest reporting. Trailers added by an editor extension are nearly free. Filling out a Jira ticket for every AI-assisted commit is not.
  • Separate observation from judgment. Collect data first; decide what to do with it later. Mixing the two corrupts both.
  • Default to information, not gates. Hooks that warn beat hooks that block, until you have strong evidence a gate is needed.
  • Audit what people actually do, not what they say. Periodic spot-checks of commits — reading the code, asking the author to explain it — catch far more than any flag system.

The Reddit post that prompted this article was venting about disclosure systems that effectively beg people to lie. The author was right about the symptom. The cure isn't stricter enforcement or more honest engineers. It's better-designed systems that don't force people to choose between honesty and getting their work done.

Top comments (0)