Praveen

Posted on Jun 13

"Co-authored-by: Copilot" Is Not an Audit Trail — Here's What One Actually Looks Like

#ai #programming #security #discuss

In late April 2026, Microsoft shipped VS Code 1.117. Buried in the release was a change: the github.copilot.chat.generateCommitMessage.addCoAuthoring setting was flipped from off to all by default. That meant "Co-authored-by: Copilot copilot@github.com" was now being appended to every commit message in the background — silently, without showing up in the commit message editor, and critically, without verifying that Copilot had generated any of the code.

Developers noticed within days. The backlash was significant. VS Code 1.119 shipped May 3 with the default reverted and a consent requirement added. Microsoft apologized.

The technical fix was straightforward. The governance question it exposed is not.

What the incident actually revealed

The developer anger wasn't really about attribution credit. It was about consent and accuracy. The co-author trailer was added to commits where AI features were disabled. It was added when developers had manually written every line. It attributed work that wasn't done.

But underneath that anger is a more important problem: even when Copilot does write code, a "Co-authored-by" git trailer tells you almost nothing useful from a governance or security standpoint.

It tells you that a tool called Copilot existed somewhere in the developer's editor during some portion of the work that eventually became this commit. That's it.

It doesn't tell you which model generated which lines. It doesn't tell you what the developer prompted for. It doesn't contain the raw model response. It doesn't tell you whether any of the AI-generated lines touched authentication paths, hardcoded credentials, or SQL construction. It says nothing about when generation happened relative to the commit. It carries no risk score.

If you had to defend a specific commit in a security audit six months from now — "which parts of this function were AI-generated, under what prompt, using what model?" — a git trailer gets you nowhere.

What a real provenance record contains

LineageLens captures provenance at insertion time, not commit time. Each AI code insertion generates a ProviderAgnosticProvenanceEvent structured around schema version lineagelens.provenance-event.v1. Here is what that record contains:

type ProviderAgnosticProvenanceEvent = {
  schemaVersion: 'lineagelens.provenance-event.v1';
  eventId: string;

  timestamps: {
    observedAtIso: string;         // when the extension saw the insertion
    insertedAtIso: string;         // when the text hit the buffer
    requestAtIso: string | null;   // when the proxy saw the outbound request
    responseAtIso: string | null;  // when the model responded
  };

  source: {
    ide: string | null;           // 'vscode'
    shim: string;                 // which capture path fired
    toolName: string | null;      // 'Edit', 'Write', 'apply_patch', etc.
    provider: string | null;      // 'anthropic', 'openai', 'google'
    adapterName: string | null;   // 'claude-code', 'copilot', 'cursor', etc.
  };

  capture: {
    level: CaptureStatus;         // 'full' | 'metadata_only' | 'tunnel_only' | 'file_diff'
    promptStatus: 'captured' | 'not-captured';
    capabilities: ProvenanceEventCapability[];  // 10 named slots
  };

  model: {
    name: unknown;
    parameters: Record<string, unknown> | null;  // temperature, max_tokens, etc.
  };

  prompt: {
    body: unknown;    // the full prompt messages array
    system: unknown;  // the system prompt
  };

  diff: {
    insertedText: string;
    chunks: ProvenanceInsertedChunk[];
    netAddedLines: number;
  };

  correlation: {
    confidence: number;                     // 0.0–1.0
    timingDifferenceMs: number | null;
    contentSimilarityScore: number | null;
    fileContextMatched: boolean;
  };
};

Compare that to what a git trailer gives you: a tool name and an email address.

The 10 capability slots

The most important part of the schema is the capture.capabilities array. Every provenance event gets 10 named capability assessments:

prompt-body       — was the full prompt captured?
response-body     — was the raw model response captured?
headers           — were the request headers available?
request-id        — was a UUID present to link request to insertion?
session-id        — was there session context?
model             — was the model name captured?
user-agent        — was the tool's user-agent available?
file-diff         — was the inserted diff captured? (always 'provided')
file-context      — did the file context match to the capture?
workspace         — was workspace context available?

Each entry carries a status: provided, missing, or unknown.

This matters because it tells you precisely what you know about a given insertion and what you don't. A record with prompt-body: missing and promptStatus: 'not-captured' is not the same as no record — it is an explicit declaration that the prompt gap exists. That gap is auditable. An audit trail with explicit gaps is categorically more useful than a label with no gaps declared.

The VS Code co-author trailer has no gap declarations. It has no granularity at all — it just has nothing.

Capture time vs. commit time

The harder architectural point: by the time you are in a git commit, you have already lost the evidence.

The prompt body does not live anywhere post-generation. The model name was in the HTTP response header. The raw response body was discarded after the tool processed it. The timing data only exists in the milliseconds between request and file write.

None of that is in the commit. None of it can be recovered retroactively.

LineageLens captures the ProviderAgnosticProvenanceEvent at the insertion event — before the diff even exists as a file change. The observedAtIso timestamp records when the VS Code extension detected the text entering the buffer. The requestAtIso and responseAtIso timestamps come from the proxy intercept that happened seconds or minutes before. By the time you type a commit message, the provenance record has already been stored.

A git trailer is a retroactive label. Provenance is an evidence chain that exists before the label does.

What actually changed after the VS Code incident

Microsoft reverted the default. They added a consent gate. They clarified that disableAIFeatures: true now also disables the co-authoring trailer.

None of that gives you a provenance record. You still do not know which lines in a given commit were AI-generated. You still cannot answer "what did Copilot generate in auth.py last month" from git history alone.

The incident forced consent around labeling. That is progress. It did not touch the underlying gap: labeling that something was AI-assisted is not the same as recording what the AI actually did.

The practical implication

If you are on a team shipping AI-generated code — and 84% of development teams are, per the 2026 Stack Overflow Developer Survey — you are almost certainly making three implicit assumptions:

That your CI pipeline or git history contains enough attribution information to answer an audit question.
That "Co-authored-by" or an equivalent label satisfies your traceability obligations.
That you could reconstruct the provenance of a specific function if you had to.

All three assumptions are likely wrong for the same reason: commit-time labeling cannot carry insertion-time evidence.

The EU AI Act Articles 11 and 12 enforcement window opens in August 2026. The question "which AI model generated this code, under what prompt, at what risk level?" is going to become a routine compliance requirement.

When it does, a git trailer is not going to be a defensible answer.

Try it

LineageLens Base is a free VS Code extension that starts capturing provenance events at insertion time today, even without proxy infrastructure. Lite, Plus, and Max tiers add proxy capture for full prompt, model, and response-body fields. The full architecture details are at lineage-website.vercel.app. The Hashnode post goes deeper on schema design tradeoffs.

One question for the comments: what data would your team actually need to survive a security audit of your AI-generated code? Not in theory — what specific fields would an auditor ask for?

Top comments (3)

Yunetzi • Jun 13

Been there—thinking a co-authored-by tag means real ownership, until audits demand verifiable history. This piece nails the difference and gives a much-needed wake-up call.

Praveen • Jun 13

Thanks! I think that's exactly the gap the incident exposed. A co-author tag can indicate AI involvement, but when audit, security, or compliance questions show up later, teams usually need evidence, context, and traceability—not just attribution. That's the distinction I wanted to explore here.

Praveen • Jun 13

what do you think about it ?