DEV Community: Praveen

Read the new post!!

Praveen — Mon, 13 Jul 2026 06:21:11 +0000

Praveen

Jul 13

Slopsquatting: Your Coding Agent Is a Supply-Chain Attack Vector (and How We Gate It)

#discuss #devops #security #news

3 min read

Slopsquatting: Your Coding Agent Is a Supply-Chain Attack Vector (and How We Gate It)

Praveen — Mon, 13 Jul 2026 06:20:42 +0000

There's a category of supply-chain attack that only exists because of AI coding agents, and it has
one of the better names in security: slopsquatting.

The mechanic takes one paragraph to explain. Large language models hallucinate package names. Not
randomly — consistently. Ask enough models to scaffold a FastAPI service and a measurable fraction
will import helper packages that don't exist, and the same phantom names recur across models and
prompts. Attackers noticed. They harvest commonly hallucinated names, register them on npm and PyPI
with malicious install hooks, and wait. The next time an agent hallucinates that package,
npm install doesn't fail — it succeeds, and the attacker's postinstall script runs with whatever
credentials your dev machine or CI runner holds.

Why this defeats code review

Every classic review signal is green. The import statement is idiomatic. The package resolves. The
lockfile updates cleanly. There is no suspicious diff hunk to squint at, because the malicious code
isn't in the diff — it's on the registry, behind a name the model made up.

The question a reviewer would need to ask is not "is this code correct?" but "why did this
dependency appear, and did a human decide to trust it?" That's a provenance question, and diffs
don't carry provenance.

Typosquatting is the same problem with an older pedigree — crossenv vs cross-env,
reqeusts vs requests — and agents make it worse, because an LLM generating a package name
token-by-token is exactly the kind of writer that produces near-miss strings at scale.

How LineageLens Trellis gates this

LineageLens Trellis is a governed runner for AI coding agents (Claude Code, Codex CLI, Gemini CLI,
OpenCode) — it correlates what the model claimed to do (proxy traffic), what actually changed
on disk (content hashes), and what persisted into git, into one record per edit. That correlation
layer is what makes dependency gating more than a linter.

When an agent edit touches a manifest (package.json, requirements.txt, lockfiles), the
supply-chain guardrail in packages/core runs two independent checks:

1. Registry verification (online). Does the package exist? How old is it? How many releases,
how many maintainers? A package registered eleven days ago with one release and a postinstall
hook, introduced by an agent rather than a human, is a fundamentally different risk object than
lodash. Hallucinated-but-unregistered names fail outright — the best possible outcome, because
you caught the hallucination before an attacker registered it.

2. Edit-distance typosquat detection (offline). Every introduced name is compared against a
corpus of high-download known packages. Small Levenshtein distance to a popular package + the
popular package not being the one imported = flag. This check is deliberately offline: it must
work in air-gapped CI, add zero network latency to the hot path, and never leak your dependency
graph to a third-party API.

agent edit → manifest touched?
              ├─ yes → registry check ──┐
              │        edit-distance ───┼→ risk event (attributed to
              │        check (offline)  │   session + prompt + tool-call)
              │                         └→ pre-commit hook blocks if risky
              └─ no  → normal provenance pipeline

Because the flag lands in the same provenance timeline as everything else, the alert isn't
"suspicious package somewhere in the repo." It's "session #47, running Codex CLI, prompted to
'add retry logic', attempted to introduce requets at 14:32, blocked at pre-commit." Attribution
turns an incident into a two-minute investigation.

The practical takeaway

Even if you never touch our tool, do these three things if agents write code in your org:

Gate manifest edits separately from code edits. A dependency introduction is a trust decision, not a code change. Route it to a human.
Check package age and release count at install time, not audit time. npm audit runs after the postinstall hook already fired.
Record which agent/prompt introduced each dependency. When (not if) something gets flagged, attribution is the difference between an afternoon and a week.

Trellis is open source — the detection lives in packages/core/src/dependencies.ts if you want to
read or steal the heuristics: https://github.com/karnati-praveen/lineagelens. Deeper dive into the
design tradeoffs (why offline, why edit distance over ML classifiers) is on my Hashnode:
[Hashnode link].

Question for the comments: if you run coding agents in CI today, does anything stand between
the agent and npm install? I'm collecting real-world setups — most answers I've heard so far are
"nothing, honestly.

81% of Engineering Teams Don't Know Where Their AI-Generated Code Lives in Production

Praveen — Sat, 20 Jun 2026 05:22:19 +0000

The Cloud Security Alliance published a research note this month with a number that should make every engineering manager uncomfortable: 81% of organizations surveyed had no complete visibility into where AI-generated code lives in their production systems.

Not "limited visibility." Complete absence.

Every one of those teams uses Copilot, Cursor, Claude Code, or one of a dozen other AI coding tools. Code is being generated at speed. It's landing in commits, getting merged, and running in production. But the teams cannot answer a basic audit question: which lines in src/routes/auth.py were written by an AI?

This is not a hypothetical governance problem. It's a production security problem.

Why git blame doesn't solve this

git blame returns a username and a commit SHA. It was designed to answer "who wrote this line" in a world where humans wrote all code.

AI coding tools broke that assumption without replacing the tooling. When a developer accepts a Copilot suggestion, the commit author is the developer. When Claude Code generates a 60-line authentication function in an agentic session, the commit author is still the developer. The AI is invisible in the commit history.

The provenance metadata -- which model, which prompt, what the AI was asked to do -- exists for exactly one window: the moment of generation. The AI tool has it. The moment the developer moves on, it's gone. Commit by commit, sprint by sprint, that metadata accumulates into an unattributed window that grows backward through your git history.

What March 2026 showed us

Georgia Tech's Vibe Security Radar tracked 35 CVEs in March 2026 alone that were directly attributable to AI-generated code. Researchers estimate the true count is 5 to 10 times higher, because most vulnerabilities are never traced back to an AI tool -- you need the generation record to do that, and almost no team has one.

Separately, Pillar Security disclosed a "Rules File Backdoor" class of attack: hidden unicode characters in Cursor .cursorrules files and Copilot config files that silently steer the model to inject malicious code during generation. The generated code looks normal. The review sees nothing unusual. The AI did exactly what it was secretly instructed to do.

If you can't distinguish AI-generated lines from human-written lines, you can't contain this attack class. The blast radius is "everything an AI tool touched" -- and without provenance records, you don't know what that is.

How LineageLens closes the gap

LineageLens sits as a proxy between your AI coding tool and the upstream model API. Every edit event -- tool_use blocks from Claude, apply_patch DSL from Codex CLI, functionCall from Gemini -- gets captured as a provenance record with model, prompt, file path, timestamp, risk category, and a confidence score.

Those records get mapped back onto current file contents by the blame engine. The matching algorithm is whitespace-normalized contiguous block matching (newest record wins, exactly like git blame), with a fuzzy per-line fallback for blocks that were edited after insertion.

The output for a single file:

-- lineagelens blame - src/routes/auth.py

AI   claude-opus-4-8  2026-06-10   42 | def authenticate_user(token: str):
AI   claude-opus-4-8  2026-06-10   43 |     payload = jwt.decode(token, SECRET, algorithms=["HS256"])
AI?  claude-opus-4-8  2026-06-10   44 |     return db.query(User).filter_by(id=payload["sub"]).first()
                                    45 |
                                    46 | def logout(session_id: str):

AI means exact contiguous match (high confidence). AI? means the line was in the original AI insertion but has been edited since. Lines with no marker are attributed to human authorship.

For a whole repository, the Risk Discovery command is one line:

lineagelens report . \
  --url https://lineagelens.internal \
  --token "$JWT" \
  --workspace my-team \
  --review-status unreviewed \
  --category auth

This walks every non-binary file in the repo, matches live content against provenance records, filters to records with no associated human review, and filters further to the auth risk category. The output is a ranked table: which files have the most unreviewed AI-generated auth code, right now, in the live tree.

-- lineagelens report - my-repo -- filter: unreviewed, auth

  src/routes/auth.py     ████████████████░░░░  80.5%  53/66 lines
  src/middleware/jwt.py  ██████████░░░░░░░░░░  51.2%  23/45 lines
  src/utils/tokens.py    ████░░░░░░░░░░░░░░░░  18.3%   9/49 lines

  Repo total: 85/160 lines AI-attributed (53.1%) across 3 of 89 scanned -- filter: unreviewed, auth

The "currently running" half is verified by the blame engine running against live file contents, not git history. The "never reviewed" half is the reviewStatus filter against the provenance record store.

The --json flag for CI

Scriptable output with --json:

PERCENT=$(lineagelens --json report . \
  --review-status unreviewed --category auth | jq '.stats.percent')

if (( $(echo "$PERCENT > 50" | bc -l) )); then
  echo "::error::Repo has ${PERCENT}% unreviewed AI auth code -- block merge"
  exit 1
fi

The install path

The CLI works on every tier, including the Base tier (free VS Code extension, no backend required). For the --review-status and --category filters, you need backend mode.

pip install lineagelens-cli

# Base mode: export from the VS Code extension first
lineagelens report . --input captures.json

# Backend mode: Risk Discovery with filters
lineagelens report . \
  --url https://your-lineagelens-instance \
  --token "$JWT" \
  --workspace your-workspace \
  --review-status unreviewed \
  --category auth

The practical question

81% of teams don't have this answer. Most won't know until an incident forces the archaeology. The archaeology is slow, incomplete, and almost always comes back inconclusive -- because the generation records are gone.

If a CVE was traced to an AI-generated line in your auth path tomorrow, how would you reconstruct what was generated, by which model, with what prompt context, and whether any human reviewed it before it shipped? If the answer is "we'd look at git blame and the PR comments," you're working with evidence that was never designed to answer that question.

Star us on GitHub: github.com/lineagelens

Try the CLI: pip install lineagelens-cli

More on the architecture: lineage-website.vercel.app

What's the highest-risk unreviewed AI code you've found in your own codebase? Auth paths, payment handlers, or something weirder -- drop it in the comments.

think about this !!

Praveen — Wed, 17 Jun 2026 04:06:23 +0000

Praveen

Jun 17

"Approved" Is Not Evidence of a Review — Here Is What Evidence Actually Looks Like

#ai #mcp #developers #cli

6 min read

"Approved" Is Not Evidence of a Review — Here Is What Evidence Actually Looks Like

Praveen — Wed, 17 Jun 2026 04:06:07 +0000

Here is a scenario that happens constantly on teams using AI coding tools.

A developer is wrapping up before a standup. GitHub shows a PR open for 18 hours — 340 lines added to src/routes/auth.py, mostly AI-generated by Copilot. They open it on their second monitor, scroll through to the bottom in about four seconds, and click Approve. "Looks fine." PR merges. Status: approved.

Two weeks later someone asks: was this auth refactor actually reviewed?

The answer stored in every system in your stack is: yes.

The real answer is: not in any meaningful sense.

The gap between "approved" and "reviewed"

The problem is structural, not behavioral. Most teams do not have negligent developers — they have too many PRs, too many AI-generated diffs that look syntactically correct at a glance, and no system that distinguishes a 30-second rubber-stamp from a 20-minute line-by-line review. Both produce the same approved status.

For non-AI-generated code, this has always been a problem. For AI-generated code, it is a materially different one. Human-written code embeds the author's context — you can often infer the intent from the surrounding code, the variable names, the structure. AI-generated code can look completely idiomatic while doing something the developer who triggered it did not fully anticipate. The code is a downstream artifact of a prompt. Without the prompt context and a real review of what the model produced, "approved" is a label on a box you did not open.

What LineageLens captures when a review happens

LineageLens's backend has a POST /review/attest endpoint that records a human review as a signed attestation. The payload includes four fields that go beyond a simple verdict:

{
  "scopeRef": "pr/12345",
  "linesReviewed": 340,
  "secondsOnDiff": 42,
  "commentCount": 0,
  "verdict": "approved"
}

linesReviewed, secondsOnDiff, and commentCount are the raw behavioral signals from the code review session. The endpoint then computes a depth_signal from these — not a replacement for the verdict, but an additional classification layer that runs on top of it.

The depth_signal formula

The full formula is documented in lineagelens-backend/app/services/human_review_service.py. It is intentionally transparent — if you are going to use a score to block merges, the thresholds should be auditable:

# Input signals
time_per_line  = seconds_on_diff / max(lines_reviewed, 1)
comment_count  = inline or PR comments left by the reviewer
lines_reviewed = AI-flagged lines the reviewer claims to have seen

# Scoring (0–100):
time_score     = min(time_per_line / 5.0, 1.0) × 40   → max 40 pts at ≥ 5 s/line
comment_score  = min(comment_count  / 3.0, 1.0) × 30   → max 30 pts at ≥ 3 comments
coverage_score = min(lines_reviewed / 50.0, 1.0) × 30  → max 30 pts at ≥ 50 lines

# Bands:
#   shallow   raw_score < 35
#   adequate  35 ≤ raw_score < 70
#   deep      raw_score ≥ 70

The time signal gets the most weight (40 points) because time-per-line is the hardest signal to fake without actually reading. A developer who spent 30 seconds on a 340-line diff clocked 0.088 seconds per line. To score maximum time points on that diff, they would need to spend 28 minutes on it — roughly 5 seconds per line.

The comment signal (30 points) captures engagement. A reviewer who left three or more inline comments clearly engaged with the diff content. Zero comments on 340 AI-generated lines in an auth file is a meaningful absence.

The coverage signal (30 points) is the least informative in isolation — linesReviewed is self-reported. But combined with the time signal, it creates a consistency check: if you claim to have reviewed 340 lines in 42 seconds, the time_per_line is 0.12 seconds. That floors the time_score and caps the achievable total regardless of coverage.

The rubber-stamp override

The most important rule in the formula is not the scoring — it is this:

# Implausibly-fast override: time_per_line < 1.0 s → always "shallow"
# (flags rubber-stamp approvals of large diffs, e.g. 3 s for 400 lines).

if time_per_line < 1.0:
    return "shallow", 0.0

If the reviewer spent less than one second per line — regardless of comment count, regardless of lines_reviewed, regardless of verdict — the result is shallow with a raw score of zero.

This catches the scenario at the top of this article. 4 seconds on 340 lines = 0.012 seconds per line. The gate returns shallow. The signed attestation records it. A merge gate configured to require adequate or higher rejects the PR.

The threshold of 1.0 second per line is a judgment call, not a scientific constant. A genuinely fast reader who knows the codebase well might clock 2 seconds per line. A reviewer skimming for obvious errors might spend 1.5 seconds. The override is calibrated to catch clearly impossible review speeds — not to penalize fast reviewers who know what they are looking at.

What the attestation record looks like

After compute_depth_signal() runs, record_review() signs the result and persists two rows:

Attestation row:
  subject_type: "review"
  subject_id: "pr/12345"
  statement_json: {
    "reviewer": "user-uuid",
    "lines_reviewed": 340,
    "seconds_on_diff": 42,
    "comment_count": 0,
    "depth_signal": "shallow",
    "depth_score": 0.0,
    "verdict": "approved"
  }
  signature: <Ed25519 signature over canonical JSON>
  public_key_id: <16-hex fingerprint>

HumanReviewAttestation row:
  scope_ref: "pr/12345"
  depth_signal: "shallow"
  verdict: "approved"
  attestation_id: <FK to Attestation>

The signed statement means the depth classification cannot be retroactively altered without invalidating the signature. The audit trail is not just what someone claimed — it is what the behavioral signals said, cryptographically bound to the review event.

This is a different kind of evidence than a closed PR. A closed PR tells you someone clicked approve. The attestation tells you how long they spent, what depth that corresponded to, and whether the classification was shallow, adequate, or deep.

The CI merge gate

The gate endpoint sits at POST /review/gate/{pr_ref}?min_depth=adequate. It is designed to be called by CI (X-API-Key auth, not JWT) at the point a PR is ready to merge:

GET /review/gate/pr/12345?min_depth=adequate

→ 200 OK
{
  "prRef": "pr/12345",
  "passed": false,
  "reason": "Review depth 'shallow' is below minimum required 'adequate'.",
  "depthSignal": "shallow",
  "verdict": "approved",
  "minDepthRequired": "adequate"
}

The gate passes only when depth_signal >= min_depth AND verdict == "approved". Three valid values for min_depth: shallow, adequate, deep. The depth ranking is ordinal: shallow=0, adequate=1, deep=2.

The practical effect: for high-sensitivity paths (auth, payments, security), you set min_depth=deep and require at least 70 points — which means roughly 5 seconds per line minimum, 3+ comments, and 50+ lines reviewed. That is a real review. For standard paths, min_depth=adequate at 35+ points is a reasonable floor that blocks 3-second approvals while allowing fast but engaged reviewers through.

ASCII flow diagram

Developer reviews AI-generated PR
          │
          ▼
  POST /review/attest
  { linesReviewed, secondsOnDiff, commentCount, verdict }
          │
          ▼
  compute_depth_signal()
  ┌─────────────────────────────────────────────────────┐
  │  time_per_line = seconds / lines                    │
  │  if time_per_line < 1.0 → return ("shallow", 0.0)  │
  │  time_score     = min(tpl/5.0, 1.0) × 40           │
  │  comment_score  = min(cc/3.0, 1.0) × 30            │
  │  coverage_score = min(lr/50.0, 1.0) × 30           │
  │  raw = sum → band: shallow / adequate / deep        │
  └─────────────────────────────────────────────────────┘
          │
          ▼
  sign_attestation() [Ed25519]
  persist Attestation + HumanReviewAttestation
          │
          ▼
  CI calls POST /review/gate/{pr_ref}?min_depth=adequate
          │
     ┌────┴─────┐
     │          │
  passed     blocked
  (merge)    (reason: "Review depth 'shallow' is below 'adequate'")

What this does not solve

The depth signal is a behavioral proxy, not a comprehension test. A developer who knows how to game the system can sit on a diff for 30 minutes, leave three generic comments, and achieve deep classification without actually understanding what the AI produced. This is not a flaw unique to this design — any measurable signal can be gamed. The value is not that the system is ungameable; it is that it sets a minimum bar that catches the most common failure mode (pure rubber-stamping) and creates an auditable record of the behavioral signals.

The formula is also calibrated for AI-generated code review, not general PR review. The 5-seconds-per-line maximum for time_score reflects the assumption that AI-generated code may require more careful reading than human-written code where you know the author's patterns. Teams reviewing dense algorithm implementations might reasonably argue the bar should be higher. The comment threshold of 3 might be too low for a 340-line diff. These are parameters worth arguing about — and they should be.

The connection to Risk Discovery

The depth_signal system exists because reviewStatus alone is not a sufficient filter. The full Risk Discovery query in LineageLens is:

lineagelens report . --unreviewed --category auth

Without the depth classification, "unreviewed" only catches code with no review record at all. With it, you can extend the definition: code where the only review on record is shallow is operationally closer to unreviewed than to reviewed. The filter becomes meaningfully stricter.

More at lineage-website.vercel.app. The Hashnode post goes deeper on the design tradeoffs I considered before settling on this formula.

One question for the comments: Is 1 second per line the right floor for the rubber-stamp override? What threshold would you set for auth-path code — and what signals am I missing that would make this more reliable?

discuss about it !!

Praveen — Mon, 15 Jun 2026 05:49:20 +0000

Praveen

Jun 15

One OpenAI-Compatible Adapter. Seven AI Coding Tools. Three Very Different Wire Formats.

#discuss #security #showdev #startup

4 min read

One OpenAI-Compatible Adapter. Seven AI Coding Tools. Three Very Different Wire Formats.

Praveen — Mon, 15 Jun 2026 05:49:05 +0000

If your team uses Aider, Cline, Continue, Copilot CLI, Goose, or Windsurf — plus any OpenAI-compatible backend like Azure, groq, fireworks, mistral, or together.ai — you now have a single provenance capture layer for all of them. It shipped this week in lineagelens-proxy/adapters/openai_chat.py.

But writing that adapter revealed something worth documenting: "speaks the Chat Completions format" does not mean "expresses code edits the same way." There are three distinct patterns in the wild, and missing any one of them produces silent capture gaps.

The three edit expression patterns

Pattern A: Tool-call edits

The canonical OpenAI way. The model emits choices[].message.tool_calls[], each with a function name and JSON-serialized arguments. File edits arrive as structured arguments to tools like write_file, str_replace_editor, or apply_patch.

In streaming mode this means assembling argument JSON fragments across SSE chunks — each tool_calls[].function.arguments delta is a partial string that only makes sense when concatenated with every prior delta for the same tool_calls[].index. The adapter accumulates these per (choice_index, tool_call_index) before parsing.

The adapter also handles the legacy singular function_call field, which OpenAI deprecated but older tool integrations still emit.

Pattern B: Text-content edits

This is where it gets interesting. Several major tools — Aider being the most prominent — do not use tool calls at all. They send edits as structured text inside the assistant message content. Three sub-formats exist:

Aider SEARCH/REPLACE blocks: <<<<< SEARCH, =====, >>>>> REPLACE delimiters. The filename is on the line immediately before the opening fence, not inside the block itself. The adapter has to look backwards in the preceding text to find it.
Unified diffs: Standard --- a/file / +++ b/file / @@ format. The adapter parses these into per-file edit records.
Fenced code blocks: The fallback. A block with a path= hint in the info string, or a # file: path/to/file comment on the first line of the code. If neither is present, the block is still captured as file_path="proxy-capture" — never silently dropped.

The priority order matters: apply-patch DSL is tried first, then Aider SEARCH/REPLACE, then unified diff. Fenced code blocks only run when none of the structured formats matched. This ensures a single edit is never recorded twice.

Pattern C: Mixed responses

A single response can carry both tool calls and structured text content. This is more common than you might expect — some tools emit a tool call for the primary edit and then add context or diff output as text. The adapter runs both paths against every response.

The session key problem

When two developers are using the same proxy simultaneously and happen to send requests that produce overlapping tool_call_id values, you get an aliasing bug: two unrelated edits get cross-attributed.

The fix is a session fingerprint derived from the request, not a counter or timestamp:

def _openai_chat_session_key(body_dict: dict, headers: dict) -> str:
    system = ""
    for msg in (body_dict.get("messages") or []):
        if isinstance(msg, dict) and msg.get("role") in ("system", "developer"):
            system = _content_to_text(msg.get("content"))
            break

    auth = ""
    for header_name in ("authorization", "x-api-key", "Authorization", "X-Api-Key"):
        v = headers.get(header_name)
        if v:
            auth = v[:24]
            break

    raw = f"openai-chat|{system[:4096]}|{auth}"
    return hashlib.sha256(raw.encode("utf-8", errors="replace")).hexdigest()[:16]

SHA-256 over the system message prefix and the first 24 characters of the auth header. The result is a 16-hex-char fingerprint. The pending-edits store keys on (session_key, tool_call_id) tuples — making aliasing across concurrent sessions geometrically unlikely (~1 in 10^19).

The fail-open rule

Every parse path in the adapter is wrapped in exception handling that fails open: a JSON decode error, a malformed SSE chunk, a tool-call argument that does not map to any known edit shape — none of these surface as errors to the forwarding path. The response always passes through to the tool.

# Everything is fail-open: a parse error never raises into the forwarding path.

This is not a compromise. It is the design requirement. A governance layer that crashes the developer's tool gets disabled within hours of installation. The only viable architecture for a capture layer that developers leave running is one that can never interrupt the thing they care about — their coding session.

What this enables for Risk Discovery

The concrete payoff is in the lineagelens report CLI. With full coverage across Aider, Cline, Windsurf, and every groq/mistral/Azure backend, you can now run:

lineagelens report . --unreviewed --category auth

And get back every AI-written line in your auth paths — regardless of which tool wrote it — that has never been reviewed. This query is only meaningful when your capture layer actually covers your whole stack. An adapter that misses Aider's text-content edits silently excludes every Aider-generated file from that result.

The path annotation on every captured record looks like this:

tool_name: "aider_search_replace" | "text_codeblock" | "str_replace_editor"
file_path: "src/routes/auth.py"
verb: "replace" | "write" | "add"
prompt_context: { model: "gpt-4o", system: "...", messages: [...] }

Three fields that a git blame will never give you: which tool made this change, what the model was, and what the developer asked for.

What is not yet covered

The adapter comment is explicit about the gaps:

# TODO(bedrock/vertex): Claude-via-Bedrock uses /model/{id}/invoke[-with-response-stream]
# on *.bedrock-runtime.*.amazonaws.com and Gemini-via-Vertex uses
# :generateContent / :streamGenerateContent on *-aiplatform.googleapis.com.
# Those are NOT chat/completions and are NOT captured here — they need their own adapters.

If your team routes through Bedrock or Vertex, this adapter does not cover you. Separate adapters are needed. This is the right design — assuming coverage for endpoints you have not explicitly tested is worse than documenting the gap.

Getting started

The adapter is part of the LineageLens proxy, which runs alongside your AI coding tool and intercepts traffic at the network layer. No tool modification required for tools that already target a configurable API endpoint.

Cross-reference: for Hashnode readers, I wrote about the broader proxy modularization and what each adapter handles at lineage-website.vercel.app.

Full source at lineage-website.vercel.app.

Which tools in your stack are you most worried about provenance gaps for — the chat-completions tools, or the proprietary backends (Cursor, GitHub Copilot) that cannot be proxied at all?

what's your opininon on this

Praveen — Sun, 14 Jun 2026 05:14:12 +0000

Praveen

Jun 14

Your Code Review Process Is Verbal. Here's What a Machine-Verifiable Proof of AI Code Safety Looks Like.

#showdev #ai #security #news

4 min read

Your Code Review Process Is Verbal. Here's What a Machine-Verifiable Proof of AI Code Safety Looks Like.

Praveen — Sun, 14 Jun 2026 05:13:51 +0000

Most code review processes produce one artifact: a merged PR. Someone approved it. The review presumably happened. But if an auditor asks you to prove that the AI-generated function in your auth service passed your risk policy — that the model was on your allowlist, that the risk score was below threshold, that a human actually approved it — what do you hand them?

A closed PR is not evidence of the above. It is evidence that someone clicked "Approve." The model identity, the risk state at merge time, whether the reviewer read the AI context or just the diff — none of that is in the PR.

This is the gap that machine-verifiable AI code certificates close.

The Problem Is Structural

When you merge AI-generated code today, you lose the generation context permanently. The commit records the diff. Git blame records the author. Nothing records which model generated it, what the prompt was, what the risk score was at insertion, or whether the human reviewer actually engaged with the full AI context.

Post-merge, you're reconstructing from memory and process documentation. That reconstruction will not survive a security audit. It certainly won't survive an EU AI Act compliance review, where Article 12 requires records of AI system outputs to be kept for a period appropriate to the purpose.

What an Indemnity Certificate Actually Contains

LineageLens's indemnity system issues certificates at three scopes: per-record, per-PR, and per-release. The evaluation runs against your workspace's active IndemnityPolicy.

A policy has five configurable rules:

class PolicyRules(BaseModel):
    max_risk_score: int = Field(default=70, ge=0, le=100)
    require_license_clean: bool = Field(default=False)
    require_human_review: bool = Field(default=False)
    allowed_models: list[str] = Field(default_factory=list)
    unknown_review_pass: bool = Field(default=False)
    cert_ttl_days: int = Field(default=90, ge=1, le=3650)

When you call POST /indemnity/certificate with scope=pr and scope_ref=PR-442, the service fetches every provenance record tagged pr:PR-442, evaluates each one against all five rules, and either issues a certificate or returns a structured list of reasons why eligibility failed.

The evaluation is explicit:

# Risk check
if record.risk_score is not None and record.risk_score > max_risk:
    eligible = False
    reasons.append(
        f"Record {uid}: risk score {record.risk_score} exceeds policy maximum {max_risk}."
    )

# Model allowlist check
if allowed_models and record.model_name:
    if record.model_name not in allowed_models:
        eligible = False
        reasons.append(
            f"Record {uid}: model '{record.model_name}' is not in the policy allowed-models list."
        )

Notice what this requires: the model name must be captured at generation time. The risk score must have been computed at insertion. Human review status must exist in ReviewQueue. If any of these are missing, the certificate either fails or the unknown_review_pass escape hatch applies — your choice, policy-level.

The Cryptographic Layer

When eligibility passes, the system builds a canonical attestation statement and signs it with Ed25519:

def sign_attestation(statement: dict) -> SignedAttestation:
    private_key = _load_private_key()
    canonical = json.dumps(statement, sort_keys=True, default=str).encode()
    sig_bytes = private_key.sign(canonical)
    return SignedAttestation(
        statement=statement,
        signature=sig_bytes.hex(),
        public_key_id=_get_public_key_id(private_key),
    )

The sort_keys=True ensures canonical key ordering regardless of Python dict insertion order — without this, semantically identical statements could produce different byte sequences and fail verification. The corresponding public key is available unauthenticated at GET /attestations/{public_ref}/verify, so any third party can verify the certificate without workspace credentials.

The attestation also includes prev_hash — the record_hash of the most recent hash-chained provenance record in the workspace. This anchors the certificate to the workspace's provenance history at the moment of issuance. You cannot retroactively alter the provenance records that supported it without breaking the chain.

What Ineligibility Looks Like

An ineligible evaluation is equally useful. The system issues an unsigned certificate with a structured reasons list:

{
  "eligibility": "ineligible",
  "reasons": [
    "Record abc-123: risk score 84 exceeds policy maximum 70.",
    "Record def-456: model 'gpt-4o-mini' is not in the policy allowed-models list.",
    "Record ghi-789: human-review status is 'pending' — policy requires 'approved'."
  ]
}

This is a machine-generated record of exactly which AI code insertions failed your policy and why — before the code shipped.

The Connection to Capture Quality

None of this works without the provenance capture layer. If a record has model_name = null because the insertion went through a path LineageLens couldn't proxy, the model allowlist check cannot run. If risk_score is null, the risk check cannot run. The certificate is only as strong as the capture underneath it.

This is the compounding argument for installing early: every day of uncaptured AI code is a day of records that cannot support a certificate.

LineageLens is open source and free to install. The indemnity endpoint is part of the Plus/Max tier backend. The deeper cryptographic design walkthrough covers why Ed25519 over HMAC, the key derivation fallback, and where the policy gate should actually live.

What does your team produce as evidence when AI-generated code goes to production? And would it survive a structured audit?

what do you think about it??

Praveen — Sat, 13 Jun 2026 05:46:30 +0000

Praveen

Jun 13

"Co-authored-by: Copilot" Is Not an Audit Trail — Here's What One Actually Looks Like

#discuss #ai #programming #security

5 min read

"Co-authored-by: Copilot" Is Not an Audit Trail — Here's What One Actually Looks Like

Praveen — Sat, 13 Jun 2026 05:45:34 +0000

In late April 2026, Microsoft shipped VS Code 1.117. Buried in the release was a change: the github.copilot.chat.generateCommitMessage.addCoAuthoring setting was flipped from off to all by default. That meant "Co-authored-by: Copilot copilot@github.com" was now being appended to every commit message in the background — silently, without showing up in the commit message editor, and critically, without verifying that Copilot had generated any of the code.

Developers noticed within days. The backlash was significant. VS Code 1.119 shipped May 3 with the default reverted and a consent requirement added. Microsoft apologized.

The technical fix was straightforward. The governance question it exposed is not.

What the incident actually revealed

The developer anger wasn't really about attribution credit. It was about consent and accuracy. The co-author trailer was added to commits where AI features were disabled. It was added when developers had manually written every line. It attributed work that wasn't done.

But underneath that anger is a more important problem: even when Copilot does write code, a "Co-authored-by" git trailer tells you almost nothing useful from a governance or security standpoint.

It tells you that a tool called Copilot existed somewhere in the developer's editor during some portion of the work that eventually became this commit. That's it.

It doesn't tell you which model generated which lines. It doesn't tell you what the developer prompted for. It doesn't contain the raw model response. It doesn't tell you whether any of the AI-generated lines touched authentication paths, hardcoded credentials, or SQL construction. It says nothing about when generation happened relative to the commit. It carries no risk score.

If you had to defend a specific commit in a security audit six months from now — "which parts of this function were AI-generated, under what prompt, using what model?" — a git trailer gets you nowhere.

What a real provenance record contains

LineageLens captures provenance at insertion time, not commit time. Each AI code insertion generates a ProviderAgnosticProvenanceEvent structured around schema version lineagelens.provenance-event.v1. Here is what that record contains:

type ProviderAgnosticProvenanceEvent = {
  schemaVersion: 'lineagelens.provenance-event.v1';
  eventId: string;

  timestamps: {
    observedAtIso: string;         // when the extension saw the insertion
    insertedAtIso: string;         // when the text hit the buffer
    requestAtIso: string | null;   // when the proxy saw the outbound request
    responseAtIso: string | null;  // when the model responded
  };

  source: {
    ide: string | null;           // 'vscode'
    shim: string;                 // which capture path fired
    toolName: string | null;      // 'Edit', 'Write', 'apply_patch', etc.
    provider: string | null;      // 'anthropic', 'openai', 'google'
    adapterName: string | null;   // 'claude-code', 'copilot', 'cursor', etc.
  };

  capture: {
    level: CaptureStatus;         // 'full' | 'metadata_only' | 'tunnel_only' | 'file_diff'
    promptStatus: 'captured' | 'not-captured';
    capabilities: ProvenanceEventCapability[];  // 10 named slots
  };

  model: {
    name: unknown;
    parameters: Record<string, unknown> | null;  // temperature, max_tokens, etc.
  };

  prompt: {
    body: unknown;    // the full prompt messages array
    system: unknown;  // the system prompt
  };

  diff: {
    insertedText: string;
    chunks: ProvenanceInsertedChunk[];
    netAddedLines: number;
  };

  correlation: {
    confidence: number;                     // 0.0–1.0
    timingDifferenceMs: number | null;
    contentSimilarityScore: number | null;
    fileContextMatched: boolean;
  };
};

Compare that to what a git trailer gives you: a tool name and an email address.

The 10 capability slots

The most important part of the schema is the capture.capabilities array. Every provenance event gets 10 named capability assessments:

prompt-body       — was the full prompt captured?
response-body     — was the raw model response captured?
headers           — were the request headers available?
request-id        — was a UUID present to link request to insertion?
session-id        — was there session context?
model             — was the model name captured?
user-agent        — was the tool's user-agent available?
file-diff         — was the inserted diff captured? (always 'provided')
file-context      — did the file context match to the capture?
workspace         — was workspace context available?

Each entry carries a status: provided, missing, or unknown.

This matters because it tells you precisely what you know about a given insertion and what you don't. A record with prompt-body: missing and promptStatus: 'not-captured' is not the same as no record — it is an explicit declaration that the prompt gap exists. That gap is auditable. An audit trail with explicit gaps is categorically more useful than a label with no gaps declared.

The VS Code co-author trailer has no gap declarations. It has no granularity at all — it just has nothing.

Capture time vs. commit time

The harder architectural point: by the time you are in a git commit, you have already lost the evidence.

The prompt body does not live anywhere post-generation. The model name was in the HTTP response header. The raw response body was discarded after the tool processed it. The timing data only exists in the milliseconds between request and file write.

None of that is in the commit. None of it can be recovered retroactively.

LineageLens captures the ProviderAgnosticProvenanceEvent at the insertion event — before the diff even exists as a file change. The observedAtIso timestamp records when the VS Code extension detected the text entering the buffer. The requestAtIso and responseAtIso timestamps come from the proxy intercept that happened seconds or minutes before. By the time you type a commit message, the provenance record has already been stored.

A git trailer is a retroactive label. Provenance is an evidence chain that exists before the label does.

What actually changed after the VS Code incident

Microsoft reverted the default. They added a consent gate. They clarified that disableAIFeatures: true now also disables the co-authoring trailer.

None of that gives you a provenance record. You still do not know which lines in a given commit were AI-generated. You still cannot answer "what did Copilot generate in auth.py last month" from git history alone.

The incident forced consent around labeling. That is progress. It did not touch the underlying gap: labeling that something was AI-assisted is not the same as recording what the AI actually did.

The practical implication

If you are on a team shipping AI-generated code — and 84% of development teams are, per the 2026 Stack Overflow Developer Survey — you are almost certainly making three implicit assumptions:

That your CI pipeline or git history contains enough attribution information to answer an audit question.
That "Co-authored-by" or an equivalent label satisfies your traceability obligations.
That you could reconstruct the provenance of a specific function if you had to.

All three assumptions are likely wrong for the same reason: commit-time labeling cannot carry insertion-time evidence.

The EU AI Act Articles 11 and 12 enforcement window opens in August 2026. The question "which AI model generated this code, under what prompt, at what risk level?" is going to become a routine compliance requirement.

When it does, a git trailer is not going to be a defensible answer.

Try it

LineageLens Base is a free VS Code extension that starts capturing provenance events at insertion time today, even without proxy infrastructure. Lite, Plus, and Max tiers add proxy capture for full prompt, model, and response-body fields. The full architecture details are at lineage-website.vercel.app. The Hashnode post goes deeper on schema design tradeoffs.

One question for the comments: what data would your team actually need to survive a security audit of your AI-generated code? Not in theory — what specific fields would an auditor ask for?

what's yoiur opinion

Praveen — Fri, 12 Jun 2026 09:19:33 +0000

Praveen

Jun 12

Why Your AI Capture Store Needs Two Security Layers (Not One)

#discuss #showdev #news #startup

7 min read