<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Praveen</title>
    <description>The latest articles on DEV Community by Praveen (@pn_28428886923dfc665).</description>
    <link>https://dev.to/pn_28428886923dfc665</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940098%2F8a7a4942-5b0d-4847-9a7a-2eaf76d0ce30.png</url>
      <title>DEV Community: Praveen</title>
      <link>https://dev.to/pn_28428886923dfc665</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pn_28428886923dfc665"/>
    <language>en</language>
    <item>
      <title>You Cannot Retroactively Capture AI Code Provenance. Here Is What You Lose Every Day You Wait.</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Wed, 10 Jun 2026 07:05:15 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/you-cannot-retroactively-capture-ai-code-provenance-here-is-what-you-lose-every-day-you-wait-5gmc</link>
      <guid>https://dev.to/pn_28428886923dfc665/you-cannot-retroactively-capture-ai-code-provenance-here-is-what-you-lose-every-day-you-wait-5gmc</guid>
      <description>&lt;p&gt;There is a failure mode in AI code governance that does not get enough attention because it is invisible until it isn't.&lt;/p&gt;

&lt;p&gt;It is not a vulnerability. It is not a misconfiguration. It is not something a security scan will catch.&lt;/p&gt;

&lt;p&gt;It is a gap in time: the period between "your team started using AI coding tools" and "your team started recording what those tools did." Every line of&lt;br&gt;
  code generated in that gap is permanently unattributable. The prompt is gone. The model is gone. Whether the suggestion was reviewed, modified, or&lt;br&gt;
  auto-accepted is gone.&lt;/p&gt;

&lt;p&gt;You cannot go back. Retroactive provenance capture does not exist.&lt;/p&gt;




&lt;p&gt;## The One-Way Door&lt;/p&gt;

&lt;p&gt;When a developer prompts Claude Code and accepts a suggestion, the following information exists briefly in memory and in transit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The full prompt sent to the model&lt;/li&gt;
&lt;li&gt;The model identifier and version&lt;/li&gt;
&lt;li&gt;The generated code, before any human edits&lt;/li&gt;
&lt;li&gt;Whether the insertion was accepted, rejected, or applied with modifications&lt;/li&gt;
&lt;li&gt;The file path and surrounding context at the time of generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the moment the developer's editor applies the change and moves on, most of that information disappears. Git records the diff and the commit author.&lt;br&gt;
  Nothing else records provenance by default.&lt;/p&gt;

&lt;p&gt;The commit is not the generation event. These happen at different times with different context. Understanding this distinction is the precondition for any&lt;br&gt;
  serious AI governance posture.&lt;/p&gt;

&lt;p&gt;If you were not capturing at the moment of generation, you were not capturing. There is no reconstruct operation.&lt;/p&gt;




&lt;p&gt;## What You Actually Lose&lt;/p&gt;

&lt;p&gt;Let's be specific about what "unattributable" means in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: The incident trace.&lt;/strong&gt;&lt;br&gt;
  A bug surfaces in src/payments/processor.py. You trace it to a block inserted six weeks ago. Git blame gives you a developer name and a commit hash. What&lt;br&gt;
  you cannot recover: the prompt that produced the block, the model that generated it, whether the developer reviewed it or accepted it from the first&lt;br&gt;
  suggestion, and what the risk patterns in that insertion looked like at generation time. You are debugging code whose origin is opaque.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: The compliance question.&lt;/strong&gt;&lt;br&gt;
  Your company is asked — by an enterprise customer, an auditor, or your own legal team — to document AI usage in the SDLC. The EU AI Act Articles 11, 12,&lt;br&gt;
  and 14 have enforcement teeth from August 2026. The question is: which code was AI-generated, which model produced it, and what review occurred?&lt;/p&gt;

&lt;p&gt;If you have been capturing since January, you have a full audit trail. If you started capturing this week because someone asked the question, you have a&lt;br&gt;
  full audit trail from this week forward and nothing before that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: The departing engineer.&lt;/strong&gt;&lt;br&gt;
  A developer who used AI tools heavily leaves the team. Their code is in the codebase, some of it AI-generated. The team has no record of which blocks were&lt;br&gt;
  AI-generated, what was prompted, or what the risk posture of those blocks is. Onboarding the next developer is a code archaeology project that cannot be&lt;br&gt;
  fully resolved.&lt;/p&gt;




&lt;p&gt;## What Starting Looks Like — Zero Configuration Required&lt;/p&gt;

&lt;p&gt;The reason the irreversibility argument matters so much is that the cost of starting is zero.&lt;/p&gt;

&lt;p&gt;LineageLens Base installs in one command and starts capturing immediately:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;code --install-extension karnatipraveen.lineagelens-base&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No backend. No proxy. No account. No configuration. No API key.&lt;/p&gt;

&lt;p&gt;The extension activates the moment it installs. It hooks into VS Code's onDidChangeTextDocument event and watches for insertions of 4 or more lines. When&lt;br&gt;
  a qualifying insertion occurs, it captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File path and language&lt;/li&gt;
&lt;li&gt;Inserted code block&lt;/li&gt;
&lt;li&gt;Net lines added&lt;/li&gt;
&lt;li&gt;Confidence score (0.0–1.0)&lt;/li&gt;
&lt;li&gt;Source classification (cursor, copilot, unknown, etc.)&lt;/li&gt;
&lt;li&gt;UTC timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Records are stored in VS Code global state — a local JSON store on the developer's machine. No data leaves the machine. Status bar shows LL: Easy (local).&lt;/p&gt;

&lt;p&gt;From this moment forward, every AI insertion of 4+ lines has a record. The record is sparse (no prompt, no model name — those require the proxy) but it&lt;br&gt;
  exists, and its timestamp is authoritative.&lt;/p&gt;




&lt;p&gt;## Upgrading Record Quality Without Reinstalling&lt;/p&gt;

&lt;p&gt;When you want full prompt and model capture, add the Lite proxy alongside it:&lt;/p&gt;

&lt;p&gt;git clone &lt;a href="https://github.com/karnati-praveen/lineagelens" rel="noopener noreferrer"&gt;https://github.com/karnati-praveen/lineagelens&lt;/a&gt;&lt;br&gt;
  cd lineagelens&lt;br&gt;
  bash lineagelens-scripts/quickstart-lite.sh&lt;/p&gt;

&lt;p&gt;Open &lt;a href="http://localhost:8787/setup" rel="noopener noreferrer"&gt;http://localhost:8787/setup&lt;/a&gt;, create your admin account in three browser steps, then set one environment variable:&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_BASE_URL=&lt;a href="http://localhost:8788" rel="noopener noreferrer"&gt;http://localhost:8788&lt;/a&gt;&lt;br&gt;
  export OPENAI_BASE_URL=&lt;a href="http://localhost:8788" rel="noopener noreferrer"&gt;http://localhost:8788&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The extension polls /proxy-health every 30 seconds. When it detects the proxy, the status bar switches from LL: Easy (local) to LL: Power automatically.&lt;br&gt;
  Records captured after that point include the full prompt, model identifier, and applied/rejected status.&lt;/p&gt;

&lt;p&gt;Records captured in Easy Mode before the proxy was running remain in the store with capture_status: file_diff and confidence ~0.35. They are not&lt;br&gt;
  retroactively enriched. But they exist. That is the point.&lt;/p&gt;




&lt;p&gt;## The Confidence Gradient&lt;/p&gt;

&lt;p&gt;Here is what record quality looks like across capture configurations:&lt;/p&gt;

&lt;p&gt;Capture mode         capture_status    Confidence     Prompt captured?&lt;br&gt;
  ---Proxy (Power Mode)   full              0.80 – 1.00    Yes&lt;br&gt;
  Proxy (tunneled)     tunnel_only       0.60 – 0.80    Partial&lt;br&gt;
  Extension only       file_diff         0.25 – 0.45    No&lt;br&gt;
  No capture           —                 —              —&lt;/p&gt;

&lt;p&gt;A file_diff record at confidence 0.35 is not a rich provenance record. But it is infinitely better than no record for a specific reason: it is timestamped&lt;br&gt;
  and file-attributed at generation time, not commit time.&lt;/p&gt;

&lt;p&gt;When an incident occurs six months later and you are tracing a bug to a specific block, a file_diff record tells you approximately when this code appeared&lt;br&gt;
  in the file, that it passed the threshold for "likely AI-generated," which AI tool extension was probably active, and what the file path was. That is&lt;br&gt;
  enough to narrow a 90-day window to a specific week. Without the record, the investigation starts from zero.&lt;/p&gt;




&lt;p&gt;## For Teams: The Asymmetry Compounds&lt;/p&gt;

&lt;p&gt;If your team has ten developers using a mix of Cursor, Copilot, and Claude Code, and none of them are running LineageLens, you have a growing body of&lt;br&gt;
  unattributable AI-generated code accumulating daily. Every sprint without capture is a sprint of records that cannot be recovered.&lt;/p&gt;

&lt;p&gt;LineageLens Lite adds a shared backend without requiring Postgres:&lt;/p&gt;

&lt;p&gt;bash lineagelens-scripts/quickstart-lite.sh&lt;/p&gt;

&lt;p&gt;Single Docker container. SQLite. Runs on a $5 VPS or a spare machine. The setup wizard creates the admin account and workspace in three browser steps.&lt;br&gt;
  Share the proxy URL and one environment variable with the team. From that point forward, every developer's AI tool traffic is captured and stored&lt;br&gt;
  centrally.&lt;/p&gt;




&lt;p&gt;## The Direct Argument&lt;/p&gt;

&lt;p&gt;LineageLens might be the right tool for your team or it might not be. That is worth evaluating. But the evaluation should happen today, not when you&lt;br&gt;
decide you need it — because the cost of deciding you need it after the fact is permanent.&lt;br&gt;
Base is free.Both are MIT-licensed and fully self-hosted.&lt;br&gt;
The cost of starting is 30 seconds and one command:&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;code --install-extension karnatipraveen.lineagelens-base&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
  After you try it: what is the oldest piece of AI-generated code in your codebase that you cannot explain the origin of? How far back does your&lt;br&gt;
  unattributable window go?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>showdev</category>
      <category>startup</category>
    </item>
    <item>
      <title>How LineageLens Scores Risk on Every AI Code Insertion — and Why Missing the Prompt Makes It Worse</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Mon, 08 Jun 2026 06:00:35 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/how-lineagelens-scores-risk-on-every-ai-code-insertion-and-why-missing-the-prompt-makes-it-worse-26em</link>
      <guid>https://dev.to/pn_28428886923dfc665/how-lineagelens-scores-risk-on-every-ai-code-insertion-and-why-missing-the-prompt-makes-it-worse-26em</guid>
      <description>&lt;p&gt;You accepted a 35-line AI suggestion into &lt;code&gt;auth.py&lt;/code&gt;. It looks clean. No red flags. Your linter passes.&lt;/p&gt;

&lt;p&gt;LineageLens has already scored it 54 out of 100.&lt;/p&gt;

&lt;p&gt;That number does not come from a model, a semantic classifier, or vibes. It comes from a small set of deterministic rules applied at the moment the record is ingested — and understanding how those rules compose tells you something useful about what "AI code risk" actually means in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture: two functions, two moments
&lt;/h2&gt;

&lt;p&gt;The risk scoring in LineageLens lives entirely in &lt;code&gt;lineagelens-backend/app/services/risk_service.py&lt;/code&gt;. It exposes exactly two public functions:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
def compute_risk_score(
    inserted_code: str,
    prompt_messages: object | None = None,
    model_name: str | None = None,
    file_path: str | None = None,
) -&amp;gt; tuple[int, list[str]]:
    """Ingest-time. All tiers. Raw fields."""


def compute_risk_from_record(
    record: dict[str, Any],
    is_agentic: bool = False,
) -&amp;gt; tuple[int, list[str], set[str]]:
    """Insights-time. Plus/Max only. Serialized record."""
The split is not cosmetic. compute_risk_score runs at ingest time — the moment a provenance record arrives at the backend. It must be fast, stateless, and operate only on what is available right now.

compute_risk_from_record runs at insights time — when a user queries the insights endpoint. By then, correlation confidence is computed and the record is fully enriched. This path is gated by require_non_solo — Lite's SQLite backend never reaches it.

The base score and what it means
Every record starts at 12 before any signal fires. A score of exactly 12 means the quietest possible record — clean code, unremarkable file path, small insertion, prompt captured. That is the floor.

Code pattern rules
Six rule groups apply to the inserted block via re.search with re.IGNORECASE:

Python
_CODE_PATTERN_RULES = [
    ([r"api[_-]?key", r"access[_-]?token", r"private[_-]?key"],
     28, "Credential-like material in the block.", "security"),

    ([r"\beval\b", r"\bFunction\s*\(", r"new Function",
      r"\bexec\s*\(", r"\bexecSync\s*\("],
     24, "Dynamic code execution present.", "security"),

    ([r"\bsubprocess\.", r"\bos\.system\b",
      r"\bchild_process\b", r"\bspawn(?:Sync)?\b"],
     22, "Shell or process execution introduced.", "security"),

    ([r"dangerouslySetInnerHTML", r"\binnerHTML\s*="],
     20, "Unsafe DOM mutation patterns.", "security"),

    ([r"\bSELECT\s+.+\bFROM\b", r"\bINSERT\s+INTO\b",
      r"\bUPDATE\s+\w+\s+SET\b", r"\bDELETE\s+FROM\b"],
     16, "Raw SQL in the block.", "reliability"),

    ([r"\bpassword\b", r"\btoken\b", r"\bauth\b", r"\bcredential\b"],
     12, "Auth/credential handling present.", "compliance"),
]
The deltas rank by failure-mode severity. Credentials score +28 because AI-generated code containing credential-like strings has a disproportionate blast radius. Dynamic execution scores +24 because eval() in AI-generated code has appeared in supply chain incidents repeatedly.

Rules are not mutually exclusive. Both can fire on the same block. The score accumulates. Final result is capped at min(score, 100), but the reasons list preserves every contributing signal.

File path rules
Python
_FILE_PATTERN_RULES = [
    ([r"auth", r"security", r"permission", r"oauth",
      r"token", r"secret", r"credential"],
     14, "Security-sensitive file path.", "compliance"),

    ([r"payment", r"billing", r"invoice", r"ledger", r"finance"],
     14, "Financially sensitive file path.", "compliance"),
]
These fire on destination, not content. A clean utility function inserted into src/routes/auth.py picks up +14 purely because of where it landed. This is deliberate.

Line count
Python
if net_lines &amp;gt;= 80:
    score += 18
elif net_lines &amp;gt;= 30:
    score += 10
Larger AI-generated blocks are harder to review thoroughly. The signal acknowledges that reality.

Working through the example
Scenario: Claude Code inserts 35 lines into src/routes/auth.py that include an eval() call to dynamically dispatch a handler. Proxy is running, prompt was captured.

Base score: 12

File path (auth pattern): +14

Lines in [30, 80) range: +10

eval() code pattern: +24

Total: 60 → HIGH

The missing-prompt penalty — and why it is the most important signal
Now remove the proxy. Same code, same file, same 35 lines — but prompt_messages=None.

Python
if prompt_messages is None:
    score += 24
    reason_set.add(
        "Prompt capture is missing, which reduces auditability "
        "and reviewer confidence."
    )
Previous score: 60

Missing prompt: +24

New total: 84 → borderline CRITICAL

Same code. Same file. 24 points higher because the audit trail is incomplete.

The score is not measuring code danger in isolation. It is measuring auditable risk — the risk that something went wrong and you would not be able to reconstruct what happened. Without the prompt you cannot detect:

A model that exceeded its scope

Prompt injection via context

Scope creep from an agentic session

The +24 penalty is the scoring engine's way of saying: you are less auditable here, and that is a risk.

Easy Mode (extension only, no proxy) → scores higher for the same code than Power Mode (proxy running, prompt captured). That gap is intentional. The architecture makes the tradeoff legible through a number.

Bash
# Step 1: install extension (free, zero config)
code --install-extension karnatipraveen.lineagelens

# Step 2: add prompt capture (eliminates the +24 penalty)
git clone [https://github.com/karnati-praveen/lineagelens](https://github.com/karnati-praveen/lineagelens)
bash lineagelens-scripts/quickstart-lite.sh
export ANTHROPIC_BASE_URL=http://localhost:8788
Insights-time signals (Plus/Max only)
compute_risk_from_record adds signals not available at ingest:

Correlation confidence:

Python
if correlation_confidence &amp;lt; 0.4:
    score += 16
elif correlation_confidence &amp;lt; 0.65:
    score += 8
Agentic session:

Python
if is_agentic:
    score += 6
These require the insights endpoint because correlation confidence is downstream of ingest — blocking ingest on that would add 20–200ms of latency to every capture.

What this is not
A score of 80 does not mean the code has a vulnerability. A score of 20 does not mean it is safe. This is heuristic risk scoring, not SAST. Pair it with Snyk or Semgrep. What it provides is a provenance-aware signal: one that factors in not just what was written, but where it landed, how much was generated at once, and whether the intent behind it is recoverable.

That last factor — auditability — is what makes this different from a linter rule.

What pattern would you add to the detection list? I'm particularly curious about lightweight CFG analysis at ingest time — detecting whether a flagged eval() is reachable from an untrusted input path. The precision gain is real; the question is whether 20–40ms of analysis per insertion is acceptable in the ingest path.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>opensource</category>
      <category>discuss</category>
      <category>news</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Logs Are Not Audit Artifacts: Why AI-Generated Code Needs a Signed AI BOM</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Wed, 03 Jun 2026 05:24:18 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/logs-are-not-audit-artifacts-why-ai-generated-code-needs-a-signed-ai-bom-28mo</link>
      <guid>https://dev.to/pn_28428886923dfc665/logs-are-not-audit-artifacts-why-ai-generated-code-needs-a-signed-ai-bom-28mo</guid>
      <description>&lt;p&gt;Most teams are trying to solve AI provenance with dashboards. That is the wrong object.&lt;/p&gt;

&lt;p&gt;A dashboard is useful while the system is live. It tells you what happened, who touched what, and where the risk seems concentrated. But audit work happens later, after the prompt is forgotten, the model changes, the PR is merged, and someone asks a question the dashboard was never designed to answer:&lt;/p&gt;

&lt;p&gt;Can you prove this record still means what it said it meant?&lt;/p&gt;

&lt;p&gt;That is a different problem.&lt;/p&gt;

&lt;p&gt;A log answers one question: what did the system emit at the time?&lt;/p&gt;

&lt;p&gt;A provenance record answers a better one: what happened, with enough context to reconstruct the event?&lt;/p&gt;

&lt;p&gt;An AI BOM answers the hardest one: can I trust this exported summary later, after the data has moved, been redacted, or been shared with another team?&lt;/p&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;Why logs are not enough&lt;br&gt;
Logs are operational data. They are great for debugging and incident response. They are not naturally shaped for later trust.&lt;/p&gt;

&lt;p&gt;They are usually mutable. They are often provider-specific. They are usually optimized for collection, not for a later proof. And in the AI code provenance case, they miss the exact thing people ask for in review:&lt;br&gt;
What model produced this? What prompt triggered it? Was the record modified after the fact?&lt;/p&gt;

&lt;p&gt;If the answer is “we have logs,” that is usually a sign the system still treats provenance like telemetry.&lt;/p&gt;

&lt;p&gt;That is why I prefer a different frame.&lt;/p&gt;

&lt;p&gt;AI provenance needs an artifact.&lt;/p&gt;

&lt;p&gt;Not a chart. Not a usage counter. Not a dashboard screenshot.&lt;/p&gt;

&lt;p&gt;An artifact.&lt;/p&gt;

&lt;p&gt;What an AI BOM is&lt;br&gt;
AI BOM means AI Bill of Materials.&lt;/p&gt;

&lt;p&gt;It is the same general idea as an SBOM, but for AI-generated code. Instead of listing dependencies, it lists AI-generated provenance: what got changed, which model produced it, whether the prompt was captured or redacted, and whether the export itself still verifies.&lt;/p&gt;

&lt;p&gt;A useful AI BOM should answer, at minimum:&lt;/p&gt;

&lt;p&gt;What was changed?&lt;br&gt;
Which file was touched?&lt;br&gt;
Which model generated the code?&lt;br&gt;
Was the prompt captured, hashed, or redacted?&lt;br&gt;
Is the provenance chain intact?&lt;br&gt;
Can the exported document be verified later?&lt;br&gt;
That is the object we are trying to produce.&lt;/p&gt;

&lt;p&gt;In LineageLens, that means the backend now does two things in Plus/Max mode:&lt;/p&gt;

&lt;p&gt;It chains provenance records together with prev_hash and record_hash.&lt;br&gt;
It exports a signed AI BOM that can be checked later.&lt;br&gt;
The point is not to make the dashboard prettier. The point is to make the record trustable.&lt;/p&gt;

&lt;p&gt;What the integrity layer looks like&lt;br&gt;
Here is the basic flow:&lt;/p&gt;

&lt;p&gt;AI tool / editor&lt;br&gt;
      |&lt;br&gt;
      v&lt;br&gt;
capture layer / proxy&lt;br&gt;
      |&lt;br&gt;
      v&lt;br&gt;
provenance record&lt;br&gt;
      |&lt;br&gt;
      +--&amp;gt; record_hash + prev_hash&lt;br&gt;
      |&lt;br&gt;
      +--&amp;gt; signed AI BOM export&lt;br&gt;
      |&lt;br&gt;
      v&lt;br&gt;
verifier / auditor / downstream system&lt;br&gt;
The important part is that the exported document is not just a dump of rows. It is a signed summary of the provenance state.&lt;/p&gt;

&lt;p&gt;In the current implementation, each provenance record gets a chain hash built from a canonical set of fields. The prompt itself is not copied into the AI BOM export. Instead, the export uses a prompt hash, which gives you disclosure tracking without leaking raw prompt text into the artifact.&lt;/p&gt;

&lt;p&gt;A simplified version looks like this:&lt;br&gt;
canonical = {&lt;br&gt;
    "uuid": record.uuid,&lt;br&gt;
    "workspace_id": record.workspace_id,&lt;br&gt;
    "file_path": record.file_path,&lt;br&gt;
    "inserted_code": record.inserted_code or "",&lt;br&gt;
    "model_name": record.model_name or "",&lt;br&gt;
    "prompt_sha256": sha256(prompt_messages),&lt;br&gt;
    "timestamp_iso": record.timestamp_iso,&lt;br&gt;
    "prev_hash": previous_hash or "",&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;record_hash = sha256(json.dumps(canonical, sort_keys=True).encode()).hexdigest()&lt;br&gt;
If any field changes, the hash changes.&lt;/p&gt;

&lt;p&gt;That is the point.&lt;/p&gt;

&lt;p&gt;The verify path then recomputes the chain and stops at the first mismatch. If something in the middle was altered, the first broken UUID tells you exactly where the trust boundary failed.&lt;/p&gt;

&lt;p&gt;Why prompt hashes instead of raw prompts&lt;br&gt;
This part matters more than people think.&lt;/p&gt;

&lt;p&gt;Raw prompts are useful inside the provenance store. They are often essential for internal debugging, review, and governance workflows. But the exported artifact has a different job.&lt;/p&gt;

&lt;p&gt;An export is meant to be shared.&lt;/p&gt;

&lt;p&gt;Once you are sharing, raw prompts become a liability. They can contain code, internal names, API structure, or sensitive business context. That is why the AI BOM uses a prompt hash in the export. It gives you fingerprinting without dumping private text into every report.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>discuss</category>
      <category>news</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Why AI provenance records need capture levels, not just logs</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Mon, 01 Jun 2026 06:36:22 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/why-ai-provenance-records-need-capture-levels-not-just-logs-467l</link>
      <guid>https://dev.to/pn_28428886923dfc665/why-ai-provenance-records-need-capture-levels-not-just-logs-467l</guid>
      <description>&lt;p&gt;Monday is when the gap shows up.&lt;/p&gt;

&lt;p&gt;By the time a repo has been touched by Claude Code, Cursor, Copilot, and one or two manual edits, the question is no longer “did AI touch this file?” The real question is: what evidence do we actually have about that change?&lt;/p&gt;

&lt;p&gt;That sounds subtle until you try to build a record around it.&lt;/p&gt;

&lt;p&gt;One assistant gives you a full prompt and response through a proxy. Another leaves only an editor diff. A third may expose metadata, but not enough to reconstruct the prompt body. If you flatten those paths into the same log shape, you create a record that looks authoritative and is not.&lt;/p&gt;

&lt;p&gt;That is the problem I wanted LineageLens to solve: provenance needs to describe capture quality, not just capture events.&lt;/p&gt;

&lt;p&gt;The wrong abstraction: one log to rule them all&lt;br&gt;
The tempting design is to make one ai_audit_log table and push everything into it.&lt;/p&gt;

&lt;p&gt;That works for about five minutes.&lt;/p&gt;

&lt;p&gt;The moment you mix tools, you get different evidence surfaces:&lt;/p&gt;

&lt;p&gt;a proxy path can see prompt, model, response, request headers, and timing&lt;br&gt;
an editor-side hook can see the file diff and local context&lt;br&gt;
some tools expose session IDs, some do not&lt;br&gt;
some paths are fully available, some are metadata-only, and some are effectively unavailable&lt;br&gt;
f all of that ends up in one record without a capture-level field, downstream consumers cannot tell the difference between “full evidence” and “best effort.” That is how audit trails become storytelling tools.&lt;/p&gt;

&lt;p&gt;The better model is to treat capture depth as part of the record itself.&lt;br&gt;
Capture level should be first-class&lt;br&gt;
In LineageLens, capture is not a Boolean. It is a state:&lt;br&gt;
type CaptureLevel = 'full' | 'metadata_only' | 'tunnel_only' | 'hook' | 'unavailable';&lt;/p&gt;

&lt;p&gt;type ProvenanceCapture = {&lt;br&gt;
  level: CaptureLevel;&lt;br&gt;
  promptStatus: 'captured' | 'not-captured';&lt;br&gt;
  capabilities: Array&amp;lt;{&lt;br&gt;
    name:&lt;br&gt;
      | 'prompt-body'&lt;br&gt;
      | 'response-body'&lt;br&gt;
      | 'headers'&lt;br&gt;
      | 'request-id'&lt;br&gt;
      | 'session-id'&lt;br&gt;
      | 'model'&lt;br&gt;
      | 'user-agent'&lt;br&gt;
      | 'file-diff'&lt;br&gt;
      | 'file-context'&lt;br&gt;
      | 'workspace';&lt;br&gt;
    status: 'provided' | 'missing' | 'unknown';&lt;br&gt;
  }&amp;gt;;&lt;br&gt;
};&lt;br&gt;
That shape does two useful things.&lt;/p&gt;

&lt;p&gt;First, it tells you what the system actually saw.&lt;/p&gt;

&lt;p&gt;Second, it refuses to overstate certainty.&lt;/p&gt;

&lt;p&gt;A record can be full and still have low correlation confidence. A record can be metadata_only and still be useful. A record can be unavailable and still matter, because the absence of evidence is itself evidence.&lt;/p&gt;

&lt;p&gt;That is the part most tools skip.&lt;/p&gt;

&lt;p&gt;Normalize the core, preserve the edge&lt;br&gt;
The core record should be provider-agnostic.&lt;/p&gt;

&lt;p&gt;That means the dashboard, search layer, export layer, and governance layer all speak the same language, even if the source was a proxy capture, an editor hook, or a lightweight adapter. The record becomes the stable contract. The raw payloads become supporting evidence, not the contract itself.&lt;/p&gt;

&lt;p&gt;A simplified shape looks like this:&lt;/p&gt;

&lt;p&gt;capture.level&lt;br&gt;
capture.promptStatus&lt;br&gt;
capture.capabilities&lt;br&gt;
source&lt;br&gt;
session&lt;br&gt;
model&lt;br&gt;
prompt&lt;br&gt;
response&lt;br&gt;
file&lt;br&gt;
diff&lt;br&gt;
context&lt;br&gt;
correlation&lt;br&gt;
confidence&lt;br&gt;
extensions&lt;br&gt;
That gives you one query surface for mixed AI tools without forcing every integration to pretend it exposes the same metadata.&lt;/p&gt;

&lt;p&gt;Here is the flow, in plain English:&lt;br&gt;
AI tool / editor&lt;br&gt;
   ├─ proxy path -&amp;gt; prompt + response + model&lt;br&gt;
   └─ hook path  -&amp;gt; inserted diff + context&lt;br&gt;
         ↓&lt;br&gt;
   correlation + adapter detection&lt;br&gt;
         ↓&lt;br&gt;
 provider-agnostic provenance event&lt;br&gt;
         ↓&lt;br&gt;
 storage / search / review&lt;br&gt;
The important part is that both paths end up in the same normalized event. The path is different. The contract is the same.&lt;/p&gt;

&lt;p&gt;Why this matters in practice&lt;br&gt;
This is not just a schema preference.&lt;/p&gt;

&lt;p&gt;It changes how you use the data.&lt;/p&gt;

&lt;p&gt;If a record says full, you can review the prompt, response, and file change together.&lt;/p&gt;

&lt;p&gt;If it says metadata_only, you know the system saw something, but not everything.&lt;/p&gt;

&lt;p&gt;If it says unavailable, you know not to trust the absence of prompt data as if it were evidence of no prompt.&lt;/p&gt;

&lt;p&gt;That matters for:&lt;/p&gt;

&lt;p&gt;incident response, where teams need to know what the system actually observed&lt;br&gt;
code review, where a partial record is still better than a confident lie&lt;br&gt;
governance, where audit trails need to be precise about what exists and what does not&lt;br&gt;
search, where filters like “show me non-full captures from Monday” are useful instead of hidden footnotes&lt;br&gt;
This is why I prefer “honest partials” over silent omissions.&lt;/p&gt;

&lt;p&gt;A provenance system that says “prompt missing” is more trustworthy than a system that fills the field with a guess or quietly hides the record.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>discuss</category>
      <category>news</category>
      <category>showdev</category>
    </item>
    <item>
      <title>The enterprise AI control that is still missing: code provenance</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Sun, 31 May 2026 07:05:49 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/the-enterprise-ai-control-that-is-still-missing-code-provenance-57ie</link>
      <guid>https://dev.to/pn_28428886923dfc665/the-enterprise-ai-control-that-is-still-missing-code-provenance-57ie</guid>
      <description>&lt;p&gt;Enterprise AI governance keeps getting framed as a policy problem. Write acceptable-use rules. Turn on SSO. Add RBAC. Review risky PRs more carefully. That is all useful, but it still misses the one thing auditors, security teams, and incident responders actually need when AI-generated code reaches production: provenance.&lt;/p&gt;

&lt;p&gt;Not “did someone use AI.” Not “did the vendor log usage.” Provenance.&lt;/p&gt;

&lt;p&gt;When a critical bug lands in production, the question is not theoretical. Someone has to answer:&lt;/p&gt;

&lt;p&gt;What was generated?&lt;br&gt;
What was asked?&lt;br&gt;
Which model produced it?&lt;br&gt;
Which file did it land in?&lt;br&gt;
Who accepted it?&lt;br&gt;
Was it reviewed?&lt;br&gt;
Can we trace that decision later?&lt;/p&gt;

&lt;p&gt;Git blame does not answer those questions. Vendor audit logs usually do not either. In most enterprise setups, you end up with three separate blind spots:&lt;/p&gt;

&lt;p&gt;A commit history that shows authorship, not generation.&lt;br&gt;
A Copilot-style usage log that only covers one tool.&lt;br&gt;
A pile of PR comments and comments in code that rely on human discipline.&lt;/p&gt;

&lt;p&gt;That is not an audit trail. It is a loose collection of hints.&lt;/p&gt;

&lt;p&gt;The missing control is code provenance.&lt;/p&gt;

&lt;p&gt;LineageLens is built around that gap. It records the prompt, the model, the tool, the target file, the inserted code, and whether the edit was accepted or rejected. It does that in a self-hosted way, so the provenance stays inside your infrastructure instead of becoming another SaaS data trail.&lt;br&gt;
This is also where most generic logging strategies break down. Datadog and Splunk are excellent when you already know what to instrument. They are not purpose-built for AI provenance. If you want them to solve this problem, you have to build custom instrumentation, define your own schema, and keep that instrumentation working across multiple coding tools as their protocols change.&lt;/p&gt;

&lt;p&gt;That is why I do not think the enterprise answer is “use your observability stack.” Observability tells you what happened at runtime. Provenance tells you how code entered the repository.&lt;/p&gt;

&lt;p&gt;That distinction matters more as AI coding becomes normal.&lt;/p&gt;

&lt;p&gt;If your team uses one tool, maybe you can tolerate a partial log. If your team uses Cursor in the morning, Claude Code for refactors, and Copilot in the editor, partial logging becomes a governance gap. The risk is not just productivity drift. It is that nobody can later say, with evidence, how the code got there.&lt;/p&gt;

&lt;p&gt;LineageLens is not a static analysis scanner and it is not a compliance certification product. It does not replace review, SAST, or policy enforcement. It does one narrower job: it records the provenance trail that those systems need but do not create.&lt;/p&gt;

&lt;p&gt;That is why the product has multiple deployment modes. Base is local and offline. Lite is a single Docker container with SQLite. Plus adds PostgreSQL, semantic search, team visibility, and governance. Max adds graph lineage for teams that need ancestry across tools and sessions. Different orgs need different operational weight, but the underlying question is the same: can you prove where AI-generated code came from?&lt;/p&gt;

&lt;p&gt;For enterprise teams, I think this is the right way to frame the conversation:&lt;/p&gt;

&lt;p&gt;If the code is not provenance-tagged, then your review process is partly guesswork.&lt;br&gt;
If the prompt is missing, then your audit trail is incomplete.&lt;br&gt;
If the record is not self-hosted, then your governance data lives somewhere else.&lt;br&gt;
If you only track one vendor, then you are not tracking the team.&lt;/p&gt;

&lt;p&gt;That is the argument I would want to make in a security review.&lt;br&gt;
If you want the deeper technical breakdown, I wrote a longer companion post for Hashnode and the product overview is on lineagelens-website.vercel.app.&lt;/p&gt;

&lt;p&gt;Tags: ai, security, devops, opensource&lt;/p&gt;

&lt;p&gt;End question: What is your team using today to prove that AI-generated code is actually traceable six months later?&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>discuss</category>
      <category>news</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Enterprise AI Governance Starts With Identity, Not Inference</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Sat, 30 May 2026 06:41:51 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/enterprise-ai-governance-starts-with-identity-not-inference-10ng</link>
      <guid>https://dev.to/pn_28428886923dfc665/enterprise-ai-governance-starts-with-identity-not-inference-10ng</guid>
      <description>&lt;p&gt;The mistake most teams make with AI governance is starting in the wrong place.&lt;/p&gt;

&lt;p&gt;They start with model choice, prompt logging, or a dashboard that shows usage counts. That is useful, but it is not the enterprise problem. The enterprise problem is this: who had access to a workspace when the code was generated, how was that access granted, how is it revoked, and where does the evidence live after the developer moves on?&lt;/p&gt;

&lt;p&gt;That is the lens I use when I look at LineageLens now. The codebase is not just a capture system. It is a control plane.&lt;/p&gt;

&lt;p&gt;It has to be, because AI-generated code becomes sensitive the moment it crosses team boundaries. A prompt often contains internal names, architecture details, hidden assumptions, or even snippets of implementation. If that prompt turns into code, the organization needs more than “we saw the model output once.” It needs a reproducible record tied to identity, scope, and storage.&lt;br&gt;
The first thing the backend now does is protect the boot sequence itself. A setup guard keeps the app behind /setup until the first admin exists. That is not a cosmetic detail. It is the difference between “we shipped a service” and “we shipped a service that knows when it is safe to expose itself.”&lt;/p&gt;

&lt;p&gt;There is also a path for unattended installs. Admin seeding lets an operator predefine the first admin account and workspace, which is exactly what you want for repeatable deployments and test environments. The important part is that the bootstrap is explicit. No hidden account. No default credential. No mystery state.&lt;/p&gt;

&lt;p&gt;That same principle shows up again in auth.&lt;/p&gt;

&lt;p&gt;LineageLens does not treat login as a single long-lived session. Tokens carry versioning. Refresh tokens carry a unique identifier. Logout increments the token version. Password changes can invalidate old sessions too. That means old credentials do not linger in the background after an offboarding event or a reset.&lt;/p&gt;

&lt;p&gt;For enterprise teams, that is the real question: can you revoke access cleanly?&lt;/p&gt;

&lt;p&gt;A lot of products can issue a token. Far fewer can answer what happens when the employee leaves, the contractor finishes, or an admin account is rotated after an incident. If your “governance” tool cannot revoke trust, it is only recording history. It is not governing access.&lt;/p&gt;

&lt;p&gt;The current codebase also scopes trust by workspace. Registration can be disabled. Invites stay within the admin’s own workspace. Self-registration creates a workspace and an admin together. That means the unit of control is not “the entire installation” in the abstract. It is the workspace that the organization actually wants to isolate.&lt;/p&gt;

&lt;p&gt;That distinction matters in real enterprise environments.&lt;br&gt;
A platform can be installed once and still serve multiple operational boundaries:&lt;/p&gt;

&lt;p&gt;one product team&lt;br&gt;
one regulated department&lt;br&gt;
one customer environment&lt;br&gt;
one contractor group&lt;br&gt;
one air-gapped deployment&lt;br&gt;
If the workspace boundary is weak, everything else gets blurry. Search becomes too broad. Audit export becomes too noisy. Access control turns into a guessing game. The codebase avoids that by making workspace ownership explicit and tying tokens to workspace scope.&lt;/p&gt;

&lt;p&gt;There is another layer here that is easy to miss: the transport and surface hardening around the app.&lt;/p&gt;

&lt;p&gt;The backend adds rate limits, body-size limits, trusted-host checks, CORS rules, and security headers. That sounds mundane, but enterprise software is mostly mundane in the places that matter. Real systems fail because of overly broad trust, not because of one dramatic exploit. Rate limiting on auth endpoints, header hardening, and explicit host validation are all the kind of controls that make self-hosted software survivable in real environments.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>opensource</category>
      <category>security</category>
      <category>discuss</category>
    </item>
    <item>
      <title>From "Who Wrote This?" to "Provenance, Actioned": Making AI-origin code obvious during review</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Fri, 29 May 2026 07:21:22 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/from-who-wrote-this-to-provenance-actioned-making-ai-origin-code-obvious-during-review-531a</link>
      <guid>https://dev.to/pn_28428886923dfc665/from-who-wrote-this-to-provenance-actioned-making-ai-origin-code-obvious-during-review-531a</guid>
      <description>&lt;p&gt;TL;DR: The most useful provenance is actionable provenance. Instead of storing prompts like a dusty audit log, surface them where decision-makers work: the code review. Recent UX and correlation work in LineageLens — sidebar captures, drag/drop, click-to-insert, and a confidence engine — demonstrate how provenance can shorten review cycles and reduce reverts.&lt;br&gt;
The problem (why it matters)&lt;br&gt;
By 2026, AI is a first-class development tool. Good suggestions become accepted edits, then commits. When reviewers see unfamiliar code they ask the obvious questions: who wrote this, why was it accepted, and was it audited? Git blame shows an author, but not the conversational context that generated the code. That missing context causes three predictable costs:&lt;br&gt;
Time to reproduce: reviewers re-run prompts or attempt to reproduce edits.&lt;br&gt;
Conservative reverts: unknown edits get reverted, losing useful fixes.&lt;br&gt;
Risk hiding: sensitive changes slip through without proper checks.&lt;br&gt;
What "actionable provenance" looks like&lt;br&gt;
Actionable provenance answers reviewer questions immediately:&lt;br&gt;
Who/what produced this block (adapter + model)&lt;br&gt;
The original prompt text&lt;br&gt;
Confidence that the prompt maps to the inserted code&lt;br&gt;
Quick actions: insert into the editor, copy prompt to PR comment, or open the capturing session&lt;br&gt;
Minimal latency, clear UI, and one-click actions are the difference between “archival” and “actionable”.&lt;br&gt;
Recent product signals that make it practical (evidence from the repo)&lt;br&gt;
Drag-and-drop + click-to-insert (sidebar improvements): let reviewers place the original generated block into a temporary editor buffer or paste the prompt into the PR comment box.&lt;br&gt;
Confidence engine and dynamic routing: correlation scores and better adapter matching reduce false positives so reviewers can rely on the provenance instead of treating it as noise.&lt;br&gt;
UX fixes (trash/clear buttons, reorder, inline hover actions): small changes that keep the capture panel usable during real reviews.&lt;br&gt;
See the architecture summary in the repo: architecture.md:1 and the product README (README.md:1).&lt;br&gt;
Concrete workflow: a reviewer’s day with actionable provenance&lt;br&gt;
PR opens. Reviewer scans diffs.&lt;br&gt;
A capture badge is shown next to changed hunks indicating "Provenance: available — confidence 88%".&lt;br&gt;
Click: the capture sidebar opens to the exact prompt, model, and surrounding context snapshot.&lt;br&gt;
Action buttons:&lt;br&gt;
"Insert at cursor" — drop the generated block into a temp editor to run tests locally.&lt;br&gt;
"Copy prompt" — paste into a PR comment template to ask follow-ups.&lt;br&gt;
"Annotate PR" — append an auto-formatted provenance note (prompt, model, confidence).&lt;br&gt;
Outcome: reviewer spends &amp;lt;5 minutes to triage instead of 30–120.&lt;/p&gt;

&lt;p&gt;UX trade-offs and governance constraints&lt;br&gt;
Confidence thresholds: too low → noisy provenance, too high → missed attributions. Tune by starting conservative (show medium/high only) and lower threshold based on false-negative feedback.&lt;br&gt;
Privacy and storage: prompts can be sensitive. Default to workspace-local storage and make PR annotation an explicit reviewer action.&lt;br&gt;
Reviewer training: a short guideline (one paragraph) in your PR template — "If provenance shows 'high confidence', prefer triage over revert" — makes a measurable difference.&lt;br&gt;
Quick checklist to evaluate your team's readiness&lt;br&gt;
Are prompts stored in your control plane or vendor logs? If not, you may lack necessary evidence.&lt;br&gt;
Can you surface provenance inline in PRs or via your editor? If not, archival logs won't help reviewers.&lt;br&gt;
Do you have a workflow for annotating PRs with provenance? If not, create a two-line PR template snippet now.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>security</category>
      <category>discuss</category>
      <category>news</category>
    </item>
    <item>
      <title>How LineageLens routes LLM requests for cost savings — without losing provenance</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Thu, 28 May 2026 05:45:32 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/how-lineagelens-routes-llm-requests-for-cost-savings-without-losing-provenance-4oo6</link>
      <guid>https://dev.to/pn_28428886923dfc665/how-lineagelens-routes-llm-requests-for-cost-savings-without-losing-provenance-4oo6</guid>
      <description>&lt;p&gt;Problem: LLM usage multiplies cost and variance across models. Teams want cheaper defaults but must keep an auditable trail of what model was used for each applied edit.&lt;br&gt;
Approach: We added deterministic request classification (classify_request), a backend-backed routing policy cache, and an in-proxy rewrite that records every routing decision into the provenance payload.&lt;br&gt;
Implementation notes&lt;br&gt;
Classifier: classify_request lives in classifier.py and evaluates tools/functions, prompt size, system keywords, and code fences to return simple|standard|complex.&lt;br&gt;
Policy cache: routing_cache.py fetches workspace-level RoutingPolicy from the backend and refreshes in the background; use get_policy(workspace_id, provider) to read a policy at request-time.&lt;br&gt;
Proxy integration: the proxy calls the classifier, consults the cache, rewrites the outbound model when a policy is enabled, and attaches the decision to every pending edit so the backend stores provenance_records.routing_decision — see proxy.py.&lt;br&gt;
Savings estimate: pricing.py contains the static pricing table and estimate_savings() used for dashboard cards like “AI Cost Saved by Routing (30d)”.&lt;br&gt;
Tests: Run pytest targets in test_routing.py and test_routing_integration.py to validate classification, mapping, and savings calculation.&lt;br&gt;
Operational considerations&lt;br&gt;
V1 intentionally avoids cross-provider rewrites and automatic fallbacks — that keeps timing and correlation simpler for audit logs.&lt;br&gt;
Policy propagation is cached; policy edits take up to the cache TTL to reach every proxy instance.&lt;br&gt;
Watch for parity issues: rewrites may change model behavior (latency/quality). Evaluate on shadow traffic or enable routing only for low-risk simple tiers first.&lt;br&gt;
Practical takeaway: Flip on routing for simple tier first, measure savings via the backend card, and expand mappings once you validate quality parity.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>security</category>
      <category>discuss</category>
      <category>agents</category>
    </item>
    <item>
      <title>How teams can add a custom LineageLens adapter — a practical, code-free guide</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Wed, 27 May 2026 10:41:37 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/how-teams-can-add-a-custom-lineagelens-adapter-a-practical-code-free-guide-4851</link>
      <guid>https://dev.to/pn_28428886923dfc665/how-teams-can-add-a-custom-lineagelens-adapter-a-practical-code-free-guide-4851</guid>
      <description>&lt;p&gt;Problem&lt;br&gt;
Many engineering teams run private LLMs or internal CLI tools that do not emit vendor telemetry. Without an explicit adapter, AI-generated edits appear in the editor but lack prompt, model, and session context. That gap reduces the usefulness of provenance for audits and PR reviews.&lt;br&gt;
High-level approach&lt;/p&gt;

&lt;p&gt;Define the signals you can trust: header signatures, stable user‑agent tokens, unique payload fields, or session identifiers. Rank them by trustworthiness.&lt;br&gt;
Design evidence rules: for each detectable signal, document a short justification, the expected field, and an evidence weight (high for signed headers, lower for heuristics).&lt;br&gt;
Implement a conservative detection rule that returns a detection only when combined evidence passes a clear threshold.&lt;br&gt;
Register the adapter so the registry can consider it; ensure its declared priority (ordering) sits ahead of fallback heuristics but after core, high-trust adapters.&lt;br&gt;
Validate with unit fixtures and an end‑to‑end replay through the local proxy and dashboard.&lt;br&gt;
Detection principles&lt;/p&gt;

&lt;p&gt;Precision over recall: prefer missing a match to declaring an incorrect attribution.&lt;br&gt;
Explainability: every detection must carry evidence that an auditor can inspect.&lt;br&gt;
Performance: keep per-detection logic cheap; avoid heavy parsing in the hot path.&lt;br&gt;
Redaction: strip or hash any secrets before saving evidence.&lt;br&gt;
Testing and validation&lt;/p&gt;

&lt;p&gt;Unit tests: supply positive and negative recorded request/response fixtures to assert match/no-match behavior and to lock down the confidence threshold.&lt;br&gt;
Integration test: replay a recorded proxy request/response along with the inserted text through a local quickstart and confirm the dashboard displays your adapter name and evidence.&lt;br&gt;
Canary rollout: enable the adapter in logging-only mode for a short period, measure false positives, and adjust weights before enabling in alerts or PR gates.&lt;br&gt;
Operational checklist&lt;/p&gt;

&lt;p&gt;Document the adapter’s evidence rules and ordering so future maintainers can tune it.&lt;br&gt;
Ensure stored evidence is redacted or hashed as needed for compliance.&lt;br&gt;
Add telemetry around low‑confidence matches for manual labeling and continuous improvement.&lt;br&gt;
Add CI guards: regression tests preventing accidental broadening of detection heuristics.&lt;br&gt;
Practical takeaway&lt;br&gt;
Custom adapters let teams capture private tools with auditability — but only if they are designed for precision, documented for explainability, and validated with unit and replay tests. Start conservative, collect samples, and iterate on evidence weights rather than broadening heuristics.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>tooling</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>What enterprises are actually buying when they adopt LineageLens</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Tue, 26 May 2026 06:29:16 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/what-enterprises-are-actually-buying-when-they-adopt-lineagelens-4efd</link>
      <guid>https://dev.to/pn_28428886923dfc665/what-enterprises-are-actually-buying-when-they-adopt-lineagelens-4efd</guid>
      <description>&lt;p&gt;The mistake is to think enterprises adopt LineageLens because they want more visibility into AI. They already have too many tools that show “usage.” What they actually need is a record they can trust six months later, when a bug, a review dispute, or an audit question comes back to the same code.&lt;/p&gt;

&lt;p&gt;Git blame is not enough. Vendor logs are not enough. A commit history tells you who changed a file. It does not tell you what was asked, which model responded, whether the change was applied, or whether the same pattern showed up in another file next week. That is the gap LineageLens fills.&lt;/p&gt;

&lt;p&gt;The reason the product is split into Base, Lite, Plus, and Max is that enterprise adoption is not one decision. It is a sequence of control boundaries.&lt;/p&gt;

&lt;p&gt;Base is the local record. It is the easiest place to start because it gives an engineer a private provenance trail without asking for a backend, a cluster, or a policy debate. That makes it useful for pilots, skeptics, and teams that want proof before process.&lt;/p&gt;

&lt;p&gt;Lite is the shared capture layer. This is where the transparent proxy starts paying off. If your team uses terminal-based tools, the proxy can see the prompt and the generated edit. That matters because an enterprise cannot govern what it cannot reconstruct. File-only capture is useful, but prompt capture is what makes provenance actionable.&lt;/p&gt;

&lt;p&gt;Plus and Max are where the enterprise conversation really begins. That is where auth, permissions, retention, search, review workflows, and identity integrations such as SSO belong. If you are going to store provenance data for an entire team, you need to decide who can see it, how long to keep it, and what happens when a policy says a record needs review or deletion. Those are backend concerns, not editor concerns.&lt;/p&gt;

&lt;p&gt;That separation is the design point. LineageLens keeps the capture layer close to the developer and the governance layer close to the org. The proxy is transparent pass-through. The backend is where policy lives. The data model stays consistent across both, so a record captured locally can still be part of a larger enterprise audit trail later.&lt;/p&gt;

&lt;p&gt;That also makes the tradeoffs honest. Base is the lightest path, but it cannot give you prompt-level provenance everywhere. Lite is the quickest way to centralize team records, but it still assumes you are willing to point tools at a local proxy. Plus and Max add infrastructure, but they give you the controls that enterprise buyers actually ask for: access control, retention, export, policy, and the ability to keep everything on your side of the boundary.&lt;/p&gt;

&lt;p&gt;If you want a practical rollout pattern, it looks like this:&lt;/p&gt;

&lt;p&gt;Prove the record exists with Base on a small group.&lt;br&gt;
Expand to Lite for one team that already uses AI tools heavily.&lt;br&gt;
Add governance controls once security and platform teams ask for them.&lt;br&gt;
Use Max only when cross-file lineage and audit depth are real requirements.&lt;br&gt;
The conclusion is simple. Enterprises are not buying a prettier dashboard. They are buying a way to make AI-generated code attributable, retainable, and reviewable without sending the underlying prompts to a third-party cloud. That is a different product category, and it is why LineageLens is structured the way it is.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>devops</category>
      <category>opensource</category>
      <category>startup</category>
    </item>
    <item>
      <title>AI provenance systems need one shared record contract</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Mon, 25 May 2026 04:59:29 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/ai-provenance-systems-need-one-shared-record-contract-55h7</link>
      <guid>https://dev.to/pn_28428886923dfc665/ai-provenance-systems-need-one-shared-record-contract-55h7</guid>
      <description>&lt;p&gt;AI provenance systems usually fail in a boring way: the data exists, but different parts of the product disagree about what it means. In LineageLens, the extension captures the insertion, the backend stores the provenance record, and the MCP server answers questions from that same record. If one layer renames a field, drops a status, or normalizes a value differently, the record still exists, but the product stops telling one coherent story.&lt;br&gt;
The fix is to treat provenance as a contract, not a payload. Canonical fields matter: prompt, model, tool, file path, line range, capture method, risk score, and outcome. The outcome is especially important because applied, rejected, and errored are not the same thing. Neither are full, metadata_only, tunnel_only, and unavailable. Those labels drive dashboards, exports, and assistant answers.&lt;/p&gt;

&lt;p&gt;The practical takeaway is simple: version the schema, validate at the boundary, and test every surface against the same fixture. If you are building any internal system where capture, storage, and assistant access all sit on top of the same record, consistency is a feature, not a nice-to-have.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>opensource</category>
      <category>automation</category>
    </item>
    <item>
      <title>Why AI provenance tools fail when their layers disagree</title>
      <dc:creator>Praveen</dc:creator>
      <pubDate>Sun, 24 May 2026 05:05:12 +0000</pubDate>
      <link>https://dev.to/pn_28428886923dfc665/why-ai-provenance-tools-fail-when-their-layers-disagree-306j</link>
      <guid>https://dev.to/pn_28428886923dfc665/why-ai-provenance-tools-fail-when-their-layers-disagree-306j</guid>
      <description>&lt;p&gt;Most people think the hard part of an AI provenance tool is capturing the prompt or parsing the model output. That is only the first layer of the problem. The more serious failure appears after the system has multiple moving parts: an editor extension, a backend, and an assistant-facing API all trying to describe the same event.&lt;/p&gt;

&lt;p&gt;That is where trust starts to break.&lt;/p&gt;

&lt;p&gt;A provenance system is supposed to answer a simple question: what happened to this change, and how did it get here? But once the extension, backend, and MCP server all participate in that answer, any mismatch in response shape, error handling, or mode-specific behavior becomes user-visible. A redirect that is helpful during setup can become opaque during login. A workspace response that is technically correct can still be formatted incorrectly for the MCP layer. A Lite-only feature gate can look like an authentication failure if the error mapping is too generic. None of those are parsing bugs. They are consistency bugs.&lt;/p&gt;

&lt;p&gt;This is why contract drift matters so much in AI infrastructure tools. The system is not just moving data. It is narrating reality across surfaces. If one surface says “setup needed,” another says “login failed,” and a third says “feature unavailable,” the user no longer knows which layer to believe.&lt;/p&gt;

&lt;p&gt;In LineageLens, the recent fixes were all about reducing that kind of ambiguity. Fresh installs now get a real auth response instead of an opaque redirect. The MCP server matches the workspace response shape that the backend actually returns. Lite-mode 403s surface the backend’s upgrade message instead of a misleading auth template. Ingest warnings and duplicate storage status now reach the user instead of disappearing silently. Even the token lifecycle became more robust by supporting refresh before falling back to password re-login.&lt;/p&gt;

&lt;p&gt;That is the practical lesson: once a product spans multiple clients, you need contract discipline at the boundaries. Not just tests for the core logic, but tests for the truth that each layer tells the next one.&lt;/p&gt;

&lt;p&gt;For AI provenance tools, that truth is the product. If the extension, backend, and MCP server disagree, the audit trail becomes noisy instead of useful. And if the audit trail is noisy, the whole category loses value.&lt;/p&gt;

&lt;p&gt;The fix is not glamorous. It is boundary work: stable payloads, mode-aware errors, better token handling, and fewer assumptions about what another layer “probably meant.” But that is exactly the kind of work that makes a provenance system trustworthy.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>opensource</category>
      <category>security</category>
    </item>
  </channel>
</rss>
