How LineageLens Scores Risk on Every AI Code Insertion — and Why Missing the Prompt Makes It Worse

#opensource #discuss #news #showdev

You accepted a 35-line AI suggestion into auth.py. It looks clean. No red flags. Your linter passes.

LineageLens has already scored it 54 out of 100.

That number does not come from a model, a semantic classifier, or vibes. It comes from a small set of deterministic rules applied at the moment the record is ingested — and understanding how those rules compose tells you something useful about what "AI code risk" actually means in practice.

The architecture: two functions, two moments

The risk scoring in LineageLens lives entirely in lineagelens-backend/app/services/risk_service.py. It exposes exactly two public functions:


python
def compute_risk_score(
    inserted_code: str,
    prompt_messages: object | None = None,
    model_name: str | None = None,
    file_path: str | None = None,
) -> tuple[int, list[str]]:
    """Ingest-time. All tiers. Raw fields."""


def compute_risk_from_record(
    record: dict[str, Any],
    is_agentic: bool = False,
) -> tuple[int, list[str], set[str]]:
    """Insights-time. Plus/Max only. Serialized record."""
The split is not cosmetic. compute_risk_score runs at ingest time — the moment a provenance record arrives at the backend. It must be fast, stateless, and operate only on what is available right now.

compute_risk_from_record runs at insights time — when a user queries the insights endpoint. By then, correlation confidence is computed and the record is fully enriched. This path is gated by require_non_solo — Lite's SQLite backend never reaches it.

The base score and what it means
Every record starts at 12 before any signal fires. A score of exactly 12 means the quietest possible record — clean code, unremarkable file path, small insertion, prompt captured. That is the floor.

Code pattern rules
Six rule groups apply to the inserted block via re.search with re.IGNORECASE:

Python
_CODE_PATTERN_RULES = [
    ([r"api[_-]?key", r"access[_-]?token", r"private[_-]?key"],
     28, "Credential-like material in the block.", "security"),

    ([r"\beval\b", r"\bFunction\s*\(", r"new Function",
      r"\bexec\s*\(", r"\bexecSync\s*\("],
     24, "Dynamic code execution present.", "security"),

    ([r"\bsubprocess\.", r"\bos\.system\b",
      r"\bchild_process\b", r"\bspawn(?:Sync)?\b"],
     22, "Shell or process execution introduced.", "security"),

    ([r"dangerouslySetInnerHTML", r"\binnerHTML\s*="],
     20, "Unsafe DOM mutation patterns.", "security"),

    ([r"\bSELECT\s+.+\bFROM\b", r"\bINSERT\s+INTO\b",
      r"\bUPDATE\s+\w+\s+SET\b", r"\bDELETE\s+FROM\b"],
     16, "Raw SQL in the block.", "reliability"),

    ([r"\bpassword\b", r"\btoken\b", r"\bauth\b", r"\bcredential\b"],
     12, "Auth/credential handling present.", "compliance"),
]
The deltas rank by failure-mode severity. Credentials score +28 because AI-generated code containing credential-like strings has a disproportionate blast radius. Dynamic execution scores +24 because eval() in AI-generated code has appeared in supply chain incidents repeatedly.

Rules are not mutually exclusive. Both can fire on the same block. The score accumulates. Final result is capped at min(score, 100), but the reasons list preserves every contributing signal.

File path rules
Python
_FILE_PATTERN_RULES = [
    ([r"auth", r"security", r"permission", r"oauth",
      r"token", r"secret", r"credential"],
     14, "Security-sensitive file path.", "compliance"),

    ([r"payment", r"billing", r"invoice", r"ledger", r"finance"],
     14, "Financially sensitive file path.", "compliance"),
]
These fire on destination, not content. A clean utility function inserted into src/routes/auth.py picks up +14 purely because of where it landed. This is deliberate.

Line count
Python
if net_lines >= 80:
    score += 18
elif net_lines >= 30:
    score += 10
Larger AI-generated blocks are harder to review thoroughly. The signal acknowledges that reality.

Working through the example
Scenario: Claude Code inserts 35 lines into src/routes/auth.py that include an eval() call to dynamically dispatch a handler. Proxy is running, prompt was captured.

Base score: 12

File path (auth pattern): +14

Lines in [30, 80) range: +10

eval() code pattern: +24

Total: 60 → HIGH

The missing-prompt penalty — and why it is the most important signal
Now remove the proxy. Same code, same file, same 35 lines — but prompt_messages=None.

Python
if prompt_messages is None:
    score += 24
    reason_set.add(
        "Prompt capture is missing, which reduces auditability "
        "and reviewer confidence."
    )
Previous score: 60

Missing prompt: +24

New total: 84 → borderline CRITICAL

Same code. Same file. 24 points higher because the audit trail is incomplete.

The score is not measuring code danger in isolation. It is measuring auditable risk — the risk that something went wrong and you would not be able to reconstruct what happened. Without the prompt you cannot detect:

A model that exceeded its scope

Prompt injection via context

Scope creep from an agentic session

The +24 penalty is the scoring engine's way of saying: you are less auditable here, and that is a risk.

Easy Mode (extension only, no proxy) → scores higher for the same code than Power Mode (proxy running, prompt captured). That gap is intentional. The architecture makes the tradeoff legible through a number.

Bash
# Step 1: install extension (free, zero config)
code --install-extension karnatipraveen.lineagelens

# Step 2: add prompt capture (eliminates the +24 penalty)
git clone [https://github.com/karnati-praveen/lineagelens](https://github.com/karnati-praveen/lineagelens)
bash lineagelens-scripts/quickstart-lite.sh
export ANTHROPIC_BASE_URL=http://localhost:8788
Insights-time signals (Plus/Max only)
compute_risk_from_record adds signals not available at ingest:

Correlation confidence:

Python
if correlation_confidence < 0.4:
    score += 16
elif correlation_confidence < 0.65:
    score += 8
Agentic session:

Python
if is_agentic:
    score += 6
These require the insights endpoint because correlation confidence is downstream of ingest — blocking ingest on that would add 20–200ms of latency to every capture.

What this is not
A score of 80 does not mean the code has a vulnerability. A score of 20 does not mean it is safe. This is heuristic risk scoring, not SAST. Pair it with Snyk or Semgrep. What it provides is a provenance-aware signal: one that factors in not just what was written, but where it landed, how much was generated at once, and whether the intent behind it is recoverable.

That last factor — auditability — is what makes this different from a linter rule.

What pattern would you add to the detection list? I'm particularly curious about lightweight CFG analysis at ingest time — detecting whether a flagged eval() is reachable from an untrusted input path. The precision gain is real; the question is whether 20–40ms of analysis per insertion is acceptable in the ingest path.

Top comments (3)

HARD IN SOFT OUT • Jun 8

The missing‑prompt penalty is the most underrated signal in AI code safety

I've been following the "auditability gap" for a while – the fact that two identical code blocks can have completely different risk profiles depending on whether you know why they were written. Turning that intuition into a deterministic +24 point penalty is clever. It's not saying "missing prompt = bad code." It's saying "missing prompt = you can't prove it's safe."

The base score of 12 for a clean, small, prompt‑captured insertion is a nice psychological anchor. It sets a healthy default instead of starting at 0.

One thing I'm curious about: have you considered a context‑aware multiplier for file paths? For example, a credential pattern in test_auth.py (unit tests) is less risky than the same pattern in production/auth.py. Maybe a _TEST_PATH_PATTERN exception list that subtracts points instead of adding?

Also, the 20–40ms CFG analysis question – I'd vote opt‑in only for ingest time, maybe a config flag. For most users, the current heuristic rules + prompt capture are enough. But for high‑security environments, they'd accept the latency.

Great project. Making the invisible cost of "no prompt" visible as a number is a real contribution to the AI engineering workflow.

Cheers,

Jack

DEV.to/ggle.in

Praveen • Jun 9

This is a really thoughtful read of the tradeoff. The core idea behind the missing-prompt penalty was exactly what you described: not “this code is malicious,” but “the evidentiary surface around this code is weak.” In practice, unverifiable intent becomes its own operational risk once teams start relying on provenance for debugging, review, or compliance.

I also like the idea of contextual path weighting. A credential-like pattern inside a test harness and the same pattern inside a production auth path clearly carry different operational implications, even if the raw heuristic match is identical. I’ve been thinking more about risk as “policy context + evidence quality + code pattern” rather than static signatures alone.

And agreed on CFG analysis. My instinct right now is that deep structural analysis probably belongs behind an opt-in or high-security mode boundary. Most users care more about low-friction observability, while heavily regulated environments are much more willing to trade latency for stronger semantic analysis.

Praveen • Jun 8

Drop the Questions below !!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.