DEV Community

Praveen
Praveen

Posted on

How LineageLens Scores Risk on Every AI Code Insertion — and Why Missing the Prompt Makes It Worse

You accepted a 35-line AI suggestion into auth.py. It looks clean. No red flags. Your linter passes.

LineageLens has already scored it 54 out of 100.

That number does not come from a model, a semantic classifier, or vibes. It comes from a small set of deterministic rules applied at the moment the record is ingested — and understanding how those rules compose tells you something useful about what "AI code risk" actually means in practice.

The architecture: two functions, two moments

The risk scoring in LineageLens lives entirely in lineagelens-backend/app/services/risk_service.py. It exposes exactly two public functions:


python
def compute_risk_score(
    inserted_code: str,
    prompt_messages: object | None = None,
    model_name: str | None = None,
    file_path: str | None = None,
) -> tuple[int, list[str]]:
    """Ingest-time. All tiers. Raw fields."""


def compute_risk_from_record(
    record: dict[str, Any],
    is_agentic: bool = False,
) -> tuple[int, list[str], set[str]]:
    """Insights-time. Plus/Max only. Serialized record."""
The split is not cosmetic. compute_risk_score runs at ingest time — the moment a provenance record arrives at the backend. It must be fast, stateless, and operate only on what is available right now.

compute_risk_from_record runs at insights time — when a user queries the insights endpoint. By then, correlation confidence is computed and the record is fully enriched. This path is gated by require_non_solo — Lite's SQLite backend never reaches it.

The base score and what it means
Every record starts at 12 before any signal fires. A score of exactly 12 means the quietest possible record — clean code, unremarkable file path, small insertion, prompt captured. That is the floor.

Code pattern rules
Six rule groups apply to the inserted block via re.search with re.IGNORECASE:

Python
_CODE_PATTERN_RULES = [
    ([r"api[_-]?key", r"access[_-]?token", r"private[_-]?key"],
     28, "Credential-like material in the block.", "security"),

    ([r"\beval\b", r"\bFunction\s*\(", r"new Function",
      r"\bexec\s*\(", r"\bexecSync\s*\("],
     24, "Dynamic code execution present.", "security"),

    ([r"\bsubprocess\.", r"\bos\.system\b",
      r"\bchild_process\b", r"\bspawn(?:Sync)?\b"],
     22, "Shell or process execution introduced.", "security"),

    ([r"dangerouslySetInnerHTML", r"\binnerHTML\s*="],
     20, "Unsafe DOM mutation patterns.", "security"),

    ([r"\bSELECT\s+.+\bFROM\b", r"\bINSERT\s+INTO\b",
      r"\bUPDATE\s+\w+\s+SET\b", r"\bDELETE\s+FROM\b"],
     16, "Raw SQL in the block.", "reliability"),

    ([r"\bpassword\b", r"\btoken\b", r"\bauth\b", r"\bcredential\b"],
     12, "Auth/credential handling present.", "compliance"),
]
The deltas rank by failure-mode severity. Credentials score +28 because AI-generated code containing credential-like strings has a disproportionate blast radius. Dynamic execution scores +24 because eval() in AI-generated code has appeared in supply chain incidents repeatedly.

Rules are not mutually exclusive. Both can fire on the same block. The score accumulates. Final result is capped at min(score, 100), but the reasons list preserves every contributing signal.

File path rules
Python
_FILE_PATTERN_RULES = [
    ([r"auth", r"security", r"permission", r"oauth",
      r"token", r"secret", r"credential"],
     14, "Security-sensitive file path.", "compliance"),

    ([r"payment", r"billing", r"invoice", r"ledger", r"finance"],
     14, "Financially sensitive file path.", "compliance"),
]
These fire on destination, not content. A clean utility function inserted into src/routes/auth.py picks up +14 purely because of where it landed. This is deliberate.

Line count
Python
if net_lines >= 80:
    score += 18
elif net_lines >= 30:
    score += 10
Larger AI-generated blocks are harder to review thoroughly. The signal acknowledges that reality.

Working through the example
Scenario: Claude Code inserts 35 lines into src/routes/auth.py that include an eval() call to dynamically dispatch a handler. Proxy is running, prompt was captured.

Base score: 12

File path (auth pattern): +14

Lines in [30, 80) range: +10

eval() code pattern: +24

Total: 60 → HIGH

The missing-prompt penalty — and why it is the most important signal
Now remove the proxy. Same code, same file, same 35 lines — but prompt_messages=None.

Python
if prompt_messages is None:
    score += 24
    reason_set.add(
        "Prompt capture is missing, which reduces auditability "
        "and reviewer confidence."
    )
Previous score: 60

Missing prompt: +24

New total: 84 → borderline CRITICAL

Same code. Same file. 24 points higher because the audit trail is incomplete.

The score is not measuring code danger in isolation. It is measuring auditable risk — the risk that something went wrong and you would not be able to reconstruct what happened. Without the prompt you cannot detect:

A model that exceeded its scope

Prompt injection via context

Scope creep from an agentic session

The +24 penalty is the scoring engine's way of saying: you are less auditable here, and that is a risk.

Easy Mode (extension only, no proxy) → scores higher for the same code than Power Mode (proxy running, prompt captured). That gap is intentional. The architecture makes the tradeoff legible through a number.

Bash
# Step 1: install extension (free, zero config)
code --install-extension karnatipraveen.lineagelens

# Step 2: add prompt capture (eliminates the +24 penalty)
git clone [https://github.com/karnati-praveen/lineagelens](https://github.com/karnati-praveen/lineagelens)
bash lineagelens-scripts/quickstart-lite.sh
export ANTHROPIC_BASE_URL=http://localhost:8788
Insights-time signals (Plus/Max only)
compute_risk_from_record adds signals not available at ingest:

Correlation confidence:

Python
if correlation_confidence < 0.4:
    score += 16
elif correlation_confidence < 0.65:
    score += 8
Agentic session:

Python
if is_agentic:
    score += 6
These require the insights endpoint because correlation confidence is downstream of ingest — blocking ingest on that would add 20–200ms of latency to every capture.

What this is not
A score of 80 does not mean the code has a vulnerability. A score of 20 does not mean it is safe. This is heuristic risk scoring, not SAST. Pair it with Snyk or Semgrep. What it provides is a provenance-aware signal: one that factors in not just what was written, but where it landed, how much was generated at once, and whether the intent behind it is recoverable.

That last factor — auditability — is what makes this different from a linter rule.

What pattern would you add to the detection list? I'm particularly curious about lightweight CFG analysis at ingest time — detecting whether a flagged eval() is reachable from an untrusted input path. The precision gain is real; the question is whether 20–40ms of analysis per insertion is acceptable in the ingest path.
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
pn_28428886923dfc665 profile image
Praveen

Drop the Questions below !!