DEV Community

Ajay Devineni
Ajay Devineni

Posted on

MCP Security in Action: Decision-Lineage Observability

Traditional observability tells you what broke.
Agentic observability must tell you why the agent decided to break it — before the decision cascades into production.
After sharing the risk-classification framework (Part 1) and the Cloud Security Alliance's Six Pillars of MCP Security (Part 2), the obvious next question was: how do we actually observe and audit why an agent made a particular change?
This post covers the decision-lineage architecture I shipped in a regulated cloud-native environment over the past two weeks, and the results.

The Gap in Current Agentic AI Security
When an AI agent proposes a Terraform change, an Auto Scaling adjustment, or a firewall rule modification — do you know:

Why it made that specific decision?
Which context it was operating from?
Whether that context was clean (i.e., not poisoned or injected)?

If your answer is "we have prompt logs" — you're one prompt-injection incident away from a very difficult post-mortem.
Prompt logs capture what was said. Decision lineage captures why the agent chose to act, at every step of the reasoning chain.

What Decision-Lineage Observability Actually Looks Like
The reasoning chain I instrument:
Goal → Context ingestion → Tool selection → Proposed action → Policy check → Execute / Quarantine
For each step, we capture:

The deterministic trace ID tying the step to its session and goal
A hash of the context at that moment (tamper-evidence)
The tool selected and the reasoning for selecting it
The proposed action and its blast-radius classification
The policy check result
Implementation: A Thin Layer on Top of OpenTelemetry
No new infrastructure. This wraps your existing observability stack.
Step 1: Wrap Every MCP Tool Call with a Deterministic Trace ID
pythonimport hashlib
import time
from dataclasses import dataclass

@dataclass
class LineageTraceId:
session_id: str
goal_hash: str
sequence: int
timestamp_ns: int

def __str__(self):
    payload = f"{self.session_id}:{self.goal_hash}:{self.sequence}:{self.timestamp_ns}"
    return hashlib.sha256(payload.encode()).hexdigest()[:16]
Enter fullscreen mode Exit fullscreen mode

This ID is deterministic — you can reconstruct it from known inputs during incident investigation, even if the log store is unreachable.
Step 2: Write Reasoning Steps to an Append-Only Store
pythondef write_lineage_record(trace_id: str, record: dict):
s3.put_object(
Bucket=LINEAGE_BUCKET,
Key=f"decision-lineage/{date_prefix}/{trace_id}.json",
Body=json.dumps({
"trace_id": trace_id,
"timestamp": datetime.utcnow().isoformat(),
"reasoning_chain": record["reasoning_chain"],
"tool_selected": record["tool_selected"],
"proposed_action": record["proposed_action"],
"context_hash": record["context_hash"],
"blast_radius_tier": record["blast_radius_tier"],
"policy_result": record["policy_result"],
}),
)
S3 + Glacier with Object Lock (WORM) for 90-day retention. The immutability is the point — a lineage store you can modify after the fact is a liability, not an asset.
Step 3: Run Three Parallel Policy Checks Before Execution
pythonasync def run_policy_checks(proposed_action, context, tool_output):
results = await asyncio.gather(
check_blast_radius(proposed_action, context["approved_tier"]),
check_behavioral_consistency(context["tool_name"], tool_output, context["hash"]),
check_context_integrity(context, tool_output),
)
return {
"passed": all(r[0] for r in results),
"checks": {
"blast_radius": results[0],
"behavioral_consistency": results[1],
"context_integrity": results[2],
}
}
Blast radius check: Does the proposed action match the approved tier for this agent session?
Behavioral consistency check: Is the tool output consistent with historical baselines for this context? Significant deviations are flagged — they can indicate tool compromise or context drift.
Context integrity check: Pattern matching against known prompt injection signatures across the full context + tool output payload.
All three run in parallel (async). Overhead is under 50ms for most checks.
Step 4: Safe Degradation on Any Failure
pythondef handle_policy_result(policy_result, proposed_action, trace_id):
if policy_result["passed"]:
attach_lineage_to_pr(trace_id, proposed_action) # Attach "why" to the change record
execute_action(proposed_action)
else:
quarantine_action(proposed_action, trace_id)
create_human_review_ticket(action=proposed_action, trace_id=trace_id)
return safe_degradation_response(trace_id)
Quarantined changes are never silently dropped — they create a human review ticket with the full lineage record attached. The agent receives a safe fallback response explaining why the action was held.

Results After a 2-Week Pilot
MetricResultAI-proposed changes with full "why" traceability100%Poisoned-tool incidents caught pre-execution3SRE on-call pages–40%Compliance audit query time~3 days → ~2 hours (self-serve)
The SRE page reduction was unexpected. Because every change now carries its reasoning chain, on-call engineers spend far less time reconstructing why something changed during incident response. The agent essentially writes its own incident context in advance.
The compliance improvement was the immediate business win — the audit team can query the lineage store directly via a simple CLI instead of opening a ticket with engineering.

The Three Lessons That Surprised Me

  1. Immutability is your integrity primitive, not a compliance checkbox. A lineage store that can be modified is a liability. The moment you apply WORM constraints, the audit value multiplies because any tampering becomes detectable.
  2. Context hashing > content logging. Logging the full context at each step is expensive and creates its own data privacy surface. Hashing the context gives you tamper-evidence without logging sensitive payloads. You only need to store the full context for flagged events.
  3. The lineage layer becomes your incident response system. Build the query interface for operators first, compliance second. If it's hard for SREs to use during an incident, it won't be used — and the value disappears.

What's Coming: Open-Source Reference Implementation
Next week I'll publish the reference implementation. It will include:

Drop-in OpenTelemetry instrumentation for common MCP-compatible agent frameworks
Pre-built policy checks (blast radius classification, behavioral baseline builder, injection pattern library)
CDK + Terraform modules for the storage/eventing infrastructure
A query CLI designed for operators (not just compliance teams)

It's designed to be framework-agnostic — if your agent emits OpenTelemetry spans, you can instrument it.

Where Are You on This?
If you're running agentic AI against production infrastructure — even in shadow mode — what's your current approach to decision auditability?
Specifically curious about:

Are you correlating agent decisions to change records (PRs, CRs, tickets)?
How are you handling prompt injection detection at the tool boundary?
What does "audit-ready" look like in your compliance context?

Drop your approach in the comments. This is an area where the community is still building the playbook, and I'd rather share notes than solve it in isolation.

Part 1: Risk Classification Framework for MCP Tool Calls
Part 2: The Cloud Security Alliance's Six Pillars of MCP Security
Part 3: Decision-Lineage Observability (this post)

Top comments (0)