MCP Security in Action: Decision-Lineage Observability

#ai #sre #devops #security

Traditional observability tells you what broke.
Agentic observability must tell you why the agent decided to break it — before the decision cascades into production.
After sharing the risk-classification framework (Part 1) and the Cloud Security Alliance's Six Pillars of MCP Security (Part 2), the obvious next question was: how do we actually observe and audit why an agent made a particular change?
This post covers the decision-lineage architecture I shipped in a regulated cloud-native environment over the past two weeks, and the results.

The Gap in Current Agentic AI Security
When an AI agent proposes a Terraform change, an Auto Scaling adjustment, or a firewall rule modification — do you know:

Why it made that specific decision?
Which context it was operating from?
Whether that context was clean (i.e., not poisoned or injected)?

If your answer is "we have prompt logs" — you're one prompt-injection incident away from a very difficult post-mortem.
Prompt logs capture what was said. Decision lineage captures why the agent chose to act, at every step of the reasoning chain.

What Decision-Lineage Observability Actually Looks Like
The reasoning chain I instrument:
Goal → Context ingestion → Tool selection → Proposed action → Policy check → Execute / Quarantine
For each step, we capture:

The deterministic trace ID tying the step to its session and goal
A hash of the context at that moment (tamper-evidence)
The tool selected and the reasoning for selecting it
The proposed action and its blast-radius classification
The policy check result
Implementation: A Thin Layer on Top of OpenTelemetry
No new infrastructure. This wraps your existing observability stack.
Step 1: Wrap Every MCP Tool Call with a Deterministic Trace ID
pythonimport hashlib
import time
from dataclasses import dataclass

@dataclass
class LineageTraceId:
session_id: str
goal_hash: str
sequence: int
timestamp_ns: int

def __str__(self):
    payload = f"{self.session_id}:{self.goal_hash}:{self.sequence}:{self.timestamp_ns}"
    return hashlib.sha256(payload.encode()).hexdigest()[:16]

This ID is deterministic — you can reconstruct it from known inputs during incident investigation, even if the log store is unreachable.
Step 2: Write Reasoning Steps to an Append-Only Store
pythondef write_lineage_record(trace_id: str, record: dict):
s3.put_object(
Bucket=LINEAGE_BUCKET,
Key=f"decision-lineage/{date_prefix}/{trace_id}.json",
Body=json.dumps({
"trace_id": trace_id,
"timestamp": datetime.utcnow().isoformat(),
"reasoning_chain": record["reasoning_chain"],
"tool_selected": record["tool_selected"],
"proposed_action": record["proposed_action"],
"context_hash": record["context_hash"],
"blast_radius_tier": record["blast_radius_tier"],
"policy_result": record["policy_result"],
}),
)
S3 + Glacier with Object Lock (WORM) for 90-day retention. The immutability is the point — a lineage store you can modify after the fact is a liability, not an asset.
Step 3: Run Three Parallel Policy Checks Before Execution
pythonasync def run_policy_checks(proposed_action, context, tool_output):
results = await asyncio.gather(
check_blast_radius(proposed_action, context["approved_tier"]),
check_behavioral_consistency(context["tool_name"], tool_output, context["hash"]),
check_context_integrity(context, tool_output),
)
return {
"passed": all(r[0] for r in results),
"checks": {
"blast_radius": results[0],
"behavioral_consistency": results[1],
"context_integrity": results[2],
}
}
Blast radius check: Does the proposed action match the approved tier for this agent session?
Behavioral consistency check: Is the tool output consistent with historical baselines for this context? Significant deviations are flagged — they can indicate tool compromise or context drift.
Context integrity check: Pattern matching against known prompt injection signatures across the full context + tool output payload.
All three run in parallel (async). Overhead is under 50ms for most checks.
Step 4: Safe Degradation on Any Failure
pythondef handle_policy_result(policy_result, proposed_action, trace_id):
if policy_result["passed"]:
attach_lineage_to_pr(trace_id, proposed_action) # Attach "why" to the change record
execute_action(proposed_action)
else:
quarantine_action(proposed_action, trace_id)
create_human_review_ticket(action=proposed_action, trace_id=trace_id)
return safe_degradation_response(trace_id)
Quarantined changes are never silently dropped — they create a human review ticket with the full lineage record attached. The agent receives a safe fallback response explaining why the action was held.

Results After a 2-Week Pilot
MetricResultAI-proposed changes with full "why" traceability100%Poisoned-tool incidents caught pre-execution3SRE on-call pages–40%Compliance audit query time~3 days → ~2 hours (self-serve)
The SRE page reduction was unexpected. Because every change now carries its reasoning chain, on-call engineers spend far less time reconstructing why something changed during incident response. The agent essentially writes its own incident context in advance.
The compliance improvement was the immediate business win — the audit team can query the lineage store directly via a simple CLI instead of opening a ticket with engineering.

The Three Lessons That Surprised Me

Immutability is your integrity primitive, not a compliance checkbox. A lineage store that can be modified is a liability. The moment you apply WORM constraints, the audit value multiplies because any tampering becomes detectable.
Context hashing > content logging. Logging the full context at each step is expensive and creates its own data privacy surface. Hashing the context gives you tamper-evidence without logging sensitive payloads. You only need to store the full context for flagged events.
The lineage layer becomes your incident response system. Build the query interface for operators first, compliance second. If it's hard for SREs to use during an incident, it won't be used — and the value disappears.

What's Coming: Open-Source Reference Implementation
Next week I'll publish the reference implementation. It will include:

Drop-in OpenTelemetry instrumentation for common MCP-compatible agent frameworks
Pre-built policy checks (blast radius classification, behavioral baseline builder, injection pattern library)
CDK + Terraform modules for the storage/eventing infrastructure
A query CLI designed for operators (not just compliance teams)

It's designed to be framework-agnostic — if your agent emits OpenTelemetry spans, you can instrument it.

Where Are You on This?
If you're running agentic AI against production infrastructure — even in shadow mode — what's your current approach to decision auditability?
Specifically curious about:

Are you correlating agent decisions to change records (PRs, CRs, tickets)?
How are you handling prompt injection detection at the tool boundary?
What does "audit-ready" look like in your compliance context?

Drop your approach in the comments. This is an area where the community is still building the playbook, and I'd rather share notes than solve it in isolation.

Part 1: Risk Classification Framework for MCP Tool Calls
Part 2: The Cloud Security Alliance's Six Pillars of MCP Security
Part 3: Decision-Lineage Observability (this post)

Top comments (1)

Hermetic Dev • Apr 15 • Edited

Really solid architecture. The three parallel policy checks before execution (blast radius, behavioral consistency, context integrity) map well to how defense-in-depth should work for agentic systems — you're validating the decision independently from validating the authorization.

One pattern I've found useful when implementing similar tamper-evident chains: using the previous entry's HMAC as input to the current entry's HMAC (chained HMACs rather than independent context hashes). This gives you not just per-entry tamper evidence but also ordering evidence — you can prove no entries were inserted, deleted, or reordered after the fact. Your S3 Object Lock handles the immutability side, but if someone ever needs to verify the chain without access to the WORM store metadata, chained HMACs give you a self-verifying data structure.

Your behavioral consistency check is the piece most teams miss. Comparing tool output against historical baselines catches a class of attack (tool compromise, context drift) that static policy checks can't — because the action looks authorized but the pattern is anomalous. Have you considered extending this to credential-usage patterns? E.g., an agent that normally uses a read-only DB credential suddenly requesting a write credential 10x in a minute is a signal even if each individual request passes policy.

I work on Hermetic (hermeticsys.com), which handles the credential-isolation side of this problem — making sure agents never see raw secrets, only opaque handles bound to specific domains and TTLs. Your lineage architecture and our credential audit chain would compose naturally (correlating a lineage trace ID with a credential handle redemption event gives you the full picture: why the agent decided to act AND what credentials it used to do it). Would be interested in comparing notes on the audit query interface design when you publish the reference impl.