Dinesh Kumar

Posted on Apr 13

Even CrowdStrike Can't See Your Agents

#mcp #agents #security #devops

The most honest admission at RSAC 2026 came from CrowdStrike's own CTO.

Elia Zaitsev told VentureBeat: "It looks indistinguishable if an agent runs Louis's web browser versus if Louis runs his browser."

This wasn't a confession of failure. It was an accurate description of the state of agent security in 2026. CrowdStrike shipped Charlotte AI AgentWorks at RSAC — a sophisticated platform that opens its infrastructure to Anthropic, OpenAI, Deloitte, and NVIDIA. Cisco reported that 85% of its enterprise customers have AI agent pilots underway, but only 5% have moved to production. Palo Alto Networks shipped Prisma AIRS 3.0 with artifact scanning, agent red teaming, and memory poisoning detection.

And none of them shipped an agent behavioral baseline.

That's the gap. And it's not a niche gap — it's the exact reason 85% of enterprise AI agent pilots never make it to production.

The Static Scoring Problem

Before behavioral baselines became the urgent problem, the MCP ecosystem tried to solve trust with static scoring. Quality scores based on GitHub stars. Maintenance ratings. Provenance checks. These systems look at an MCP server and ask: does it have a good reputation?

Runtime behavioral analysis asks a different question: does it actually behave the way it should?

The difference matters more than it seems. Research on dynamic vs static analysis methods found that dynamic behavioral scoring consistently outperforms static methods by 36.2 points in detecting anomalies that would affect agent reliability. Static quality scores measure a server's history. Behavioral trust scores measure what it does when an agent calls it right now.

A compromised or degraded MCP server doesn't need to attack your agent. It just needs to behave differently than your agent expects — returning subtly wrong data, injecting inconsistent schemas, or timing out at critical junctures. None of this shows up in a GitHub star count.

The EU AI Act Forces the Issue

The August 2, 2026 deadline for EU AI Act Article 13 compliance isn't abstract for enterprise teams. Any organization deploying AI agents in EU operations must maintain automated logs of agent actions — which tools were called, what they returned, whether behavior was consistent with baseline expectations.

Static quality scores don't satisfy this requirement. Runtime behavioral logs do.

Singapore's IMDA Agentic AI Governance Framework (January 2026) maps to the same requirement: traceability and accountability for agent-initiated actions. The compliance window is now under six months and closing.

Here is the structural problem: most enterprise teams know they need behavioral logs, but they're treating it as a post-production concern. By the time your agent fleet is in production and the regulator asks for behavioral records, it's too late to retroactively establish what "normal" looked like.

The baseline has to be built before the agent ships.

Why This Survived RSAC

CrowdStrike, Cisco, and Palo Alto are solving a real problem: securing the agent itself from adversarial attack, credential theft, and memory poisoning. Charlotte AI AgentWorks, Prisma AIRS 3.0, and Cisco's agentic SOC tools are serious products for serious threats.

The behavioral baseline gap is orthogonal. It's the question: when your trusted agent calls an MCP server, can you trust what that server does?

A compromised or degraded MCP server can poison an agent's context without ever touching the agent's credentials. It doesn't need to attack the agent — it just needs to behave differently than the agent expects. Response schemas drift. Tool outputs shift. Timeout behavior changes. None of this triggers a security alert. It just makes your agent wrong.

That's the behavioral baseline gap. Three Tier-1 security vendors confirmed at RSAC 2026 that they haven't filled it.

What the Dominion Observatory Does

Dominion Observatory is a free runtime behavioral trust API for MCP servers. It currently tracks 4,400+ servers across 13 categories — not GitHub metadata, but actual production behavioral patterns:

Response consistency: Does the server return structured data with consistent schema across calls?
Signature variance: Does the server's behavior drift between invocations of the same tool?
Timeout anomaly rate: Is response time stable, or does the server show erratic timing patterns?
Category baseline: How does this server's behavioral profile compare to peers in its category?

The trust score is a 0–100 composite updated continuously as agents interact with registered servers. It's the closest thing currently available to a behavioral baseline for MCP.

Getting Started

Observatory API is free. To retrieve a trust score for any tracked MCP server:

GET https://levylens.co/api/trust/{server-name}
Authorization: Bearer {api-key}

The response includes: trust_score, behavioral_category, last_checked, variance_flag, and baseline_deviation.

If a server isn't yet tracked, registration takes 30 seconds and starts the behavioral sampling cycle immediately.

Full API documentation: levylens.co
Smithery listing: search "dominion-observatory" on smithery.ai
npm: npm install dominion-observatory

The Next Six Months

EU AI Act hard deadline: August 2, 2026. Singapore IMDA framework: live since January 2026. The enterprise pilot-to-production gap: 80 percentage points and closing.

Three of the world's largest security vendors just validated at RSAC 2026 that the behavioral baseline gap exists and that they haven't filled it. That's not a competitive threat — it's market confirmation.

Observatory is the free runtime layer that starts filling it, beginning with MCP servers.

If you're building on MCP and want trust data before your agents call unknown tools: levylens.co

Dominion Observatory is a runtime behavioral trust layer for the MCP ecosystem, built in Singapore.

DEV Community