Logan for Waxell

Posted on Apr 14 • Originally published at waxell.ai

96% of Enterprises Run AI Agents. Only 12% Can Govern Them.

#ai #enterprise #agents #governance

OutSystems just published a survey of 1,900 global IT leaders. Ninety-six percent of enterprises are already running AI agents. Ninety-seven percent are pursuing system-wide agentic strategies. And 12% — one in eight — have implemented centralized governance to manage them.

That number — 12% — is not a survey artifact. It's an accurate picture of a structural problem: the governance approaches most organizations reach for were designed for one agent, and they stop working at fleet scale.

The other 88% aren't ignoring governance. They have monitoring. They have system prompts. They have team-level policies and access controls that made sense when there was one agent, one team, one deployment. The problem is that none of those things constitute centralized governance — and as agent counts climb from one to ten to hundreds, the gap between "we have monitoring" and "we have governance" becomes the gap between "we know what happened" and "we have control over what happens."

Agentic governance is the set of runtime policies and enforcement mechanisms that control what autonomous AI agents are permitted to access, spend, output, and execute — enforced at the infrastructure layer, evaluated before each agent action, independent of the agent's own reasoning. Enterprise agentic governance extends this across agent fleets: a centralized control layer that applies consistent policies across every agent regardless of which team built it, which framework it runs on, or how many agents are running simultaneously. Without it, each agent operates under whatever governance the team that built it chose to implement — which produces 96% of enterprises running agents and 12% controlling them.

What does "agent sprawl" actually look like inside an organization?

The OutSystems research found that 94% of enterprises report concern that AI sprawl is increasing complexity, technical debt, and security risk. Thirty-eight percent are mixing custom-built and pre-built agents, creating stacks too fragmented to standardize and secure.

Sprawl doesn't usually start as a governance failure. It starts as success.

A support team ships a ticket-routing agent and it works. A sales team builds a CRM enrichment agent. A finance team adds a reporting assistant. A product team stands up a research agent. Each of these runs fine in isolation. Each team applied whatever governance they thought appropriate — usually a system prompt with behavioral instructions and some dashboards they check when something seems off.

At some point, the organization has forty agents. Then a hundred. Then more, as vendors ship agents pre-embedded in tools that don't announce themselves as agents. Gravitee research found that of the roughly 3 million AI agents active in US and UK enterprises, approximately 1.5 million are running without any oversight or security controls — most deployed without a centralized inventory, many without any formal approval process.

The governance problem that emerges isn't any single agent behaving badly. It's that you can no longer answer basic questions about your fleet: Which agents have access to production databases? Which agents can make external API calls? Which agents processed PII in the last 30 days? Which agents are currently running?

Separate CyberArk research found that 91% of organizations report at least half of their privileged access is consumed by always-on AI-driven identities — machine accounts that don't log off, don't expire, and rarely appear in standard identity audits. You can't govern what you can't see, and at fleet scale, most organizations can't see the full scope of what their agents can access.

Why does governance fail when you have more than one agent?

The answer is architectural. The governance mechanisms that work for a single agent are per-agent by design — they don't compose when you need consistent control across a fleet.

System prompts don't scale as policies. A system prompt that says "do not transmit customer PII to external APIs" works — until it doesn't, due to context window limits, adversarial injection, or a model update that shifts compliance behavior. More critically: if you have 40 agents, you have 40 system prompts, each slightly different, each maintained by a different team, each with its own interpretation of what "external API" means. That's not a policy. That's 40 separate agreements that may or may not hold.

Monitoring without enforcement is not governance. LangSmith, Helicone, Arize, and Braintrust all produce excellent observability. You can see what every agent called, what it spent, what it returned. What none of these tools do is intercept an action before it executes. If your monitoring tells you an agent routed PII to an external endpoint at 2 PM, that's useful forensics. It's not governance — the data left at 2 PM, and you found out at 3.

Team-level policies don't produce fleet-level consistency. When each team governs its own agents, you get policies that reflect each team's risk tolerance and knowledge level. The team that built the CRM enrichment agent applied the constraints that seemed reasonable to them. The team that built the finance reporting assistant applied different constraints. Neither set of constraints was evaluated against the organization's full compliance requirements. Nobody knows if the constraints are consistent with each other.

The technical name for what you need instead is a governance plane — a layer that sits above agent implementations, enforces consistent policies across all agents regardless of who built them, and applies those policies at the execution layer before actions run.

What does centralized governance actually require technically?

The 12% who have centralized governance aren't necessarily more sophisticated than the 88%. They've made specific architectural choices that the majority haven't made yet.

Infrastructure-layer enforcement, not prompt-layer. The distinction matters. Governance baked into system prompts lives inside the agent — subject to everything that can go wrong with the agent's reasoning. Infrastructure-layer governance operates outside the agent's code, wrapping its execution surface. A runtime governance policy that blocks outbound requests containing detected PII patterns fires at the API call layer, before the request leaves the system. The agent never gets the chance to decide whether to comply.

Microsoft's newly released Agent Governance Toolkit (April 2026) takes exactly this approach — sub-millisecond deterministic policy enforcement that hooks into agent frameworks at the execution layer, not the prompt layer. The OWASP Agentic AI Top 10, published in December 2025, formalized the attack surface this architecture addresses: goal hijacking, tool misuse, memory poisoning, identity abuse. None of those attack vectors can be reliably blocked by system prompt instructions. They require enforcement at the execution surface.

Framework-agnostic instrumentation. Most enterprises run agents built on multiple frameworks: LangChain agents, CrewAI pipelines, vendor-embedded agents, custom Python. Centralized governance only works if it's framework-agnostic — if the same policies apply whether the agent runs on LangChain or not, built in-house or purchased from a vendor. The 88% who lack centralized governance typically have framework-specific observability that covers some agents and misses others. Consistent control requires consistent instrumentation, which means the governance layer has to be above the framework, not inside it.

Fleet-wide policy management with deployment-free updates. When a compliance requirement changes — and with EU AI Act enforcement arriving in August 2026, requirements will change — you need to update policies once and have the change propagate across every agent. Per-agent governance means updating 40 system prompts across 40 deployments, with the risk that some get updated and some don't. A fleet-wide governance plane lets you define a policy once and enforce it everywhere without touching agent code.

A durable enforcement record. For compliance, governance needs to be auditable — not just logs of what agents did, but records showing that specific policies were evaluated before specific actions, what was allowed, and what was blocked. That distinction matters to regulators. A log that shows an agent accessed a customer record is evidence of behavior. A record that shows a policy evaluated that access, confirmed it was within authorized scope, and allowed it is evidence of governance. The two look different under audit review.

What the August 2026 deadline means for teams still in the gap

The EU AI Act's enforcement phase for high-risk AI systems takes effect August 2, 2026 — less than four months away. High-risk systems include AI operating in financial services, healthcare, employment, critical infrastructure, and law enforcement. Penalties for non-compliant deployment reach €15 million or 3% of global annual turnover for violations, and €35 million or 7% for the most serious categories.

For organizations in the 88%, the August deadline doesn't require perfect fleet governance by August 1. It requires demonstrating that high-risk AI systems operate within defined constraints with adequate human oversight and documented compliance controls. What it rules out is the status quo in most organizations: agents running in high-risk domains under ad-hoc per-team governance with no cross-fleet audit trail.

The Colorado AI Act becomes enforceable June 30, 2026. State-level AI regulation in the US is fragmenting faster than most legal teams anticipated — and the enforcement dates are arriving faster too. The organizations building fleet governance infrastructure now are building a compliance asset, not just a technical one.

How Waxell handles this

How Waxell handles this: Waxell is built for the fleet governance case, not just the single-agent case. Three lines of SDK instruments any agent — LangChain, CrewAI, custom Python, or a vendor-embedded agent your team didn't write:

from waxell import WaxellSDK
from openai import OpenAI

waxell = WaxellSDK(api_key="...")
client = OpenAI()

with waxell.trace("support_agent"):
    # Waxell evaluates fleet-wide policies before each tool call
    # and output — no changes to agent code required
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": task}]
    )

The runtime governance policies evaluate before each tool call and output. A PII policy defined once applies to every agent in the fleet the moment it deploys. A cost threshold update propagates across every agent's per-session ceiling without touching a single deployment. Audit records embed enforcement events directly in each execution trace — showing not just what agents did, but which policies evaluated each action and whether they allowed or blocked it. That's the enforcement documentation that separates governance from monitoring, and the difference that shows up when compliance reviews ask to see evidence of control, not just logs of behavior.

If you're currently in the 88% — with monitoring but not governance, with per-agent constraints but no fleet-wide control layer — get early access to see what centralized governance looks like in practice.

Frequently Asked Questions

What is enterprise AI agent governance?
Enterprise AI agent governance is a centralized control layer that enforces consistent policies across all AI agents in an organization — regardless of which team built them, which framework they run on, or how many agents are running. It operates at the infrastructure layer, evaluating policies before each agent action executes, and produces audit records showing what was allowed, what was blocked, and why. It is distinct from per-agent monitoring (which records what agents did) and from system prompt instructions (which tell agents what to do, but don't enforce it). Most enterprises have monitoring; only 12% have centralized governance.

What is AI agent sprawl?
AI agent sprawl is the uncontrolled proliferation of AI agents across an enterprise, typically the result of teams independently deploying agents without a shared governance framework, inventory, or approval process. It produces organizations where dozens or hundreds of agents are running with inconsistent policies, overlapping tool access, and no single team with visibility across the fleet. The OutSystems State of AI Development survey (April 2026) found that 94% of enterprises report concern about agent sprawl increasing complexity, technical debt, and security risk — and only 12% have centralized governance to address it.

Why do most enterprises lack centralized AI agent governance?
The primary reason is architectural: the governance mechanisms most teams deploy were designed for single agents. System prompts, team-level monitoring, and per-agent access controls work when there's one agent. When the fleet grows to tens or hundreds, those mechanisms don't compose — each agent operates under whatever governance its team implemented, with no cross-fleet policy consistency, no fleet-wide audit trail, and no mechanism to update constraints across all agents simultaneously. Centralized governance requires infrastructure-layer enforcement that sits above agent implementations, which is a different architectural investment than the per-agent observability most teams have.

What does the EU AI Act require for AI agents?
The EU AI Act's enforcement phase for high-risk AI systems takes effect August 2, 2026. For organizations deploying AI agents in high-risk domains (financial services, healthcare, employment, critical infrastructure), the Act requires documented risk management, data governance controls, human oversight mechanisms, technical documentation, and ongoing post-market monitoring. Critically, it requires evidence that agents operated within defined constraints — not just logs of what they did, but records showing that controls were evaluated and enforced. Organizations that can only show monitoring logs, not enforcement records, face a compliance gap under the Act's requirements.

What is the difference between AI agent monitoring and AI agent governance?
Monitoring records what agents did after the fact: which tools they called, what they cost, what they returned. Governance controls what agents are allowed to do before actions execute: blocking tool calls that violate policy, terminating sessions that exceed cost limits, requiring human approval before sensitive operations. You can have complete monitoring with zero governance — you'll know exactly what went wrong after it happens. Governance is the enforcement layer between an agent's intent and real-world consequences. The 88% of enterprises without centralized governance typically have monitoring; they lack the enforcement layer.

Sources

OutSystems, State of AI Development 2026: Agentic AI Goes Mainstream in the Enterprise (April 2026) — https://www.businesswire.com/news/home/20260407749542/en/Agentic-AI-Goes-Mainstream-in-the-Enterprise-but-94-Raise-Concern-About-Sprawl-OutSystems-Research-Finds
Microsoft, Introducing the Agent Governance Toolkit: Open-source runtime security for AI agents (April 2026) — https://opensource.microsoft.com/blog/2026/04/02/introducing-the-agent-governance-toolkit-open-source-runtime-security-for-ai-agents/
CyberArk, New Study: Only 1% of Organizations Have Fully Adopted Just-in-Time Privileged Access as AI-Driven Identities Rapidly Increase (2026) — https://www.cyberark.com/press/new-study-only-1-of-organizations-have-fully-adopted-just-in-time-privileged-access-as-ai-driven-identities-rapidly-increase/ (91% always-on AI identity stat)
InfoSecurity Magazine, Governance Gaps Emerge as AI Agents Drive 76% Increase in NHIs (2026) — https://www.infosecurity-magazine.com/news/governance-gaps-agents-76-increase/
Artificial Intelligence News, Agentic AI's governance challenges under the EU AI Act in 2026 — https://www.artificialintelligence-news.com/news/agentic-ais-governance-challenges-under-the-eu-ai-act-in-2026/
Centurian AI, EU AI Act 2026: What Your AI Agents Must Prove by August 2 — https://centurian.ai/blog/eu-ai-act-compliance-2026
Gravitee / Security Boulevard, The 'Invisible Risk': 1.5 Million Unmonitored AI Agents Threaten Corporate Security (February 2026) — https://securityboulevard.com/2026/02/the-invisible-risk-1-5-million-unmonitored-ai-agents-threaten-corporate-security/
OWASP GenAI Security Project, OWASP Top 10 for Agentic Applications 2026 (December 2025) — https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0) (2023) — https://doi.org/10.6028/NIST.AI.100-1

DEV Community