Sebastian Buzdugan

Posted on Jun 19 • Originally published at Medium

Governing and Monitoring Enterprise AI Usage: Allow and Deny Controls for AI Apps on Managed Devices

#ai #llm #security #devops

Your gateway logs look clean. Every API call is accounted for, rate-limited, and piped to your SIEM. You have virtual keys per team, budgets per model, guardrails on PII. The compliance box is checked.

Then someone asks: which AI apps are actually running on our laptops right now?

Nobody knows.

The gateway only sees traffic that goes through it. Claude Desktop, ChatGPT, Cursor, Codex, and a browser full of AI sidebars talk directly to their own endpoints, using API keys that live in the user's home directory. None of that traffic has ever touched your proxy.

This isn't a configuration problem. It's a structural one. And it has a clean fix that most enterprise teams haven't wired up yet.

The gap nobody talks about

A gateway governs the traffic that flows through it. That is exactly what it is for, and Bifrost's gateway does it well: virtual keys, budgets, routing, guardrails, and audit logs, enforced on every request your infrastructure sends.

But your developers are not only calling AI through your infrastructure. They are running the Claude app on their laptop. They are using Cursor with a personal API key. They installed an MCP server that a GitHub README told them would make their coding agent smarter. That traffic leaves the machine straight for the provider. It never had a reason to pass through your gateway, so it doesn't.

This is not a gateway shortcoming. It is a question of where the traffic starts. A centralized proxy can only govern what reaches it, and a desktop app talking directly to api.anthropic.com was never going to reach it on its own. The endpoint is simply a second surface, and until something sits on the machine, that surface is dark.

Here is what lives there:

Desktop chat apps: Claude Desktop, ChatGPT, calling home directly
Browser AI: ChatGPT and Claude on the web, running over HTTPS
Coding agents: Claude Code, Cursor, Codex, OpenCode, configured per-user, per-project
MCP servers: local processes that give those agents filesystem access, shell execution, database connections, and external API calls, spun up from a few lines of JSON

That last one deserves a pause. An MCP server is not an app in the traditional sense. It's a process your developer launched that tells their AI agent: here are the tools you can use. File reads, shell commands, web fetches. All of it governed by whatever the developer put in a config file, with no visibility for your security team and no logs anywhere.

A coding agent with shell access, connected to an ungoverned MCP server, can exfiltrate credentials before your SIEM sees a single packet.

That is the surface Edge is built to cover, and it covers it by extending the gateway you already run, not by replacing it.

What your gateway sees vs. what's actually running on managed devices. The gap is everything on the right.

Two layers, one control plane

Bifrost, the open-source AI gateway built by Maxim and available on GitHub, splits the governance problem into the two layers it actually has.

The gateway is the centralized control plane. It sits between your infrastructure and 1000+ models behind one unified API, and enforces policy on every request: virtual keys, team budgets, rate limits, model routing, PII detection, secrets scanning, content guardrails, and audit logs that feed SOC 2 Type 2, GDPR, and HIPAA stacks. It is built to stay out of the way. In Bifrost's own t3.xlarge benchmark at 5,000 requests per second, the gateway added 11 microseconds of overhead per request at a 100 percent success rate.

The OSS version is self-hosted and drops in as a base URL change. You point your existing SDK at it and leave the rest of your code alone:

# before
client = anthropic.Anthropic(api_key="your-anthropic-key")

# after: same code, traffic now governed
client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="dummy-key",  # real keys live in Bifrost
)

No agent on the machine, no per-app config. This is the layer most teams already have, or can stand up in an afternoon.

Bifrost Edge is the endpoint enforcement layer. It is an Enterprise feature, not part of the open-source gateway. Edge runs locally on every managed machine and intercepts AI traffic at the OS level, before it reaches any application's networking stack. That traffic gets routed through your Bifrost gateway for policy evaluation, and the gateway's verdict comes back to Edge, which enforces it on the device.

The frame that makes this click: the gateway centralizes governance, Edge enforces it on every machine.

Every desktop app, every browser extension, every MCP server suddenly inherits the same virtual keys, budgets, guardrails, and audit logging that your API infrastructure already has. Zero per-app configuration. You install Edge once and governance follows the user.

Bifrost Edge on every managed machine, all routing to the same central gateway. Policy changes propagate to the fleet without touching individual devices.

How Edge sits on the machine

Installation is a single MDM push. Edge ships managed config for Jamf, Intune, Kandji, Omnissa Workspace ONE, and JumpCloud, so it rolls out across macOS, Windows, and Linux with no end-user action. The managed config carries only non-sensitive connection settings. The machine arrives pre-pointed at your Bifrost; identity and keys come from the user's sign-in, not from the device profile.

After install, the user signs in once through a browser SSO flow. That sign-in links the machine to the user and pulls down every policy assigned to them. From then on, Edge runs as a menu-bar agent on macOS or a system-tray process on Windows and Linux.

Interception happens at the machine level, not per application. Edge does not ask you to change base URLs or touch anything inside the AI apps. It uses an organization certificate, managed centrally from the Bifrost console, to route each app's AI calls through your gateway before they leave the machine. When Claude Desktop calls api.anthropic.com, Edge sees the request, sends it through Bifrost for policy evaluation, and hands the response back. The app never knows the difference.

That is what makes this work at scale. You have a hundred AI tools across the fleet; you configure one thing, and all hundred are governed.

Allow, deny, and the state in between

Edge discovers AI apps and MCP servers across the fleet and rolls them into a deduplicated catalog: the same tool on 200 machines shows up once. Every entry sits in one of three states.

Pending: discovered and awaiting review. It keeps working in the meantime, so turning Edge on does not break anyone's workflow on day one.
Approved: explicitly allowed. Traffic runs normally and stays under the gateway's full policy stack.
Denied: blocked. Edge stops it on the device, and the user sees a notice that the app is not permitted on company machines.

That default-pending posture is the part worth sitting with. You are not forced to choose between locking everything down up front or flying blind. New tools surface as Pending while you decide, so you get visibility immediately and enforcement when you are ready.

The workflow in practice: a developer installs a new coding agent. It appears as Pending in the console with its name, the devices running it, who last changed its status, and any notes. You approve it, deny it, or select every Pending item and act in bulk. Decisions take effect on each device at its next check-in, which is set by a sync interval you control centrally, down to a few seconds when you need policy to move fast.

Supported surfaces today: Claude Desktop, ChatGPT, Cursor, and Codex on the desktop; Claude Code, Codex CLI, and OpenCode as coding agents; ChatGPT and Claude on the web. If something you run is not on the list yet, the console has a "request support for an application" path.

MCP governance is its own layer

MCP servers get their own governance surface in Edge, separate from application-level allow/deny. That separation matters.

You might want Cursor allowed but a specific MCP server denied. Maybe it has filesystem write access your security team hasn't signed off on. Maybe it was pulled from an untrusted source. Edge lets you approve or deny at the server level, not just the app level.

The workflow:

Edge discovers MCP servers configured in supported AI apps across the fleet
The full inventory surfaces in the Bifrost admin console
Administrators approve or deny each server individually
Edge enforces those decisions at the device level

Default posture applies here too. Set Edge to deny all unknown MCP servers by default and new servers require explicit approval before anyone in the fleet can use them.

The fleet-wide MCP inventory alone is something most security teams have never had. Ask a room of security leads which MCP servers run on their developer machines and the honest answer is nobody knows. Edge surfaces each one with its name, whether it connects through a local command or a remote URL, and the exact tools it exposes, then gives you per-server approve and deny. MCP discovery currently covers Claude Code, Claude Desktop, Cursor, Gemini CLI, OpenCode, and Codex.

MCP governance in three steps: Edge discovers servers fleet-wide, admin sets per-server policy, Edge enforces on every device.

What you can see

With Edge deployed, the console becomes a live inventory of AI on the fleet.

The Devices view lists every machine running the agent: hostname, owner (name and email), platform with OS version and architecture, agent version, how many AI apps and MCP servers it has (versions on hover), when it was first seen, and its last check-in. You can filter by platform, by a specific app or MCP server, or by approval status, so "which machines are still running the app we denied" is a filter, not an investigation. Open a single device and you get its installed apps, its configured MCP servers and the tools they expose, and one-click approve, deny, or remove.

Underneath the inventory, the guardrails you already configured in the gateway now apply to all of this traffic, in both directions, before a prompt reaches a model and before a response comes back. That includes Gitleaks-backed secrets detection for leaked API keys, tokens, and private keys, PII redaction, and any third-party content-safety integration you run (AWS Bedrock, Azure, Google Model Armor, CrowdStrike, Patronus, and others). Edge does not re-implement any of it. It brings ChatGPT, Claude, and your coding agents under the same protection your API traffic already has.

Every routed request also inherits the gateway's structured logging: timestamp, user, model, token count, guardrail outcome. Pair that with the gateway's Prometheus, OpenTelemetry, and Datadog exports and you have one view over all AI usage, infrastructure and endpoint alike.

The containment argument

Here is the scenario that makes the security case concrete.

Your team discovers a new AI desktop app is exfiltrating conversation history to an undisclosed third party. You need it off every managed laptop, now.

Without Edge: you're filing an MDM change request, writing browser extension policies, opening a firewall ticket, coordinating across three teams, and probably missing the macOS desktop app entirely. Timeline: days to weeks.

With Edge: you open the console, set the app to Denied, and apply it to the fleet. Every machine picks it up at its next check-in, governed by the sync interval you set. The block lands before any data leaves the device. Audited, reversible, fleet-wide in seconds to minutes.

Same logic for MCP servers. A new server with dangerous tool permissions shows up in a GitHub repo. You deny it fleet-wide before any developer on your team installs it.

The governance speed matters as much as the governance capability. The threat landscape for AI tooling is moving fast. Waiting weeks to roll out a block is not a viable security posture.

The honest part

Edge is in limited alpha. You cannot self-serve into production today. You register, you get access when the alpha opens up. That is the reality of where it sits.

The capability set described here is what the alpha delivers. All of it is live and usable for teams in the program. But if you need this in production next week with an SLA behind it, that conversation starts with the Bifrost team, not a self-serve dashboard.

Also worth naming: the MCP app support list is current as of this writing. New apps and MCP servers get added, but if you have a specific tool you need covered, check the supported applications page before assuming it's there.

Getting started

The Bifrost gateway (OSS) is available now. It is a drop-in base URL replacement for any OpenAI-compatible SDK call. If you are not running a centralized gateway yet, that is the right starting point. Get visibility on your API traffic first, then add Edge for the endpoint layer.

For Edge, register for alpha access at docs.getbifrost.ai/edge/overview.

The path that makes sense:

Deploy the Bifrost gateway, route your team's API traffic through it
Configure virtual keys, budgets, and audit logging
Roll out Bifrost Edge via MDM to bring endpoint AI traffic under the same governance

Steps 1 and 2 give you coverage on everything your infrastructure controls. Step 3 closes the gap.

Every technical detail here is drawn from the Bifrost Edge documentation and the gateway overview. If something in this piece doesn't match what you see in the docs, trust the docs.

Originally published on Medium.

More writing on Medium → https://medium.com/@sebuzdugan
Short takes on X → https://x.com/sebuzdugan
Practical AI / ML videos on YouTube → https://www.youtube.com/@sebuzdugan/