Bifrost and TrueFoundry both market themselves as AI gateways in 2026, and if you look at a feature grid, they cover a lot of the same ground: LLM routing, MCP support, guardrails, observability, rate limiting, cost attribution, agent execution. The overlap is real.
But the overlap obscures the actual decision. Bifrost is a single Go binary you run. TrueFoundry is a Kubernetes-native control plane whose gateway is one layer of a larger platform. These are different architectural bets, and the right choice depends almost entirely on where your team is starting from and how far you expect to grow.
Here's what I found running both.
What you're actually deploying
Bifrost starts in seconds. One binary, no external dependencies — it initializes a local SQLite store and stands up on first run. The startup log from a v1.5.7 instance shows exactly what it spins up: config, logs, and governance stores; a per-user OAuth sweep worker; a pricing sync worker; and its model catalog. Everything in one process. Apache 2.0 licensed, self-hosted.
TrueFoundry inverts this entirely. There's no binary to download — the gateway installs into Kubernetes as part of a control plane, configured via YAML through the TrueFoundry CLI. Available as managed SaaS, VPC deployment, on-prem, or air-gapped. That's more operational surface area at the start. In return, you get deployment options with documented compliance posture, and a control plane that handles things no single binary can: multi-team RBAC tied to your identity provider, SCIM-driven provisioning, and managed model hosting alongside the gateway.
The TrueFoundry gateway itself is stateless — built on the Hono framework, synced from the control plane over a NATS queue. Auth, RBAC, and rate limiting run in memory; logs write asynchronously to ClickHouse. Their published benchmarks: ~250 RPS on 1 vCPU / 1 GB pod, reaching ~350 RPS before saturation, adding roughly +7 ms overhead (closer to +12 ms with full tracing enabled). [Note: these are vendor-stated figures under vendor-specified conditions.]
Model access: both are OpenAI-compatible drop-ins
The code change to adopt either is minimal — a base_url swap:
Bifrost (from a running v1.5.7 instance):
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-api-key" # handled by Bifrost
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "List files in current directory"}],
)
TrueFoundry (same SDK, gateway endpoint):
from openai import OpenAI
client = OpenAI(
base_url="https://<org>.truefoundry.com/api/llm",
api_key="tfy-..."
)
response = client.chat.completions.create(
model="openai-main/gpt-4o", # provider/model set in GitOps YAML
messages=[{"role": "user", "content": "List files in current directory"}],
)
In the v1.5.7 instance I ran, Bifrost's catalog showed 3,020 models across 89 providers. Their current public docs say "1000+ models" and "23+ providers" — these numbers don't match what I observed in the instance, and I can't reconcile them [see RISK FLAGS in the review file]. Regardless of the exact count, the catalog is large in both cases. TrueFoundry documents 1,600+ managed models plus self-hosted options. For most teams the model coverage difference won't be a deciding factor.
MCP and agent execution: closer than expected
Both gateways have invested meaningfully in MCP, and the design pattern is similar.
Bifrost offers two modes: Manual Tool Execution, where the client calls and approves each tool, and Agent Mode, where the gateway auto-executes whitelisted tools. You configure this with tools_to_execute (what can be called) and tools_to_auto_execute (what runs without approval):
# Bifrost agent-mode pattern (from docs)
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
extra_body={
"tools_to_execute": ["list_files", "read_file"],
"tools_to_auto_execute": ["list_files"] # runs without approval
}
)
TrueFoundry frames the same control differently: Virtual MCP Servers (curated subsets of tools exposed to specific teams), per-team RBAC on tool access, and pre/post-call MCP guardrails. It also ships prebuilt connectors for Slack, Confluence, Sentry, and Datadog, and can wrap any REST/OpenAPI service as an MCP server.
Where Bifrost has a clear advantage: if you want lean, self-hosted MCP execution without standing up a full platform, Bifrost's implementation is immediately runnable. Where TrueFoundry adds value: when you need org-level identity on every tool call — one auto-refreshed OAuth token per user across all MCP servers — or multi-agent, session-aware workflows through their Agent Gateway.
For teams that just need reliable tool calling without enterprise identity overhead, Bifrost's approach is genuinely simpler.
Identity, compliance, and deployment options
This is where the two products diverge most clearly.
Bifrost's auth model is built around per-user OAuth with automatic token refresh — visible as a running worker in the boot log. Its Enterprise tier adds SAML-based SSO, RBAC, and OIDC directory sync. Compliance claims (SOC 2 Type II, HIPAA, ISO 27001, GDPR) are marketed as part of the Enterprise Governance module with immutable audit logs. Pricing for the Enterprise tier isn't publicly documented.
TrueFoundry's identity model operates primarily at the org level: SSO via OIDC or SAML 2.0 through any major IdP, optional SCIM provisioning for automated user/group sync, and RBAC. On its higher-tier on-prem Enterprise plan, auth traffic can be routed directly to your IdP without touching TrueFoundry's servers.
Compliance claims: SOC 2 Type II, HIPAA, GDPR, and ITAR for export-controlled defense/aerospace workloads.
The honest take on the compliance parity: both vendors market SOC 2 and HIPAA, both offer VPC/on-prem/air-gapped deployment. As is standard, certifications attach to the managed/audited environment — for self-hosted deployments, compliance also depends on your own controls.
The genuine differentiators:
ITAR: TrueFoundry claims ITAR-compliant deployments; Bifrost does not advertise this. If this matters for your work, you'll need to verify TrueFoundry's current ITAR posture directly — I haven't independently confirmed the scope.
SCIM: TrueFoundry offers SCIM-driven provisioning for automated team/user management. Bifrost's Enterprise tier has OIDC directory sync, which is directionally similar, but SCIM is the more standardized enterprise provisioning protocol.
Per-user OAuth at the tool level: Bifrost's built-in per-user OAuth is cleaner for MCP tool authentication where individual user credentials need to flow through to downstream services. TrueFoundry handles this differently — worth evaluating which model fits your auth architecture.
Observability, cost, and prompts
Observability: Bifrost ships with a dashboard, LLM logs, MCP logs, and 365-day log retention. It integrates with Maxim's evaluation platform for evals. TrueFoundry is fully OpenTelemetry-compliant with metadata tagging and a dedicated tracing product. Both are solid; TrueFoundry's OTel compliance means easier integration with existing monitoring infrastructure.
Cost attribution: Bifrost initializes a governance store at boot and applies budgets and rate limits per user/team. TrueFoundry enforces budgets at user/team/model level with chargeback. Comparable in intent; TrueFoundry goes deeper on multi-level attribution and consolidated reporting across the broader platform.
Prompt management: Both treat prompts as managed artifacts — Bifrost has a Prompt Repository; TrueFoundry offers prompt lifecycle management with versioning, rollback, and publishing. This is closer to parity than either vendor's marketing implies.
Where each wins
Bifrost is the better choice when:
You want open-source code you can read, audit, and fully own
A zero-dependency, single-binary setup matters — operationally or philosophically
You need MCP + agent-mode auto-execution without adopting a broader platform
Your infra doesn't run Kubernetes
Per-user OAuth flowing through to downstream MCP servers is the auth model you need
You don't need vendor-documented ITAR compliance
TrueFoundry is the better choice when:
You need ITAR/export-controlled deployment posture (if confirmed for your use case)
You want SCIM-driven org provisioning and centralized identity management
You need to consolidate gateway + model deployment/training + MCP hosting + multi-agent workflows
You're governing AI tool access across many teams from a single control plane
Your org is already Kubernetes-native and wants a managed SaaS or VPC option with vendor support
What this comparison doesn't settle
Version gap: I evaluated Bifrost v1.5.7. The Helm chart is at v2.1.22 and the project has 86 contributors and thousands of commits. Several capabilities described above may have changed. Check the Bifrost GitHub for current state before drawing conclusions.
Security research controversy: A dev.to post raised concerns about Bifrost and Maxim AI (H3 Labs) potentially fitting patterns of API key harvesting services. I have not verified these claims, but anyone evaluating Bifrost for production workloads involving sensitive API keys should read that post and form their own judgment. [Link: dev.to/bradleymatera/research-why-bifrost-maxim-ai-h3-labs-inc-fits-the-exact-pattern-of-api-key-harvesting-2844]
Pricing: Neither vendor publishes clear pricing for enterprise tiers. This matters if you're comparing total cost. Bifrost's core is open source with no licensing cost; TrueFoundry is a commercial product. Get quotes before assuming either fits your budget.
Actual benchmark comparison: The source article doesn't include a head-to-head latency comparison between the two gateways. Bifrost's own marketing claims "<100 µs overhead at 5k RPS." TrueFoundry publishes "+7 ms at 250 RPS." These are measured under different conditions and aren't comparable without a controlled test. [NEEDS: head-to-head benchmark under identical load and config.]
If you've run either in production at scale and have data on latency, memory usage, or operational overhead that differs from what's documented here, I'd be interested in hearing about it in the comments.
Top comments (1)
Curious how this holds up on long contexts. Does hybrid search stay consistent?