Kuldeep Paul

Posted on May 24

Managing Virtual Keys and Budgets in Bifrost: A Complete Guide

See how virtual keys and budgets in Bifrost deliver layered cost control, rate limits, and access governance for LLM workloads across every provider.

AI spend has become the budget line that grows fastest inside most engineering organizations, and the governance tooling around it has not kept pace. Gartner projects worldwide AI outlay reaching $2.5 trillion in 2026, and AI workloads now make up roughly 22% of total cloud spend at SaaS and IT companies. With no structured access control in place, every developer-held API key turns into a potential cost incident waiting to happen. Bifrost answers this with virtual keys and budgets that platform teams can use to govern LLM spend at the gateway layer, applying hierarchical budgets, per-provider rate limits, and model-level access rules uniformly across every supported provider. The sections below walk through how virtual keys operate, how the budget hierarchy is layered, and how to set both up for production workloads.

How Virtual Keys Work in Bifrost

Within Bifrost, the virtual key serves as the central governance entity. It identifies a consumer (which can be a developer, an application, an internal team, or an external customer) and applies a defined permission set: the providers it can call, the models it can invoke, the spend ceiling it must respect, and the token or request volume permitted inside a given window. By replacing the practice of handing out raw provider API keys, a single virtual key closes off one of the biggest paths through which AI costs typically leak.

Each Bifrost-issued virtual key carries an sk-bf-* prefix, and the gateway accepts it through several header formats so that existing SDK conventions remain unchanged:

x-bf-vk for Bifrost-native clients
Authorization: Bearer sk-bf-* for OpenAI-style SDKs
x-api-key for Anthropic-style SDKs
x-goog-api-key for Google Gemini-style SDKs

The practical implication: Bifrost slots in as a drop-in replacement for any existing SDK, with no authentication refactor needed on the application side. Switching the base URL is all it takes to layer governance on top.

The Case for Virtual Keys and Budgets in AI Cost Control

The CloudBees 2026 State of Code Abundance Report captures something platform teams already know first-hand: scaling AI consumption is easy, but forecasting and governing it remains hard, and most organizations still operate without mature controls around token usage, automated governance, or cost attribution. Three failure patterns show up repeatedly when AI overspend happens:

Shared provider keys with no attribution. When one OpenAI or Anthropic key is circulated across an engineering organization, attributing spend back to any specific team becomes impossible.
No model-level restrictions. Without a constraint in place, developers reach for the most powerful (and most expensive) model by default, even when a smaller one would do the job.
No real-time enforcement. Provider invoices land monthly, well after the damage is done, and offer no mechanism for cutting off a runaway workload in flight.

All three are addressed directly by Bifrost's virtual keys. Every key is scoped to a fixed list of providers, an allow-list of models, an independent budget with a configurable reset window, and rate limits applied at both the request count and token count levels. The moment any limit is hit, Bifrost rejects the request and returns a structured error, which gives platform teams hard enforcement instead of soft warnings.

Bifrost's Budget Hierarchy: From Customer Down to Provider

Bifrost lays out budgets across a four-tier hierarchy that closely mirrors the way enterprises already think about spend allocation:

Customer/Business Unit (organization-level budget)
    ↓
Team (department-level budget)
    ↓
Virtual Key (consumer-level budget + rate limits)
    ↓
Provider Config (per-provider budget + rate limits)

Each tier carries its own independent budget. Whenever a request comes in with a virtual key attached, Bifrost evaluates every applicable budget separately, and the call is only allowed through if every level still has sufficient balance remaining. Costs then flow down through all relevant tiers automatically, with deductions calculated from the model catalog against live provider pricing, the count of input and output tokens, the request type, and any cache hit status.

Attachment is flexible at every level. A virtual key can sit under a team (which itself can sit under a customer), attach directly to a customer, or operate standalone. Team attachment and customer attachment are mutually exclusive on any one virtual key. The result: the same governance primitives can model both an internal engineering org and an external SaaS customer base. Full configuration patterns live in the Bifrost governance reference.

Budget Evaluation in Practice

Take a virtual key configured with provider-specific budgets plus an overall VK budget, attached to a team that nests under a customer. Before any request is actually dispatched, Bifrost runs the following evaluation:

The provider config budget tied to the selected provider
The virtual key's own budget
The team-level budget above it
The customer-level budget above that

A failure at any single level blocks the request and returns a 402 budget_exceeded error. Once a request does succeed, the same cost is deducted from every applicable tier. If the only exhausted budget belongs to one specific provider, that provider is dropped from routing while the others under the same virtual key stay available, which keeps applications running against a fallback while still respecting the cost ceiling.

Pairing Rate Limits with Budgets

Spend over time is what budgets control, but they offer no defense against a sudden traffic burst that can either saturate provider rate limits or rack up unexpected charges inside a few minutes. Bifrost handles this with rate limits that run in parallel, enforced at both the virtual key level and the provider config level. Two limit types operate side by side:

Request limits cap how many API calls land inside a reset window (for example, 100 requests every minute).
Token limits cap the combined prompt and completion token volume inside a reset window (for example, 50,000 tokens every hour).

A request only goes through when both limits pass. Reset durations are flexible: 1m, 5m, 1h, 1d, 1w, 1M, and 1Y are all valid, and budgets can additionally carry a calendar_aligned flag so that resets snap to the start of each UTC calendar period (00:00 UTC for daily, Monday 00:00 UTC for weekly, the first of the month for monthly, January 1 for annual). Calendar alignment only applies to day, week, month, and year intervals. Sub-day durations like 1h or 30m operate as rolling windows.

Provider-level rate limits also enable patterns that flat per-key limits cannot. On a single virtual key holding both an OpenAI and an Anthropic provider config, the OpenAI side can carry a 1,000-request-per-hour ceiling while Anthropic carries an independent 500-request-per-hour ceiling. When one provider tops out, the other keeps serving traffic, and the virtual key as a whole stays operational. Full details are on the Bifrost budget and rate limits reference.

Three Ways to Configure Virtual Keys and Budgets

The same governance primitives are exposed through three different configuration interfaces in Bifrost, which lets platform teams pick between point-and-click setup, programmatic provisioning, or declarative configuration as needed.

Web UI

Inside the Bifrost dashboard, a Virtual Keys management page offers expandable provider cards, budget controls with reset period selection, separate token and request rate limit controls, model filtering per provider, and weight distribution indicators for load balancing. Configuration errors surface immediately through real-time validation, and an info sheet on each virtual key exposes live budget consumption, rate limit usage, and provider availability status.

HTTP API

All of the governance primitives (virtual keys, teams, customers, budgets, and rate limits) can be managed through the /api/governance/* endpoints. A typical creation request is shaped like this:

curl -X POST http://localhost:8080/api/governance/virtual-keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Engineering Team API",
    "provider_configs": [
      {
        "provider": "openai",
        "weight": 0.5,
        "allowed_models": ["gpt-4o-mini"]
      },
      {
        "provider": "anthropic",
        "weight": 0.5,
        "allowed_models": ["claude-3-sonnet-20240229"]
      }
    ],
    "team_id": "team-eng-001",
    "budget": {
      "max_limit": 100.00,
      "reset_duration": "1M"
    },
    "rate_limit": {
      "token_max_limit": 10000,
      "token_reset_duration": "1h",
      "request_max_limit": 100,
      "request_reset_duration": "1m"
    },
    "is_active": true
  }'

config.json

For GitOps-style workflows, the same setup can be expressed declaratively inside config.json. Budgets and rate limits sit as top-level arrays inside governance and are referenced by ID from virtual keys and provider configs, which makes the configuration composable and easy to reuse across multiple keys.

Model, Provider, and MCP Tool Restrictions on a Per-Key Basis

On top of budgets and rate limits, three additional access controls travel with every virtual key, and platform teams use them to prevent the kinds of misuse that drive up costs:

Allowed models. An allowed_models array lives inside each provider config, and any request targeting a model outside that list returns 403 model_blocked. This stops a key originally issued for cheap prototyping models from being quietly redirected to a frontier model.
Key ID restrictions. The key_ids field pins a virtual key to specific underlying provider API keys, which is helpful for keeping development, staging, and production environments cleanly separated.
MCP tool filtering. When Bifrost is acting as an MCP gateway, each virtual key can be locked down to a specific allow-list of MCP tools. For agent workloads, where tool access carries both security and cost implications, this control is critical. Teams building broader tool governance can reference Bifrost's MCP gateway resource page for additional context.

Layered together with weighted load balancing across providers and automatic fallback when a provider exceeds its own limits, these controls let platform teams ship a single governed API surface that consumer teams can adopt without sacrificing flexibility.

Production Patterns for Virtual Keys and Budgets

Three configuration patterns show up over and over in production Bifrost deployments:

Per-team monthly budgets paired with daily rate limits. Each engineering team receives a virtual key carrying a $1,000 monthly budget, a 10,000-requests-per-day ceiling, and access to OpenAI and Anthropic under weighted routing. The month-level budget handles cost containment, while the daily rate limit absorbs abuse protection.
Tiered access for cost optimization. A single virtual key holds two provider configs simultaneously: a cheaper model gets a high routing weight and a $50 daily budget, while a premium model gets a low weight and a $200 daily budget. Once the cheap budget runs out, traffic automatically fails over to the premium provider until the next reset.
Customer-attached virtual keys for SaaS resale. Companies layering AI features on top of LLMs attach virtual keys directly to customer entities, set an org-wide budget, and pass usage through into invoicing. Each customer's spend is isolated from every other customer's, so a single runaway integration cannot affect anyone else.

All of these patterns ship in the open-source build, with no enterprise contract required. The governance resource page for Bifrost documents how that OSS foundation scales up to enterprise RBAC, SSO, and SAML once those become hard requirements.

Bring Virtual Keys and Budgets to Your AI Workloads

For AI workloads at scale, the cost discipline that virtual keys and budgets in Bifrost provide has stopped being optional. Hierarchical budget management across customers, teams, virtual keys, and providers, parallel token and request rate limits, model and provider filtering, and calendar-aligned reset windows together give platform teams the same caliber of financial governance over LLM spend that cloud infrastructure already enjoys. And all of it runs through a gateway that adds only 11 microseconds of overhead at 5,000 requests per second (full numbers are on the Bifrost benchmarks page), so the governance layer never costs latency.

To see virtual keys and budgets in Bifrost applied to your own AI cost governance and access control workflows, book a demo with the Bifrost team.

DEV Community