Setting Up Budgets, Rate Limits, and Guardrails in Bifrost: A Hands-On Walkthrough

#ai #programming #tutorial #agents

TL;DR: Governance is the part of an AI gateway that decides whether you can ship to production. I set up Bifrost's four-tier budget hierarchy, configured rate limits at both virtual key and provider levels, and wired in guardrails for PII redaction and content filtering on a single instance. This post walks through the config, the gotchas, and how the budget tree behaves under real traffic.

This post assumes familiarity with running Bifrost via npx or Docker, the virtual keys concept, and how rate limits typically work in API gateways.

Why Governance Lives at the Gateway

If you let every team hit upstream LLM APIs directly, three problems show up.

You cannot answer "who spent what" without parsing invoices from each upstream vendor. You cannot stop a runaway agent mid-incident, because the kill switch is on the upstream provider dashboard, not in your control plane. And you cannot enforce a content policy without writing the same filter in every service that touches a model.

A gateway moves these problems behind one set of controls. The Bifrost governance resource covers the full model. This post is the hands-on version.

Step 1: The Four-Tier Budget Hierarchy

Bifrost models budgets as a four-tier tree: Customer > Team > Virtual Key > Provider Config. Every request walks the tree from the most specific scope outward. If any tier is over budget, the request is rejected before it leaves the gateway.

customers:
  - id: acme-corp
    budget:
      max_limit: 5000
      reset_duration: 1M

teams:
  - id: support-team
    customer_id: acme-corp
    budget:
      max_limit: 1500
      reset_duration: 1M

  - id: eng-team
    customer_id: acme-corp
    budget:
      max_limit: 2000
      reset_duration: 1M

virtual_keys:
  - key: vk-support-chatbot
    team_id: support-team
    budget:
      max_limit: 500
      reset_duration: 1w

Reset durations are 1m, 5m, 1h, 1d, 1w, 1M, 1Y. The daily, weekly, monthly, and yearly buckets are calendar-aligned in UTC, so a 1d budget resets at 00:00 UTC, not 24 hours after first use. That detail matters when you are debugging "why did my budget reset at 5:30am IST."

The customer above caps total spend at 5,000 units per month. Inside that, the support team has 1,500 and engineering has 2,000. They cannot collectively exceed 5,000 even if each team has headroom, because the customer ceiling binds.

Step 2: Rate Limits at Two Scopes

Budgets cap spend; rate limits cap request frequency. Bifrost enforces rate limits at the virtual key level and the provider config level, with separate request and token counters.

virtual_keys:
  - key: vk-support-chatbot
    team_id: support-team
    rate_limits:
      requests:
        max: 60
        reset_duration: 1m
      tokens:
        max: 200000
        reset_duration: 1h

provider_configs:
  - name: anthropic-primary
    provider: anthropic
    rate_limits:
      requests:
        max: 10000
        reset_duration: 1m

The VK rate limit protects per-key abuse: 60 requests per minute, 200k tokens per hour. The provider config rate limit protects upstream: 10k requests per minute total across all keys hitting that provider, which keeps you under your vendor tier ceiling.

If a limit trips, the gateway returns a 429 with the relevant Retry-After header. Reset windows follow the same calendar-aligned rules as budgets.

Step 3: Routing and Weighted Fallbacks

Governance is also about controlling where requests land. Bifrost supports weighted load balancing across providers and automatic fallbacks. Weights are auto-normalised to sum to 1.0, so you do not have to do the math yourself.

routing:
  - model: claude-sonnet
    providers:
      - name: anthropic-primary
        weight: 3
      - name: anthropic-backup
        weight: 1

That sends 75% of traffic to the primary and 25% to the backup. On failure, Bifrost retries the next provider in weight order. Cross-provider routing (Anthropic to OpenAI) must be explicitly configured. The gateway will not silently fall back to a different vendor unless you tell it to, which is the right default for a governance system.

Step 4: Guardrails for PII and Content

Guardrails are the layer that inspects request and response payloads before they leave the gateway. The Bifrost guardrails resource covers the supported policies. The pattern looks like this in config:

guardrails:
  - name: pii-redact
    type: pii
    action: redact
    fields: [email, ssn, phone]
    scope: [request, response]

  - name: prompt-injection
    type: content
    action: block
    scope: [request]

A redact policy rewrites the payload before forwarding. A block policy rejects the request with a 400 and an audit log entry. Both policies apply on top of the rate limit and budget checks, so a request that survives all three tiers still has to pass the policy filter before it touches the upstream LLM.

How This Stacks Against Other Gateways

Capability	Bifrost	LiteLLM	OpenRouter	Kong AI Gateway
Hierarchical budgets (4 tiers)	Yes	Limited	No	Plugin-based
Rate limits at key + provider	Yes	Key only	Vendor-managed	Yes
Weighted load balancing	Yes	Yes	Vendor routing	Yes
Built-in guardrails	Yes	External	No	Plugin-based
Self-hostable	Yes	Yes	No	Yes
Open-source	Yes	Yes	No	Partial

LiteLLM has wide provider support and a large community (BerriAI/litellm), but its governance model is flatter and guardrails are not built in. OpenRouter is the easiest to start with, but data residency and hard budget caps at the customer tier are not on the menu. A deeper migration walkthrough between gateways lives in the LiteLLM alternatives guide.

Trade-offs and Limitations

Bifrost is a Go binary, so if your platform is Python-first and you want governance code you can fork in-process, LiteLLM is a more natural fit despite its higher per-request overhead (~8ms vs Bifrost's 11μs).

The four-tier budget tree is powerful but adds cognitive load. For a single-team setup, you only need the virtual key tier. Resist the urge to use customer and team scopes for hypothetical future structure.

Calendar-aligned resets are great for finance reporting but counter-intuitive if you expect rolling windows. If you want a true rolling 24-hour bucket, you have to model it with shorter reset durations and external aggregation.

Guardrails inspect payloads in-process, which means the gateway sees them in plaintext. If your threat model requires no plaintext PII transiting any shared service, you need to redact upstream of the gateway.

Quick Recap

Bifrost's four-tier budget tree (Customer > Team > VK > Provider) gives finance-grade spend control inside the gateway.
Rate limits run at both the VK and provider scopes, with separate request and token counters.
Weighted load balancing handles distribution, automatic fallbacks handle failure, and cross-provider routing requires explicit config.
Guardrails for PII redaction and content filtering layer on top of budgets and rate limits, before traffic reaches the upstream model.
The whole setup runs in a single Go binary with 11 microsecond overhead per request, so the governance layer does not become the latency bottleneck.

GitHub: https://git.new/bifrost | Docs: https://getmax.im/bifrostdocs | Website: https://getmax.im/bifrost-home