DEV Community

Kuldeep Paul
Kuldeep Paul

Posted on

Bifrost on Kubernetes with Helm: A Production Deployment Playbook

Run Bifrost on Kubernetes with Helm in production, with proven configuration patterns, scaling defaults, and fixes for the most common deployment failures.

Platform teams that want a declarative, reproducible way to ship AI gateway infrastructure into Kubernetes typically reach for Helm first. The official chart lets you deploy Bifrost on Kubernetes with Helm in a matter of minutes, but production installs ask for more than a one-line helm install. The configuration choices that actually hold up under load, the failure modes that show up during the first incident, and the operational habits that keep a Bifrost install healthy across release cycles are what this guide covers end to end. Bifrost itself is the open-source AI gateway from Maxim AI, built for high-throughput inference, governance, and reliability across 20+ LLM providers.

The Case for Running Bifrost on Kubernetes

Self-hosting an AI gateway inside your own Kubernetes cluster gives platform teams complete control over data residency, governance, and scaling characteristics. As a first-class Kubernetes workload, the Bifrost Helm chart bundles the gateway with StatefulSet support for SQLite, optional embedded PostgreSQL, vector store integrations for semantic caching, and ingress, autoscaling, and probe configuration out of the box.

Helm is well-suited to this workload for three concrete reasons:

  • Declarative configuration: each values.yaml parameter maps cleanly onto a field in the generated config.json, keeping the chart input and the deployed cluster state in lockstep.
  • Reproducible rollouts: a single values file works for installs, upgrades, and rollbacks across every environment.
  • Native Kubernetes primitives: the chart emits standard Deployments, StatefulSets, Services, Ingress, HPAs, and ServiceAccounts that slot into existing platform tooling without friction.

For teams weighing self-hosted gateways against managed alternatives, the LLM gateway buyer's guide walks through the evaluation criteria that matter most for enterprise rollouts.

What You Need Before the First Helm Install

Before running helm install, make sure the target cluster satisfies the chart's baseline requirements:

  • Kubernetes v1.19 or newer, with kubectl already pointed at the cluster you plan to deploy to.
  • Helm 3.2.0 or later installed on your local machine.
  • A persistent volume provisioner if SQLite will be your storage backend. Postgres-only deployments can skip this.
  • A UTF8-encoded PostgreSQL database if you are running on Postgres. Databases on other encodings will fail at initialization.
  • Kubernetes Secrets that hold the encryption key plus whatever provider API keys you want injected at install time.

Register the chart repository before the first install:

helm repo add bifrost https://maximhq.github.io/bifrost/helm-charts
helm repo update
Enter fullscreen mode Exit fullscreen mode

The complete configuration surface, with every parameter and ready-made example values files, is documented in the Helm values reference.

What Production Bifrost Helm Installs Get Right

The patterns below show up consistently in Bifrost installs that survive their first quarter in production.

Always Pin a Concrete Image Tag

image.tag is required by the chart. Leaving it blank, or setting it to latest, will either break the install outright or pin the cluster to an unnamed version that drifts between rollouts. Specify a concrete version (such as v1.4.11), and treat tag bumps as deliberate, reviewed changes.

helm install bifrost bifrost/bifrost --set image.tag=v1.4.11
Enter fullscreen mode Exit fullscreen mode

Move Every Secret Out of values.yaml

Storing credentials in plaintext inside values.yaml is one of the most common compliance findings on AI gateway installs. The chart exposes an existingSecret option for every sensitive field, covering the encryption key, the PostgreSQL password, vector store credentials, and the per-provider API keys. Keep these in Kubernetes Secrets, or wire them up to HashiCorp Vault and cloud key management services if you are on Bifrost Enterprise.

bifrost:
  encryptionKeySecret:
    name: "bifrost-encryption-key"
    key: "encryption-key"
  providerSecrets:
    openai:
      existingSecret: "provider-keys"
      key: "openai-api-key"
      envVar: "OPENAI_API_KEY"
Enter fullscreen mode Exit fullscreen mode

Choose PostgreSQL Over SQLite for Production

SQLite works well for local development and single-node demos, but it cannot back a multi-replica deployment because the chart's PVC is ReadWriteOnce by default. Production setups should run on the embedded PostgreSQL bundled with the chart, an external managed Postgres (RDS, Cloud SQL, Azure Database), or a high-availability Postgres operator. Mixed-backend installs that store config in Postgres and logs in SQLite are also supported through the mixed-backend.yaml example.

Tune Autoscaling for Long-Lived Streams

Chat completion endpoints in Bifrost rely on long-lived SSE streams. Default Kubernetes HorizontalPodAutoscaler settings will happily cut active streams during scale-down, so raise scaleDown.stabilizationWindowSeconds and keep a preStop sleep in place so the load balancer drains traffic before pods exit.

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

terminationGracePeriodSeconds: 60
lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 15"]
Enter fullscreen mode Exit fullscreen mode

When workloads include streams that run longer than 45 seconds, raise terminationGracePeriodSeconds further so pods can complete in-flight responses without truncation.

Distribute Replicas Across Nodes

In an HA install, replicas should land on different nodes by way of pod anti-affinity. This guards against single-node failures and gives the Kubernetes scheduler a clearer signal for spreading load.

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: bifrost
        topologyKey: kubernetes.io/hostname
Enter fullscreen mode Exit fullscreen mode

Configure Plugins Through values.yaml, Not the UI

Bifrost's built-in plugins (telemetry, logging, governance, semantic cache, OTel, Datadog) are all configurable directly through Helm values. Keeping plugin config in values.yaml keeps every environment reproducible. For installs backed by a database, bump the plugin version field whenever you want Helm-supplied config to overwrite an older record stored in the DB.

bifrost:
  plugins:
    telemetry:
      enabled: true
      version: 1
    logging:
      enabled: true
      version: 1
    governance:
      enabled: true
      version: 1
Enter fullscreen mode Exit fullscreen mode

Pair plugin configuration with Bifrost's virtual key governance for per-team budgets, rate limits, and access control. For the full enterprise governance picture, the Bifrost governance overview covers the model end to end.

Wire Observability in From the Start

Bifrost emits native Prometheus metrics at /metrics and ships with OpenTelemetry tracing support. Add a ServiceMonitor resource to hook those metrics into an existing Prometheus stack:

serviceMonitor:
  enabled: true
  interval: 30s
  scrapeTimeout: 10s
Enter fullscreen mode Exit fullscreen mode

The published Bifrost performance benchmarks record 11 microseconds of per-request overhead at 5,000 RPS, but real production performance depends on cluster topology and provider latency. Without observability wired in, there is no way to confirm the gateway is actually performing the way the benchmarks suggest.

Frequent Failure Modes in Bifrost Helm Installs

The failures below tend to surface either at the first production install or during a routine upgrade.

Registry Authentication and Missing Image Tags

ErrImagePull and ImagePullBackOff typically appear when image.tag is unset, when the repository points to a private enterprise registry without a valid pull secret, or when an ECR token has aged out (ECR credentials are valid for only 12 hours). Use a credential helper or an operator to refresh ECR tokens automatically rather than rotating them manually.

Multiple Replicas with SQLite Storage

Bumping replicaCount to 3 while leaving storage.mode: sqlite will cause PVC binding to fail, because the SQLite PVC in the chart is ReadWriteOnce. Either switch to PostgreSQL, or hold replicaCount: 1 for SQLite-backed installs. HA in production requires Postgres.

Storage Classes That Do Not Exist

A PVC stuck in Pending after install almost always means the cluster has no default storage class, or that the storageClass you set does not exist. Confirm with kubectl get storageclass and pin to a class that is actually present.

helm upgrade bifrost bifrost/bifrost \
  --reuse-values \
  --set storage.persistence.storageClass=standard
Enter fullscreen mode Exit fullscreen mode

Managed Postgres and the SSL Requirement

Most managed Postgres services (RDS, Cloud SQL, Azure Database) require SSL by default. Leaving sslMode: disable produces connection failures right away. Set sslMode: require in the external Postgres block and store the credential in a Kubernetes Secret rather than checking it into values.

Mismatched Secret Key Names for the Encryption Key

The bifrost.encryptionKeySecret.key parameter defaults to encryption-key, but many teams create the secret using the key name key (via --from-literal=key=...). Either align the secret's key name with the default, or override the parameter so it matches whatever key name the secret actually uses.

Active SSE Streams Killed by Scale-Down

If scale-down events end up cutting streaming chat completions mid-response, push terminationGracePeriodSeconds to at least 120 seconds, extend the preStop sleep, and raise stabilizationWindowSeconds so scale-down only happens once traffic genuinely tapers. The Bifrost troubleshooting guide documents the precise configuration knobs.

Accidental Data Loss During Cleanup

By design, helm uninstall leaves PVCs in place, but a hasty kubectl delete pvc will permanently destroy the gateway's data. Before any destructive operation, snapshot the PVC, and use storage.persistence.existingClaim to re-attach the data on reinstall.

Day-Two Operations: Upgrades, Rollbacks, and Scaling

Day-two operations stay predictable on Helm as long as the underlying values file lives in version control.

  • Upgrades: run helm upgrade bifrost bifrost/bifrost -f your-values.yaml. To change one field without touching the rest, use helm upgrade --reuse-values --set image.tag=v1.4.12.
  • Rollbacks: helm history bifrost lists the revision history, and helm rollback bifrost 2 reverts to a known-good revision.
  • Scaling: lean on HPA-driven scaling instead of running kubectl scale by hand. For sudden bursts, raise minReplicas rather than waiting on HPA reaction lag.
  • Verification: after any change, kubectl get pods -l app.kubernetes.io/name=bifrost and curl /health against a port-forwarded pod confirm the gateway is serving traffic.

For multi-replica installs that need cluster mode and gossip-based peer discovery, the chart provides the headless service and the service account permissions required for Kubernetes-based discovery.

Ready-Made Values Files for Common Scenarios

The chart ships example values files for the most common patterns under helm-charts/bifrost/values-examples/:

  • sqlite-only.yaml: a stripped-down dev or local-only install.
  • external-postgres.yaml: points Bifrost at a managed Postgres database already running in your environment.
  • production-ha.yaml: 3 replicas backed by embedded Postgres, with Weaviate as the vector store, HPA enabled, and ingress configured.
  • secrets-from-k8s.yaml: pulls every credential and key from Kubernetes Secrets.
  • providers-and-virtual-keys.yaml: configurations covering every supported provider alongside virtual key examples.

These files install directly from a raw GitHub URL, which makes them an easy starting point to override per environment:

helm install bifrost bifrost/bifrost \
  -f https://raw.githubusercontent.com/maximhq/bifrost/main/helm-charts/bifrost/values-examples/production-ha.yaml \
  --set image.tag=v1.4.11
Enter fullscreen mode Exit fullscreen mode

Teams running MCP gateway workloads on Kubernetes can use the same values file to add MCP server connections, tool filters, and OAuth-based federated auth, keeping the gateway and its tool catalog under unified config management.

Next Steps for Your Bifrost Kubernetes Rollout

A production-grade way to deploy Bifrost on Kubernetes with Helm goes well beyond the quickstart: pinned image tags, externalized secrets, a real database, stream-aware autoscaling, pod anti-affinity, and observability wired in from day one. The Helm chart supports each of these patterns directly through values.yaml, and the example files in the repo cover most production scenarios as ready-made starting points.

To see how Bifrost can simplify AI gateway infrastructure on your own clusters, book a demo with the Bifrost team, or browse the Bifrost GitHub repository to get started with the open-source release.

Top comments (0)