Self-Hosted AI Gateway for Cursor: Routing Claude, Ollama, and 20+ Providers Through Bifrost

Power Cursor with Claude, Ollama, and 20+ providers behind a self-hosted AI gateway. Bifrost handles custom models, virtual keys, and unified routing.

For engineering teams that have made Cursor their primary AI IDE, the default model picker quickly starts to feel constraining. Agent-mode cost spikes, single-provider lock-in, zero visibility into per-developer token spend, and the inability to use locally hosted models for sensitive code show up over and over in the complaint pile. A self-hosted AI gateway for Cursor consolidates the fix into one layer. Teams can point Cursor at a single internal endpoint, then route every Chat, Agent, Inline Edit, and Tab Completion call to Claude, GPT-5, Gemini, or a local Ollama model, with governance, observability, and failover all enforced at the gateway. Bifrost is the open-source AI gateway that makes this setup achievable in minutes.

The Case for a Self-Hosted Gateway in Front of Cursor

Out of the box, Cursor relies on a hosted backend and a curated list of models. Enterprise and security-conscious teams almost always need more control over the data path. A self-hosted gateway tackles several recurring problems head-on:

Cost opacity: Until the monthly statement lands, individual developer spend against Anthropic, OpenAI, or Google bills stays effectively invisible without a gateway.
Provider lock-in: Cursor's hosted backend abstracts over which provider actually serves each request, which makes it hard to enforce policies like "Claude only for repos that touch payment code" or "local Ollama for proprietary algorithms."
Data residency: Some workloads cannot leave the VPC. Pointing Cursor at a self-hosted gateway with an Ollama backend keeps every inference local.
Audit and compliance: SOC 2, HIPAA, and GDPR audits require request-level logs, and the gateway is the natural place to produce them centrally.
Failover during outages: A single provider outage will stall Cursor's default backend completely. With automatic fallbacks in place, the gateway silently swaps in a backup model and developers stay productive.

Bifrost sits between Cursor and every supported provider as a self-hosted gateway, exposing all of these controls through a single dashboard.

Self-Hosted AI Gateways, Defined

A self-hosted AI gateway is a proxy you operate on your own infrastructure (a laptop, a VPC, or a Kubernetes cluster) that presents a unified, OpenAI-compatible API to clients like Cursor while dispatching requests to one or more LLM providers in the background. Authentication, model routing, rate limits, caching, and observability all live in one place, so every AI-powered client across the organization talks to a single endpoint.

For Cursor specifically, the gateway catches requests that would otherwise hit api.openai.com, translates them to whichever provider you have chosen, and hands back an OpenAI-shaped response. Cursor itself never sees the difference.

Bifrost's Connection Layer: Cursor to Claude, Ollama, and Beyond

Bifrost is a high-performance, open-source AI gateway from Maxim AI that surfaces a single OpenAI-compatible HTTP API and forwards requests to 20+ providers. Since Cursor exposes a global override for the OpenAI base URL, Bifrost drops in without any modification to Cursor itself.

Through a provider/model-name format that Cursor recognizes natively, Bifrost supports the following providers:

Anthropic: anthropic/claude-sonnet-4-5-20250929, anthropic/claude-opus-4-5
OpenAI: openai/gpt-5, openai/gpt-4.1
Google Gemini: gemini/gemini-2.5-pro, gemini/gemini-2.5-flash
AWS Bedrock: bedrock/anthropic.claude-3-5-sonnet
Ollama (local): ollama/llama-3.3-70b, ollama/qwen2.5-coder
Groq, Mistral, Cohere, xAI, Cerebras, Perplexity, Azure OpenAI, OpenRouter, vLLM, Hugging Face, and more

At 5,000 RPS in sustained performance benchmarks, Bifrost adds just 11 microseconds of overhead per request, which makes the gateway effectively invisible to Cursor users even under heavy agent workloads.

Standing Up Bifrost as Your Self-Hosted Cursor Gateway

A local installation runs end-to-end in under ten minutes; deploying behind a public hostname for a team takes a bit longer. The flow has four phases: launch Bifrost, configure the providers you need, expose Bifrost to Cursor, and point Cursor at the gateway.

Step 1: Launch Bifrost on a laptop or inside your VPC

Three distribution formats are available: a single binary, an NPX package, and a Docker image. The fastest local start uses Docker:

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

The gateway then runs at http://localhost:8080 against a persistent data volume. You can also start it via npx @maximhq/bifrost, or deploy to Kubernetes for team-wide access. Because Bifrost acts as a drop-in replacement for the OpenAI SDK, no client code changes are needed anywhere downstream.

Step 2: Register Claude and Ollama as providers

Open the Bifrost dashboard at http://localhost:8080, navigate to Providers, and add:

Anthropic: Paste in your Anthropic API key. Any anthropic/... model will then be routed through this provider.
Ollama: Point the base URL at http://localhost:11434, or wherever your Ollama instance is listening. Local Ollama does not require an API key.

Additional providers follow the same workflow. Bifrost's provider routing rules support weights, fallbacks, and per-model overrides, so every anthropic/claude-sonnet-4-5-20250929 request can, for example, fall back to openai/gpt-5 whenever Anthropic is degraded.

Step 3: Make Bifrost reachable from Cursor

Cursor's base URL override requires a publicly accessible URL, because the desktop client routes API traffic through Cursor's own servers when paired with hosted accounts. Three approaches commonly work for exposing a self-hosted Bifrost instance:

Cloudflare Tunnel or ngrok for individual developers and pilots
Internal load balancer behind a private DNS entry like https://bifrost.internal.company.com, suitable for team deployments
Kubernetes ingress for production-grade installations (see the k8s deployment guide)

Whichever route you pick, the endpoint should accept HTTPS traffic and forward to Bifrost's port 8080.

Step 4: Point Cursor at your self-hosted gateway

Inside Cursor, press Cmd+, (macOS) or Ctrl+, (Windows/Linux), open the Models section, and configure the following:

Paste either a Bifrost virtual key or a raw provider API key into the OpenAI API Key field.
Switch Override OpenAI Base URL to ON and supply your Bifrost endpoint (for instance, https://bifrost.example.com/cursor).
Under Add or search model, enter each model you want available, using the provider/model-name format: anthropic/claude-sonnet-4-5-20250929, openai/gpt-5, ollama/qwen2.5-coder, gemini/gemini-2.5-pro.

Each of Chat, Agent, Inline Edit, and Tab Completion gets its own model assignment inside Cursor. That opens up per-feature provider mixing: Claude for Agent mode, a fast Groq-hosted Llama for Tab Completion, and a local Ollama model for anything touching sensitive code. For Agent mode and Inline Edit to work correctly, non-native models must support tool use. Bifrost's CLI agents resource page covers broader patterns across all supported coding agents.

Screenshots and the full configuration walkthrough are available in the Cursor integration guide.

Cost Control and Governance for Cursor Teams

Running Cursor behind a self-hosted gateway delivers one big operational win: centralized governance. The primary governance entity inside Bifrost is the virtual key, and each developer or team receives one that maps to specific permissions in place of a raw provider API key.

With virtual keys, platform teams can enforce:

Per-developer budgets: A hard cap on monthly Cursor spend per engineer.
Rate limits: Protection against runaway agent loops that would otherwise burn through a whole team's quota.
Model allowlists: Cursor's Agent mode restricted to specific models for cost reasons, or Claude Opus reserved for sensitive workloads only.
Provider scoping: Only Ollama models exposed to engineers working on regulated repositories.

Bifrost's governance feature set handles hierarchical budgets at the virtual key, team, and customer level, so an organization can manage Cursor spend with the same granularity it applies to AWS cloud spend.

Observing Cursor Traffic Through Bifrost

Each Cursor request that passes through Bifrost gets logged with the prompt, response, latency, token counts, and provider in use. At http://localhost:8080/logs, the Bifrost dashboard supports filtering by provider, model, or conversation content, and the observability stack emits native Prometheus metrics and OpenTelemetry traces.

Cursor teams typically use this layer for:

Pinpointing which Cursor features (Tab vs Agent vs Chat) generate the most cost
Tracking adoption per team to inform license and quota decisions
Side-by-side comparison of real-world latency for Claude versus Gemini on Inline Edit operations
Spotting prompt injection attempts inside agent runs through log analysis

Teams that want active filtering on top of detection can pair this with Bifrost's guardrails layer for input and output policy enforcement.

Datadog, Grafana, New Relic, and Honeycomb are all supported integrations, so Cursor telemetry lands in the same dashboards platform teams already use for backend services. Teams running Maxim AI's evaluation platform can additionally pipe Bifrost traces into Maxim for production-grade prompt and agent evaluation.

Keeping Cursor Reliable with Automatic Failover

When the active provider goes down, a Cursor session stalls outright. Bifrost's automatic fallbacks make it possible to define a fallback chain (Anthropic first, then OpenAI, then Ollama as a last resort, for example), and Bifrost transparently retries failed requests against the next provider in line. From the developer's seat inside Cursor, the response just keeps flowing.

For Agent mode runs that may stretch over several minutes, this kind of failover is especially valuable. A single 503 from Anthropic would otherwise force a full restart, but the fallback returns a usable result and keeps the agent's context intact.

Get Started with a Self-Hosted Gateway for Cursor

A self-hosted AI gateway for Cursor converts the IDE from a single-provider tool into a fully governed, multi-model platform that any engineering organization can scale. Behind one endpoint, Bifrost opens up Claude, Ollama, OpenAI, Gemini, and 16+ additional providers, along with virtual keys for governance, semantic caching for cost reduction, automatic failover for reliability, and built-in observability for compliance.

To see how Bifrost can serve as the self-hosted gateway behind your team's Cursor deployment, book a demo with the Bifrost team or browse the open-source repo on GitHub. The LLM gateway buyer's guide is also available for teams formally comparing gateway options.