Kuldeep Paul

Posted on Jun 11

Connect Claude Code to Groq With Bifrost

Bifrost is an open-source LLM gateway that redirects Claude Code to any inference provider, including Groq, without modifying Claude Code itself. This walkthrough covers the entire configuration.

By default, Claude Code connects exclusively to Anthropic's API and runs only Claude models. For individual developers, this constraint is fine. Teams seeking fast inference on open-source models for cost-sensitive applications, or those wanting to direct specific workloads to specialized hardware, find the default limiting. Bifrost, the open-source AI gateway written in Go by Maxim AI, stands between Claude Code and any LLM provider, translating at the protocol level. By configuring Claude Code to use Bifrost's Anthropic-compatible endpoint, you can direct all inference to Groq or any other supported provider without modifying Claude Code's binary or internals.

The Case for Groq

Groq's inference service relies on Language Processing Units (LPUs), custom silicon optimized for neural network computation. This LPU-based approach guarantees consistent latency: each token generation takes the same time, removing the variance present on traditional GPU platforms. Independent performance tests show Groq delivering 4-7x greater token throughput than leading GPU systems, with initial-token latency 3-4x lower at equivalent scale.

For Claude Code users, this speed advantage applies in specific contexts:

Experimental and exploratory work using openly-licensed models (Llama 4 Scout, llama-3.3-70b-versatile, DeepSeek R1 Distill) for faster prototyping at lower per-token cost than paid frontier models
Chained multi-step reasoning where each tool invocation requires a new LLM call; per-call speed improvements compound across the chain
Selective model usage where routine tasks run on quick, low-cost open models, while complex reasoning stays on Anthropic's paid tiers

An important limitation: Groq's API lacks support for image input, vector embeddings, voice synthesis, or transcription. It processes text-based chat and function calling exclusively. Features like Claude's computer_use tool are unsupported. Deploy Groq for text and function-call work; keep Anthropic available for multimodal or proprietary-feature requirements.

Bifrost's Role in Connecting Claude Code to Groq

Bifrost presents an Anthropic-compatible interface at the /anthropic path. When Claude Code points here instead of Anthropic's servers, Bifrost acts as the intermediary. It accepts Claude Code's request in Anthropic format, routes it to whichever provider you've configured, and returns the response in the shape Claude Code expects.

Since Groq implements OpenAI's API, Bifrost maps through its OpenAI layer with Groq-specific tuning: fields like store, service_tier, and prompt_cache_key are ignored, and server-sent events use Groq's schema. Full support for function calling is included. The Bifrost provider configuration layer and the request routing layer are orthogonal; you configure Groq independently from configuring which Claude Code requests hit Groq.

Step 1: Launch Bifrost

The simplest path is via npx with Node.js 18+.

npx -y @maximhq/bifrost -app-dir ./bifrost-data

The gateway starts at http://localhost:8080 and initializes a bifrost-data directory for configs and data. Visit http://localhost:8080 in a browser to reach the control panel.

Docker is also an option:

docker run -p 8080:8080 maximhq/bifrost:latest

Step 2: Add Groq as a Provider

The Bifrost control panel and config.json both work for provider setup. In the dashboard, go to Models > Model Providers, locate Groq, add it if missing, and enter your Groq API key (raw or as env.GROQ_API_KEY). Under Allowed Models, choose All Models or restrict to a list.

For config.json in your bifrost-data folder:

{
  "providers": {
    "groq": {
      "keys": [
        {
          "name": "groq-key-1",
          "value": "env.GROQ_API_KEY",
          "models": ["*"],
          "weight": 1.0
        }
      ]
    }
  }
}

Before starting Bifrost, export your Groq key:

export GROQ_API_KEY="your-groq-api-key"

Verify activation in the dashboard: Models > Model Providers > Groq will list the available Groq models.

Step 3: Generate a Virtual Key

Virtual keys let Claude Code authenticate to Bifrost without exposing your real provider keys. Each key can carry routing policies, spending caps, and rate-limit rules.

In the dashboard, go to Governance > Virtual Keys, create a new entry, name it, select Groq as an allowed provider, and copy the generated value. Pass this to Claude Code in the configuration step.

Step 4: Connect Claude Code to Bifrost

Now that Bifrost is running with Groq configured, point Claude Code at Bifrost. Two options: directly edit settings.json, or run Bifrost CLI, a guided terminal tool that sets up Claude Code without requiring you to manage environment variables or files.

Option A: Direct settings.json Edit

Claude Code loads environment settings from settings.json. Location varies by OS and scope:

macOS / Linux / WSL system-wide: ~/.claude/settings.json
Windows system-wide: %USERPROFILE%\.claude\settings.json
Project-level: .claude/settings.json in the project folder

Merge the following into your settings.json's env block (shown here as the key alone; combine with your other settings):

"env": {
  "ANTHROPIC_BASE_URL": "http://localhost:8080/anthropic",
  "ANTHROPIC_AUTH_TOKEN": "your-bifrost-virtual-key",
  "ANTHROPIC_DEFAULT_HAIKU_MODEL": "groq/llama-3.3-70b-versatile",
  "ANTHROPIC_DEFAULT_SONNET_MODEL": "groq/llama-3.3-70b-versatile"
}

ANTHROPIC_BASE_URL sends Claude Code's calls to Bifrost's local Anthropic layer. ANTHROPIC_AUTH_TOKEN provides the virtual key in the Authorization: Bearer header for Bifrost's auth and routing. Model names prefixed with groq/ pin Haiku and Sonnet slots to Groq.

If Claude Code is active, exit and relaunch. Prevent caching issues by running /logout first.

Option B: Bifrost CLI (no Manual Configuration)

Bifrost CLI is a guided terminal launcher. It connects Claude Code to an active Bifrost instance interactively, handling base URL setup, key storage, model selection, and MCP server hookup—all without editing files or setting env variables.

From a second terminal (gateway already running):

npx -y @maximhq/bifrost-cli

The interactive flow asks five questions:

Bifrost gateway location (typically http://localhost:8080)
Your virtual key (optional; press Enter to skip)
Pick Claude Code as your agent
Filter by groq/ and select your model (e.g., groq/llama-3.3-70b-versatile)
Confirm and launch

The CLI writes environment variables on the fly, auto-registers Bifrost's MCP service for your tools, and saves settings to ~/.bifrost/config.json between runs. Keys go to the system keyring—never saved as text.

On model selection: Claude Code depends on function calling for file operations, terminal execution, and code changes. Verify your chosen Groq model supports tool use. The GroqCloud documentation lists each model's capabilities.

Step 5: Test the Setup

Once Claude Code is pointed at Bifrost (via either method), run:

claude --model groq/llama-3.3-70b-versatile

Inside the session, type /model to confirm the active model. Check Bifrost's dashboard Logs tab to see the request routed to Groq; a 200 status indicates success.

To change models mid-session, use:

/model groq/llama-3.3-70b-versatile

Automatic Failover and Routing Rules

Bifrost opens up failover on demand: if Groq returns errors or hits rate limits, Bifrost can automatically reroute to Anthropic or another provider without Claude Code knowing. Build a fallback chain in the dashboard (Features > Fallbacks) or in config.json:

{
  "fallbacks": [
    {
      "from": "groq/llama-3.3-70b-versatile",
      "to": ["anthropic/claude-haiku-4-5"]
    }
  ]
}

Routing rules give you condition-based switching too. Example: send Claude Code requests (identifiable by claude-cli user-agent) to Groq by default, but route others to Anthropic. This creates per-client model behavior on a single instance.

Governance and Observability Built In

Routing Claude Code via Bifrost unlocks the full governance stack at no cost. Per-key spending caps and request limits prevent unexpected bills. Logs capture every request's provider, model, token count, and timing. For teams with many Claude Code users, the LLM Gateway Buyer's Guide details how to layer virtual keys by team, project, or owner.

Observability is ready immediately via the Bifrost dashboard. Prometheus metrics live at /metrics for your monitoring systems. OTLP export is available for Datadog, Grafana, or any standards-compatible backend.

For teams needing to centralize MCP tool management alongside model routing, Bifrost can also work as an MCP gateway: link upstream MCP servers, expose them through /mcp, and register them once with Claude Code via a single command. This cleanly separates inference routing from tool control.

Wrapping Up

Getting Claude Code to Groq through Bifrost boils down to three shared steps (start gateway, enable Groq, make a virtual key) and then connecting Claude Code, either via direct settings.json editing or Bifrost CLI for a guided, no-variables approach. From there, Claude Code defaults to Groq, with failover, governance, and dashboards ready to go.

The same flow works for any Bifrost-supported provider: flip the model name to anthropic/, openai/, bedrock/, vertex/, or any of 20+ platforms, and Claude Code adapts instantly.

For multi-team scaling with load balancing, clustering, vault integration, OIDC, or private-cloud deployment, schedule a demo with the Bifrost team.

DEV Community