Kuldeep Paul

Posted on May 24

One AI Gateway for AWS Bedrock, Google Vertex AI, Gemini, and Anthropic

#ai #api #aws #llm

Bifrost routes AWS Bedrock and Google Vertex AI alongside Gemini and Anthropic through one OpenAI-compatible API with shared auth, failover, and governance.

It is rare for an enterprise AI team to operate just one model from one provider. The production stack at most companies tends to look something like this: Claude on AWS Bedrock for one class of workload, Gemini on Google Vertex AI for another, the native Anthropic API for features such as prompt caching, and the direct Google Gemini API powering low-latency consumer paths. Every provider speaks its own protocol, requires its own authentication scheme, and ships its own SDK. Bifrost collapses all of this into a single OpenAI-compatible endpoint that fronts AWS Bedrock and Google Vertex AI alongside Google Gemini and Anthropic, with failover, load balancing, and governance built into the gateway itself.

Bifrost, the open-source AI gateway built by Maxim AI, runs at 5,000 requests per second with only 11 microseconds of added overhead per request, and connects to 20+ LLM providers through one API. This guide explains why teams put Bedrock, Vertex, Gemini, and Anthropic behind Bifrost, walks through the configuration for each provider, and shows how Bifrost's routing layer keeps multi-provider workloads reliable. The full overhead profile is captured in Bifrost's published benchmarks.

Why Claude and Gemini end up spanning multiple clouds

Anthropic ships Claude through AWS Bedrock, through Google Vertex AI, and through its own native API. Google offers Gemini on both Vertex AI and the direct Gemini API. Enterprises wind up on more than one of these surfaces for three recurring reasons, all tied to procurement, latency, and capability:

AWS Bedrock fits cleanly into teams that already hold AWS contracts, govern access through AWS Organizations, and have data residency mappings tied to AWS regions.
Google Vertex AI tends to win at organizations already running on Google Cloud, or at teams that want one control plane spanning Gemini, Claude, and third-party models together.
The native Anthropic API surfaces features such as prompt caching and the latest beta headers that may not reach Bedrock and Vertex for weeks or months after launch.
The Gemini API gives the shortest direct route to Gemini models and ships with a generous free tier that is helpful during prototyping.

Once a workload graduates from prototype to production volume, teams almost always rely on more than one of these surfaces. The real pain shows up when each one is operated through its native SDK.

What it costs to manage four providers in isolation

Without a gateway in the middle, every provider drags in its own dependencies and code paths:

SDKs that do not align: boto3 for Bedrock, the Google Cloud SDK for Vertex, google-genai for Gemini, and the Anthropic SDK for direct API access.
Different authentication models: IAM credentials and SigV4 signing for Bedrock, OAuth2 service accounts for Vertex, an API key for Gemini, and a bearer token for Anthropic.
Distinct request shapes: Bedrock's Converse API does not match Anthropic's Messages API, and neither matches Vertex's generateContent endpoint.
No common failover story: if Bedrock's Claude endpoint hits its rate limit, your code has to know how to roll over to Anthropic's direct API as the backup.
Fragmented usage data: each provider reports cost and consumption separately, which complicates cost allocation across teams or end customers.

Reliable production AI systems that fan out across OpenAI, Anthropic, Google Vertex AI, and AWS Bedrock cannot be sustained on direct API calls and hand-rolled retry logic. Solving exactly this problem is what Bifrost is built for.

Bifrost's approach to unifying Bedrock, Vertex, Gemini, and Anthropic

Bifrost positions itself between the application layer and these four providers, surfacing one OpenAI-compatible endpoint. Application code calls Bifrost, and Bifrost takes care of protocol translation, authentication, and routing into the upstream provider. This is the drop-in replacement model: switch the base URL in your existing OpenAI, Anthropic, Bedrock, or Google SDK, and the rest of your code keeps working.

What you get out of the swap:

One endpoint covering all four providers plus 16+ additional ones.
A single configuration surface for keys, regions, projects, and IAM roles.
One OpenAI server-sent-event stream format, regardless of which provider answered the call.
Built-in routing rules that target requests by model name, by virtual key, or by weight.
Shared observability, governance, and guardrails across every upstream provider.

Provider targeting uses the provider/model syntax. bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 reaches Claude on Bedrock. vertex/gemini-2.5-flash reaches Gemini on Vertex. gemini/gemini-2.5-pro calls the direct Gemini API. anthropic/claude-sonnet-4-20250514 hits the native Anthropic API.

Setting up each provider inside Bifrost

Providers can be configured through the Bifrost web UI, the API, a config.json file, or the Go SDK. The snippets below illustrate the configuration shape; the full surface is documented in the docs.

AWS Bedrock

The AWS Bedrock provider inside Bifrost accepts static IAM credentials, IRSA on EKS, EC2 instance profiles, and AWS_ACCESS_KEY_ID style environment variables. It also covers assumed IAM roles with an external ID and session name, which matches the standard pattern used for cross-account Bedrock access.

{
  "providers": {
    "bedrock": {
      "keys": [{
        "models": ["*"],
        "weight": 1.0,
        "aliases": {
          "claude-3-5-sonnet": "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
        },
        "bedrock_key_config": {
          "region": "us-east-1",
          "role_arn": "env.AWS_ROLE_ARN",
          "external_id": "env.AWS_EXTERNAL_ID"
        }
      }]
    }
  }
}

Leaving the access key and secret key empty tells Bifrost to fall back to the AWS default credential chain, which walks through IRSA, ECS task roles, EC2 instance profiles, environment variables, and shared credential files in order.

Google Vertex AI

Bifrost's Google Vertex AI provider reaches Gemini, Claude, and third-party models through Google Cloud. The model family (Gemini versus Anthropic) is detected automatically and the correct request conversion is applied. Three authentication paths are supported on Vertex: service account JSON, Application Default Credentials (the recommended path for GKE Workload Identity), and an API key for Gemini-only use cases.

{
  "providers": {
    "vertex": {
      "keys": [{
        "models": ["*"],
        "weight": 1.0,
        "vertex_key_config": {
          "project_id": "env.VERTEX_PROJECT_ID",
          "region": "us-central1",
          "auth_credentials": "env.VERTEX_CREDENTIALS"
        }
      }]
    }
  }
}

OAuth2 token caching and refresh happen inside Bifrost automatically. For Claude on Vertex, the anthropic_version header is set to vertex-2023-10-16 and any unsupported beta headers are stripped out of the request before it is forwarded.

Google Gemini

The Gemini provider authenticates with a simple API key from Google AI Studio. Reach for this path when the project, region, and IAM machinery of Vertex is more than the workload requires.

{
  "providers": {
    "gemini": {
      "keys": [{
        "value": "env.GEMINI_API_KEY",
        "models": ["gemini-2.5-flash", "gemini-2.5-pro"],
        "weight": 1.0
      }]
    }
  }
}

Gemini's native streaming format is converted by Bifrost into the standard OpenAI server-sent-event shape your client already expects, so the same request body that runs against bedrock/... also runs against gemini/... with zero client changes.

Anthropic

The Anthropic provider calls Anthropic's native API directly. Use this surface when the workload needs prompt caching, beta headers, or any Claude feature that has not yet propagated out to Bedrock or Vertex.

{
  "providers": {
    "anthropic": {
      "keys": [{
        "value": "env.ANTHROPIC_API_KEY",
        "models": ["claude-sonnet-4-20250514", "claude-opus-4-20250514"],
        "weight": 1.0
      }]
    }
  }
}

With all four providers configured, a single OpenAI-compatible request can target any of them by swapping the model field. Application code does not change.

Cross-provider routing, failover, and load balancing

Once Bedrock, Vertex, Gemini, and Anthropic all sit behind Bifrost, you can stitch them together into reliability and cost strategies that would otherwise demand custom code:

Automatic failover: Bifrost's retries and fallbacks let you declare a primary and a fallback chain. If Bedrock's Claude endpoint starts throwing 429s or 5xxs, Bifrost can route the call onward to Claude running on Vertex, and from there to Anthropic's own API, all without any application-side intervention.
Weighted load balancing: Bifrost's keys and load balancing split traffic by weight across providers. As one example, 70% of Claude traffic can land on Bedrock while the remaining 30% goes to Vertex during a phased migration.
Cost-aware routing: cheaper or latency-sensitive requests can be sent to Gemini, while high-stakes reasoning calls stay on Claude.
Region-aware routing: European traffic can stay pinned to Vertex in eu-west1, while traffic from the US lands on Bedrock in us-east-1, with no change to the application code.

Because routing decisions are made at the gateway, application teams never have to reason about provider availability or failure modes themselves.

Multi-provider workloads: governance and observability

Putting Bedrock, Vertex, Gemini, and Anthropic behind one gateway also folds the operational surface into a single control plane. Bifrost provides:

Virtual keys, budgets, and rate limits: per-team or per-customer virtual keys can be issued with dedicated spend caps and rate limits, no matter which upstream provider handles the request. Bifrost's governance capabilities cover virtual keys, RBAC, audit logs, and granular rate limits.
Unified observability: native Prometheus and OpenTelemetry exporters publish request-level metrics, distributed traces, and cost data across every provider.
Guardrails: content safety policies through AWS Bedrock Guardrails, Azure Content Safety, or Patronus AI apply uniformly across all upstream providers.
Audit logs: immutable trails of every request, including provider, model, latency, tokens, and cost, support SOC 2, GDPR, HIPAA, and ISO 27001 compliance reporting.

For teams running Bifrost for AWS Bedrock inside their own VPC, none of this traffic ever leaves the customer's AWS account.

Get started with Bedrock, Vertex, Gemini, and Anthropic on Bifrost

Consolidating Bedrock, Vertex, Gemini, and Anthropic onto Bifrost collapses four SDKs, four authentication schemes, and four distinct failure-handling layers into a single OpenAI-compatible endpoint. Protocol translation, OAuth2 and IAM credential handling, stream normalization, and routing all happen inside the gateway, so application teams can ship against one API while platform teams retain full control over cost and governance.

To see what Bifrost can do for your multi-provider AI stack, book a demo with the team.

DEV Community