DEV Community

FuturMix
FuturMix

Posted on

What Is an OpenAI-Compatible API? How It Works and Why Every AI Tool Supports It

If you've used any AI coding tool in the past year, you've used an OpenAI-compatible API — whether you knew it or not.

Claude Code, Cursor, Aider, Continue, Cline, LangChain, LlamaIndex — they all speak the same protocol. Here's what that means and why it matters for your stack.

The Standard

An "OpenAI-compatible API" is any HTTP endpoint that accepts the same request format as OpenAI's Chat Completions API:

curl https://any-provider.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "any-model-name",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 1024
  }'
Enter fullscreen mode Exit fullscreen mode

The response follows the same schema:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "any-model-name",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "Hi there!"},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}
}
Enter fullscreen mode Exit fullscreen mode

That's it. Any server that accepts this request format and returns this response format is "OpenAI-compatible."

Why It Matters

The OpenAI wire protocol has become the HTTP of AI. Just like every web browser speaks HTTP regardless of the server behind it, every AI tool speaks the OpenAI protocol regardless of which model you're actually using.

This means:

  1. Model portability — Switch from GPT to Claude to DeepSeek without changing your code
  2. Tool interoperability — Any tool that works with OpenAI works with any compatible provider
  3. No vendor lock-in — Your integration code doesn't depend on any single provider
  4. Gateway compatibility — Route through proxies, load balancers, and gateways transparently

Who Supports It

Model providers with native OpenAI-compatible endpoints:

Provider Endpoint Models
OpenAI api.openai.com/v1 GPT-5.5, GPT-5 Mini, o3, etc.
DeepSeek api.deepseek.com/v1 DeepSeek V3, R1
Mistral api.mistral.ai/v1 Mistral Large, Codestral
Groq api.groq.com/v1 Llama 3, Mixtral (fast inference)
Together AI api.together.xyz/v1 100+ open source models
Fireworks AI api.fireworks.ai/v1 Llama 3, Mixtral, custom models

Providers accessible through OpenAI-compatible gateways:

Provider Native API Via Gateway
Anthropic (Claude) Messages API (different format) OpenAI-compatible via gateway
Google (Gemini) Vertex AI / Gemini API OpenAI-compatible via gateway

Tools that consume OpenAI-compatible APIs:

Tool How to Configure
Claude Code ANTHROPIC_BASE_URL env var
Cursor Settings → Models → Custom API Base
Aider --openai-api-base or .aider.conf.yml
Continue config.json → provider apiBase
Cline Settings → API Provider
Roo Code Settings → API Configuration
OpenAI Codex CLI OPENAI_BASE_URL env var or config.toml
LangChain ChatOpenAI(base_url=...)
LlamaIndex OpenAI(api_base=...)
AutoGen config_list with base_url

How to Use It: Practical Examples

Python (openai SDK)

from openai import OpenAI

# Point to any OpenAI-compatible endpoint
client = OpenAI(
    base_url="https://api.futurmix.ai/v1",
    api_key="your-key"
)

# Works with any model the endpoint supports
response = client.chat.completions.create(
    model="claude-sonnet-4-6",  # or "deepseek-chat", "gpt-5.5", etc.
    messages=[{"role": "user", "content": "Explain async/await in Python"}],
    max_tokens=1024
)
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.futurmix.ai/v1',
  apiKey: 'your-key'
});

const response = await client.chat.completions.create({
  model: 'claude-sonnet-4-6',
  messages: [{ role: 'user', content: 'Write a React hook for debouncing' }]
});
Enter fullscreen mode Exit fullscreen mode

cURL

curl https://api.futurmix.ai/v1/chat/completions \
  -H "Authorization: Bearer your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hello"}]}'
Enter fullscreen mode Exit fullscreen mode

Streaming

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a sorting algorithm"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Enter fullscreen mode Exit fullscreen mode

The Gateway Pattern

The most powerful use of OpenAI-compatible APIs is the gateway pattern: one endpoint that routes to multiple providers.

Your Code (OpenAI SDK)
    │
    ▼
┌──────────────────┐
│  API Gateway      │ ← One endpoint, one key
│  (OpenAI-compat)  │
└──────────────────┘
    │
    ├── model="claude-sonnet-4-6"  → Anthropic
    ├── model="gpt-5.5"           → OpenAI
    ├── model="deepseek-chat"     → DeepSeek
    └── model="gemini-2.5-pro"    → Google
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • One API key for all models
  • Automatic failover — if one provider is down, route to another
  • Cost optimization — gateways negotiate volume discounts (10-30% off)
  • Usage dashboard — see all model usage in one place

Common Pitfalls

1. Not all endpoints support all features

The core chat/completions endpoint is universal, but advanced features vary:

Feature OpenAI Claude (via gateway) DeepSeek
Chat completions
Streaming
Function calling
JSON mode
Vision (images)
Responses API Varies

2. Model names differ between providers

Always check the provider's model list. Common mistakes:

  • claude-3.5-sonnet vs claude-sonnet-4-6 (naming conventions changed)
  • gpt-4o vs gpt-5.5 (model generations)
  • deepseek-chat vs deepseek-v3 (aliases)

3. Rate limits are provider-specific

Even through a gateway, each upstream provider has its own rate limits. The gateway may return 429s from the underlying provider.

Building Your Own OpenAI-Compatible Server

If you're serving a local model, here's the minimum viable implementation:

from fastapi import FastAPI
from pydantic import BaseModel
import time, uuid

app = FastAPI()

class Message(BaseModel):
    role: str
    content: str

class ChatRequest(BaseModel):
    model: str
    messages: list[Message]
    max_tokens: int = 1024
    temperature: float = 0.7

@app.post("/v1/chat/completions")
async def chat(request: ChatRequest):
    # Your model inference here
    response_text = your_model.generate(
        request.messages[-1].content,
        max_tokens=request.max_tokens
    )

    return {
        "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
        "object": "chat.completion",
        "created": int(time.time()),
        "model": request.model,
        "choices": [{
            "index": 0,
            "message": {"role": "assistant", "content": response_text},
            "finish_reason": "stop"
        }],
        "usage": {
            "prompt_tokens": sum(len(m.content.split()) for m in request.messages),
            "completion_tokens": len(response_text.split()),
            "total_tokens": 0  # calculated
        }
    }
Enter fullscreen mode Exit fullscreen mode

Now any OpenAI-compatible client can talk to your local model.

Get Started

FuturMix provides an OpenAI-compatible API with 22+ models. One endpoint, one key, 10-30% off official pricing.

from openai import OpenAI
client = OpenAI(base_url="https://api.futurmix.ai/v1", api_key="your-key")
Enter fullscreen mode Exit fullscreen mode

Works with every tool listed in this article — Claude Code, Cursor, Aider, LangChain, and more.


Are you using OpenAI-compatible APIs in production? What patterns have worked best for you?

Top comments (0)