FuturMix

Posted on May 16 • Edited on Jun 1

What Is an OpenAI-Compatible API? How It Works and Why Every AI Tool Supports It

#openai #ai #api #programming

If you've used any AI coding tool in the past year, you've used an OpenAI-compatible API — whether you knew it or not.

Claude Code, Cursor, Aider, Continue, Cline, LangChain, LlamaIndex — they all speak the same protocol. Here's what that means and why it matters for your stack.

The Standard

An "OpenAI-compatible API" is any HTTP endpoint that accepts the same request format as OpenAI's Chat Completions API:

curl https://any-provider.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "any-model-name",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 1024
  }'

The response follows the same schema:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "any-model-name",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "Hi there!"},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}
}

That's it. Any server that accepts this request format and returns this response format is "OpenAI-compatible."

Why It Matters

The OpenAI wire protocol has become the HTTP of AI. Just like every web browser speaks HTTP regardless of the server behind it, every AI tool speaks the OpenAI protocol regardless of which model you're actually using.

This means:

Model portability — Switch from GPT to Claude to DeepSeek without changing your code
Tool interoperability — Any tool that works with OpenAI works with any compatible provider
No vendor lock-in — Your integration code doesn't depend on any single provider
Gateway compatibility — Route through proxies, load balancers, and gateways transparently

Who Supports It

Model providers with native OpenAI-compatible endpoints:

Provider	Endpoint	Models
OpenAI	api.openai.com/v1	GPT-5.5, GPT-5 Mini, o3, etc.
DeepSeek	api.deepseek.com/v1	DeepSeek V3, R1
Mistral	api.mistral.ai/v1	Mistral Large, Codestral
Groq	api.groq.com/v1	Llama 3, Mixtral (fast inference)
Together AI	api.together.xyz/v1	100+ open source models
Fireworks AI	api.fireworks.ai/v1	Llama 3, Mixtral, custom models

Providers accessible through OpenAI-compatible gateways:

Provider	Native API	Via Gateway
Anthropic (Claude)	Messages API (different format)	OpenAI-compatible via gateway
Google (Gemini)	Vertex AI / Gemini API	OpenAI-compatible via gateway

Tools that consume OpenAI-compatible APIs:

Tool	How to Configure
Claude Code	`ANTHROPIC_BASE_URL` env var
Cursor	Settings → Models → Custom API Base
Aider	`--openai-api-base` or `.aider.conf.yml`
Continue	`config.json` → provider `apiBase`
Cline	Settings → API Provider
Roo Code	Settings → API Configuration
OpenAI Codex CLI	`OPENAI_BASE_URL` env var or `config.toml`
LangChain	`ChatOpenAI(base_url=...)`
LlamaIndex	`OpenAI(api_base=...)`
AutoGen	`config_list` with `base_url`

How to Use It: Practical Examples

Python (openai SDK)

from openai import OpenAI

# Point to any OpenAI-compatible endpoint
client = OpenAI(
    base_url="https://api.futurmix.ai/v1",
    api_key="your-key"
)

# Works with any model the endpoint supports
response = client.chat.completions.create(
    model="claude-sonnet-4-6",  # or "deepseek-chat", "gpt-5.5", etc.
    messages=[{"role": "user", "content": "Explain async/await in Python"}],
    max_tokens=1024
)
print(response.choices[0].message.content)

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.futurmix.ai/v1',
  apiKey: 'your-key'
});

const response = await client.chat.completions.create({
  model: 'claude-sonnet-4-6',
  messages: [{ role: 'user', content: 'Write a React hook for debouncing' }]
});

cURL

curl https://api.futurmix.ai/v1/chat/completions \
  -H "Authorization: Bearer your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hello"}]}'

Streaming

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a sorting algorithm"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

The Gateway Pattern

The most powerful use of OpenAI-compatible APIs is the gateway pattern: one endpoint that routes to multiple providers.

Your Code (OpenAI SDK)
    │
    ▼
┌──────────────────┐
│  API Gateway      │ ← One endpoint, one key
│  (OpenAI-compat)  │
└──────────────────┘
    │
    ├── model="claude-sonnet-4-6"  → Anthropic
    ├── model="gpt-5.5"           → OpenAI
    ├── model="deepseek-chat"     → DeepSeek
    └── model="gemini-2.5-pro"    → Google

Benefits:

One API key for all models
Automatic failover — if one provider is down, route to another
Cost optimization — gateways negotiate volume discounts (10-30% off)
Usage dashboard — see all model usage in one place

Common Pitfalls

1. Not all endpoints support all features

The core chat/completions endpoint is universal, but advanced features vary:

Feature	OpenAI	Claude (via gateway)	DeepSeek
Chat completions	✅	✅	✅
Streaming	✅	✅	✅
Function calling	✅	✅	✅
JSON mode	✅	✅	✅
Vision (images)	✅	✅	❌
Responses API	✅	Varies	❌

2. Model names differ between providers

Always check the provider's model list. Common mistakes:

claude-3.5-sonnet vs claude-sonnet-4-6 (naming conventions changed)
gpt-4o vs gpt-5.5 (model generations)
deepseek-chat vs deepseek-v3 (aliases)

3. Rate limits are provider-specific

Even through a gateway, each upstream provider has its own rate limits. The gateway may return 429s from the underlying provider.

Building Your Own OpenAI-Compatible Server

If you're serving a local model, here's the minimum viable implementation:

from fastapi import FastAPI
from pydantic import BaseModel
import time, uuid

app = FastAPI()

class Message(BaseModel):
    role: str
    content: str

class ChatRequest(BaseModel):
    model: str
    messages: list[Message]
    max_tokens: int = 1024
    temperature: float = 0.7

@app.post("/v1/chat/completions")
async def chat(request: ChatRequest):
    # Your model inference here
    response_text = your_model.generate(
        request.messages[-1].content,
        max_tokens=request.max_tokens
    )

    return {
        "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
        "object": "chat.completion",
        "created": int(time.time()),
        "model": request.model,
        "choices": [{
            "index": 0,
            "message": {"role": "assistant", "content": response_text},
            "finish_reason": "stop"
        }],
        "usage": {
            "prompt_tokens": sum(len(m.content.split()) for m in request.messages),
            "completion_tokens": len(response_text.split()),
            "total_tokens": 0  # calculated
        }
    }

Now any OpenAI-compatible client can talk to your local model.

Get Started

FuturMix provides an OpenAI-compatible API with 22+ models. One endpoint, one key, 10-30% off official pricing.

from openai import OpenAI
client = OpenAI(base_url="https://api.futurmix.ai/v1", api_key="your-key")

Works with every tool listed in this article — Claude Code, Cursor, Aider, LangChain, and more.

Are you using OpenAI-compatible APIs in production? What patterns have worked best for you?

DEV Community