If you've used any AI coding tool in the past year, you've used an OpenAI-compatible API — whether you knew it or not.
Claude Code, Cursor, Aider, Continue, Cline, LangChain, LlamaIndex — they all speak the same protocol. Here's what that means and why it matters for your stack.
The Standard
An "OpenAI-compatible API" is any HTTP endpoint that accepts the same request format as OpenAI's Chat Completions API:
curl https://any-provider.com/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "any-model-name",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 1024
}'
The response follows the same schema:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "any-model-name",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "Hi there!"},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}
}
That's it. Any server that accepts this request format and returns this response format is "OpenAI-compatible."
Why It Matters
The OpenAI wire protocol has become the HTTP of AI. Just like every web browser speaks HTTP regardless of the server behind it, every AI tool speaks the OpenAI protocol regardless of which model you're actually using.
This means:
- Model portability — Switch from GPT to Claude to DeepSeek without changing your code
- Tool interoperability — Any tool that works with OpenAI works with any compatible provider
- No vendor lock-in — Your integration code doesn't depend on any single provider
- Gateway compatibility — Route through proxies, load balancers, and gateways transparently
Who Supports It
Model providers with native OpenAI-compatible endpoints:
| Provider | Endpoint | Models |
|---|---|---|
| OpenAI | api.openai.com/v1 | GPT-5.5, GPT-5 Mini, o3, etc. |
| DeepSeek | api.deepseek.com/v1 | DeepSeek V3, R1 |
| Mistral | api.mistral.ai/v1 | Mistral Large, Codestral |
| Groq | api.groq.com/v1 | Llama 3, Mixtral (fast inference) |
| Together AI | api.together.xyz/v1 | 100+ open source models |
| Fireworks AI | api.fireworks.ai/v1 | Llama 3, Mixtral, custom models |
Providers accessible through OpenAI-compatible gateways:
| Provider | Native API | Via Gateway |
|---|---|---|
| Anthropic (Claude) | Messages API (different format) | OpenAI-compatible via gateway |
| Google (Gemini) | Vertex AI / Gemini API | OpenAI-compatible via gateway |
Tools that consume OpenAI-compatible APIs:
| Tool | How to Configure |
|---|---|
| Claude Code |
ANTHROPIC_BASE_URL env var |
| Cursor | Settings → Models → Custom API Base |
| Aider |
--openai-api-base or .aider.conf.yml
|
| Continue |
config.json → provider apiBase
|
| Cline | Settings → API Provider |
| Roo Code | Settings → API Configuration |
| OpenAI Codex CLI |
OPENAI_BASE_URL env var or config.toml
|
| LangChain | ChatOpenAI(base_url=...) |
| LlamaIndex | OpenAI(api_base=...) |
| AutoGen |
config_list with base_url
|
How to Use It: Practical Examples
Python (openai SDK)
from openai import OpenAI
# Point to any OpenAI-compatible endpoint
client = OpenAI(
base_url="https://api.futurmix.ai/v1",
api_key="your-key"
)
# Works with any model the endpoint supports
response = client.chat.completions.create(
model="claude-sonnet-4-6", # or "deepseek-chat", "gpt-5.5", etc.
messages=[{"role": "user", "content": "Explain async/await in Python"}],
max_tokens=1024
)
print(response.choices[0].message.content)
Node.js
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.futurmix.ai/v1',
apiKey: 'your-key'
});
const response = await client.chat.completions.create({
model: 'claude-sonnet-4-6',
messages: [{ role: 'user', content: 'Write a React hook for debouncing' }]
});
cURL
curl https://api.futurmix.ai/v1/chat/completions \
-H "Authorization: Bearer your-key" \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hello"}]}'
Streaming
stream = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Write a sorting algorithm"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
The Gateway Pattern
The most powerful use of OpenAI-compatible APIs is the gateway pattern: one endpoint that routes to multiple providers.
Your Code (OpenAI SDK)
│
▼
┌──────────────────┐
│ API Gateway │ ← One endpoint, one key
│ (OpenAI-compat) │
└──────────────────┘
│
├── model="claude-sonnet-4-6" → Anthropic
├── model="gpt-5.5" → OpenAI
├── model="deepseek-chat" → DeepSeek
└── model="gemini-2.5-pro" → Google
Benefits:
- One API key for all models
- Automatic failover — if one provider is down, route to another
- Cost optimization — gateways negotiate volume discounts (10-30% off)
- Usage dashboard — see all model usage in one place
Common Pitfalls
1. Not all endpoints support all features
The core chat/completions endpoint is universal, but advanced features vary:
| Feature | OpenAI | Claude (via gateway) | DeepSeek |
|---|---|---|---|
| Chat completions | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ |
| Function calling | ✅ | ✅ | ✅ |
| JSON mode | ✅ | ✅ | ✅ |
| Vision (images) | ✅ | ✅ | ❌ |
| Responses API | ✅ | Varies | ❌ |
2. Model names differ between providers
Always check the provider's model list. Common mistakes:
-
claude-3.5-sonnetvsclaude-sonnet-4-6(naming conventions changed) -
gpt-4ovsgpt-5.5(model generations) -
deepseek-chatvsdeepseek-v3(aliases)
3. Rate limits are provider-specific
Even through a gateway, each upstream provider has its own rate limits. The gateway may return 429s from the underlying provider.
Building Your Own OpenAI-Compatible Server
If you're serving a local model, here's the minimum viable implementation:
from fastapi import FastAPI
from pydantic import BaseModel
import time, uuid
app = FastAPI()
class Message(BaseModel):
role: str
content: str
class ChatRequest(BaseModel):
model: str
messages: list[Message]
max_tokens: int = 1024
temperature: float = 0.7
@app.post("/v1/chat/completions")
async def chat(request: ChatRequest):
# Your model inference here
response_text = your_model.generate(
request.messages[-1].content,
max_tokens=request.max_tokens
)
return {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion",
"created": int(time.time()),
"model": request.model,
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": response_text},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": sum(len(m.content.split()) for m in request.messages),
"completion_tokens": len(response_text.split()),
"total_tokens": 0 # calculated
}
}
Now any OpenAI-compatible client can talk to your local model.
Get Started
FuturMix provides an OpenAI-compatible API with 22+ models. One endpoint, one key, 10-30% off official pricing.
from openai import OpenAI
client = OpenAI(base_url="https://api.futurmix.ai/v1", api_key="your-key")
Works with every tool listed in this article — Claude Code, Cursor, Aider, LangChain, and more.
Are you using OpenAI-compatible APIs in production? What patterns have worked best for you?
Top comments (0)