江欢（JackSoul）

Posted on Jun 5

OpenAI-compatible AI API gateway migration checklist

#ai #llm #api #webdev

Audience: developers and SaaS teams moving an existing OpenAI SDK integration behind an API gateway, router, or managed model-access layer.

Goal: switch safely with minimal code churn, while catching cost, billing, observability, and reliability gaps before production traffic moves.

FerryAPI positioning note: FerryAPI is an OpenAI-compatible AI API gateway for teams that want one base URL/API-key flow plus customer API-key management, usage records, prepaid balance controls, provider pools, and lower-cost model access. This checklist is written to be useful even if you choose another gateway.

1. Inventory the current integration

Before changing a base_url, write down what the app already depends on.

Which SDKs are in use: OpenAI Node, Python, LangChain, Vercel AI SDK, custom HTTP client, or another wrapper?
Which endpoints are used: chat completions, responses, embeddings, images, audio, moderation, batch, streaming?
Which model names are hardcoded?
Which requests stream tokens to users?
Which requests are background jobs where latency is less sensitive?
Where are API keys stored and rotated?
Which logs, metrics, or billing jobs currently depend on OpenAI response fields?

Migration tip: start with the simplest production-like request path, not the largest or most agentic workflow.

2. Confirm compatibility with a smoke test

A gateway should make the first test boring.

Minimum smoke test:

curl https://YOUR_GATEWAY_BASE_URL/v1/chat/completions \
  -H "Authorization: Bearer YOUR_GATEWAY_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "YOUR_MODEL_ALIAS",
    "messages": [{"role": "user", "content": "Reply with exactly: gateway ok"}],
    "temperature": 0
  }'

Check:

Does the request use the same OpenAI-style authorization header?
Does the response shape match what your SDK expects?
Are errors returned in a format your retry and alerting code can parse?
Does streaming work if your product uses streaming?
Are usage fields present and plausible?
Is the model alias stable and documented?

Red flag: the gateway says OpenAI-compatible but requires a proprietary SDK for common chat-completion use cases.

3. Change only configuration first

Keep the first migration as small as possible.

Typical config-only change:

base_url: from OpenAI to gateway URL
api_key: from provider key to gateway key
model: from direct provider model to gateway-supported model alias

Avoid changing prompts, agents, retry policy, and product UX in the same release. If behavior changes, you want to know whether the gateway or your own code caused it.

4. Run a comparison batch

Send a small, representative set of prompts through both the current provider path and the gateway path.

Compare:

latency p50 / p95
output quality for key workflows
timeout and retry behavior
token counts and billed units
streaming chunk format
refusal/error behavior
JSON/tool-call reliability if your app depends on structured output

Use real application prompts when possible, but remove secrets and customer data.

5. Add usage and cost guardrails before rollout

Cost controls are easier to validate before the first production incident.

Confirm the gateway can answer:

Which API key spent money?
Which customer, workspace, or project generated usage?
Which model/provider handled the request?
Can you set per-key quotas or prepaid balances?
What happens when a key reaches its limit?
Can you export usage for internal billing or customer invoicing?
Can compromised keys be disabled quickly?

Practical test: intentionally set a low quota on a test key, hit the limit, and confirm your app shows a safe failure state.

6. Decide routing and fallback rules explicitly

Do not let routing be mysterious in production.

Document:

default model/provider for each product feature
fallback model/provider order
when cheaper models are acceptable
when high-quality models are required
whether retries can cross providers
how model changes are communicated to users or internal teams

If the gateway offers automatic routing, test it with prompts that represent expensive, low-risk, high-risk, and latency-sensitive workloads.

7. Ship with a staged rollout

Recommended sequence:

local smoke test
staging environment
internal users only
low-risk background jobs
1–5% production traffic
wider rollout after latency, error rate, and cost checks pass

For each stage, define rollback:

old base URL and key still available
feature flag or environment variable ready
dashboards showing gateway traffic separately
owner on call during first production window

8. Monitor the right signals

At minimum, track:

request count by model/provider
error rate by endpoint
timeout rate
p50/p95 latency
streaming disconnects
spend by key/project/customer
quota-limit events
fallback events
provider outage events

A gateway migration is not complete when requests succeed. It is complete when you can explain behavior and cost under normal and failure conditions.

9. Common rollback triggers

Rollback or pause rollout if you see:

unexplained cost spikes
missing usage records
streaming format incompatibility
higher timeout rate on critical paths
model alias changes without notice
customer billing attribution gaps
provider fallback producing unacceptable output changes
support team cannot diagnose failures from logs

Quick pre-launch checklist

[ ] Existing SDK works with gateway base_url and API key
[ ] Streaming tested if used
[ ] Error parsing tested
[ ] Usage records verified
[ ] Per-key quota or balance behavior tested
[ ] Staging comparison batch completed
[ ] Rollback config ready
[ ] Production rollout starts with a small traffic slice
[ ] Owner and alerting defined for launch window

FerryAPI fit

FerryAPI is most relevant when your app already speaks OpenAI-compatible APIs and you want gateway-level control over API keys, provider pools, usage records, prepaid balance, and model cost management without rewriting the application around a new AI stack.

Useful pages:

DEV Community