Why this matters
AI model prices dropped 80% between 2023 and 2025. Three major model releases changed the capability/cost tradeoff significantly. New providers entered the market. Existing providers deprecated model versions with 6 months notice.
If your application calls openai.chat.completions.create() directly in 40 places, every one of these market changes is a refactoring project.
Here's the architecture pattern we use across every production AI system we build at Ailoitte.
The 4-Layer Model-Agnostic Pattern
Layer 1: The Unified Interface
Application code never imports a provider SDK directly. Every AI call goes through a single interface:
interface AICallConfig {
task: 'classification' | 'generation' | 'extraction' | 'embedding';
priority: 'speed' | 'quality' | 'cost';
maxTokens?: number;
temperature?: number;
stream?: boolean;
}
interface NormalisedAIResponse {
content: string;
tokensUsed: { input: number; output: number; total: number };
modelUsed: string;
latencyMs: number;
confidence?: number;
cached: boolean;
}
async function callAI(
prompt: string,
context: Record,
config: AICallConfig
): Promise {
const provider = await routeToProvider(config);
const rawResponse = await provider.complete(prompt, context, config);
return normaliseResponse(rawResponse, provider.name, config);
}
No provider import. No model name. No API key. Just your interface.
Layer 2: Configuration-Driven Routing
The routing config lives in a JSON file — not in code:
{
"routing": {
"classification": {
"cost": "gpt-4o-mini",
"quality": "claude-3-5-sonnet-20241022",
"speed": "gpt-4o-mini"
},
"generation": {
"cost": "gpt-4o-mini",
"quality": "claude-3-5-sonnet-20241022",
"speed": "gpt-4o"
},
"extraction": {
"cost": "gpt-4o-mini",
"quality": "gpt-4o",
"speed": "gpt-4o-mini"
}
},
"fallback_chain": ["gpt-4o", "claude-3-5-sonnet-20241022", "gpt-4o-mini"],
"fallback_trigger": ["rate_limit", "timeout", "server_error"]
}
Changing the model for any task type = updating this JSON. No code deployment.
Layer 3: Response Normalisation
OpenAI, Anthropic, and other providers return different response shapes. The normalisation layer abstracts this:
function normaliseResponse(
raw: OpenAIResponse | AnthropicResponse | GeminiResponse,
providerName: string,
config: AICallConfig
): NormalisedAIResponse {
if (providerName === 'openai') {
return {
content: raw.choices[0].message.content,
tokensUsed: {
input: raw.usage.prompt_tokens,
output: raw.usage.completion_tokens,
total: raw.usage.total_tokens
},
modelUsed: raw.model,
latencyMs: raw._latencyMs, // injected by provider wrapper
cached: false
};
}
if (providerName === 'anthropic') {
return {
content: raw.content[0].text,
tokensUsed: {
input: raw.usage.input_tokens,
output: raw.usage.output_tokens,
total: raw.usage.input_tokens + raw.usage.output_tokens
},
modelUsed: raw.model,
latencyMs: raw._latencyMs,
cached: false
};
}
// Add more providers here without touching application code
}
Layer 4: Automatic Fallback Routing
async function routeToProvider(config: AICallConfig): Promise {
const routingConfig = await loadRoutingConfig();
const primaryModelName = routingConfig.routing[config.task][config.priority];
try {
const provider = providerRegistry.get(primaryModelName);
await provider.healthCheck(); // lightweight check
return provider;
} catch (e) {
if (isRetryableError(e)) {
// Route to next in fallback chain
return getNextFallbackProvider(primaryModelName, routingConfig.fallback_chain);
}
throw e;
}
}
Real-World Result
One client wanted to migrate 60% of their API calls from GPT-4 to Claude 3.5 Sonnet when Anthropic's pricing dropped significantly. Because they'd built with this pattern:
- Updated the routing config JSON (2 minutes)
- Ran the eval set against the new routing (2 hours)
- Confirmed quality parity on their specific tasks (passed)
- Deployed the config change (10 minutes)
Zero application code changes. Monthly savings: $9,200.
The Setup Cost
This pattern adds approximately 3–5 days of initial engineering work, depending on how many providers you want to support. In the current AI market, that investment pays back within the first provider change you need to make.
Written by Sunil, Content Lead at Ailoitte — we build production AI systems for fintech, healthcare, SaaS, and logistics companies. We publish technical content on what actually works in production AI, not just what tutorials teach.
Top comments (0)