DEV Community: Dee.Bee

Right Model, Right Time: Why Model Routing Is Becoming Core to GenAI Platforms

Dee.Bee — Thu, 14 May 2026 09:52:18 +0000

Dee.Bee

May 14

Right Model, Right Time: Why Model Routing Is Becoming Core to GenAI Platforms

#ai #llm #architecture #microsoft

Comments 2

3 min read

Right Model, Right Time: Why Model Routing Is Becoming Core to GenAI Platforms

Dee.Bee — Thu, 14 May 2026 09:48:30 +0000

Why a single AI model is no longer enough

If you’re building AI-powered applications today, you’ve probably faced this problem already:

Some prompts are trivial, but they still hit an expensive model
Others are complex and fail badly when routed to a cheaper one
Latency, cost, and quality constantly pull in different directions

Using one model for every prompt is increasingly inefficient especially as new models with very different strengths are released every few weeks.

This is where model routing comes in.

The hospital triage analogy

Consider a large hospital where patients arrive all day with very different problems:

A sore throat
Chest pain
Sudden vision issues like floaters

Patients do not reliably know where to go. Some choose to see a consultant directly, some underplay their symptoms, while others bounce between departments losing valuable time in the process.

To handle this situation, hospitals rely on a triage lead.

The triage lead does not treat patients; they simply ensure that each patient is redirected to the right department in the hospital based on:

Complexity of the case – simple symptom vs. unclear combination of issues
Urgency / latency of the case – how quickly the case needs attention
Cost and resource use – does this really require top‑tier expertise now?
Expected performance / accuracy – does this require a specialist or advanced diagnosis?

What is model router

A model router is like that triage lead in a hospital. It intelligently routes prompts to the most suitable AI model from a collection of available models in real time rather than relying on a single model for all queries.

Simple prompts can be handled by smaller, faster, and cheaper models, while more complex reasoning can be routed to more capable models automatically.

How does a model router work?

The obvious next question that comes to mind is: how does a model router actually work?

Nowadays, models are released rapidly, and we already have benchmark datasets to compare them. It is hard to imagine that a single model will perform best across every dataset, especially when benchmarks measure very different capabilities.

As an AI developer or architect, the real interest is knowing which model performs best for a specific task or use case.

A model router is trained on various benchmark datasets to learn the relationship between prompt types and model strengths.

Large Language Model Routing with Benchmark Datasets

Model routing in cloud platforms

Cloud providers already offer out-of-the-box model routing capabilities:

Azure AI Foundry – Model Router
AWS Bedrock – Intelligent Prompt Router

According to Microsoft documentation:

We train the router on a large, diverse dataset spanning hundreds of thousands of examples across many domains. These include question answering, code generation, mathematical reasoning. Summarization, conversations, and agentic workflows are also covered. We continuously expand the training data to keep pace with new models and capabilities.

Model Router – How it works

Open ecosystem support

OpenRouter also provides similar capabilities. It allows AI engineers to optimise model usage based on specific needs such as cost, quality, or speed, while automatically maintaining fallback strategies.

Benefits of using a model router

Based on the incoming prompt, a model router intelligently identifies and routes to the most suitable model. Smaller, less expensive models are used when they are sufficient for the task.

This leads to:

Lower inference costs
Reduced latency
More efficient and sustainable compute usage

The following Microsoft community blog shares the cost benefits achieved using model routing in Azure AI Foundry:

Optimising AI costs with Microsoft Foundry Model Router

Measurable cost savings across all modes:

4.5% in Balanced, 4.7% in Cost, and 14.2% in Quality mode.

Quality mode saved the most by routing simple prompts to faster, cheaper models while still directing complex requests to more capable models.

What’s next?

In the next article, I’ll provide a step-by-step guide on how a model router can be created in Microsoft AI Foundry, covering model selection strategies, routing behaviour, and practical considerations for production systems.

Will ESB gradually die.......in era of microservices

Dee.Bee — Mon, 08 Apr 2024 10:36:13 +0000

Traditional integration aimed to solve the problem of data exchange between isolated applications. An Enterprise Service Bus (ESB) was a common tool used in this approach. It served as a centralized hub, allowing different applications to access and exchange data. However, this approach had its limitations, such as complexity in managing the ESB, lack of real-time data exchange, and difficulties in integrating with modern, cloud-based applications. This led to the evolution of new integration approaches like API-led connectivity and microservices architecture. These newer methods offer more flexibility, scalability, and efficiency in integrating diverse applications.

In the modern era, applications are moving away from monolithic architectures towards more flexible and scalable microservices architectures. These architectures break down application functionality into small, independent services, eliminating the need for a centralized data transfer point like an ESB. Instead, services communicate with each other in a decentralized manner, allowing for elastic scalability - the ability to easily add or remove service functionality as needed.

This architectural shift aligns well with agile development practices. Agile development breaks down the application development process into short, iterative sprints, each focused on delivering a complete set of functionality for a specific set of tasks. This approach naturally complements a microservices architecture, as microservices are small, discrete, and have clearly defined functions and service boundaries. This combination allows for rapid, iterative development and deployment, and facilitates continuous integration and continuous delivery (CI/CD) practices.

In conclusion, the shift towards microservices and agile development is a response to the need for more flexible, scalable, and efficient application development and deployment in today’s fast-paced digital world.

Traditional Integration:

Imagine a world where different software applications are like isolated islands. Each island has its own data, rules, and way of doing things.
Traditional integration aimed to connect these islands. It was like building bridges or tunnels between them so they could share information.
One common approach was using an Enterprise Service Bus (ESB). Think of the ESB as a central hub where data from different applications could meet and chat.
However, this often led to big, complex applications with tightly woven connections. It was like gluing puzzle pieces together to made a giant picture.
Integration was seen as part of the infrastructure—the behind-the-scenes plumbing that made everything work.

Modern Applications and Microservices:

Fast forward to today. We’re moving away from those giant puzzle pieces.
Instead of building monolithic skyscrapers, we’re creating smaller, independent buildings called microservices.
Each microservice has a specific job, like a tiny superhero with a unique power. They don’t need a central hub; they can talk directly to each other.
Imagine a city where these microservices are scattered around. They communicate flexibly, like neighbours borrowing sugar from each other.
Agile development fits right in. It’s like building one room at a time, adding features step by step. Each sprint is a mini construction project.
Microservices play well with this approach because they’re small, focused, and have clear boundaries. It’s like having separate rooms for cooking, sleeping, and playing.