How to Dynamically Switch Local LLMs with LangChain

#programming #ai #python #opensource

In production RAG applications, you often need to evaluate different models against the same prompts. Does Llama 3.2 handle formatting better? Is Gemma 3 (12B) better at reasoning?

Hardcoding model swaps or reloading your application is slow. In this post, I'll show you how to use LangChain's configurable_alternatives to build a Streamlit app where you can hot-swap models per request.

The Architecture

We want a User Interface (Streamlit) that passes a configuration object to our Logic Layer (LangChain). The Logic Layer then routes the request to the appropriate Model Provider (Ollama).

graph TD
    User[User via Streamlit] -->|Selects Model| UI[UI Config]
    UI -->|config='gemma'| Chain[LangChain Runnable]

    subgraph "Swappable Backend"
        Chain -->|Default| Llama[Llama 3.2]
        Chain -->|Alternative| Gemma[Gemma 3 (12B)]
    end

    Llama --> Output
    Gemma --> Output

The Code Implementation

The core magic lies in the configurable_alternatives method on any Runnable (including LLMs).

from langchain_ollama import OllamaLLM
from langchain_core.runnables import ConfigurableField

# 1. Base Model
llm = OllamaLLM(model="llama3.2")

# 2. Add Alternatives
llm_swappable = llm.configurable_alternatives(
    ConfigurableField(id="model_provider"), 
    default_key="llama",
    gemma=OllamaLLM(model="gemma3:12b")
)

# 3. Invoke with Config
response = llm_swappable.invoke(
    "Why is the sky blue?", 
    config={"configurable": {"model_provider": "gemma"}}
)

This approach decouples your Chain Definition from your Execution Configuration. You build the pipeline once, and modify its behavior at runtime. This is essential for:

A/B Testing: Randomly routing 50% of traffic to a new model.
User Preference: Letting power users choose their "engine".
Fallback: If the primary model times out, switch to a backup.

By combining Streamlit for the frontend and LangChain's LCEL for dynamic routing, we built a robust "Model Playground" in under 50 lines of code.

Github repo is here: https://github.com/harishkotra/langchain-ollama-cookbook