DEV Community

Cover image for How to Dynamically Switch Local LLMs with LangChain
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

How to Dynamically Switch Local LLMs with LangChain

In production RAG applications, you often need to evaluate different models against the same prompts. Does Llama 3.2 handle formatting better? Is Gemma 3 (12B) better at reasoning?

Hardcoding model swaps or reloading your application is slow. In this post, I'll show you how to use LangChain's configurable_alternatives to build a Streamlit app where you can hot-swap models per request.

The Architecture

We want a User Interface (Streamlit) that passes a configuration object to our Logic Layer (LangChain). The Logic Layer then routes the request to the appropriate Model Provider (Ollama).

graph TD
    User[User via Streamlit] -->|Selects Model| UI[UI Config]
    UI -->|config='gemma'| Chain[LangChain Runnable]

    subgraph "Swappable Backend"
        Chain -->|Default| Llama[Llama 3.2]
        Chain -->|Alternative| Gemma[Gemma 3 (12B)]
    end

    Llama --> Output
    Gemma --> Output
Enter fullscreen mode Exit fullscreen mode

The Code Implementation

The core magic lies in the configurable_alternatives method on any Runnable (including LLMs).

from langchain_ollama import OllamaLLM
from langchain_core.runnables import ConfigurableField

# 1. Base Model
llm = OllamaLLM(model="llama3.2")

# 2. Add Alternatives
llm_swappable = llm.configurable_alternatives(
    ConfigurableField(id="model_provider"), 
    default_key="llama",
    gemma=OllamaLLM(model="gemma3:12b")
)

# 3. Invoke with Config
response = llm_swappable.invoke(
    "Why is the sky blue?", 
    config={"configurable": {"model_provider": "gemma"}}
)
Enter fullscreen mode Exit fullscreen mode

This approach decouples your Chain Definition from your Execution Configuration. You build the pipeline once, and modify its behavior at runtime. This is essential for:

  1. A/B Testing: Randomly routing 50% of traffic to a new model.
  2. User Preference: Letting power users choose their "engine".
  3. Fallback: If the primary model times out, switch to a backup.

By combining Streamlit for the frontend and LangChain's LCEL for dynamic routing, we built a robust "Model Playground" in under 50 lines of code.

Github repo is here: https://github.com/harishkotra/langchain-ollama-cookbook

Top comments (0)