In production RAG applications, you often need to evaluate different models against the same prompts. Does Llama 3.2 handle formatting better? Is Gemma 3 (12B) better at reasoning?
Hardcoding model swaps or reloading your application is slow. In this post, I'll show you how to use LangChain's configurable_alternatives to build a Streamlit app where you can hot-swap models per request.
The Architecture
We want a User Interface (Streamlit) that passes a configuration object to our Logic Layer (LangChain). The Logic Layer then routes the request to the appropriate Model Provider (Ollama).
graph TD
User[User via Streamlit] -->|Selects Model| UI[UI Config]
UI -->|config='gemma'| Chain[LangChain Runnable]
subgraph "Swappable Backend"
Chain -->|Default| Llama[Llama 3.2]
Chain -->|Alternative| Gemma[Gemma 3 (12B)]
end
Llama --> Output
Gemma --> Output
The Code Implementation
The core magic lies in the configurable_alternatives method on any Runnable (including LLMs).
from langchain_ollama import OllamaLLM
from langchain_core.runnables import ConfigurableField
# 1. Base Model
llm = OllamaLLM(model="llama3.2")
# 2. Add Alternatives
llm_swappable = llm.configurable_alternatives(
ConfigurableField(id="model_provider"),
default_key="llama",
gemma=OllamaLLM(model="gemma3:12b")
)
# 3. Invoke with Config
response = llm_swappable.invoke(
"Why is the sky blue?",
config={"configurable": {"model_provider": "gemma"}}
)
This approach decouples your Chain Definition from your Execution Configuration. You build the pipeline once, and modify its behavior at runtime. This is essential for:
- A/B Testing: Randomly routing 50% of traffic to a new model.
- User Preference: Letting power users choose their "engine".
- Fallback: If the primary model times out, switch to a backup.
By combining Streamlit for the frontend and LangChain's LCEL for dynamic routing, we built a robust "Model Playground" in under 50 lines of code.
Github repo is here: https://github.com/harishkotra/langchain-ollama-cookbook
Top comments (0)