Building AI applications often feels like writing "glue code"—endless if/else statements and loops to manage how data flows between your Prompt, LLM, and Output Parser.
LangChain Expression Language (LCEL) solves this by giving us a declarative, composable way to build chains. It’s like Unix pipes | for AI.
In this post, I'll walk you through a Python demo I built using LangChain, Ollama, and the Gemma model that showcases three advanced capabilities: Routing, Parallel Execution, and Streaming Middleware.
1. Intelligent Routing (Branching)
The Problem
You have one chatbot, but you want it to behave differently based on what the user asks. If they ask for code, you want a "Senior Engineer" persona. If they ask about data, you want a "Data Scientist".
The LCEL Solution: RunnableBranch
Instead of writing imperative if statements, we build a Router Chain.
- Classify Intent: We ask the LLM to categorize the input (e.g., "code", "data", "general").
- Branch: We use
RunnableBranchto direct the flow to the correct sub-chain.
The Code
# A chain that outputs "code", "data", or "general"
classifier_chain = classifier_prompt | llm | parser
# Route based on the output of classifier_chain
routing_chain = RunnableBranch(
(lambda x: x["intent"] == "code", code_chain),
(lambda x: x["intent"] == "data", data_chain),
general_chain
)
The Result
When you run: python main.py routing --query "Write a binary search in Python"
Output:
[Router] Detected 'code'
def binary_search(arr, target):
# ... concise, professional code output ...
The system automatically detected the intent and switched to the coding expert persona!
2. Parallel Fan-Out (Multi-Source RAG)
The Problem
You need to answer a question using info from multiple distinct documents (e.g., your internal wiki, API docs, and general notes). Querying them one by one is slow.
The LCEL Solution: RunnableParallel
RunnableParallel runs multiple runnables at the same time. We use it to fan-out our query to three different retrievers simultaneously.
The Code
parallel_retrievers = RunnableParallel({
"lc_docs": retriever_langchain,
"ollama_docs": retriever_ollama,
"misc_docs": retriever_misc,
})
The Result
When you run: python main.py parallel_rag --query "What is LCEL?"
Output:
The "Merger" step received results from all three databases instantly, combined them, and the LLM answered using the full context.
3. Streaming Middleware (Real-Time Transforms)
The Problem
You are streaming the LLM's response letter-by-letter to the user, but you need to catch sensitive information (like PII) before it hits the screen.
The LCEL Solution: Generator Middleware
We can wrap the standard .astream() iterator with our own Python async generator. This acts as a "middleware" layer that can buffer, sanitize, or log the tokens in real-time.
The Code
async def middleware_stream(iterable):
buffer = ""
async for chunk in iterable:
buffer += chunk
# If buffer contains a potential email, Redact it
if "@" in buffer:
yield "[REDACTED_EMAIL]"
buffer = ""
else:
yield buffer
(Note: The actual implementation uses smarter buffering to handle split tokens)
The Result
When you run: python main.py stream_middleware --query "My email is test@example.com"
Output:
Even though the LLM generated the real email, our middleware caught it on the fly and replaced it before the user saw it.
This demo proves that LCEL isn't just syntactic sugar—it's a powerful framework for building complex, production-ready flows. We achieved:
- Dynamic Logic (Routing)
- Performance (Parallelism)
- Safety (Middleware)
...all using standard, composable components running locally with Ollama!



Top comments (0)