Harish Kotra (he/him)

Posted on Jan 16

Mastering LangChain Expression Language (LCEL): Branching, Parallelism, and Streaming

#ai #langchain #programming #python

Building AI applications often feels like writing "glue code"—endless if/else statements and loops to manage how data flows between your Prompt, LLM, and Output Parser.

LangChain Expression Language (LCEL) solves this by giving us a declarative, composable way to build chains. It’s like Unix pipes | for AI.

In this post, I'll walk you through a Python demo I built using LangChain, Ollama, and the Gemma model that showcases three advanced capabilities: Routing, Parallel Execution, and Streaming Middleware.

1. Intelligent Routing (Branching)

The Problem

You have one chatbot, but you want it to behave differently based on what the user asks. If they ask for code, you want a "Senior Engineer" persona. If they ask about data, you want a "Data Scientist".

The LCEL Solution: `RunnableBranch`

Instead of writing imperative if statements, we build a Router Chain.

Classify Intent: We ask the LLM to categorize the input (e.g., "code", "data", "general").
Branch: We use RunnableBranch to direct the flow to the correct sub-chain.

The Code

# A chain that outputs "code", "data", or "general"
classifier_chain = classifier_prompt | llm | parser

# Route based on the output of classifier_chain
routing_chain = RunnableBranch(
    (lambda x: x["intent"] == "code", code_chain),
    (lambda x: x["intent"] == "data", data_chain),
    general_chain 
)

The Result

When you run: python main.py routing --query "Write a binary search in Python"

Output:

[Router] Detected 'code'

def binary_search(arr, target):
    # ... concise, professional code output ...

The system automatically detected the intent and switched to the coding expert persona!

2. Parallel Fan-Out (Multi-Source RAG)

The Problem

You need to answer a question using info from multiple distinct documents (e.g., your internal wiki, API docs, and general notes). Querying them one by one is slow.

The LCEL Solution: `RunnableParallel`

RunnableParallel runs multiple runnables at the same time. We use it to fan-out our query to three different retrievers simultaneously.

The Code

parallel_retrievers = RunnableParallel({
    "lc_docs": retriever_langchain,
    "ollama_docs": retriever_ollama,
    "misc_docs": retriever_misc,    
})

The Result

When you run: python main.py parallel_rag --query "What is LCEL?"

Output:

The "Merger" step received results from all three databases instantly, combined them, and the LLM answered using the full context.

3. Streaming Middleware (Real-Time Transforms)

The Problem

You are streaming the LLM's response letter-by-letter to the user, but you need to catch sensitive information (like PII) before it hits the screen.

The LCEL Solution: Generator Middleware

We can wrap the standard .astream() iterator with our own Python async generator. This acts as a "middleware" layer that can buffer, sanitize, or log the tokens in real-time.

The Code

async def middleware_stream(iterable):
    buffer = ""
    async for chunk in iterable:
        buffer += chunk
        # If buffer contains a potential email, Redact it
        if "@" in buffer:
             yield "[REDACTED_EMAIL]"
             buffer = ""
        else:
             yield buffer

(Note: The actual implementation uses smarter buffering to handle split tokens)

The Result

When you run: python main.py stream_middleware --query "My email is test@example.com"

Output:

Even though the LLM generated the real email, our middleware caught it on the fly and replaced it before the user saw it.

This demo proves that LCEL isn't just syntactic sugar—it's a powerful framework for building complex, production-ready flows. We achieved:

Dynamic Logic (Routing)
Performance (Parallelism)
Safety (Middleware)

...all using standard, composable components running locally with Ollama!

Github: https://github.com/harishkotra/langchain-lcel-demo

DEV Community

Mastering LangChain Expression Language (LCEL): Branching, Parallelism, and Streaming

1. Intelligent Routing (Branching)

The Problem

The LCEL Solution: `RunnableBranch`

The Code

The Result

2. Parallel Fan-Out (Multi-Source RAG)

The Problem

The LCEL Solution: `RunnableParallel`

The Code

The Result

3. Streaming Middleware (Real-Time Transforms)

The Problem

The LCEL Solution: Generator Middleware

The Code

The Result

Top comments (0)

1. Intelligent Routing (Branching)

The Problem

The LCEL Solution: RunnableBranch

The Code

The Result

2. Parallel Fan-Out (Multi-Source RAG)

The Problem

The LCEL Solution: RunnableParallel

The Code

The Result

3. Streaming Middleware (Real-Time Transforms)

The Problem

The LCEL Solution: Generator Middleware

The Code

The Result

The LCEL Solution: `RunnableBranch`

The LCEL Solution: `RunnableParallel`