DEV Community

Cover image for Mastering LangChain Expression Language (LCEL): Branching, Parallelism, and Streaming
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Mastering LangChain Expression Language (LCEL): Branching, Parallelism, and Streaming

Building AI applications often feels like writing "glue code"—endless if/else statements and loops to manage how data flows between your Prompt, LLM, and Output Parser.

LangChain Expression Language (LCEL) solves this by giving us a declarative, composable way to build chains. It’s like Unix pipes | for AI.

In this post, I'll walk you through a Python demo I built using LangChain, Ollama, and the Gemma model that showcases three advanced capabilities: Routing, Parallel Execution, and Streaming Middleware.

1. Intelligent Routing (Branching)

The Problem

You have one chatbot, but you want it to behave differently based on what the user asks. If they ask for code, you want a "Senior Engineer" persona. If they ask about data, you want a "Data Scientist".

The LCEL Solution: RunnableBranch

Instead of writing imperative if statements, we build a Router Chain.

  1. Classify Intent: We ask the LLM to categorize the input (e.g., "code", "data", "general").
  2. Branch: We use RunnableBranch to direct the flow to the correct sub-chain.

The Code

# A chain that outputs "code", "data", or "general"
classifier_chain = classifier_prompt | llm | parser

# Route based on the output of classifier_chain
routing_chain = RunnableBranch(
    (lambda x: x["intent"] == "code", code_chain),
    (lambda x: x["intent"] == "data", data_chain),
    general_chain 
)
Enter fullscreen mode Exit fullscreen mode

The Result

When you run: python main.py routing --query "Write a binary search in Python"

Output:

[Router] Detected 'code'

def binary_search(arr, target):
    # ... concise, professional code output ...
Enter fullscreen mode Exit fullscreen mode

The system automatically detected the intent and switched to the coding expert persona!

RunnableBranch

2. Parallel Fan-Out (Multi-Source RAG)

The Problem

You need to answer a question using info from multiple distinct documents (e.g., your internal wiki, API docs, and general notes). Querying them one by one is slow.

The LCEL Solution: RunnableParallel

RunnableParallel runs multiple runnables at the same time. We use it to fan-out our query to three different retrievers simultaneously.

The Code

parallel_retrievers = RunnableParallel({
    "lc_docs": retriever_langchain,
    "ollama_docs": retriever_ollama,
    "misc_docs": retriever_misc,    
})
Enter fullscreen mode Exit fullscreen mode

The Result

When you run: python main.py parallel_rag --query "What is LCEL?"

Output:

Parallel Fan-Out

The "Merger" step received results from all three databases instantly, combined them, and the LLM answered using the full context.

3. Streaming Middleware (Real-Time Transforms)

The Problem

You are streaming the LLM's response letter-by-letter to the user, but you need to catch sensitive information (like PII) before it hits the screen.

The LCEL Solution: Generator Middleware

We can wrap the standard .astream() iterator with our own Python async generator. This acts as a "middleware" layer that can buffer, sanitize, or log the tokens in real-time.

The Code

async def middleware_stream(iterable):
    buffer = ""
    async for chunk in iterable:
        buffer += chunk
        # If buffer contains a potential email, Redact it
        if "@" in buffer:
             yield "[REDACTED_EMAIL]"
             buffer = ""
        else:
             yield buffer
Enter fullscreen mode Exit fullscreen mode

(Note: The actual implementation uses smarter buffering to handle split tokens)

The Result

When you run: python main.py stream_middleware --query "My email is test@example.com"

Output:

Generator Middleware

Even though the LLM generated the real email, our middleware caught it on the fly and replaced it before the user saw it.

This demo proves that LCEL isn't just syntactic sugar—it's a powerful framework for building complex, production-ready flows. We achieved:

  1. Dynamic Logic (Routing)
  2. Performance (Parallelism)
  3. Safety (Middleware)

...all using standard, composable components running locally with Ollama!

Github: https://github.com/harishkotra/langchain-lcel-demo

Top comments (0)