Why I Modelled My LLM Pipeline as a DAG Instead of a Chain — and What I Found Out

#python #llm #architecture #opensource

The problem with chains in production

Every major Python LLM framework gives you the same primitive: a chain.

LangChain's LCEL. LlamaIndex's pipeline. Haystack's components. They all model your pipeline as a linear sequence of steps — input flows through A, then B, then C, output comes out the end.

For a hello-world RAG demo, that's fine. For a production system, you hit the wall fast.

What chains can't express cleanly

Here's a real pipeline I needed to build:

Classify the incoming query
Based on classification: route to either semantic search, keyword search, or both in parallel
If both: merge and re-rank results
Generate response from ranked context
If any retrieval stage fails: surface a clear error, don't silently continue

Try expressing that as a chain. You end up with nested chains, manual asyncio.gather() calls outside the framework, try/except blocks swallowing exceptions to keep the chain going, and no clean way to express "step D depends on both B and C."

The abstraction is fighting you.

DAGs are the right model

A directed acyclic graph expresses all of this naturally.

Nodes are tasks. Edges are data dependencies. Execution is topologically ordered — a node fires when all its upstream dependencies have resolved.

from synapsekit import Pipeline

pipeline = Pipeline()
pipeline.add_node("classify", ClassifierNode())
pipeline.add_node("semantic_search", RAGNode(store=vector_store))
pipeline.add_node("keyword_search", BM25Node(index=bm25_index))
pipeline.add_node("rerank", RerankerNode())
pipeline.add_node("generate", LLMNode(model="gpt-4o"))

pipeline.add_edge("classify", "semantic_search")
pipeline.add_edge("classify", "keyword_search")
pipeline.add_edge("semantic_search", "rerank")
pipeline.add_edge("keyword_search", "rerank")
pipeline.add_edge("rerank", "generate")

result = await pipeline.run(query="explain async-native design")

semantic_search and keyword_search run concurrently. rerank waits for both. generate waits for rerank. The execution engine handles ordering. You describe the dependencies.

Failure propagation that actually works

In a chain, a failed step either kills the whole pipeline or gets swallowed silently.

In a DAG, failure has semantics. If semantic_search fails, rerank — which depends on it — is cancelled. generate — which depends on rerank — is also cancelled. You get a clear error naming the failed node and its dependents.

No silent degradation. Failure is explicit and traceable.

The async piece

Nodes with no dependency relationship between them run concurrently automatically. The execution engine handles asyncio.gather() at each topological level. You write individual async node functions. The graph handles orchestration.

class SemanticSearchNode(Node):
    async def execute(self, inputs):
        results = await self.vector_store.asearch(inputs["query"])
        return {"results": results}

No manual gather calls. The graph structure encodes the parallelism.

Is it worth the complexity?

For simple A → B → C pipelines with no branching: a chain is fine.

The moment you have parallel retrieval, conditional routing, or stages where failure isolation matters — the chain abstraction costs more than it saves.

SynapseKit is the framework I built around this model:
https://github.com/SynapseKit/SynapseKit
API Doc: https://synapsekit.github.io/synapsekit-docs/

10k PyPI downloads since launch. The engineers who need this know exactly why.

What does your production RAG architecture look like? Drop it in the comments.