DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at aifoss.dev

LangChain vs LlamaIndex vs Haystack 2026: Which to Use

This article was originally published on aifoss.dev

TL;DR: LangGraph (LangChain's agent layer) handles multi-step agents with tool calls and persistent memory better than the alternatives. LlamaIndex ships a working RAG pipeline faster with lower overhead. Haystack wins when you need every pipeline step to be serializable, testable, and auditable. All three work with Ollama for fully local deployments.

LangGraph / LangChain LlamaIndex Haystack
Best for Multi-step agents, tool calls, persistent memory RAG-first projects, multi-modal retrieval Production teams, auditable pipelines
Current version LangChain 1.3.2 / LangGraph 1.2.2 0.14.22 (May 2026) 2.29.0 (May 2026)
License MIT MIT Apache 2.0
Framework overhead ~10 ms (LangChain) / ~14 ms (LangGraph) ~6 ms ~5.9 ms
The catch Higher latency, largest abstraction surface Agent support is secondary Smaller community, verbose wiring

Honest take: Start with LlamaIndex for a new RAG project — you'll have something working in 30 minutes. Move to LangGraph if you outgrow it, or use them together.


What changed in 2026

Three years ago, choosing between these frameworks was mostly a style preference. Today they've diverged into genuinely different tools making different primary bets.

LangChain hit its v1.0 stable milestone in October 2025 with LangGraph positioned as the primary interface for any non-trivial workflow. LangChain 1.3.2 + LangGraph 1.2.2 is the current production combo. LangGraph adds checkpointing (agents survive server restarts), fine-grained node execution control with per-node timeouts and error recovery, and a content-block-centric streaming API. The ecosystem has over 100K GitHub stars and the largest community of the three by a significant margin. The flip side: LangGraph adds another abstraction layer on top of an already-layered stack. If you're building a semantic search API without agents, that complexity is unnecessary.

LlamaIndex is at v0.14.22 (released May 14, 2026) and has repositioned itself as an "agentic document and OCR platform" rather than just a query library. The core remains the best-in-class retrieval pipeline — VectorStoreIndex, HybridRetriever, and SubQuestionQueryEngine are production-grade with minimal setup. Multi-modal retrieval (text and images in the same query pipeline) works properly now; it was research-grade in v0.10. The framework has ~40K GitHub stars and a mature integration ecosystem. The Ollama packages (llama-index-llms-ollama, llama-index-embeddings-ollama) have been stable for over a year.

Haystack is at v2.29.0 (released May 12, 2026) and is a complete architectural rewrite from v1. Every pipeline is a typed directed acyclic graph (DAG) where each component declares its inputs and outputs explicitly, pipelines serialize to YAML for version control and deployment, and the observability stack (OpenTelemetry, Langfuse, MLflow) is first-class rather than bolted on. v2.29.0 builds on the State injection feature from v2.28.0, which lets components access and modify live agent state at invocation time. Haystack has ~15K GitHub stars — smaller community means harder debugging when you hit edge cases.


Minimal RAG pipeline in each

Same task for all three: index a folder of PDFs, run a semantic query, return an answer.

LlamaIndex — fewest lines to a working pipeline

pip install llama-index llama-index-llms-ollama llama-index-embeddings-ollama
Enter fullscreen mode Exit fullscreen mode
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm = Ollama(model="llama3.2", request_timeout=60.0)
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("What are the key findings?")
print(response)
Enter fullscreen mode Exit fullscreen mode

SimpleDirectoryReader handles PDFs, DOCX, TXT, and HTML out of the box. VectorStoreIndex defaults to in-memory storage; swapping in Chroma, Qdrant, or Weaviate requires one line. The global Settings object propagates your LLM and embedding choices to every component downstream — you set it once.

For a deeper look at how chunking and retrieval choices affect output quality, see RAG Architecture Deep Dive 2026.

LangChain — more explicit, more imports

pip install langchain langchain-ollama langchain-community chromadb pypdf
Enter fullscreen mode Exit fullscreen mode
from langchain_ollama import OllamaLLM, OllamaEmbeddings
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

llm = OllamaLLM(model="llama3.2")
embeddings = OllamaEmbeddings(model="nomic-embed-text")

loader = PyPDFDirectoryLoader("./docs")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(docs)

vectorstore = Chroma.from_documents(chunks, embeddings)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever()
)

result = qa_chain.invoke({"query": "What are the key findings?"})
print(result["result"])
Enter fullscreen mode Exit fullscreen mode

More lines, more imports, same result. The explicit text-splitter step is a real difference — LangChain doesn't auto-chunk on ingest. That's explicit control, not a bug, but it's friction you don't hit with LlamaIndex until you need to tune chunk sizes for production.

Haystack — maximum explicitness

pip install haystack-ai ollama-haystack
Enter fullscreen mode Exit fullscreen mode
from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.components.builders import PromptBuilder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.ollama import (
    OllamaDocumentEmbedder, OllamaTextEmbedder
)
from haystack_integrations.components.generators.ollama import OllamaGenerator

document_store = InMemoryDocumentStore()

# Step 1: indexing pipeline
indexing = Pipeline()
indexing.add_component("converter", PyPDFToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=10))
indexing.add_component("embedder", OllamaDocumentEmbedder(model="nomic-embed-text"))
indexing.add_component("writer", DocumentWriter(document_store=document_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": ["./docs/report.pdf"]}})

# Step 2: query pipeline
TEMPLATE = """
Answer based on the documents below.
Documents: {% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{ question }}
"""
querying = Pipeline()
querying.add_component("embedder", OllamaTextEmbedder(model="nomic-embed-text"))
querying.add_component("retriever", InMemoryEmbeddingRetriever(document_store))
querying.add_component("prompt_builder", PromptBuilder(template=TEMPLATE))
querying.add_component("generator", OllamaGenerator(model="llama3.2"))
querying.connect("embedder.embedding", "retriever.query_embedding")
querying.connect("retriever.documents", "prompt_builder.documents")
querying.connect("prompt_builder", "generator.prompt")

result = querying.run({
    "embedder": {"text": "What are the key findings?"},
    "prompt_builder": {"question": "What are the key findings?"}
})
print(result["generator"]["replies"][0])
Enter fullscreen mode Exit fullscreen mode

Verbose? Yes. But every connection is visible and traceable. If the pipe

Top comments (0)