Serhii Kalyna

Posted on May 19 • Originally published at kalyna.pro

LangChain vs LlamaIndex: Which Python AI Framework to Choose (2026)

#ai #langchain #python #machinelearning

Choosing between LangChain and LlamaIndex is one of the first decisions you face when building a Python AI application in 2026. Both frameworks accelerate LLM development, both are open source, and both have active communities — but they solve different problems.

Originally published at kalyna.pro

What Is LangChain?

LangChain is a framework for building applications powered by language models. Its core idea is composability: you chain together prompts, models, tools, memory, and agents using a unified interface called LCEL (LangChain Expression Language).

LangChain's core strengths:

Agents and tool use — first-class support for giving LLMs access to tools (web search, calculators, APIs)
Memory — conversation history, entity memory, and summary memory built in
Broad integrations — 100+ LLM providers, 50+ vector stores, dozens of data tools
LangSmith — first-party tracing, evaluation, and dataset management platform
LCEL — composable, streaming-native, async-first pipeline syntax

What Is LlamaIndex?

LlamaIndex (formerly GPT Index) is a data framework for LLM applications. Its core idea is data ingestion and indexing: you load documents from any source, index them efficiently, and query the index with natural language.

LlamaIndex's core strengths:

Document ingestion — loaders for PDFs, Word docs, databases, APIs, cloud storage, and SaaS tools
Advanced retrieval — hybrid search, re-ranking, recursive retrieval, query decomposition
Index types — vector, keyword, tree, summary, and knowledge graph indexes in one API
Query engines — natural language interfaces over structured and unstructured data
Agents — ReAct and OpenAI-style agents that reason over indexes and call tools

LangChain vs LlamaIndex: Key Differences

	LangChain	LlamaIndex
Primary focus	LLM orchestration & agents	Data indexing & RAG
Core abstraction	Chain / Runnable	Index / QueryEngine
Best for	Multi-step agents, tool use, memory	RAG over private docs, structured Q&A
Learning curve	Steeper (many abstractions)	Gentler for RAG, opinionated defaults
Ecosystem	Broader (agents, tools, evaluation)	Deeper on retrieval and indexing

When to Use LangChain

Choose LangChain when you need multi-step pipelines, agents, or conversation memory.

from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatAnthropic(model="claude-sonnet-4-6")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise technical writer."),
    ("human", "{question}"),
])

chain = prompt | llm | StrOutputParser()

answer = chain.invoke({"question": "What is LCEL in LangChain?"})
print(answer)

Adding Retrieval to a LangChain Chain

from langchain_anthropic import ChatAnthropic
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

llm = ChatAnthropic(model="claude-sonnet-4-6")
prompt = ChatPromptTemplate.from_template(
    "Answer using only the context below.\n\nContext:\n{context}\n\nQuestion: {question}"
)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print(rag_chain.invoke("What topics are covered?"))

When to Use LlamaIndex

Choose LlamaIndex when you need to ingest large document collections or apply advanced retrieval.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load all documents from a folder (PDF, TXT, DOCX, HTML, ...)
documents = SimpleDirectoryReader("./docs").load_data()

# Build a vector index — chunking, embedding, and storage handled automatically
index = VectorStoreIndex.from_documents(documents)

# Query with natural language
query_engine = index.as_query_engine()
response = query_engine.query("What are the main topics in these documents?")
print(response)

Using LlamaIndex with Claude

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = Anthropic(model="claude-sonnet-4-6")
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("Summarize the key findings.")
print(response)

Can You Use Both Together?

Yes — use LlamaIndex for retrieval and LangChain for orchestration:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# LlamaIndex: build the index and retrieve context
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever(similarity_top_k=4)


def retrieve_context(question: str) -> str:
    nodes = retriever.retrieve(question)
    return "\n\n".join(n.get_content() for n in nodes)


# LangChain: format and generate the answer
llm = ChatAnthropic(model="claude-sonnet-4-6")
prompt = ChatPromptTemplate.from_template(
    "Answer using only the context.\n\nContext:\n{context}\n\nQuestion: {question}"
)
chain = prompt | llm | StrOutputParser()

question = "What deployment options are available?"
context = retrieve_context(question)
answer = chain.invoke({"context": context, "question": question})
print(answer)

This separation lets you upgrade each layer independently.

Performance and Ecosystem

Streaming — both stream tokens natively; LangChain's LCEL is explicitly streaming-first
Async — LangChain: chain.ainvoke(); LlamaIndex: query_engine.aquery()
Batching — LangChain's chain.batch() parallelises multiple inputs automatically
Community — LangChain ~90k GitHub stars; LlamaIndex ~37k stars, more enterprise-focused
API stability — both stabilised after major rewrites and are production-ready in 2026
Shared ecosystem — both integrate with Chroma, Pinecone, Weaviate, Qdrant, pgvector, and every major LLM provider

In RAG accuracy benchmarks, LlamaIndex's default retrieval pipeline consistently matches or beats LangChain's equivalent with less configuration.

Choosing the Right Tool

Need agents that call tools dynamically? → LangChain
Building a RAG pipeline over a large document collection? → LlamaIndex
Need advanced retrieval: hybrid search, re-ranking, recursive? → LlamaIndex
Need conversation memory or multi-turn dialogue? → LangChain
Q&A over SQL, CSV, or Pandas DataFrames? → LlamaIndex
Orchestrating multiple LLM calls with conditional branching? → LangChain
Want minimal boilerplate for a quick RAG prototype? → LlamaIndex
Production system needing both rich retrieval and agent behaviour? → Both

Summary

LangChain excels at orchestration — chaining LLM steps, building agents, managing memory
LlamaIndex excels at data — ingesting documents, building indexes, and powering RAG pipelines
Both support async, streaming, and the same popular vector stores and LLM providers
They are complementary — many production systems use LlamaIndex for retrieval inside a LangChain agent
For most RAG use cases in 2026, start with LlamaIndex; for multi-step agent workflows, start with LangChain

DEV Community