LangChain gives you building blocks. LlamaIndex gives you the data layer. If your AI app needs to ingest, index, and query documents — PDFs, databases, APIs, Slack, Notion — LlamaIndex is purpose-built for that job.
What LlamaIndex Gives You for Free
- 160+ data connectors — load from PDF, Notion, Slack, GitHub, databases, APIs
- Advanced indexing — vector, keyword, knowledge graph, tree, and hybrid indexes
- Query engines — natural language queries over your data
- Agents — AI agents that can search, filter, and combine data sources
- Evaluation — measure RAG quality with built-in metrics
- LlamaCloud — managed service for production RAG (free tier)
Quick Start
pip install llama-index
export OPENAI_API_KEY=sk-...
RAG in 5 Lines
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load all documents from a directory
documents = SimpleDirectoryReader('data/').load_data()
# Create vector index
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is the refund policy?")
print(response)
That's a complete RAG pipeline. Drop PDFs in data/, ask questions in natural language.
Data Connectors (160+ Sources)
from llama_index.readers.notion import NotionPageReader
from llama_index.readers.slack import SlackReader
from llama_index.readers.github import GithubRepositoryReader
# Notion
notion_docs = NotionPageReader(integration_token=token).load_data(
page_ids=["page_id_1", "page_id_2"]
)
# Slack
slack_docs = SlackReader(slack_token=token).load_data(
channel_ids=["C01234567"]
)
# GitHub
github_docs = GithubRepositoryReader(
owner="facebook", repo="react"
).load_data()
# Combine everything into one index
all_docs = notion_docs + slack_docs + github_docs
index = VectorStoreIndex.from_documents(all_docs)
Advanced: Sub-Question Query Engine
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
# Create separate indexes for different data sources
financial_tool = QueryEngineTool(
query_engine=financial_index.as_query_engine(),
metadata=ToolMetadata(name="financials", description="Company financial data")
)
hr_tool = QueryEngineTool(
query_engine=hr_index.as_query_engine(),
metadata=ToolMetadata(name="hr", description="HR policies and procedures")
)
# Engine that breaks complex questions into sub-questions
engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=[financial_tool, hr_tool]
)
# This query gets broken into sub-questions automatically:
response = engine.query(
"Compare our Q4 revenue with headcount growth from HR reports"
)
Evaluation (Measure RAG Quality)
from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator
faithfulness = FaithfulnessEvaluator()
relevancy = RelevancyEvaluator()
query = "What is the vacation policy?"
response = query_engine.query(query)
# Is the response faithful to the source documents?
faith_result = faithfulness.evaluate_response(response=response)
print(f"Faithful: {faith_result.passing}") # True/False
# Is the response relevant to the query?
rel_result = relevancy.evaluate_response(query=query, response=response)
print(f"Relevant: {rel_result.passing}")
LlamaIndex vs LangChain
| Aspect | LlamaIndex | LangChain |
|---|---|---|
| Focus | Data indexing & retrieval | General LLM orchestration |
| RAG | Purpose-built, advanced | Basic implementation |
| Data connectors | 160+ built-in | Community-maintained |
| Index types | Vector, keyword, graph, tree | Vector only |
| Evaluation | Built-in metrics | Separate (LangSmith) |
| Best for | RAG-heavy applications | Agent-heavy applications |
The Verdict
If your AI app is primarily about querying your own data, LlamaIndex is the specialized tool for the job. Better indexing, more data connectors, built-in evaluation, and a cleaner API for RAG-specific workflows.
Need help building AI-powered data pipelines or web scrapers? I build custom solutions. Reach out: spinov001@gmail.com
Check out my awesome-web-scraping collection — 400+ tools for extracting web data.
Top comments (0)