LangChain vs LlamaIndex: 1M Document Query Speed Test

#langchain #llamaindex #vectorstore #ragperformance

The 47-Second Query That Shouldn't Exist

My RAG pipeline hit a wall at 800K documents. What took 200ms at 10K documents suddenly took 47 seconds. The culprit wasn't the vector database—it was how LangChain and LlamaIndex handle the retrieval-to-LLM handoff differently at scale.

I've already covered the 10K document comparison, but 1M documents is a different beast entirely. The bottlenecks shift from embedding lookup to metadata filtering, re-ranking overhead, and memory management. Here's what actually happens when you push both frameworks to their limits.

Bright and colorful display of various Asian products on supermarket shelves. — Photo by lee starry on Pexels

Test Setup: 1M Wikipedia Chunks on a 32GB Machine

Before diving in, let me be clear about the constraints. I ran this on a single machine with 32GB RAM and an RTX 4090 (24GB VRAM). Production deployments would use distributed vector stores, but I wanted to isolate the framework overhead from infrastructure scaling.

The dataset: 1,048,576 chunks from Wikipedia (roughly 500 tokens each), embedded with text-embedding-3-small. Total embedding size around 6GB in float32.


python

---

*Continue reading the full article on [TildAlice](https://tildalice.io/langchain-vs-llamaindex-1m-document-query-speed/)*

DEV Community

LangChain vs LlamaIndex: 1M Document Query Speed Test

The 47-Second Query That Shouldn't Exist

Test Setup: 1M Wikipedia Chunks on a 32GB Machine

Top comments (0)