DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

LangChain vs LlamaIndex: 1M Document Query Speed Test

The 47-Second Query That Shouldn't Exist

My RAG pipeline hit a wall at 800K documents. What took 200ms at 10K documents suddenly took 47 seconds. The culprit wasn't the vector database—it was how LangChain and LlamaIndex handle the retrieval-to-LLM handoff differently at scale.

I've already covered the 10K document comparison, but 1M documents is a different beast entirely. The bottlenecks shift from embedding lookup to metadata filtering, re-ranking overhead, and memory management. Here's what actually happens when you push both frameworks to their limits.

Bright and colorful display of various Asian products on supermarket shelves.

Photo by lee starry on Pexels

Test Setup: 1M Wikipedia Chunks on a 32GB Machine

Before diving in, let me be clear about the constraints. I ran this on a single machine with 32GB RAM and an RTX 4090 (24GB VRAM). Production deployments would use distributed vector stores, but I wanted to isolate the framework overhead from infrastructure scaling.

The dataset: 1,048,576 chunks from Wikipedia (roughly 500 tokens each), embedded with text-embedding-3-small. Total embedding size around 6GB in float32.


python

---

*Continue reading the full article on [TildAlice](https://tildalice.io/langchain-vs-llamaindex-1m-document-query-speed/)*
Enter fullscreen mode Exit fullscreen mode

Top comments (0)