The 47-Second Query That Shouldn't Exist
My RAG pipeline hit a wall at 800K documents. What took 200ms at 10K documents suddenly took 47 seconds. The culprit wasn't the vector database—it was how LangChain and LlamaIndex handle the retrieval-to-LLM handoff differently at scale.
I've already covered the 10K document comparison, but 1M documents is a different beast entirely. The bottlenecks shift from embedding lookup to metadata filtering, re-ranking overhead, and memory management. Here's what actually happens when you push both frameworks to their limits.
Test Setup: 1M Wikipedia Chunks on a 32GB Machine
Before diving in, let me be clear about the constraints. I ran this on a single machine with 32GB RAM and an RTX 4090 (24GB VRAM). Production deployments would use distributed vector stores, but I wanted to isolate the framework overhead from infrastructure scaling.
The dataset: 1,048,576 chunks from Wikipedia (roughly 500 tokens each), embedded with text-embedding-3-small. Total embedding size around 6GB in float32.
python
---
*Continue reading the full article on [TildAlice](https://tildalice.io/langchain-vs-llamaindex-1m-document-query-speed/)*

Top comments (0)