Liu et al. 2023 (Lost in the Middle, TACL) found multi-document QA accuracy drops roughly 20 percentage points when the relevant document sits mid-context versus first or last position. The U-shaped degradation holds across GPT-3.5, GPT-4, and Claude. It is not a model quirk. It is an architectural constant.
Attention weights dilute over long spans. The softmax over a 100k token window turns middle evidence into background noise. Your prompt is not a flat file system. It is a priority queue where the head and tail get probability mass and the center gets averaged out. Recency and primacy bias in transformers are features, not bugs.
Picture a RAG pipeline in context_builder.py ingesting fifty chunks. The retriever ranks the answer chunk at position twenty-six. The generator sees it buried between two irrelevant JSON blobs. Accuracy tanks. The fix is not better retrieval. It is reranking so the gold chunk hits index zero or appending it to the tail. Same tokens, different order, different output.
If you are shipping a founder-facing tool, never ask the model to synthesize a buried insight. Put the user query, the schema, and the critical evidence in the first 1k tokens. Summarize the middle. End with a clear instruction. Liu et al. 2023 gave us the map. Use the edges. That is the Lost in the Middle pattern.
Top comments (0)