Originally published on AI Tech Connect.
What you need to know For most of RAG's short history the argument for it was simple: the context window was too small to hold your data, so you had to retrieve. As of mid-2026 that argument has collapsed. Every frontier family now ships a roughly 1M-token window as standard, and a few reach further. So the obvious question, asked in every architecture review from Bengaluru to Bristol, is whether retrieval-augmented generation was a workaround for a limitation that no longer exists — whether you can now delete the vector database, stop worrying about chunking, and simply paste the whole corpus into the prompt. The honest answer is: sometimes, but far less often than the headline windows suggest, and almost never at scale. A marketed window is not a usable window, longer prompts cost more…
Top comments (0)