Discussion on: DeepSeek V4: Million-Token Context That Actually Works

View post

The 94% hallucination rate on Omniscience is the tell. More context window doesn't fix unstructured data — it just gives you more room to be confidently wrong.

We ran a benchmark across 45 domains (7,928 queries) comparing RAG, GraphRAG, and pre-structured knowledge graphs. CKG hit F1 = 0.471 vs RAG's 0.123 at 11× fewer tokens. The structure is doing the work the context window can't.

For agents doing legal analysis or clinical research, V4's 1M context + a domain CKG is actually the right stack — context for state, CKG for facts. Complementary, not competing.

Benchmark: github.com/Yarmoluk/ckg-benchmark