I mapped LangChain Core as a knowledge graph. 180 modules, 650 dependency edges. Here's what the structure reveals that the docs never tell you.
Finding 1: The messages module has a 70% blast radius.
Change it and 126 of 180 modules break — directly or transitively. Every callback, every agent, every retriever, every embedding module traces a dependency path back to messages. It is the load-bearing wall of the entire framework. Nothing in the documentation flags this.
Finding 2: runnables.base requires 147 other modules to fully function.
That is 82% of the codebase as a prerequisite chain. Before an agent touches runnables.base, it needs ground-truth awareness of almost everything else. Without that map, it is guessing.
Finding 3: Exactly 7 modules are completely safe to modify without any downstream risk.
cross_encoders, structured_query, sys_info, version, utils.html, utils.image, utils.mustache. Seven. Out of 180.
Why this matters for agents:
A coding agent dispatched to modify LangChain without this map will grep for context, retrieve similar-looking docs, and make a confident, structurally wrong change. The blast radius is invisible to similarity search. It is only visible to graph traversal.
This is the difference between retrieval and spatial intelligence. RAG finds text that looks relevant. A knowledge graph tells you what actually breaks.
The dataset is live. The same query interface that works on GLP-1 pharmacology and ICD-10 classification works on a codebase. The domain doesn't matter. The structure does.
LangChain Core CKG (180 modules, 650 edges): https://huggingface.co/datasets/danyarm/ckg-benchmark
MCP server — query it directly: https://github.com/Yarmoluk/ckg-mcp
Full benchmark (RAG vs CKG across 54 domains): https://graphifymd.com/paper.html
Top comments (0)