We accidentally built a brain.
Okay, not literally. It's software, and nowhere near as capable as the real thing. But it kept reaching for the same tricks the brain uses, and we never planned that.
We set out to build a better knowledge graph for legal and drilling documents. We ended up with software analogues of spreading activation, episodic memory, recognition memory, memory consolidation, executive function, self-checking, and fast-vs-slow reasoning. Each one got added to kill one specific bug. A wrong answer. A missing clause. A number it invented. Only when we stepped back did the parallel to neuroscience get unsettling.
For context: we're an eight-month-old startup. This didn't come from a research lab. It came from not being able to sleep while our own system gave answers we couldn't trust.
We benchmarked six knowledge graph frameworks on the same data, same LLM, same embeddings: LightRAG, HippoRAG, PathRAG, OG-RAG, Graphify, PageIndex. And ours.
Honestly, every one of them is impressive. They're fast, they scale to far more documents than we tested, and we learned a lot just reading their code.
But for the kind of work we care about, one thing kept nagging at us. These systems retrieve and generate, then hand you the answer. What most of them don't do, at least not out of the box, is check that answer back against the source before showing it to you. For a chatbot, that's fine. For a contract or a compliance document, it isn't. A wrong fine, a missed exception, a mis-cited article, and nobody catches it until it matters: a lawyer leaning on the wrong clause, a compliance officer missing an exception, an engineer acting on a number that was never in the source.
So that became the problem we set out to solve. Not because anyone else got it wrong, but because our use case needed something they weren't built for.
So we built something different, and stopped calling it a knowledge graph. We call it the SwarmLens Cognitive Index.
Here's the idea. Most knowledge graphs embed your query, match nodes, traverse a few edges, rank, and generate. One pathway, one shot, no second-guessing. Your brain doesn't work that way. Activation spreads across everything you know. Your hippocampus calls up similar situations you've faced before. You filter noise, break the question into parts, watch for gaps, and when a number feels off, you stop and check.
We built a software analogue of each step. Real, separate, inspectable parts. Far rougher than biology, but each playing a similar role.
Brain function on the left, our code on the right:
Spreading activation = Personalized PageRank. Ask something and the signal ripples out through the graph along real relationships, so things described differently but connected by meaning light up.
Functional specialization = Leiden communities. The graph self-sorts into topic clusters, each with its own summary, like specialized regions of the brain.
Memory consolidation = RAPTOR A recursive tree boils many cluster summaries down to a few themes and one overview, the way sleep turns detail into gist.
Recognition memory = a fast fact filter. A quick "have I seen this before?" pass throws out noise before the deep search. (HippoRAG calls it the same thing.)
Episodic memory = an episode store with a temporal guard. It reuses answers that worked before, but drops any built on documents that have since been replaced. None of the six we tested do this.
Prefrontal planning = query decomposition. Ask three things at once and it splits them into three focused searches instead of one blurry one.
Executive function = a critics pipeline. After a draft, four checkers hunt for missing articles, skipped sub-clauses, uncited provisions, and questions the sources can't answer. Find a gap, go back and fill it.
Self-checking = a numeric guardrail + abstain gate. Every figure in the answer is matched against the source text; if it isn't there, it's flagged. Thin evidence, and it says "not sure" instead of guessing. Blunt, not self-aware, but it beats a confident lie.
Synaptic strength = consensus-weighted edges. The more independent passes that find the same link, the stronger it gets; weak ones are tagged "inferred," not "confirmed."
Fast and slow thinking = runtime modes. We index everything up front, then pick the gear at query time. A fast mode answers in seconds. A full mode runs every lane and every check, takes its time, and costs real money. None of the others let you dial that.
And let me be honest: almost none of these building blocks are ours. We took them from published research and stitched them together. Personalized PageRank over a graph for memory-style retrieval is the core of HippoRAG (NeurIPS 2024), built on the hippocampal-indexing theory. Organizing a system around working, episodic, and semantic memory is the premise of "cognitive architectures for language agents." Fast-vs-slow reasoning for LLMs is its own active research direction. We didn't follow that map on purpose. We kept fixing failures and looked up to find we'd redrawn it. What's ours is the combination, and the refusal to ship an answer the system hasn't checked.
The part I haven't seen elsewhere: the framework rewrites itself for your data.
Every other tool we tested is fixed. Same types, prompts, and chunking, whatever you feed it. Ours reads your documents first, works out their structure, and induces its own categories. From the legal corpus, it discovered its own entity and relation types and wrote its own domain-specific prompts, none by us. Then it deletes the modules your data doesn't need. A generic framework goes in; a custom-built one comes out. Hand it drilling reports tomorrow and it does the whole thing again. LightRAG uses 10 fixed types, PathRAG 5, OG-RAG needs a hand-built ontology. SwarmLens is the only one we've found that discovers its own.
And it's not only for prose. Ask "what was the drilling-speed trend over the last 7 days" and it's built to return numbers you can chart, not a paragraph. Messy reports in, structured data out.
Did it work? We gave every tool one hard question spanning two laws, graded against the same answer key pulled from the source. Ours was the only one that returned the full answer with every figure traceable to its exact line, nothing invented, nothing dropped. The others did well, but each missed details that matter when the document is the law or drilling.
Straight talk: we tested on a smaller slice of documents, so raw size numbers aren't a fair fight, and I won't pretend they are. The fair fight is the question, graded the same for everyone. That's the part we won.
And we didn't do it alone. We stand on ideas these frameworks pioneered: HippoRAG's passage nodes, LightRAG's relationship channel, RAPTOR's summaries. We made them vote across eight fused retrieval lanes, including a BM25 keyword lane none of the others had, the one that reliably catches exact identifiers like "Article 22." Then we added the verification layer none of them have. Because for high-stakes work, retrieval is only half the problem. Verification is the other half. Under the hood, 40-plus techniques from the literature work together, alongside the unglamorous parts: token compression, batch embedding, per-file chunking, and a three-level config with safety rails.
The honest trade-off: in full mode we're the slow, expensive option, minutes per answer and far more compute. The others are faster, cheaper, and easier to ship today. We chose accuracy first on purpose, for the cases where a wrong answer costs more than the compute. But we're not standing still: bringing the speed up and the token cost down is where most of our engineering goes right now. And if your questions are simple, you don't need us, and I'll tell you that to your face.
So, what we built: a system that learns your domain, rewrites itself around your data, and thinks as hard as the question demands. It pulls structured data from messy documents, checks its own answers against the source, flags contradictions, abstains when evidence is thin, and forgets what's out of date.
It's not a knowledge graph. Not a database. Not a search engine. Not a chatbot. It's a Cognitive Index.
We're eight months in, not open-sourcing it, and looking for a few design partners in legal and compliance, oil and gas, financial due diligence, and healthcare and pharma. If "close enough" isn't good enough for your work, let's talk: hello@swarmlens.com
(And if you research cognitive architectures, I'd love to compare notes. The parallels found us; we didn't design them.)
One question I keep coming back to: is a fast answer good enough for your work? Or is "mostly correct" the most dangerous phrase in AI?
Tell me where you land.
https://www.linkedin.com/pulse/swarmlens-cognitive-index-hari-menath-iwete/
Top comments (0)