Bhavesh Patil

Posted on Jul 5

RootCause: Giving Your Codebase a Memory That Doesn't Have a Hangover

#hackathon #webdev #opensource #ai

Built by Bhavesh Patil and Shreya Shelar for WeMakeDevs × Cognee's hackathon, "The Hangover Part AI: Where's My Context?"

TL;DR

The problem: your codebase has no memory. It doesn't know that the bug you fixed today is the same root cause as one from three weeks ago, just wearing a different error message.
What we built: RootCause — point it at any GitHub repo, and it ingests the commit history into a hybrid graph-vector memory layer powered by Cognee, building a causal chain for every fix: Error → Root Cause → Fix → Files Touched → Related Bugs.
Why it needed a graph, not just a vector DB: two bugs sharing a root cause but worded completely differently is a graph-traversal problem. Vector similarity alone only catches the ones that already sound alike.
The twist: we found three silent, demo-breaking bugs while building this — each one looked fine until we pushed past the surface. We're including all of them here because that's the more useful write-up.
🔗 GitHub: [https://github.com/bhaveshpatil093/rootcause](https://github.com/bhaveshpatil093/rootcause)
🎥 Demo video: [https://youtu.be/kj4eBlZlj9g](https://youtu.be/kj4eBlZlj9g)
🌐 Live demo: [https://rootcause-cognee.vercel.app/](https://rootcause-cognee.vercel.app/)

The Problem We All Recognize

Ever had one of those debugging sessions where you fix a bug, breathe a sigh of relief, and then three weeks later a different error pops up, with a totally different message — but it's actually the same root cause?

Your codebase has no memory of that first fix. It doesn't know the two bugs are related. Neither does your search bar, your issue tracker, or — critically — a plain vector search over your commit history. Vector similarity finds bugs that are worded alike. It has no concept of bugs that are caused alike.

That's exactly the problem RootCause aims to solve: a memory layer for your codebase that remembers the why behind the what, so you're not stuck chasing the same shadow twice.

What RootCause Actually Does

Point RootCause at any GitHub repository, and it ingests the commit history into a hybrid graph-vector memory layer powered by Cognee. Instead of just storing "what changed," it builds a causal chain for every fix:

Error → Root Cause → Fix → Files Touched → Related Bugs

Then you can ask it things a keyword search simply can't answer:

"Has this exact failure mode happened before, under a different name?"

And when a fix doesn't actually hold — when the bug resurfaces later — RootCause's memory layer can flag that the earlier fix should no longer be trusted. The graph gets more honest over time, not just bigger.

That last part is the whole pitch for why this needed Cognee specifically. Two bugs sharing a root cause but worded completely differently is a graph-traversal problem, not a similarity-search problem. A plain vector DB would only ever catch the ones that already sound alike.

The Architecture, at a Glance

GitHub Commit Ingestion ↓
Analysis & Metadata Extraction (files, diffs, contextualized edits) ↓
Causal Chain Building (generate the causal graph) ↓
Link Bug & Fix Commits ↓
Vector & Graph Embedding ↓
Stored in Graph-Vector Memory (Cognee) ↓
New Bug Report → Retrieve Relevant Info → Identify Root Cause → Bug Recall Report

How the Team Split the Work

Two of us built this in the compressed timeline of a hackathon, so we split along the natural seam in the architecture:

Ingestion — cloning the repo, walking the commit log, extracting diffs, mapping them to a shared schema (Commit, File, Bug, Fix), and pushing structured entities into Cognee.
Recall + Interface — designing the query layer that turns a plain-English question into a good recall() call, building the interface to actually use it (CLI + web UI), and getting "resurfaced bug" detection working end to end.

The plan was to work in parallel against seed data so neither side blocked the other — and that's exactly how it went. The recall layer was built and tested against Cognee's example data while ingestion was still being wired up, then the two sides were integrated once the real pipeline was ready.

The Build, and the Bugs That Almost Sank It

Here's the part most project write-ups skip, and the part that actually mattered most: three bugs, each one silent, each one perfectly capable of quietly wrecking the entire demo without so much as a warning hiccup. Classic hangover behavior — everything feels fine until it very much isn't.

Bug #1 — The Config That Wasn't There

Our ingestion code initialized Cognee with new Cognee() and no arguments. It worked on the machine that wrote it, because that machine already had environment variables sitting around from earlier testing. On a clean checkout, it failed instantly with an authentication error.

An easy fix once found — but the kind of thing that looks fine in a demo and breaks the moment someone else clones the repo.

Bug #2 — The One That Would Have Quietly Ruined Everything

This was the big one — the equivalent of thinking you remember the whole night, when really you're only remembering the first five minutes.

Cognee's remember() pipeline marks a dataset as "completed" after processing it once, and silently skips re-processing on every subsequent call to that same dataset. Our ingestion code called remember() once per commit, in a loop.

That meant: for any real repository with more than one commit, only the very first commit ever actually made it into the knowledge graph. Everything after that was written to storage but never extracted into entities, relationships, or the causal chain we needed. The bug wouldn't throw an error — it would just quietly hand you a graph that looked complete but was missing 99% of your project's history. You'd never know until you asked it a question it should have been able to answer, and got silence back instead.

The fix: batch all commits into a single remember() call per ingestion, so cognify only needs to run and complete once.

Bug #3 — Data Bleeding Across Repositories

Once multiple repos had been ingested during testing, questions about one repo started returning answers contaminated with another repo's data. Cognee's dataset-level filtering isolates document chunks reliably, but cross-dataset graph entity resolution was inconsistent in practice — so results from two datasets in the same local store could mix even when a filter was applied.

The fix: give every ingestion a uniquely timestamped dataset name, so there's never ambiguity about which repo's graph a question is scoped to.

None of these bugs were exotic. They passed a first test, looked fine in isolation, and only revealed themselves when we pushed on the seams — which is exactly why testing kept going well past the point where things "seemed to work." Turns out the fastest way to give an AI amnesia is to build the amnesia yourself and not notice.

How Cognee Actually Gives an AI a Memory

The mechanism is simpler than it sounds, and that's the point.

Cognee's remember() takes raw input — in our case, a plain-text description of a commit — and runs it through a pipeline: chunking, entity extraction, relationship extraction, and embedding, all in one pass. The result isn't just a stored blob of text sitting in a vector index. It's a graph: nodes for entities (files, functions, bugs, fixes), edges for the relationships between them (this fix touched this file, this bug shares a symptom with that bug), plus vector embeddings layered on top so semantic search still works where it's useful.

Then recall() doesn't just do nearest-neighbor lookup. It traverses that graph, pulling in connected context that a pure similarity search would never surface, and hands the result to an LLM to synthesize into an actual answer with real citations back to the source commits.

The part that made this project possible rather than just cute: improve(). If a fix later turns out not to have held, the memory layer can be told so, and future recall on that topic reflects it. Memory that can be corrected is what turns "a database of things that happened" into something closer to actual understanding — a system that knows not just what was tried, but what actually worked.

That's the whole idea behind the hackathon's framing: an AI that "woke up in Vegas with no memory of last night" is an AI with no causal understanding of its own history, just a pile of disconnected facts. RootCause is a small, concrete demonstration of what it looks like to fix that — for a codebase's entire debugging history.

Why "No Hangover" Is the Right Metaphor

The "hangover" in the title isn't just a joke — it's an accurate description of what happens after most debugging sessions. You find the bug. You fix it. But there's a lingering, foggy feeling because you're not 100% sure you found the root cause. Maybe you just patched a symptom. Maybe the real issue is still lurking somewhere deeper. That's the hangover — the uncertainty that sticks around even after the immediate pain is gone.

Most debugging tools leave you with this feeling because they only show you a slice of the picture: the error, but not the ecosystem around it. So you fix what you can see and hope for the best. That's why the same bugs often come back. That's why teams end up playing whack-a-mole with production issues — fixing the same kind of problem in slightly different locations over and over again.

RootCause is designed to eliminate that hangover. By giving your codebase a proper memory — organized, searchable, full of context — it lets you actually understand what happened instead of just patching what you can see. When you fix a bug with full context, you fix it with confidence. You know what you changed. You know why it was broken. You know what else might be affected.

No foggy morning-after feeling. Just clarity.

FAQ

Q: Why not just use a vector database for this?
Because vector similarity only catches bugs that are worded alike. Two bugs with the same underlying cause but completely different error messages, stack traces, or symptoms won't cluster together in embedding space — but they will be connected in a graph if you model the relationship explicitly (same file, same function, same fix commit). That's the core reason this needed a hybrid graph-vector store rather than a plain vector DB.

Q: What happens if RootCause is wrong about a bug being "resurfaced"?
That's exactly what improve() is for. The memory isn't static — if a flagged relationship turns out to be wrong, or a fix that was marked as not holding actually did hold, the graph can be corrected, and future recall reflects the update rather than repeating the same mistake indefinitely.

Q: Does this only work on Python codebases?
No — the diff-to-function mapping uses AST parsing where available (with a regex-based fallback for languages without a convenient parser in our stack), so it isn't limited to a single language.

Q: How much commit history do you need before this becomes useful?
There's no hard minimum, but the value compounds with history — a repo with a handful of commits won't show much benefit over just reading the git log yourself. The "resurfaced bug" detection specifically needs at least one prior fix and one resurfacing to have something to flag.

Q: Is this production-ready?
Not yet — it's a hackathon build. The pipeline works end-to-end against real repositories, but things like automatic (rather than manually-flagged) resurfacing detection, and more robust cross-language function mapping, are the natural next steps.

What's Next

The pipeline works end-to-end against real repositories, with a working CLI and a web interface for asking questions and seeing an investigation verdict — resolved, or resurfaced and still unresolved. There's more to explore here:

Automatic detection of resurfacing bugs without needing them pre-labeled
Richer graph visualization of the causal chains
Pushing further on improve() to see how much a memory layer can correct itself over time

Conclusion

Even in its current form, RootCause answers the question that actually matters: not just what changed, but why it broke, and whether we've already been here before.

No hangover required.

🔗 GitHub: [https://github.com/bhaveshpatil093/rootcause](https://github.com/bhaveshpatil093/rootcause)
🎥 Demo video: [https://youtu.be/kj4eBlZlj9g](https://youtu.be/kj4eBlZlj9g)
🌐 Live demo: [https://rootcause-cognee.vercel.app/](https://rootcause-cognee.vercel.app/)

Built with Cognee for WeMakeDevs' "The Hangover Part AI" hackathon.

If you have thoughts on where a causal memory layer like this could go next, drop them in the comments — we'd love to hear them.

DEV Community