Five Bugs Deep in an AI Memory Layer: My Week with Cognee

#agents #ai #debugging #rag

By Abhishek Vishwakarma — final-year CS student, SOC analyst background, building toward GenAI/agentic AI engineering.

When I signed up for The Hangover Part AI: Where's My Context? — WeMakeDevs' hackathon built around Cognee — I didn't start by building a flashy demo. I started by reading code. Cognee promises AI agents a real memory: ingest anything, build a hybrid graph-vector knowledge store, and let agents remember(), recall(), improve(), and forget() across infinite sessions instead of waking up with amnesia every session. Before I trusted that promise enough to build on top of it, I wanted to know how solid the foundation actually was.

So instead of a project, I went issue-hunting on the Cognee GitHub repo — 25k+ stars, Python-first, the open-source backbone for a lot of "agent memory" products getting built right now. Five pull requests later, here's what I found and fixed.

1. Retrying an error that was never going to succeed

EmbeddingException is what Cognee's embedding engines raise when a chunk of text is too short to split further but still blows past the embedding model's context window. That's a deterministic failure — retrying it changes nothing. But the @retry decorator on embed_text in FastembedEmbeddingEngine, LiteLLMEmbeddingEngine, and OpenAICompatibleEmbeddingEngine was catching it anyway and retrying with exponential backoff for up to 128 seconds. In production this meant silent hangs on bad input; in CI it meant unit tests covering context-window fallbacks took over four minutes to run.

Fix: added EmbeddingException to the excluded exception types in retry_if_not_exception_type across all three engines, so non-transient errors fail fast instead of burning two minutes pretending they might not.

2. When "skip the bad entity" quietly breaks alignment

TripletSearchContextProvider builds search context by gathering results for a list of entities. The problem: when an entity was invalid (_get_entity_text(entity) returned None), it was still passed into _results_to_context(entities, results) alongside the valid entities' search tasks — but search tasks are only created for valid entities. That mismatch in list length silently zipped the wrong results to the wrong entities, with no error, just quietly wrong context.

Fix: filter to valid_entities before generating search tasks, and use that same filtered list when zipping results back into context. Added a unit test specifically verifying alignment holds.

3. Configuration that pretended to be dynamic

DefaultCrawlerConfig and TavilyConfig referenced environment variables like WEB_SCRAPER_TIMEOUT — but the Pydantic fields were bound at class-definition time, so changing the env var at runtime did nothing. The config looked configurable. It wasn't.

Fix: wrapped the env lookups in Field(default_factory=...) so timeout, concurrency, and crawl-delay settings are actually read fresh at instantiation, with a test verifying overrides take effect.

4. A docstring lying about its own function

Small one, but the kind of thing that costs someone an hour of debugging: is_embeddable(s: str)'s docstring claimed a string needed at least one alphanumeric character to be embeddable. The actual implementation only checked for one non-whitespace character. Different bar entirely — a string of just punctuation would pass the real check but, per the docs, shouldn't have.

Fix: corrected the docstring to match what the code actually does.

5. Serialization that broke on its own success

SearchResultPayload had two separate problems. First, its serialization logic couldn't properly handle nested Pydantic models, UUIDs, or collections inside result_object — it needed a real recursive serializer, not ad-hoc handling. Second, and sneakier: the result-resolution logic used a truthiness check, so a legitimately empty list, empty string, or empty dict in completion/context was treated as "nothing here" and silently fell back to different behavior — even though an empty result is still a valid result.

Fix: wrote a recursive serialize_value() helper covering BaseModel, UUIDs, lists/tuples/sets, and dicts, and replaced the truthiness check with an explicit is not None check so falsy-but-valid values are returned correctly. Added tests for both the complex serialization case and the falsy-completion case.

What this actually taught me

None of these are headline bugs — no security holes, no crashes that scream at you in production. They're the quiet kind: a retry that should never retry, a zip that's misaligned by one, a config that lies about being configurable, a docstring that's just wrong, a truthy/falsy mixup that throws away valid empty results. The kind you only find by actually reading the code path end to end instead of skimming the README and writing a demo on top of it.

Coming from a SOC/security background, that's basically the instinct I brought here: don't trust the surface, trace the actual data flow. Turns out that instinct travels well into "is this open-source memory layer solid enough to build agents on."

All five PRs are open and awaiting review as of writing. I'll update this post once they're merged — but whether or not all five land, this was a better use of hackathon week than shipping a demo I'd have to explain away in the README.

PRs: #3565 · #3566 · #3567 · #3568 · #3569

All code, fixes, and pull requests in this post are my own work. I used Claude (AI assistant) to help structure and draft the writeup, as disclosed per the hackathon rules.

Built for The Hangover Part AI by @wemakedevs, powered by Cognee.

and the above post is made by using claude