Mayank Roy

Posted on Jul 3

Day 1: The Day I Stopped Treating Memory as an Afterthought

#ai #programming #productivity #python

I almost made the same mistake every developer makes when building an AI app.

I was ready to open my laptop, spin up a FastAPI server, connect an LLM, start streaming responses, and then somewhere near the end, probably at 2am bolt on some kind of memory system as an afterthought. A dictionary here. A database table there. Maybe a list of the last five messages shoved into the context window and called "memory." I have done this before. It always feels fine until it very obviously isn't.

This time I caught myself.

How We Got Here

We are building a project called Continuum for the WeMakeDevs Hangover Part AI hackathon. The premise of the hackathon is almost offensively simple: your AI woke up in Vegas with no memory of last night. Build one that doesn't forget. The tool they want you to use is called Cognee, a hybrid graph-vector memory layer for AI agents, and the judging is literally weighted on how deeply you use it.

So I sat down on Day 1 with one rule: don't touch the tutor logic, don't touch the LLM prompts, don't touch the frontend. Just get Cognee working and build the layer everything else will sit on top of.

It sounds boring. It was the best decision I made all week.

What Even Is Cognee

I'll be honest, my first reaction to Cognee was skepticism. Another memory library? Isn't that just a vector store with extra steps?

No. And the difference matters more than I expected.

A vector store gives you semantic search. You embed some text, you store the embedding, you query against it later and get back chunks that are semantically similar to your query. That's useful, but it's flat. Every piece of information has the same relationship to every other piece: none.

Cognee builds a knowledge graph on top of that. When you call remember() with a piece of text, it doesn't just embed it. It runs an extraction pass that pulls out entities and relationships, and stores those as nodes and edges in a graph. So when you later call recall(), it isn't just doing vector similarity search, it's doing graph traversal too. It can follow chains of relationships. It can answer multi-hop questions. It can connect information across different things you told it at different times.

For a tutoring app, this is the difference between "the student got question 4 wrong" (a flat fact) and "the student has a sign error misconception on factoring, which is a prerequisite of completing the square, which is what they're stuck on now" (a chain of relationships). The second one is what actually helps you teach better.

Cognee exposes four operations, and the hackathon is explicitly built around them:

remember() ingests text, files, or URLs and structures them into the knowledge graph. recall() queries that graph with a natural language question and returns relevant results using both semantic similarity and graph traversal. improve() runs a post-ingestion enrichment pass that re-weights nodes and prunes stale or redundant data. forget() surgically removes data from the graph without wiping everything else.

Four operations. Total recall. That's their tagline and honestly it's accurate.

The First Thing I Did Was Prove It Actually Works

Before writing a single line of application code, I wrote a throwaway test file. Not a unit test. Not a pytest fixture. Just a plain Python script called test_cognee.py that did three things:

Called remember() with one sentence describing a fake student interaction. Called recall() with a question about that interaction. Printed the result.

Then I ran it.

This took about forty minutes including the time I spent reading the Cognee docs to get the config right, specifically figuring out that it needs an LLM API key to do its graph extraction during remember() . That was the one thing that tripped me up I expected it to work like a pure vector store where you just need an embedding model, but the graph extraction step calls an LLM internally. Once I had that configured in my .env file, it worked.

And when recall() returned a response that actually related to what I had just stored, something clicked. This wasn't a toy. The graph traversal was surfacing the right information from a single fact I had stored ten seconds ago. I could already see how this would behave across hundreds of stored interactions.

I deleted the test file. I was unblocked.

The reason I'm emphasising this step is that it's very tempting to skip it. You want to build things. Sitting and running a five-line test script feels like procrastination. But discovering that your entire memory layer doesn't work on day three, when everything else is already built on top of it, is genuinely catastrophic for a five-day hackathon. One hour of proof-of-life on day one is worth eight hours of debugging on day three.

Why FastAPI

I'm a backend developer. I've used Flask, Django, and FastAPI. For this project FastAPI was the obvious choice and I want to explain why beyond just "it's fast."

The killer feature for this specific project is async support. Cognee's operations : remember(), recall(), improve(), forget() are all async functions. They do IO: they call LLMs, they write to databases, they traverse graphs. If you build a synchronous server on top of async operations you either block everything or you start wrestling with event loops in ways that will make you cry at 11pm on day four.

FastAPI is async-native. Every endpoint is an async function. You await your Cognee calls directly inside your route handlers with no ceremony. The mental model is clean and it stays clean.

Beyond that: automatic OpenAPI docs that your frontend teammate can use to understand the API without you explaining every endpoint, type validation via Pydantic, and a startup lifespan hook where you can validate your environment config before the server starts accepting requests. That last one matters because catching a missing API key at startup with a clear error is infinitely better than catching it mid-request with a cryptic 500 during your demo.

The Memory Lifecycle Service: The Most Important Thing I Built All Day

After proving Cognee works, I built one file that became the backbone of the entire backend: memory.py .

The concept is simple. Instead of calling cognee.remember() directly from five different places in the codebase, every Cognee operation goes through this single module. Every other service imports from here and only from here.

Why? Three reasons.

Debugging. When something breaks with memory and something will break there is exactly one place to look. Not scattered across the tutoring engine, the grading service, the strategy selector, and two routers. One file.

The demo. Every call to remember(), recall(), improve(), and forget() gets logged to a JSON file with a timestamp, the student ID, the dataset, and a plain-English description of what triggered it. During the demo you can show judges a literal timestamped receipt of every memory operation the system performed. That's not just a nice UI detail, that's your answer to the "best use of Cognee" judging criterion made tangible.

Discipline. Building this first sets a team norm. Nobody reaches around the service and calls Cognee directly, because the central module exists and is obviously the right place. It prevents the kind of architectural drift that happens in hackathons where everyone is moving fast and shortcuts accumulate.

The module has five functions that map directly onto the four Cognee operations, plus a sixth utility that reads back the log. remember_interaction() takes a structured description of a student's attempt and stores it. recall_student_context() takes a student ID and a query string and returns relevant history. improve_student_memory() triggers the enrichment pass on a student's dataset. forget_resolved_misconception() prunes resolved misconceptions and re-runs improve to keep the graph clean. And get_lifecycle_log() returns the event history, filterable by student ID.

Writing these five functions took maybe two hours. Using them for the rest of the week saved us from an entire category of problems.

What the End of Day 1 Looked Like

By the time I pushed to the repo at the end of the day, here's what existed:

A virtual environment with four dependencies. A .env.example file so teammates don't have to ask how to configure things. A config.py that centralises every environment variable and validates them at startup. The memory.py service with all four operations and logging. A tests/test_memory_service.py that runs the full lifecycle — remember, recall, improve, forget — and verifies the log contains all four events. A FastAPI app in main.py with a working health check endpoint. And a _README _ section explaining exactly how to get the server running.

Nothing about the actual tutoring logic existed yet. No question generation. No grading. No strategy selection. And that was correct.

The foundation was solid. Everything built on top of it over the next four days had a place to stand.

The Thing I Keep Coming Back To

There's a version of this project where I spent Day 1 writing prompt templates, arguing about whether to use GPT-4o or Claude, and building a chat endpoint that kind of works but has no real memory. I've seen that project. It's the one that, during the demo, has to answer "how does it remember the user?" with "well, we pass the last few messages as context."

The answer I have instead is: it builds a per-student knowledge graph. It stores every interaction as a structured memory. It recalls relevant history using graph traversal, not just keyword matching. It improves its graph after every session. It forgets resolved misconceptions instead of letting stale wrong data pollute future teaching.

That answer exists because of what we built on Day 1.

Memory isn't a feature you add at the end. For an AI agent, memory is the architecture. Get that right first and everything else is just building rooms in a house that already has a solid foundation.

This is Day 1 of our build log for Continuum, built during the WeMakeDevs Hangover Part AI hackathon (June 29 – July 5, 2026).

Top comments (1)

Dipankar Sarkar • Jul 6

The graph-vs-flat call really comes down to which facts deserve to become edges. Your prerequisite-chain example is exactly where the graph earns its keep, because those relationships are stable and worth traversing. A flat store is fine for volatile stuff like the last five messages.

The failure mode to watch is the extraction pass. Pulling entities and relationships with an LLM means a wrong edge gets baked into the store, and on recall a bad edge is worse than a missing chunk because traversal propagates it into everything connected to it. Vector recall degrades gracefully, you just get a slightly-off chunk. Graph recall degrades sharply when the graph is wrong. The other thing that got me was traversal depth at recall time: unbounded multi-hop pulls in loosely related nodes and quietly blows the context budget, so you end up needing an edge relevance score and a hop limit anyway. The store is the easy part, the recall policy is where the real engineering lives.