Scott Raisbeck

Posted on Dec 2, 2025

Introducing Forgetful - Shared Knowledge and Memory across Agents

#agents #ai #python #programming

I, like many of us over the last year or so, have been on an interesting journey when it comes to software development.

From using ChatGPT to show me how to use sk-learn to build a classifier in 2022, seeing tab completion reach super saiyan level with copilot in 2023 (or was it 2024?! - It all seems like ancient history) on an engineers machine at work, using Cursor to remove the friction of copy/pasting out of chatgpt in 2024, to adopting my own form of the BMAD method to build out some really nice pet projects using a variety of tools like Claude Code, Cursor and Codex here in 2025.

That's my personal experience compressed into a small paragraph, possibly a similar experience had by many on here have already had. The past few years have been a hell of a ride and I am finding I am now using more AI tools than ever.

Over that time I have enjoyed reading many an article on DEV.TO on how many of us have our own approaches to get the best in this new paradigm, coupled with the 'X AI Feature/Product.. is Insane' YouTube videos (seriously guys, please stop).

There have been some real nuggets in this that I have found, I've already mentioned the BMAD Method (which I still use to this day, albeit a bit more lightweight). Managing context window was another, and probably my favourite of all was context7.

Using context7 also made me realise something, the models are more accurate when the information is inside their context window and they are not having to rely on training data.

For me the training data is almost like a necessary evil, I want them to rely on it less and less, please allow me to explain a bit further.

Let's imagine I am looking to implement a Model Context Protocol (MCP) server. If you were looking to do this a few months back, the models straight up didn't know about MCP, so you were forced to ask it to use context7 and a web search to bring back the necessary data.

Even today, with the models with the more recent training data, things are developing so fast in MCP, what it is likely to surface could be out of date and is more than likely not the optimal implementation pattern, so using Context7 I was able to be more confident in the approaches the Agent would suggest, especially when they could cite the sources and I could go validate them myself.

Like any other developer, once you solve a problem, you like to keep that pattern for other projects and over time your toolbox gets bigger and better. It should be no different with AIs, in fact, if anything it is even more important to ensure that the AIs have your taste, preferences and best practices in mind whenever they are implementing on your behalf.

Simultaneously, I was working on my own AI agents, I've been developing agents for about six months now. As a learning exercise mostly, but I found it to actually be really fun.

The first thing that anyone engineering an agent will realise is that you need memory. Even if it is just reinjecting the conversation history back to the agent so you can have continuity between requests.

As you work on these you start to envisage better systems for memory. You often start to architect different types of memory, short term (a simple version being what I described in the previous paragraph), long term memory - for example some kind of automated retrieval of relevant facts/conversations that had been during previous interactions.

There's also the breakdown of episodic (temporal and spatial context - "I had a meeting last week about the new payroll implementation"), semantic (facts and concepts - "payroll is a term to describe the business process of paying employees"), procedural (motor skills, and how to knowledge - "to put up a shelf I first.." <- I am still to learn this one).

Ultimately the answer usually ends up with some kind of persistence layer, a search mechanism and some service to pull it all together (maybe even some sub-agent specifically to manage retrieval and storage of memories) at inference time.

So with this in mind I went away and vibe coded up an MCP server for agentic systems to store memories in. I did it in an evening and spent the rest of the weekend dogfooding it on systems like Claude Code, Claude Desktop, Codex and Cursor. I had built this tool for my own agents that I was building, but I actually found it really useful in my coding agents. I quickly set about encoding the code for all my projects into memories, getting encoding agents to attach documents and code snippets to sit alongside memories so that an agent querying the knowledge base could get an idea of a particular pattern, or the way something worked, just from what was in the knowledge base and dig into the code if needed.

In this sense, the MCP server I had built became very much like my own little mini context7, well actually a bit more than that really. I could ask it to look at the code in those repos of course, but it would still lack information about design decisions, information for how we resolved an issue, preferred patterns and when to use them. It just felt like I was working in the same session across multiple projects/agents, it was really refreshing.

I had hosted this remotely and managed to connect it to Claude Mobile via Dynamic Client Registration,

So now when I was out having a walk or something and I had an idea around something related to one of my projects, Claude had semantic understanding of it. It meant I could have meaningful conversations about it without it having to scan through code.

I could even make decisions with Claude while out on that walk, get it to record a memory (and an implementation plan document) and then once I was back home just tell Claude Code to go ahead and implement. The same applied to using other agentic coding applications, such as Cursor, Copilot and Codex.

I found myself logging every design decision in the memory, and then just having built in commands to fetch context for when I started out a new session. This worked across Claude Code, Codex, OpenCode and Copilot. Which has been my arsenal of AI coding tools for the past few months (could I get by with just one? Yes, but those YouTube videos I mentioned earlier.. they're unfortunately really effective - so please stop guys, I have kids to feed).

Anyhow, it didn't take long for me to realise that I needed to build a proper version of this as I was depending on it more and more in my daily work. While it all worked fine and felt pretty robust, I didn't like the way Claude had implemented it, maybe I shouldn't care but I do. In order to understand something, I need to build it and all that good stuff, I believe dr Feinman said something along those lines, I am certainly no Richard Feinman but I can still appreciate that sentiment.

So I set about building out a new version of it, gave it a name (Forgetful) and decided I'd make it open source to see if it would be useful for others - be it for people using coding agents (which was a use case I had found myself accidently having) to AI engineers build agents themselves.

I did all this before checking if anyone else had built one by the way, there are several out there now, some paid and some for free. I would encourage you to check them out and see what works for you. I do see this as the next paradigm in AI, cross agent memory solutions are going to be key, especially as the AI ecosystem seems to open up more and more for third party apps on some of the big AI labs systems and I cannot tell you how much this has helped my own use of coding agents. I am working on something right now for my day job that I plan on showcasing, but that is for another post (as it's not finished) and Forgetful sits at the heart of it.

It's not some 'I 10xd my AI workflow' hack, I don't know what kind of gains this has given me, I don't really have a way to measure it.

I just feel more comfortable switching between projects and agents now and perhaps more importantly getting the agents more familiar with the patterns you use and reducing the need to have the same conversation with AI's to tackle the same problem elsewhere.

So What Did I Actually Build?

When I sat down to build Forgetful properly, I had to make some decisions about how memories should be structured. This is where I got opinionated.

Most memory systems are essentially vector dumps — throw everything in, embed it, retrieve by cosine similarity, hope for the best. It works, but it's messy. You end up with overlapping chunks, no clear boundaries between concepts, and retrieval that's good enough but never precise, reranking helps to a degree, but I see this as something other than just a store.

I wanted something more like how I use Obsidian. Atomic notes. One concept per note. Links between related ideas. A graph that emerges from the connections rather than being imposed from above. I had been using Obsidian to follow the Zettelkasten principle — a note-taking method where each note captures exactly one idea, is self-contained enough to understand on its own, and links explicitly to related notes.

I had stumbled across A video of someone implementing semantic encoding of their own Obisidan Notes and then things started to come together. I did my usually back and forth with Claude when out for one of my walks, it threw up some papers around where similar concepts had already been tried: The A-Mem paper on agentic memory systems found that structured, self-organising memory significantly improves retrieval precision.

In Forgetful, every memory must have (these are configurable as environment variables):

A clear title (forced brevity — 200 char limit)
Content covering one concept (~300-400 words max)
Context around what the agent was doing when it created the memory
Keywords and tags for additional retrieval paths

This might seem restrictive, but that's the point. When an agent goes to store something, it has to think about what the atomic unit of knowledge actually is. No more dumping entire conversations or documents into memory and hoping retrieval figures it out later.

Auto-Linking

Here's where it gets interesting. When you create a memory, Forgetful doesn't just store it — it finds its place in the graph. Now I would stipulate this is also configurable, I think the best practice is to have an agent dedicated to memory management, who takes in raw input and decides what memories are worth keeping, and how they should fit inside the knowledge base, and whether existing memories need updating or need to be made obsolete as a result of the new interactions. This however is not something I have built yet for my own development memory management and indeed might not be something others want to build. So as a starting point I added automated memory linking, and so far it has worked just fine.

The process:

Generate an embedding for the new memory
Search for semantically similar existing memories
Any memories above a 0.7 similarity threshold and cross encoder ranked get automatically linked
These links are bidirectional — the graph builds itself

So if I store a memory about "choosing Stripe over PayPal for payment processing" and I already have memories about "PCI compliance requirements" and "subscription billing architecture", Forgetful will automatically connect them. No manual linking required.

When an agent later queries "how should I handle payments?", it doesn't just get the Stripe decision — it gets the linked context about compliance and billing architecture too. One-hop graph traversal is included by default, what I mean by that is, for every memory that is retrieved through search, all linked memories 1-hop away are returned as well.

This is what I mean by "Obsidian for AI agents". The same way your Obsidian vault becomes more valuable as connections emerge between notes, your agent's memory becomes more useful as the knowledge graph densifies.

Why Not Neo4j?

I can already hear some of you asking: "If you're building a knowledge graph, why not use a graph database?"

The honest answer is I've never worked with a Graph database, I'm only familiar with the relational database such as PostgreSQL, MySQL, MSSQL etc.
So Forgetful stores memories in PostgreSQL (or SQLite for the zero-config local experience) with pgvector for embeddings. The graph relationships are just rows in a links table. So no elegant graph theory, maybe it's something I'll consider in the future. I've architectued Forgetful in a way that I can add adapters quite easily for different types of implementation layers, hence why I can switch between Postgres and SQLite, so adding a Graph Database or even a dedicated Vector Database later down the line wouldn't involve a total re-write.

The relational database appraoch appears to works fine for my scale. For the access patterns agents actually use (store a memory, find related memories, traverse one hop), a relational model with proper indexing handles it without breaking a sweat. Maybe at massive scale I'd revisit this, but I'd rather ship something useful than architect for problems I don't have yet.

Making Retrieval Actually Good

Storing memories is the easy part. Retrieval is where most systems fall down.

A naive approach: embed the query, find the top-k most similar memories by cosine distance, return them. This works but has problems. Embedding similarity is fuzzy — semantically related doesn't always mean actually relevant to what the agent needs right now.

Forgetful uses a multi-stage approach:

Stage 1: Dense retrieval
Embed the query, pull back candidate memories using vector similarity. Cast a wide net.

Stage 2: Cross-encoder reranking
Here's the trick — when an agent searches, it provides not just the query but a query_context explaining why it's searching. "I'm implementing payment integration" gives different results than "I'm debugging a checkout error" even if both search for "payments". The cross-encoder uses this full context to rerank candidates.

The cross-encoder scores each candidate against the full query context, reranks them, and the top results go back to the agent.

Is this overkill? Maybe. But retrieval precision is everything. Returning the wrong context is worse than returning nothing — it confidently misleads the agent.

Token Budget Management

Even with great retrieval, you can still overwhelm an agent's context window. Twenty highly relevant memories might be 15,000 tokens — and that's before the agent's actual task.

Forgetful manages this with a configurable token budget (default 8K). Results are prioritised by:

Importance score (9-10 rated memories first)
Recency (newest within each importance tier)

If the budget fills up, lower-priority memories get truncated. The agent always gets the most critical context without the LLM choking on input length.

From Single Machine to Cloud

One thing I wanted to get right: Forgetful should scale with your needs.

Just trying it out?

uvx forgetful-ai

That's it. SQLite database, stored in your home directory, stdio transport for MCP. Zero configuration.

The default setup runs completely offline — embeddings are generated locally using FastEmbed with the BAAI/bge-small-en-v1.5 model. No OpenAI API key required, no data leaves your machine. If you want cloud embeddings (Azure OpenAI, Google), you can configure that, but it's entirely optional.

Running it for real?
Docker Compose with PostgreSQL, HTTP transport, proper authentication. Same codebase, same API, just different deployment.

Multi-device access?
Host it somewhere with an endpoint, configure your MCP clients to point at it. I run mine on a VPS and connect from Claude Desktop, Claude Mobile, Cursor, and Claude Code — all hitting the same knowledge base.

The progression should feel natural. Start local, go remote when you need to.

What Else Is In There?

Memories are the core, but Forgetful has a few other concepts that emerged from real usage:

Entities: Concrete things — people, organisations, products, infrastructure. These can have relationships to each other ("Jordan works_for TechFlow") and link to memories. Useful for building out knowledge about your team, your systems, your clients.

Projects: Scope for memories. When I'm working on the e-commerce platform, I don't need memories from the trading bot project polluting my context. Project scoping keeps retrieval focused.

Documents: Sometimes you need more than 400 words. Documents store long-form content, with the expectation that you'll extract atomic memories from them that link back to the parent.

Code Artifacts: Reusable code snippets attached to memories. The agent can retrieve not just the concept but a working example.

All of these link together. An entity can relate to memories, which belong to projects, which have associated documents and code artifacts. The graph extends beyond just memory-to-memory connections.

The Meta-Tools Pattern

One last implementation detail that I'm quite pleased with.

MCP clients see three tools from Forgetful:

discover_tools — what's available?
how_to_use — how does a specific tool work?
execute_tool — run a tool with arguments

Behind that facade sit 42 actual tools. The agent discovers what it needs, learns how to use it, then executes. This keeps the tool list in the agent's context minimal while still exposing full functionality.

It's a small thing, but context window discipline matters. Every token spent on tool definitions is a token not available for actual reasoning. The numbers matter here: exposing dozens of tools with full JSON schemas would consume thousands of tokens before the agent even starts working. The three meta-tools keep context overhead minimal while still providing full access to everything Forgetful can do.

That's Forgetful. An opinionated memory system built on Zettelkasten principles, with auto-linking, proper retrieval, and a deployment model that scales from "just trying it" to "running in production".

If you want to try it:

uvx forgetful-ai

GitHub: github.com/ScottRBK/forgetful

Discord: If you are into building AI Agents or even just like talking about coding agents and AI in general head over to my discord

I'd love to hear how you use it, what breaks, and what's missing. I'd imagine that this will be something i continually work on as part of agent development, so having the input of others will be most helpful!

Next post: I'll show what I'm building on top of Forgetful for my day job — a system for semantic understanding of 250+ repositories. But that's for another time.

Top comments (2)

Vladimir • Dec 3 '25

I tried your product. Brief feedback:
1) Local deployment: uvx is very convenient, the default storage is sqllite - it is perfectly readable via dbeaver

2) claude code, when interacting with mcp, perfectly understood the purpose of entities in tools and interprets my memory management commands well. However, these are only a few recordings, you will need to test on a larger volume.
I'll try the codex cli later.

3) as an outsider, I was a little lost in the purpose of documents, entities, and records. It is difficult to identify a categorization for yourself: what exactly to write to the entity, what exactly to the document, what exactly to the record. In general, the direction of separating the purpose of documents/entities and records is clear, but this dilemma of where to write is a bit annoying. You probably need to get used to the division of responsibility proposed in the draft. It would probably be useful to see in the documentation how the author suggests "cutting" a specific project into these components.: what to write down in essence, what is in the documents, what is in the record. As an illustrative example. It would improve understanding.

4) Plans ?
It looks great so far! I will try to structure my knowledge and fumble between agents, while locally. If everything is fine, I will deploy it in the cloud.
And by the way, if it's really convenient in combat projects, I wouldn't give up on saas - I'd pay for the service (it's a chore to deploy on a virtual machine myself)

Scott Raisbeck • Dec 3 '25 • Edited

Hey Valdimir,

thanks for the feedback, I'm glad you have found it useful thus far!

In terms of point 2, yes this is something I am interested in as well, getting a large volume of data and finding if there is a point at which we start to impact the recall metrics for query results. One of the areas I am working on right now is encoding something like 250 repositories at work, not only as an experiment, but to actually help me in my day to day work (I work across multiple enterprise products and repositories), so it's actually a brilliant sample set.

Once I have completed that I will share the outcome and evaluation results compared to a golden dataset I have created myself (I will not be able to share the individual queries however as it is proprietary code), but it will give us an idea how capable this is at scale.

On point 3, It's a good question, I should probably update the readme a bit to cover this in more detail, however this is how I see entities - these I record more actual objects, people, organisations, products that extend beyond just a github repo/project.

It was another 'category' or 'qualifier' for organising memories. So for example I have various devices and I like to have my agents remember specific things about how I do stuff on my laptop compared to how I do something on my pc (they have different OS for example)

I also like the agent to know about team members and colleagues for example in work. I find myself feeding my agent meeting notes, and it being able to know about an individual and have memories has helped me for example when taking an action from a meeting and the agent knowing that 'Alice is a senior dev in alpha squad and has previously collaborated with us on product x and helped resolve an issue around entity framework migrations'.

Documents are for when we want to keep a record of something beyond the size of individual memories, so lets say a technical specification, we can create a memory for the 'tldr' version of the spec, but have it in the memory store and linked to that memory so that an agent can access the full context of it if it is required.

Example: Encoding a new project

You're starting work on an e-commerce platform. Here's how to structure it:
Entities:

"Acme Corp" (Organization) — the client
"Sarah Chen" (Individual) — tech lead, works_for Acme Corp
"Payment Service" (System) — the microservice you're building

Documents:

The technical spec (2000 words) — stored as document, linked to Payment Service entity

Memories extracted from that document:

"Stripe chosen over PayPal — better webhooks, lower fees"
"PCI compliance requires tokenization, no raw card data in our DB"
"Settlement runs nightly at 2am UTC"

Each memory is one decision or fact. The document is the source. The entities are the concrete things those memories relate to.

In terms of future plans - well I built this because I needed it (more than I originally thought!). I think extending the routes is a good next step, allowing for API routes will pave the way for a front end for managing and view memories, I will build a UI for forgetful but having the API routes exposed will allow people to build their own as well or incorporate it into their existing platforms.

I'd also like to extend out some of the adapters for various embedding and cross-encoder providers. While fastembed is great for local only, for some people it might not be viable option if they host it remotely on a vps (limited cpu), so extending the flexibility in this area I see as key as well.

Once again thank you for taking the time to have tried it out and I'm really glad you like what you have seen so far :)