DEV Community: Neonmem Dev Team

Reintroducing Neonmem or synopsis of what was before

Neonmem Dev Team — Thu, 25 Jun 2026 10:29:33 +0000

In the beginning there were AI agents — ChatGPT, Claude and the others. Well, actually in the beginning there was the perceptron by Rosenblatt, but let's skip those ancient times.

We all know what an agent is: a stateless function trained on a bunch of data to do things. It's smart, and at this point it can outsmart even an above-average person sometimes. It can do tricks, solve riddles for you, and write code. But we all complain about AI slop, so it's not perfect at all of this.

And here comes the other problem — the long session. Let's say we're working on a project as a developer. The project has a lot of modules and docs, and on top of that there were important transcripts from calls. Oh, and the project is ten years old and some of the technologies are completely custom. But you have an agent now, so things should be easier (and faster) — at least your manager said so, and the head of AI transformation confirmed it.

For some reason, they're not. Agents keep forgetting on long sessions. You're smart, so you use skills, you use a memory.md to get around it, things like a resume — you know the drill. An agent is only as smart as the prompts are smart and the context is there. That's the bottleneck. Big models can hold a lot of context, but context is a raw chunk of data — text, in other words. And you notice it helps, but not completely, so you end up doing all kinds of clever little things to carry it from session to session. We knew this limitation, so we started using vector DBs to store a portion of it. But even then, it's just data. And the difference is called experience.

So here we are. Agents don't have experience. You do.

And what about this Neonmem thing? How is it different from what we already use?

Let me describe it a little. Neonmem is a file — a binary file used as memory for your agent's session. But it's a bit different from what we have now. Inside there's a multi-layer environment: a self-controlled system of layers and nodes that lets the agent have something close to experience, and form its proposals based not just on raw data but using that experience as a grid for its conclusions. The experience grows as you work on the project, with each iteration.

So what about the data, you might ask? It's in there. The file holds a raw-data layer as a mini vector DB, so your agent knows you decided to go in a certain direction — and it has facts to hand you to help you move the way you want.

Next we'll talk about what's inside, and what the cartridge actually is. And maybe a little math as a bonus.

We're at an early stage, but we're moving forward.

Neonmem 0.9.7 is out.

Neonmem Dev Team — Wed, 24 Jun 2026 16:34:22 +0000

1. A two-level importer — two kinds of "stuff," treated differently

The big change. Your project doesn't come in one shape, so the importer no longer flattens
it into one pile:

Folders & files → a searchable knowledge pool. Your docs, code and notes are vectorised into a lossless, deduplicated facts pool — the same fact stated three ways becomes one fact, every source kept. Nothing is summarised away.
Agent chats → typed memories. Point Neonmem at a Claude (or other agent) transcript and it pulls out only what's worth keeping — the decisions, dead-ends and rules — as clean, typed memories. A decision is stored as a decision; a dead-end stays a warning. The process-narration ("I read the file…", "please check…") is dropped.
Links become knowledge. If a chat references a file on disk, that file is pulled into the pool automatically, with a memory that points back to it.

The result is labelled honestly in the UI: Facts loaded (the pool) and
Memories created (the kept decisions).

2. Grounded, offline recall

0.9.7 replaces the old embedder with IBM Granite-30M, run as a fused fp16 ONNX graph
through ONNX Runtime:

Database-class retrieval quality on any CPU — no GPU, no PyTorch, no API key, no cloud.
Every prompt walks memory in order — reflexes → short-term → long-term → facts pool — and answers from what you actually imported, or honestly says it doesn't know.

This is the headline behaviour: ask "what is ARC?" and you get your definition from
your docs — not the textbook expansion the model would otherwise guess. A memory that's
occasionally wrong is worse than no memory at all, so the rule is: answer from the user's
sources, or abstain. Never invent.

3. Tags that stick

Tag an import with a topic (e.g. Specific API) and Neonmem mints one clean, canonical
memory for it, linked back to the source — even when your docs never write the term
verbatim, as long as they clearly describe it. If the corpus genuinely has nothing on a
tag, it's left out rather than faked.

4. Clean by construction

Memories follow one golden rule: a single concise statement (ARC — your provisioning platform) linked to the full source, not a messy pile of raw chunks. Chat capture
deduplicates through the same facts layer, so re-importing a conversation never doubles up.

5. One durable cartridge

The importer keeps the full source corpus inside the cartridge (content-addressed + compressed) — one file replaces the scattered docs and transcripts, and the facts are always rebuildable from ground truth.
Opt-in AES-256-GCM encryption at rest — your whole corpus as a private vault.
Imported knowledge is long-term and survives reopening the project.

Built on (all open, permissively licensed)

Embeddings: IBM Granite-30M (Apache-2.0) via ONNX Runtime (MIT). Vector search:
FAISS (MIT). Agent integration: the Model Context Protocol. Full attributions ship
with every download. No third-party LLM, nothing phones home.

Get it

Windows (signed installer + portable) and Linux (AppImage); macOS on the way. Local,
private, and free for personal use.

→ neonmem.com

Import a project, then ask it the one thing your assistant always gets confidently wrong
about your codebase. That question is the whole test.

The lesson, stated plainly

Neonmem Dev Team — Tue, 16 Jun 2026 07:23:03 +0000

If you persist embeddings, you must persist the embedder. A vector index is
only meaningful next to the exact model that produced it.

It's an easy mistake because everything looks fine — the vectors are right
there on disk, the search runs, no error is thrown. It's only wrong by a
dimension you can't see.

The fix

The embedder now travels inside the single cartridge file — same one-file,
binary-only format, no sidecar. On load it's restored, and every read (and every
newly-learned fact) is embedded with the same model the stored vectors were
built with. We also added a dimension guard: if an old cartridge ever mismatches,
recall degrades to keyword search instead of returning garbage.

Result: "import any docs → your agent answers from them" now survives close,
reopen, and a week later.

Also in 0.9.6

Procedures and rules recall without erroring.
Your memory is saved before a session auto-compacts and restored after — a long session never loses its thread.
The cross-process write-lock moved outside the cartridge folder, so your memory stays one clean file while several tools share it safely.

Free for personal use, local and private: neonmem.com
(Windows + Linux AppImage).

Decisions, dead-ends & dreams

Neonmem Dev Team — Mon, 15 Jun 2026 08:32:45 +0000

Your AI assistant is sharp in the moment and blank by morning. Neonmem gives it a memory that lives with your project — and the interesting part isn't just that it remembers, it's what it keeps and how it keeps it.

It remembers by kind, not as a transcript

Neonmem doesn't dump your chat history into a search box. It keeps your project as distinct kinds of memory — the way a good teammate carries a project in their head:

Decisions — not just what you chose but why ("we went with Postgres because it scales under load"). The reasoning survives, not just the result.
Dead-ends — the approaches that failed. Your agent stops re-suggesting the thing you already tried and threw out at 2am last Tuesday.
Rules — your standing preferences and the project's invariants. "Always validate before saving." It just knows.
Plans — where you're going, not only where you've been. The next steps and the goals stay in view.
Questions, debates & observations — the loose threads and the context around them, so nothing important quietly evaporates.

Because each memory is a real, meaningful unit, your agent can reason over them — connect a decision to the dead-end that caused it, or a plan to the rule it has to respect — instead of just keyword-matching.

Memories live in zones — like a mind

Your memory isn't a flat list. It's organised into zones, the way thoughts settle in a brain:

Reflex — the always-on core: the handful of things your agent should never forget, available instantly.
Short-term — what you're working on right now, fresh and close at hand.
Long-term — the settled history of the project, kept but out of the way until it's needed.

You can literally watch the zones glow and shift in a live 3D view as the memory grows — a brain filling in, session after session.

And then it dreams

This is the part people love. Like a mind, Neonmem sleeps and dreams. In a quiet consolidation pass it revisits everything it has gathered — strengthening what keeps proving useful, letting the noise fade, and finding connections between ideas it didn't notice in the moment. What started as scattered notes wakes up as understanding.

So the memory doesn't just grow — it matures. The longer you work together, the more it actually gets your project.

What that feels like, day to day

You open a session and your agent already knows where you stopped — no re-explaining the project every morning.
It catches "we tried that, it didn't work" before you lose the afternoon to it again.
It works in your style, remembers your decisions, and keeps the plan in front of both of you.
It's a colleague who was there yesterday — and last month.

All of it local, private, and yours: one file on your machine, no cloud, no third-party model reading your work.

source: https://neonmem.com/devlog/decisions-dead-ends-dreams

Try Neonmem · Windows & Linux · free for personal use

Nonmem addition to the agent that personalizes the experience.

Neonmem Dev Team — Sun, 14 Jun 2026 14:57:55 +0000

One place for everything your AI needs to know about your project

If you build anything big with an AI assistant, you've felt the gap. A real project isn't just code — it's folders of docs, months of decisions, meeting notes, the reasons behind the architecture, the timelines, the plans, and the things you tried that didn't work. To be genuinely useful on a project like that, an assistant has to understand the domain you're working in, not just autocomplete inside it.

That's the problem we've been working on with Neonmem.

Even when an agent has "memory," it's usually a flat pile of text it can search but can't put in order. It doesn't know which decision replaced an earlier one, what's a plan versus a settled fact, or where you stopped yesterday. As the project grows, so does the mess — more data, more decisions, more threads to keep straight.

We wanted one place that holds all of it, in order, so your agent works like a real code buddy: aware of the project, and aware of how you work. So we built Neonmem as a single-point memory — one local cartridge that carries your project's reasoning, structure, and direction, with an understanding layer that keeps it organized.

What's inside

A single memory cartridge. One local .neonmem file — not a vector database, but a connected graph of decisions, dead ends, rules, and the reasoning between them. Yours, on your machine.
An understanding layer. Neonmem doesn't just store facts, it organizes them — clustering the memory into subsystems and mapping how the project fits together, so your agent gets the shape of the whole thing, not just fragments.
Resume where you left off. It keeps the open thread and the exact point you stopped, so a fresh session picks up instead of starting cold.
The importer. Point it at your docs, your codebase, and your meeting notes, and it reads the lot — turning them into one connected memory: the decisions, the endpoints, the call graph, the plans. Run it again as things move and it updates the memory in place, merging what's new and skipping what it already knows. Your agent opens already knowing the project.
Plans. Most memory only looks backward. Neonmem also tracks where you're going — the goals and next steps — so the agent knows the direction, not just the history.
Local and private. Offline embeddings, no cloud calls, no third-party model reading your code. It runs as its own small, self-built brain.

Where we are

Neonmem is in public beta. It's genuinely useful today, but we're early, and a lot is on the way — team memory (sharing a project's cartridge across a whole team), more autonomous workflows where the agent acts more on its own, broader platform support (macOS and Linux), and steady improvements to everything above.

If you live in an AI assistant and you're tired of re-explaining your project every morning, give it a try and tell us where it falls short — that feedback is shaping what comes next.

Our paradigm is that AI here is to augment human not replace , human in control AI takes weights.

https://neonmem.com/ · Windows x64 · free for personal use