k1lgor

Posted on Apr 30 • Originally published at dev.to

I Taught My AI Assistant to Remember (And Saved 99% of Its Brain)

#ai #llm #typescript #pi

The Problem: Your AI Is a Goldfish

Here's a scene that plays out in my terminal every single day:

Me: "Hey AI, what's the architecture of this project?"

AI: runs ls, runs find, runs grep, reads 15 files, spends 28,000 tokens

AI: "Here's the architecture! It's Express with MongoDB!"

Me: "Great, now implement the login route."

AI: runs ls, runs find, reads the same 15 files again, spends another 25,000 tokens

Me: quietly sobbing over my API bill

If you've used any AI coding assistant (Cursor, Copilot, Claude Code, Devin, whatever), you've seen this. The LLM has the memory of a goldfish. Every new turn is like waking up in a strange room going "Who am I? Where am I? What project is this? Let me read everything again."

It's not the LLM's fault. These models have context windows, not persistent memory. They can't close their eyes and remember that your project uses Express with MongoDB — they have to re-discover reality from scratch every time.

But here's the thing nobody talks about: This costs you a fortune.

The Price of Being a Goldfish

Let's put real numbers on it.

In a typical coding session, an LLM agent will explore the project by running ls, find, grep, and reading files. A lot of files. Every. Single. Turn.

I benchmarked this against real repos on GitHub. Here's what I found:

Repository	Files	Token Cost to Explore (read everything)	Token Cost with Memoir
A small side project (53 files)	246,000 tokens	~10K tokens	96%
A medium React app (258 files)	826,000 tokens	~41K tokens	95%
A serious codebase (2,538 files)	12.2 MILLION tokens	~67K tokens	99%
TypeScript compiler (40,877 files)	36 MILLION tokens	~19K tokens	~100%

Let me repeat that last one:

36 million tokens to explore the TypeScript repo with traditional methods. 19 thousand with the memoir.

That's a ~99.95% reduction.

Or, put another way: every time your AI asks "what's in this project?", it's burning through the equivalent of 72 copies of the Great Gatsby in tokens. When it could be burning through a single tweet.

The Insight: Project Knowledge Almost Never Changes

The core realization is so obvious it hurts:

Your project's basic structure, dependencies, entry points, and source files change maybe once a day. Your LLM queries it 20 times per session.

Why are we paying 30,000 tokens to rediscover the same package.json every 45 seconds?!

If a human dev joined your team, you wouldn't make them re-read the entire codebase every time they wanted to write a function. You'd say "the routes are in src/api/, the DB layer is in src/db/, here's the README." They'd remember.

Why can't our AI assistants do the same?

Well… now they can.

Enter pi-memoir

pi-memoir (pronounced pi-mem-wahr, like the French word for memory — because developers love obscure multilingual puns) is an extension for the pi coding agent that gives your LLM persistent project memory.

Here's what it does in one sentence:

Harvest your project's structure once. Let the LLM query it for ~100 tokens instead of ~30,000. Forever.

How it works

Your Project/
└── .pi/memoir/
    └── memories.jsonl    ← Everything the AI knows, in one file

When you run /memo harvest (or call memo_harvest), pi-memoir walks every file in your project and builds a compact knowledge base:

📄 README → tagged project:readme
📦 package.json → tagged project:manifest
🏗️ Directory tree → tagged project:structure
🚪 Entry points → tagged project:entry
⚙️ Config files → tagged project:config
Every single source file → tagged project:file + file:path/to/file.ts

That's right — it stores a memory for every .ts, .js, .py, .rs, .go, .rb, .vue, .svelte — actually 128 file extensions in total. Up to 200 files, each under 20KB. Smart summaries per file type. It skips node_modules, .git, dist, and all the usual junk.

Then the magic happens.

The "DON'T USE BASH" Rule

The extension injects a single instruction at the top of every system prompt:

=== PI-MEMOIRE: DON'T USE BASH — USE THE MEMOIRE ===
CRITICAL: Before running ANY bash/ls/find/grep/wc/read commands
to explore the project, you MUST call memo_search first.
Querying the memoir costs ~100 tokens. Running bash to discover
the same info costs ~2,000+ tokens.

• "what's the architecture?" → memo_search({ query: "architecture" })
• "what files?" → memo_search({ tags: "project:structure" })
• "dependencies?" → memo_search({ query: "package" })
...
If memo_search returns nothing, THEN fall back to bash/read.

Think of it as telling your AI: "Check your notes before you run around the office asking questions."

And because the tools (memo_search, memo_store, memo_harvest) are registered as first-class LLM callable tools — same as read, bash, or edit — the model actually uses them. It's not a suggestion. It's a protocol.

The Numbers That Made My Jaw Drop

I ran pi-memoir against six real-world repositories — from a 53-file side project to the 40,877-file TypeScript compiler — and measured the token cost of two approaches:

The Old Way: LLM runs bash/ls/find/grep, reads files one by one
The Memoir Way: LLM calls memo_search with keywords

Repository	Files	Size	Old Way	Memoir Way	Savings
VibeVoice	53	264 MB	~246K tok	~10K tok	96%
rtk	258	4.5 MB	~826K tok	~41K tok	95%
claude-mem	679	104 MB	~2.7M tok	~55K tok	98%
oh-my-codex	921	15 MB	~2.9M tok	~71K tok	98%
hermes-agent	2,538	64 MB	~12.2M tok	~67K tok	99%
TypeScript	40,877	548 MB	~36M tok	~19K tok	~100%

The chart is basically comedy:

Token Cost:  The Old Way ████████████████████████████████████████ 36,000,000
            The Memoir Way ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▏     19,000

Every repo, every language, every size — 95-100% savings. Not "theoretically." Not "in ideal conditions." Real repos, real numbers.

What This Means for Your Wallet

Running Claude Sonnet 3.5 at $3/M input tokens:

Without memoir: 1 session × 10 explorations × 30K tokens = 300K tokens = $0.90/session
With memoir: 1 session × 1 harvest (30K) + 9 searches (100 each) = 30.9K tokens = $0.09/session

On a team of 10 devs doing 5 sessions a day? That's $4,500/month → $450/month.

Your AI just got 10x cheaper. And you don't have to buy it a fishbowl.

The Quirky Bits

It stores memories in a JSONL file

Yes, the entire "knowledge base" is a flat JSONL file in .pi/memoir/memories.jsonl. No vector database. No Postgres. No Redis. A flat file.

And you know what? It works better. The memoir stores 500-odd memories for a 40K-file project in ~200KB of text. That's smaller than a single JPEG of a cat. And JSONL means you can grep it, wc -l it, pipe it through jq, check it into git, read it on your phone while commuting. It's the most portable "database" in the universe.

Zero external dependencies

Seven TypeScript files. 1,597 lines of code. Zero npm dependencies.

It uses typebox — which is bundled with pi anyway — and that's it. No vector DBs, no embeddings APIs, no LLM calls at harvest time. Just the filesystem, JSON.parse, and a TF (term frequency) scoring algorithm that would make a CS undergrad shrug.

But it works. For the simple reason that project knowledge is mostly keywords and categories, not semantic embeddings. When the LLM asks "what's the architecture?", it doesn't need cosine similarity between a 1536-dimensional vector and the README. It needs a string match on the word "architecture" paired with the project:structure tag.

It auto-captures decisions on shutdown

When you close a session, pi-memoir scans the conversation for decisions, architecture changes, and important notes — and stores them automatically with an "auto" source tag. So even when you forget to /memo store "we decided to use Drizzle ORM instead of Prisma", the memoir remembers.

No more "wait, why did we switch databases?" scrolls through 3 hours of chat history "oh right, the connection pool issue."

Okay But How Do I Actually Use This?

1. Install pi

You need pi, the LLM-native coding agent. It's bun install -g @mariozechner/pi-coding-agent.

2. Install pi-memoir

pi install git:github.com/k1lgor/pi-memoir

Or, if you're the type who likes living on the edge:

pi install ./path/to/cloned/pi-memoir

3. Harvest your project

/memo harvest

Walk away. Make coffee. It takes ~2 seconds on a small project, ~30 seconds on TypeScript.

4. Work normally

That's it. The LLM now queries the memoir instead of running ls/grep/find. You'll see it in the logs:

🧠 [memo_search] Found 3 results for "architecture" (tags: project:structure)

And in your wallet:

💸 Token savings: 99.9% this session

Bonus: Benchmark your own project

The extension ships with a standalone bench script:

node bench.mjs . --all

📈 Summary
  Files scanned:    40,877
  Read files:       ~36,362,143 tokens
  Query memoir:    ~19,293 tokens
  Savings:          ~36,342,850 tokens (99.9%)

If you can't feel the dopamine rush from a 99.9% savings number, I don't know what to tell you.

What's Next?

The current version is already saving me ~$200/month in LLM costs. But here's what's cooking:

🔬 Semantic search via LLM re-ranking — for when keyword matching isn't enough
🕸️ Knowledge graph from cross-file relationships — imagine the memoir telling the LLM "this function depends on that module which is imported by this route"
📓 Obsidian vault export — because what's more developer than exporting a perfectly usable knowledge base into a note-taking app you'll abandon in 3 months

The One-Liner

If you use AI coding agents, you are paying for the same ls command 47 times a session. Pi-memoir replaces that with a cheap memory lookup.

It's one install command. It takes 2 seconds to harvest. It saves 95-99% of your project exploration tokens.

Install it. Harvest your project. Watch your token count crater.

P.S. — If you're still not convinced, run the benchmark against your own project. node bench.mjs . --all. I promise the number will make you smile.

P.P.S. — The name is "pi-memoir" but you can pronounce it however you want. I'm partial to "pee-mem-wah" like you're a very confused French person talking about urination. But "pie-memory" works too.

Built with ❤️ by k1lgor. Inspired by MemPalace (verbatim memory with semantic retrieval) and Graphify (knowledge graph extraction).

DEV Community