DEV Community

Ani
Ani

Posted on

I Built a tool to give AI coding agents persistent memory and a way smaller token footprint

Been building with AI coding agents for a while now. Claude Code, Cursor, Antigravity, and two things kept annoying me enough that I finally just built something to fix them.


The two problems

Problem 1: Your agent reads a 1000-line file and burns 8000 tokens doing it.

That's before it's done anything useful. Large codebases eat context fast, and once the window fills up, you're either compressing (lossy) or starting over. Neither is great.

Problem 2: Every new session, your agent starts from zero.

It doesn't remember that the API rate limit is 100 req/min. It doesn't remember the weird edge case in the auth module you spent two hours debugging last week. It doesn't remember anything. You either re-explain everything, or watch it rediscover the same gotchas.

These aren't niche complaints — if you're using AI agents to work on real codebases, you've hit both of these.


What I built

agora-code — persistent memory and context reduction for AI coding agents. Works with Claude Code, Cursor, and Gemini CLI. Survives context resets, new conversations, and agent restarts.

It's early. It works. I want people to try it.


How it handles token bloat

Instead of letting the agent read raw source files, agora-code intercepts every file read and serves an AST summary instead.

Real example: summarizer.py is 885 lines. Raw read = 8,436 tokens. Summarized = 542 tokens. That's a 93.6% reduction — and the agent still gets all the signal: class names, function signatures, docstrings, line numbers.

It works across languages too:

File type Method What you get
Python stdlib AST Classes, functions, signatures, docstrings
JS, TS, Go, Rust, Java + 160 more tree-sitter Same — exact line numbers, parameter types
JSON / YAML Structure parser Top-level keys + shape
Markdown Heading extractor Headings + opening paragraph

Summaries are cached in SQLite, so re-reads on the same branch are instant.


How it handles memory loss

When a session ends, agora-code parses the transcript and extracts a structured checkpoint: what was the goal, what changed, what non-obvious things did you find, what's next.

At the start of the next session, the relevant parts are injected automatically — last checkpoint, top learnings from recent commits on the branch, git state, symbol index for dirty files.

You can also manually store findings:

agora-code learn "POST /users rejects + in emails" --tags email,validation
agora-code learn "Rate limit is 100 req/min" --confidence confirmed
Enter fullscreen mode Exit fullscreen mode

And recall them later (keyword search by default, semantic search if you wire up embeddings):

agora-code recall "email validation"
agora-code recall "rate limit"
Enter fullscreen mode Exit fullscreen mode

Storage is three layers: an active session file (project-local, gitignored), a global SQLite DB scoped per project via git remote URL, and search (FTS5/BM25 always on, optional vector search).


What happens automatically (Claude Code)

Once hooks are installed, you don't have to think about most of this:

When you… agora-code automatically…
Start a session Injects last checkpoint + relevant learnings
Submit a prompt Recalls relevant past findings, sets session goal
Read a file > 100 lines Summarizes via AST — serves summary instead
Edit a file Tracks the diff, re-indexes symbols
Run git commit Derives learnings from the commit
Context window compresses Checkpoints before, re-injects after
End a session Parses transcript → structured checkpoint in DB

Getting started

pip install git+https://github.com/thebnbrkr/agora-code.git
Enter fullscreen mode Exit fullscreen mode

Then in your project:

cd your-project
agora-code install-hooks --claude-code
Enter fullscreen mode Exit fullscreen mode

For Cursor and Gemini CLI, you copy a config directory into your project root — full instructions in the README.

At the start of every Claude Code session, run /agora-code to load the skill. That's the bit that tells the agent when to summarize, when to inject context, when to save progress.


It's early

APIs may change. Things might break. I'm actively working on it — semantic search is in progress, automated hook setup for Cursor and Gemini is on the roadmap.

If you try it and hit something weird, open an issue. If you want to add hook support for a different editor, the pattern is consistent across .claude/hooks/ and .cursor/hooks/ — PRs welcome.

GitHub: https://github.com/thebnbrkr/agora-code

Screenshot: (https://imgur.com/a/APaiNnl


Would love to hear if this solves the same pain points for others, or if you're handling token bloat / memory loss differently. Drop a comment.

Top comments (0)