How We Gave CLI Coding Agents Persistent Memory Using a Proxy and Cognee

#agents #ai #cli #showdev

Every morning, I open my terminal and re-explain my entire project to an AI that worked on it yesterday. The architecture, the naming conventions, the bug we were halfway through fixing — all gone. Close the terminal, lose the context. We got tired of it, so we built cliMEM.
The problem
If you've used Claude Code, OpenCode, or Codex CLI, you know how good these agents have become. They read your codebase, refactor across files, and fix bugs faster than you can describe them. But they all share one embarrassing flaw: they're completely stateless.

Every session starts from zero. The agent doesn't know what you decided last Tuesday, what conventions your team follows, or which thread you left half-finished. You either re-explain everything each time, or watch the agent confidently contradict decisions it made yesterday.

The usual workaround is stuffing everything into a giant instructions file or a bigger context window. But that doesn't scale — context windows fill up, instruction files go stale, and none of it captures what actually matters: the decisions made mid-conversation, the conventions that emerged naturally, the open threads you meant to come back to.

What we wanted was simple to say and annoying to build: an agent that remembers each project the way a teammate would — automatically, per project, without changing how we work. That's what our team AI ALCHEMISTS built at the WeMakeDevs hangover hackathon.
How it works
cliMEM is a local proxy server that sits between your CLI agent and its AI provider. The agent thinks it's talking to its provider; it's actually talking to us.

CLI Agent → cliMEM proxy (localhost:8000) → AI Provider

                │

                ├── injects remembered context into every request

                ├── logs the conversation as it happens

                └── on session end: extracts facts → stores in Cognee

Capturing. As requests flow through, the proxy logs every message of the conversation.

Remembering. When the session ends — Ctrl+C or an idle timeout — the chat log goes to filter.py, a rule-based extractor that distills the raw conversation into atomic, self-contained facts, categorized as things like decision, convention, open_thread, and architecture. We deliberately chose heuristics over an LLM summarization step: no extra API cost, no added latency, no dependency on yet another model call. Those facts are stored in Cognee, a graph + vector memory engine, scoped to the project's working directory so contexts never bleed between projects.

Recalling. In a new session, the proxy intercepts your prompt, searches Cognee for relevant facts, and injects them (plus a live file tree) into the system prompt before forwarding the request. Your agent's original instructions are preserved — we append, never replace. The AI answers with full project memory, and you never notice anything changed.

Setup is two commands: climem start runs the proxy, and climem configure claude (or opencode, or codex) points your agent at it. One command restores the original config.
What actually went wrong
Hackathon writeups usually skip this part. We won't — these problems ate most of our nights, and each one taught us something we couldn't have learned from a tutorial.

The crash that only happens when nothing happens Our server kept dying at shutdown with DatabaseNotCreatedError — and for the longest time we couldn't reproduce it consistently. That's the worst kind of bug: the one that seems random.

When we finally traced it, the cause was almost funny. Cognee's database schema doesn't exist until migrations run. Our store_memory() had an early return — if a session produced no memorable facts, it never called cognee.add(), so the schema never got created. search_memory() failed silently by design (wrapped in try/except). But improve_memory() had no such protection — it slammed straight into a database that didn't exist.

The trigger condition? A session where nothing interesting was said. All our test messages were casual questions — no decisions, no facts — so the crash fired every single time, while looking completely unrelated to anything we'd typed. We "fixed" it by manually running cognee.run_migrations() in a terminal, felt relieved for about an hour, So we moved it into the code: one await cognee.run_migrations() at server startup, idempotent, invisible, permanent,later we fixed it by adding cognee.run_migrations() to our code.

Three embedding providers, three different ways to fail This was the most demoralizing stretch of the hackathon. The memory engine — the whole point of the project — wouldn't embed anything.

NVIDIA NIM rejected our requests as malformed. Digging in, we found two API contract mismatches: NVIDIA requires an input_type field ("query" vs "passage") that Cognee never sends, and Cognee passes a dimensions parameter that NIM models don't accept. Cognee's embedding engines are built around the OpenAI spec with no way to add or remove parameters per provider — so we were stuck on both sides of the contract at once.

Jina was our plan B. It failed before a single request even left the machine — Cognee couldn't map Jina's models to the right tokenizer, so chunking broke at step zero. (An upstream PR, cognee#3762, now addresses this class of bug.)

All of this while juggling rate-limited API keys and a hackathon clock. There's a special kind of frustration in watching your architecture work perfectly on paper while the one dependency you can't control keeps saying no. We finally stopped fighting external APIs entirely and switched to fastembed — fully local embeddings, no API contract to violate, no rate limits to hit. Sometimes the winning move is refusing to play.

Turning our worst bug into an upstream contribution Here's the part we're proudest of. After the hackathon, we didn't let the NVIDIA problem go. We searched Cognee's repo: the dimensions half traced to a closed issue (cognee#1961), but the input_type gap was completely unreported. Nobody had hit it — or nobody had written it down.

So we built the fix ourselves: a new EMBEDDING_INPUT_TYPE config setting, provider detection for NVIDIA NIM (from either the provider field or the model prefix — they can disagree, which we learned the hard way), conditional omission of the unsupported dimensions parameter, and forwarding input_type via extra_body for self-hosted NIM-style servers. Four unit tests, all passing, no regressions in the existing suite. The pull request is on its way to topoteretes/cognee.

A bug that cost us hours of hackathon sleep is becoming our first open-source contribution. That trade feels right.
The honest constraint
We're students at Amrita Vishwa Vidyapeetham with 9-to-5 classes, so this was built in the gaps — late nights, between-lecture debugging, and a lot of coordination over chat. With more time we're confident we could have attempted a second problem statement — instead, we chose to ship one thing that works. We'd make that trade again.
See it in action
📺 Watch the 60-second demo

⭐ Star the repo on GitHub — and if you try it and something breaks, open an issue. We know exactly how it feels to debug someone else's undocumented gotcha.
FOLLOW OUR INSTAGRAM PAGE FOR MORE INSIGHTS AND ABOUT CLIMEM https://www.instagram.com/alchemists.ai/

What's next
Raising the idle timeout out of test mode, hardening the fact extractor, and landing our NVIDIA NIM input_type fix in Cognee — because the best ending for a hackathon bug is a merged PR.

Built by Team AIALCHEMISTS — at the WeMakeDevs hackathon.

DEV Community

How We Gave CLI Coding Agents Persistent Memory Using a Proxy and Cognee

Top comments (0)