DEV Community

musecl
musecl

Posted on

The package.json Pattern for AI Agent Memory

Your agent learned 500 things last month. Then the disk died.

This isn't hypothetical — it happened to me. Months of accumulated corrections, preferences, and domain expertise, gone because I treated agent memory like disposable state instead of valuable data.

The Portability Gap

If you're running AI agents that learn over time, you've probably noticed a gap in the tooling:

  • Cloud memory APIs — portable, but your agent's knowledge lives on someone else's servers
  • Local-only systems — private, but one hardware failure away from zero
  • Protocol-based services — great architecture, but not portable across machines

What I wanted was simple: private, portable, auditable, and free.

A Pattern You Already Know

Every developer intuitively understands this separation:

package.json    → syncs via git
node_modules/   → rebuilds locally from package.json
Enter fullscreen mode Exit fullscreen mode

Agent memory has the same structure:

MEMORY.md       → syncs via git (source of truth)
vectordb/       → rebuilds locally from MEMORY.md (derived artifact)
Enter fullscreen mode Exit fullscreen mode

Source files are human-readable text — Markdown and JSON. They diff cleanly, version naturally, and merge without conflicts. Vector indexes are binary blobs that depend on your local embedding model. They should never touch git.

How It Works

The implementation is a single bash script (~200 lines) that does three things:

Push: Local → Git

  1. Compare workspace files against sync repo by content hash (skip unchanged)
  2. Copy changed .md and .json files
  3. Scan for secrets — abort if API keys found
  4. Commit and push

Pull: Git → Local

  1. git pull --ff-only (fails loudly if diverged — protects local changes)
  2. Copy files to agent workspaces with 600 permissions
  3. Rebuild vector indexes for agents that received changes

Status

Bidirectional drift check — shows files that exist locally but aren't synced, and vice versa.

What Gets Synced (and What Doesn't)

Synced Not Synced
*.md — Memory documents *.sqlite — Vector indexes
*.json — Structured data vectordb/ — Embedding stores
.gitignore *.jsonl — Session transcripts
sync.sh *.key, *.pem, *.env

The .gitignore does the heavy lifting here. Secrets and binary artifacts stay local. Only human-readable source files travel.

Multi-Agent, Multi-Machine

This pattern scales naturally. Each agent gets its own directory in the repo:

security-auditor/MEMORY.md
growth-engine/MEMORY.md
content-moderator/MEMORY.md
Enter fullscreen mode Exit fullscreen mode

Security knowledge doesn't pollute architecture patterns. And because it's just git, syncing across machines is a push/pull away.

The Secret Scanning Part

The scariest thing about syncing agent memory to git is accidentally pushing API keys. The script runs a regex scan before every commit — looks for common key patterns (sk-, ghp_, Bearer, base64 blobs) and aborts if anything looks suspicious.

Not bulletproof, but it catches the obvious mistakes that would otherwise end up in your git history forever.

Try It

The repo is MIT-licensed: github.com/musecl/musecl-memory

It's a bash script, not a framework. No dependencies beyond git. Works with any agent system that stores memory as files — Claude, GPT, local models, whatever.

If you're running agents that accumulate knowledge, I'm curious: how are you handling persistence today? The tooling landscape is still early, and I think there's a lot of room for different approaches.

Top comments (0)