DEV Community

Cover image for How I built Rewind — a local-first AI memory layer for developers that records your terminal sessions and lets you chat with your history using Ollama.
Oridjinnn
Oridjinnn

Posted on

How I built Rewind — a local-first AI memory layer for developers that records your terminal sessions and lets you chat with your history using Ollama.

I have a bad habit.

I'll spend three hours debugging a nasty Docker networking issue, finally crack it, close the terminal, and then two weeks later hit the exact same problem. I know I solved it before. I remember the frustration. But the commands? Gone. The output that finally made it click? Gone.

I tried shell history. Too much noise. I tried keeping notes. Too much friction — I never remember to write things down while debugging. I tried asking AI assistants, but they don't know what I was actually doing on my machine.

So I built Rewind.


What Rewind does

Rewind is a CLI tool that records your terminal sessions, IDE activity, and AI conversations — then lets you recall and chat with that history using a local LLM via Ollama.

The key word is local. No cloud. No API keys. No subscriptions. Everything — embeddings, ranking, summaries, chat — runs on your machine. A single Go binary backed by SQLite.

$ rewind run docker build -t myapp .
● Recording... [exit 1] 2.3s

$ rewind chat qwen2.5:1.5b
> why did my docker build fail yesterday?
↳ Searching 47 sessions... found 3 relevant

[2h ago] docker build failed: COPY failed, file not found
The build tried to COPY ./dist but the folder didn't exist yet.
Run `npm run build` first, then retry the build.
Enter fullscreen mode Exit fullscreen mode

That's the whole pitch. Your terminal finally has memory.


The constraints I set for myself

I wanted to build this with zero budget and make it run on a potato laptop (mine is a Lenovo with an i7-4765T and 8GB RAM — not exactly a powerhouse).

That shaped every technical decision:

  • Go — single static binary, fast startup, easy cross-compilation
  • SQLite — embedded, zero infrastructure, WAL mode for performance
  • Ollama — run small quantized models locally, no GPU required
  • No background daemons — everything is on-demand

How it works under the hood

Recording

When you run rewind run <command>, it forks a child process, captures stdout/stderr in real-time, and writes events to SQLite as they stream in.

Before storing, two things happen:

Cleaning — ANSI escape sequences, spinner characters, and terminal control codes are stripped. Raw terminal output is surprisingly dirty; storing it verbatim makes recall useless.

Redaction — a pattern-based scanner checks each line for secrets before it hits the database. GitHub PATs, AWS keys, OpenAI tokens, Slack tokens, private keys — 12 patterns total. The last thing you want is your API keys ending up in a searchable local database.

// redact.go — simplified
var patterns = []*regexp.Regexp{
    regexp.MustCompile(`ghp_[A-Za-z0-9]{36}`),         // GitHub PAT
    regexp.MustCompile(`AKIA[0-9A-Z]{16}`),             // AWS Access Key
    regexp.MustCompile(`sk-[A-Za-z0-9]{48}`),           // OpenAI key
    // ... 9 more
}

func RedactCommand(line string) string {
    for _, p := range patterns {
        line = p.ReplaceAllString(line, "[REDACTED]")
    }
    return line
}
Enter fullscreen mode Exit fullscreen mode

Storage

Everything goes into SQLite with WAL mode enabled and 11 indexes. Sessions and events are stored separately with a foreign key relationship. A single LEFT JOIN query handles loading all sessions with their events — no N+1 problem.

SELECT s.id, s.command, s.title, s.summary, ...
       e.timestamp, e.type, e.content
FROM sessions s
LEFT JOIN events e ON e.session_id = s.id
ORDER BY s.started_at DESC, e.id
Enter fullscreen mode Exit fullscreen mode

Semantic recall

This is the interesting part. When you run rewind recall "docker networking issue", it:

  1. Embeds your query using nomic-embed-text via Ollama
  2. Loads cached embeddings from .rewind/embeddings/ (pre-computed, not re-generated each time)
  3. Ranks sessions using cosine similarity + recency decay
  4. Returns the top matches

The recency decay matters more than it sounds. Without it, an old session with a perfect semantic match will beat a recent session that's slightly less similar. In practice, you almost always care more about what happened recently.

// ranking — simplified
score := cosineSimilarity(queryVec, sessionVec)
age := time.Since(session.StartedAt).Hours() / 24 // days
decayedScore := score * math.Exp(-0.1 * age)
Enter fullscreen mode Exit fullscreen mode

Chat with context

rewind chat <model> loads your most relevant sessions and injects them as context before your conversation. The model "knows" what you've been working on without you having to explain it.

The chat engine uses streaming from Ollama's HTTP API — so responses feel responsive even on slow hardware.

IDE integration

This was the hardest part to architect. I wanted VS Code, JetBrains, and Neovim to all feed data into the same SQLite database without building three completely different integrations.

The solution: a local JSON-RPC server (rewind ide start) that all extensions talk to. Each extension sends events — file opens, saves, git operations, AI suggestions, build/test results — using the same protocol. The server writes them to SQLite and links them to shell sessions via a Bridge layer.

VS Code  ──►┐
JetBrains──►├──► JSON-RPC server ──► SQLite ──► recall / chat
Neovim   ──►┘         (Go)
Enter fullscreen mode Exit fullscreen mode

IDE recording is opt-in per-project. Nothing records until you explicitly enable it:

rewind ide permissions vscode on /path/to/project
Enter fullscreen mode Exit fullscreen mode

What I learned building this

Start with the storage layer. I initially had everything in JSON files. Migrating to SQLite mid-project was painful — I had to write a migration tool and keep the old JSON reader alive. If I started over, SQLite from day one.

Embedding cache is critical for performance. The first version re-embedded every session on every recall query. On a slow machine with 47 sessions that meant 47 HTTP calls to Ollama before returning a single result. Caching embeddings to disk made recall go from ~60 seconds to ~2 seconds.

Secret redaction is non-negotiable. I almost shipped without it. A developer's terminal output is full of tokens, keys, and credentials. If you're building anything that stores terminal history, build redaction first.

Single binary is a superpower for adoption. No Docker, no Python venv, no npm install. go build, move the binary, done. For a tool people need to trust enough to let it record their terminal, low friction installation matters a lot.


Current state

Rewind is in active development. What's working today:

  • ✅ Terminal recording with redaction and cleaning
  • ✅ SQLite storage with WAL mode
  • ✅ Semantic recall via Ollama embeddings
  • ✅ Chat with session context
  • ✅ Shell hooks for auto-recording (bash/zsh/fish)
  • ✅ VS Code, JetBrains, and Neovim extensions
  • ✅ Web UI for browsing sessions
  • ✅ Export to HTML/Markdown
  • ✅ Shell history import

On the roadmap:

  • [ ] rewind sync — optional encrypted backup to S3/R2
  • [ ] MCP server — expose Rewind memory to Claude Code, Cursor, and other AI tools
  • [ ] GitHub Actions integration — record CI runs

Try it

git clone https://github.com/Oridjinnn/Rewind.git
cd Rewind
go build -o rewind ./cmd/rewind

# Pull models
ollama pull qwen2.5:1.5b
ollama pull nomic-embed-text

# Record something
./rewind run ls -la

# Chat with your history
./rewind chat qwen2.5:1.5b
Enter fullscreen mode Exit fullscreen mode

The repo is at github.com/Oridjinnn/Rewind — MIT licensed, contributions welcome.

If you're building something on top of Rewind (a smart terminal, an agent, an IDE plugin), I'd love to hear about it. Drop a comment or open an issue.


Any command. Any session. Any question. Rewind knows.

Top comments (1)

Collapse
 
habel_davidson_ profile image
Habel Davidson

Nice work!