I've been pair-programming with Claude since day one — long before Claude Code existed, before MCP existed, back when "AI coding assistant" still meant tab-completion. The setup got unreasonably good. Then I noticed I kept re-explaining the same things.
Me, Tuesday: "We picked Postgres for this project, not MongoDB. Use JSONB for the metadata column."
Me, Friday: "We picked Postgres for this project, not MongoDB. Use JSONB for the metadata column."
Me, Monday: "Wait, why did you suggest MongoDB?"
The agent is brilliant in any single session. Across sessions it has the memory of a goldfish. Every conversation starts from scratch. Every decision I thought we'd settled has to be re-litigated.
If you've been there too, this post is for you.
The shape of the problem
Three things that should persist across agent sessions, and don't:
What are we working on? The list of tasks. Their status. Who claimed which one (you? the agent? a different agent on the other laptop?).
What did we decide? The architectural / library / pattern choices we already made — and why. Not so the agent can parrot them; so it can build on them instead of relitigating.
What did we already try? The dead-ends. The "we already tried that and the reason it didn't work was X" insights that are the most expensive to re-derive.
Tool-call memory inside a single session is great. Vector recall over your codebase is great. Neither solves project state.
The naive fixes that don't work
A README. I tried it. The agent doesn't always read it. When it does, it can't update it without overwriting your formatting. When two agents edit it concurrently you get conflicts.
A Notion / Linear / Jira board. The agent can call their API if you set up the integration. But: it's slow (every read is a network round-trip), it's noisy (sync delays), and the schema doesn't fit what agents actually care about (you don't need a Kanban with assignees and sprint planning; you need "is this task done? what's blocking it? what decisions touch this code path?").
Markdown files in .claude/ or .cursor/. Closer. Now the data is local. But it's plain text — no querying, no FTS, no relations.
What I actually wanted
Three things, none of which existed:
A MCP server the agent can call to claim/complete tasks, log decisions, log dead-ends, and recall any of them later — without leaving the conversation.
A local SQLite store that holds all of it. Querying. FTS. Cheap.
A dashboard I can open in the browser to see what state the project is in, what decisions are pending, what the agent is doing right now. Just glance at it — don't need to ask the agent.
So I built it. It's called Shinobi. It's a single CLI:
npm install -g @shinobiapps/shinobi
cd /your/project
shinobi init
shinobi dashboard
init drops an .mcp.json into the current directory that points at the local Shinobi MCP server. Restart Claude Code (or Cursor) and it auto-loads create_project, claim_task, log_decision, recall, and friends.
Now the next time I'm starting a session:
> Read the project context. What are we working on?
[claude calls recall, reads the project spine, reads the decisions log]
We're building the auth flow for $project. Decisions on record:
- 2026-06-12: Postgres + JSONB for session metadata (not MongoDB —
per the constraint that the rest of the app already uses Postgres)
- 2026-06-14: NextAuth.js over rolling our own (3-line config vs.
weeks of cookie/CSRF correctness)
Dead ends ruled out:
- 2026-06-13: Tried passing sessions as URL query strings; got
flagged by the linting rule and breaks bookmark sharing.
Next task: subtask #7, "wire up the OAuth callback". Want me to
claim it?
Yes. Yes I do. And we've already saved ten minutes of "remember when we decided…" thrash.
Why this matters more than it sounds
The bottleneck on AI-assisted development is not the model's single-turn cleverness. The model is already smarter than it needs to be for 90% of tasks. The bottleneck is getting the right context in front of it without you, the human, becoming the bottleneck.
A README is a chat log. A task spine is a database. The agent can read, write, and query a database without your typing. That's the unlock.
What's coming next in this series
Part 2: Mobile push approvals — how
request_approvallets you step away from the laptop while the agent runs, and respond from your phone when it hits a fork.Part 3: Multi-agent real-time sync — desktop and laptop, or you and a teammate, working in the same workspace with no manual git pull.
Part 4: Why the data stays local (SQLite + optional git sync) and what the hosted SaaS adds on top — the explicit "no lock-in" promise.
If you want to try Shinobi now: https://github.com/numbererikson/shinobi. MIT-licensed, self-host forever.
Top comments (0)