Why AI coding agents need a task spine

shinobi apps — Wed, 03 Jun 2026 07:10:00 +0000

I've been pair-programming with Claude since day one — long before Claude Code existed, before MCP existed, back when "AI coding assistant" still meant tab-completion. The setup got unreasonably good. Then I noticed I kept re-explaining the same things.

Me, Tuesday: "We picked Postgres for this project, not MongoDB. Use JSONB for the metadata column."

Me, Friday: "We picked Postgres for this project, not MongoDB. Use JSONB for the metadata column."

Me, Monday: "Wait, why did you suggest MongoDB?"

The agent is brilliant in any single session. Across sessions it has the memory of a goldfish. Every conversation starts from scratch. Every decision I thought we'd settled has to be re-litigated.

If you've been there too, this post is for you.

The shape of the problem

Three things that should persist across agent sessions, and don't:

What are we working on? The list of tasks. Their status. Who claimed which one (you? the agent? a different agent on the other laptop?).
What did we decide? The architectural / library / pattern choices we already made — and why. Not so the agent can parrot them; so it can build on them instead of relitigating.
What did we already try? The dead-ends. The "we already tried that and the reason it didn't work was X" insights that are the most expensive to re-derive.

Tool-call memory inside a single session is great. Vector recall over your codebase is great. Neither solves project state.

The naive fixes that don't work

A README. I tried it. The agent doesn't always read it. When it does, it can't update it without overwriting your formatting. When two agents edit it concurrently you get conflicts.

A Notion / Linear / Jira board. The agent can call their API if you set up the integration. But: it's slow (every read is a network round-trip), it's noisy (sync delays), and the schema doesn't fit what agents actually care about (you don't need a Kanban with assignees and sprint planning; you need "is this task done? what's blocking it? what decisions touch this code path?").

Markdown files in .claude/ or .cursor/. Closer. Now the data is local. But it's plain text — no querying, no FTS, no relations.

What I actually wanted

Three things, none of which existed:

A MCP server the agent can call to claim/complete tasks, log decisions, log dead-ends, and recall any of them later — without leaving the conversation.
A local SQLite store that holds all of it. Querying. FTS. Cheap.
A dashboard I can open in the browser to see what state the project is in, what decisions are pending, what the agent is doing right now. Just glance at it — don't need to ask the agent.

So I built it. It's called Shinobi. It's a single CLI:

npm install -g @shinobiapps/shinobi
cd /your/project
shinobi init
shinobi dashboard

init drops an .mcp.json into the current directory that points at the local Shinobi MCP server. Restart Claude Code (or Cursor) and it auto-loads create_project, claim_task, log_decision, recall, and friends.

Now the next time I'm starting a session:

> Read the project context. What are we working on?

[claude calls recall, reads the project spine, reads the decisions log]

We're building the auth flow for $project. Decisions on record:
- 2026-06-12: Postgres + JSONB for session metadata (not MongoDB —
  per the constraint that the rest of the app already uses Postgres)
- 2026-06-14: NextAuth.js over rolling our own (3-line config vs.
  weeks of cookie/CSRF correctness)

Dead ends ruled out:
- 2026-06-13: Tried passing sessions as URL query strings; got
  flagged by the linting rule and breaks bookmark sharing.

Next task: subtask #7, "wire up the OAuth callback". Want me to
claim it?

Yes. Yes I do. And we've already saved ten minutes of "remember when we decided…" thrash.

Why this matters more than it sounds

The bottleneck on AI-assisted development is not the model's single-turn cleverness. The model is already smarter than it needs to be for 90% of tasks. The bottleneck is getting the right context in front of it without you, the human, becoming the bottleneck.

A README is a chat log. A task spine is a database. The agent can read, write, and query a database without your typing. That's the unlock.

The rest of the series — now shipped

Mobile push approvals — request_approval pings your phone; the agent blocks until you tap yes/no. Live.
One brain, every device — desktop, laptop, and phone all hit the same remote MCP store, no manual sync. Live.
Local-first, no lock-in — SQLite + optional git sync; self-host forever, MIT. The hosted brain is opt-in.