DEV Community

shinobi apps
shinobi apps

Posted on • Originally published at shinobi-dev.hashnode.dev

Why AI coding agents need a task spine

I've been pair-programming with Claude since day one — long before Claude Code existed, before MCP existed, back when "AI coding assistant" still meant tab-completion. The setup got unreasonably good. Then I noticed I kept re-explaining the same things.

Me, Tuesday: "We picked Postgres for this project, not MongoDB. Use JSONB for the metadata column."

Me, Friday: "We picked Postgres for this project, not MongoDB. Use JSONB for the metadata column."

Me, Monday: "Wait, why did you suggest MongoDB?"

The agent is brilliant in any single session. Across sessions it has the memory of a goldfish. Every conversation starts from scratch. Every decision I thought we'd settled has to be re-litigated.

If you've been there too, this post is for you.

The shape of the problem

Three things that should persist across agent sessions, and don't:

  1. What are we working on? The list of tasks. Their status. Who claimed which one (you? the agent? a different agent on the other laptop?).

  2. What did we decide? The architectural / library / pattern choices we already made — and why. Not so the agent can parrot them; so it can build on them instead of relitigating.

  3. What did we already try? The dead-ends. The "we already tried that and the reason it didn't work was X" insights that are the most expensive to re-derive.

Tool-call memory inside a single session is great. Vector recall over your codebase is great. Neither solves project state.

The naive fixes that don't work

A README. I tried it. The agent doesn't always read it. When it does, it can't update it without overwriting your formatting. When two agents edit it concurrently you get conflicts.

A Notion / Linear / Jira board. The agent can call their API if you set up the integration. But: it's slow (every read is a network round-trip), it's noisy (sync delays), and the schema doesn't fit what agents actually care about (you don't need a Kanban with assignees and sprint planning; you need "is this task done? what's blocking it? what decisions touch this code path?").

Markdown files in .claude/ or .cursor/. Closer. Now the data is local. But it's plain text — no querying, no FTS, no relations.

What I actually wanted

Three things, none of which existed:

  1. A MCP server the agent can call to claim/complete tasks, log decisions, log dead-ends, and recall any of them later — without leaving the conversation.

  2. A local SQLite store that holds all of it. Querying. FTS. Cheap.

  3. A dashboard I can open in the browser to see what state the project is in, what decisions are pending, what the agent is doing right now. Just glance at it — don't need to ask the agent.

So I built it. It's called Shinobi. It's a single CLI:

npm install -g @shinobiapps/shinobi
cd /your/project
shinobi init
shinobi dashboard
Enter fullscreen mode Exit fullscreen mode

init drops an .mcp.json into the current directory that points at the local Shinobi MCP server. Restart Claude Code (or Cursor) and it auto-loads create_project, claim_task, log_decision, recall, and friends.

Now the next time I'm starting a session:

> Read the project context. What are we working on?

[claude calls recall, reads the project spine, reads the decisions log]

We're building the auth flow for $project. Decisions on record:
- 2026-06-12: Postgres + JSONB for session metadata (not MongoDB —
  per the constraint that the rest of the app already uses Postgres)
- 2026-06-14: NextAuth.js over rolling our own (3-line config vs.
  weeks of cookie/CSRF correctness)

Dead ends ruled out:
- 2026-06-13: Tried passing sessions as URL query strings; got
  flagged by the linting rule and breaks bookmark sharing.

Next task: subtask #7, "wire up the OAuth callback". Want me to
claim it?
Enter fullscreen mode Exit fullscreen mode

Yes. Yes I do. And we've already saved ten minutes of "remember when we decided…" thrash.

Why this matters more than it sounds

The bottleneck on AI-assisted development is not the model's single-turn cleverness. The model is already smarter than it needs to be for 90% of tasks. The bottleneck is getting the right context in front of it without you, the human, becoming the bottleneck.

A README is a chat log. A task spine is a database. The agent can read, write, and query a database without your typing. That's the unlock.

What's coming next in this series

  • Part 2: Mobile push approvals — how request_approval lets you step away from the laptop while the agent runs, and respond from your phone when it hits a fork.

  • Part 3: Multi-agent real-time sync — desktop and laptop, or you and a teammate, working in the same workspace with no manual git pull.

  • Part 4: Why the data stays local (SQLite + optional git sync) and what the hosted SaaS adds on top — the explicit "no lock-in" promise.

If you want to try Shinobi now: https://github.com/numbererikson/shinobi. MIT-licensed, self-host forever.

Top comments (0)