Karthick

Posted on Jun 25

AI Is Too Good at Coding - So I Built Something to Keep the Why (and Make It Scale)

#ai #claude #opensource #cli

Over the past few months I've been building with Cursor, Claude Code, and Codex - same tools a lot of builders are running right now. The pace is genuinely wild. I've shipped MVPs for friends where we went from a rough PRD to a working demo in a single day: auth wired up, database in place, deployed, link sent before midnight. Some of it fully vibe-coded. Some of it structured. All of it fast.

You're seeing this everywhere now. Builders spinning up entire products over a weekend, sometimes over an afternoon, because the tools finally keep up with the idea in your head.

And here's the thing people undersell: AI coding agents are actually pretty good at following engineering principles - when they have something to follow. Hand them a PRD, point them at your conventions, set a few non-negotiable rules, and they mostly stick to it. They'll reuse your components instead of inventing new ones. They'll write tests if tests are part of the standard. They won't skip the boring obvious stuff if the boring obvious stuff is written down somewhere they load every session.

The gap I kept hitting wasn't the coding. It was what happened to the decisions along the way.

When you move this fast, you make dozens of small calls in chat: which auth flow, which database, what "done" means for this feature, whether this module owns that boundary. The code commits to the repo. The reasoning usually doesn't. It stays in a thread that gets compacted, or a session that ends, or your head - which is a rough place to store architecture when you're juggling three builds at once.

I didn't want to slow down. I wanted the speed and a durable layer the agent could reload every session: accepted decisions, conventions, quality gates, evidence that work is actually finished.

So I built Persist OS - a local CLI that writes that layer into plain repo files and enforces it with a deterministic persist doctor gate. No network. No telemetry. No AI calls.

npx persist-os@latest init

The actual failure mode

Most agent setups treat chat history like it's source of truth. It isn't - it's the thing that evaporates first.

Persist OS flips the priority order. Repository memory wins over MCP context, external notes, and chat history. Accepted ADRs and engineering standards outrank whatever the model guessed in the last message. If they conflict, the agent is instructed to stop and report - not quietly pick a side.

That's the problem it solves: not "help me remember," but make the repo authoritative so the next session doesn't start from zero.

What `persist init` puts in your repo

One command scaffolds the whole memory structure under docs/ plus .persist/config.json:

docs/
  00-product/          PRD, vision, roadmap
  10-architecture/     stack, boundaries, decisions
  20-security/         threat model, security rules
  30-modules/          ownership per module
  40-features/         PRD + acceptance + test plan per feature
  50-quality/          quality gates
  60-engineering/      standards, agent rules
  adrs/                proposed → accepted decision records

It also generates the files your agent actually loads:

AGENTS.md - routing rules + Persist commands the agent runs itself
CLAUDE.md - short pointer for Claude Code ("read AGENTS.md, don't trust chat")
.cursor/rules/persist-memory.mdc - always-apply rule in Cursor
.persist/hooks/pre-commit - runs persist doctor before commits land

Pick a stack preset if you want proposed ADRs for real forks (nextjs, python-fastapi, laravel-react, etc.). Every preset decision stays Proposed until you explicitly accept it - the schema won't let a preset silently become truth.

persist init --preset nextjs --ai-tools cursor,claude
git config core.hooksPath .persist/hooks   # enable the doctor hook once per clone

How I run a one-day MVP with it

Say I'm building a checkout flow for a friend's shop. Fast build, but I still want the decisions to stick.

1. Scaffold memory before the agent touches code

persist feature create checkout

That creates docs/40-features/F-###-checkout/ with PRD.md, ACCEPTANCE.md, TEST_PLAN.md, TASKS.md, and the rest - not empty philosophy docs, but the exact checklist the project's delivery workflow expects. The agent isn't supposed to jump straight into src/. Module work follows the same pattern: feature docs first, module memory under docs/30-modules/, tasks after acceptance criteria and test plan exist.

2. Record decisions as proposals, accept what I mean

persist adr create payment-provider    # writes docs/adrs/proposed/ADR-PROPOSED-payment-provider.md
persist adr accept payment-provider    # promotes to docs/adrs/ADR-####-payment-provider.md

Nothing becomes accepted memory by accident. persist adopt on an existing repo and persist mcp add figma work the same way - inspect, propose, human accepts. The agent can capture MCP context offline into docs/ai/mcp/<server>.md, but Persist never calls the MCP server itself.

3. Let the agent load rules every session

Generated AGENTS.md tells the agent what to read and what to run:

Repository rules override model preferences. If instructions conflict, stop and report the conflict.

- `persist doctor` - validate repository memory; run before claiming any work is complete.
- `persist feature create <name>` - scaffold feature memory before non-trivial work.
- `persist adr supersede <old> <new-title>` - record a changed decision (never overwrite an accepted ADR).

In Claude Code, a SessionStart hook injects a live map of accepted ADRs and modules on top of that. Cursor gets the always-apply rule. Codex reads AGENTS.md directly. Different mechanisms, same floor: the agent reads the map, follows pointers into detail, doesn't need you to re-paste the architecture every time.

4. Gate "done" with evidence, not vibes

Before the agent can honestly say a feature is finished, the completion loop is:

pnpm test:run && pnpm typecheck && persist doctor

persist doctor is deterministic - no model judging your architecture. Exit codes: 0 healthy, 1 warnings, 2 errors. Same repo state, same result every time.

Concrete things it catches that chat never would:

Check	What breaks
Completion evidence	Feature marked complete but `REVIEW.md` still pending or no test results recorded
ADR drift	Module doc cites `ADR-0003-auth` but that ADR was never accepted - error
Superseded decisions	Memory still points at an ADR you've since superseded - warning
Staleness	`src/` changed six commits ago, memory hasn't been touched since - warning
Context budget	`AGENTS.md` + Cursor rule + `CLAUDE.md` bloated into a wall of text - warning

So when you're moving fast, "done" means persist doctor exits 0 - not "the agent said it looks good." You can add persist guard --source src to fail commits where source changed without test files staged.

What changes when a decision changes

Fast builds change their mind. The failure mode there is silent contradiction - code on MySQL, ADR still says PostgreSQL, everything "passes."

Persist handles that with persist adr supersede <old> <new-title>: old ADR marked superseded, new one links back, Doctor flags anything still citing the old decision. The generated agent rules include the supersede command inline so the model uses it instead of editing history.

What it doesn't do: automatically detect that your accepted ADR and your current code quietly disagree. That needs reading, not grep. There's a generated architecture-drift-review skill for that. The gate stays dumb; the agent does the interpretation.

Not a vector memory engine (and that's intentional)

Tools like supermemory or mem0 solve retrieval - "find me what we said about Auth0." Useful, different problem.

I kept losing accepted decisions with review trails: where it was decided, whether a human accepted it, whether it's still the decision. That's a files-and-PR-review problem. Plain markdown ADRs, explicit accept/supersede, git diffs. Vector stores can index these files later; they don't replace them.

When it's worth the overhead

Use it when someone will inherit the repo - teammate, future you, fresh agent next week. That's when losing the why costs real time.

Skip it for a throwaway spike you'll delete Friday. No second session, no payoff.

The name is the test: persist what should persist. Prototype turns into product → run persist init. Stay proportional - small fix, just ship it; real feature or architectural fork, write it down.

Try it

MIT licensed. Runs entirely on your machine.

npx persist-os@latest init

Repo: github.com/Karthick-Ramachandran/persist-os
Why I built it: PHILOSOPHY.md
Example outputs per preset: examples/generated-nextjs/, examples/generated-python-fastapi/, etc. in the repo

If you're shipping MVPs in a day and want the agent to keep following the same rules on day seven - this is what I wished existed. Tell me what breaks when you run it.

Top comments (2)

John Gregoriadis • Jun 27

This resonates — the bottleneck has quietly moved from writing code to justifying it. When an agent produces 500 lines before you've finished your coffee, the scarce artifact isn't the diff, it's the reasoning that never made it into the diff: the alternatives you rejected, the constraint that made the ugly option the correct one, the "we'll regret this in six months but ship it now" note.

Since "make it scale" is the genuinely hard part, a few things I'd love to know how you handle:

Decay. A "why" captured at write-time is accurate for exactly one commit. The moment the surrounding code shifts, rationale anchored to line ranges goes stale silently — which is worse than no docs, because it lies with confidence. Do you anchor to semantic units (a function/symbol) instead of line numbers, and can you flag rationale that's drifted from the code it explains?
Capture cost. Documentation rots because writing it is friction at the worst possible moment. The only version that scales is one where the "why" falls out as a byproduct of work you already did — e.g., harvested from the plan/prompt you handed the agent — rather than a second pass nobody volunteers for.
Retrieval beats storage. Rationale is only worth keeping if it resurfaces at the moment of the next decision (review, refactor, the 2am incident), not when someone remembers to grep a wiki. Where does it surface for you — the PR diff, inline, somewhere ambient that's always in view?

One pattern I've seen actually hold up: treat the "why" as a gate, not a record. A pre-merge check that refuses why-less changes turns the slow 18-month architecture drift into something you can watch accumulating in real time — discipline at scale rather than hoping people document.

Curious which of these you leaned into. Nice work naming the thing everyone's quietly losing.

Karthick • Jun 27

Hey, @jonnonz thank you for your valuable feedback and questions. You basically nailed the split, the whole thing is built around: dumb deterministic gate for what's actually checkable, agent for the semantic stuff

To answer your last point, why as a gate is the core. That's exactly what I do with doctor command. Deterministic check, exit codes, drop it in pre-commit or CI. It won't let you call something "done" with no test evidence, or keep memory that cites a decision nobody actually accepted. Drift shows up as it accumulates in real time, not something you discover it when the project is about to end.

So for the capture cost is the same bet as you. The "why" has to fall out of the work you already did. So the agent records it as it goes, driven by rules it reloads every session, sourced from the PRD/prompt you already gave it. Not a second pass nobody volunteers for. Honest caveat though: only as reliable as the agent actually following those rules, and that's the part I'm still hardening.

Decay, I deliberately don't anchor to line ranges. Decisions live as ADRs at the architecture level, code refs are file-level, so you sidestep the silent line-rot thing. There's a deterministic staleness nudge (flags memory pointing at code that changed way after the fact) plus an agent drift-review pass for "is this still true?" But anchoring to a function/symbol I haven't done that yet, but seriously considering it for the next release.

Retrieval, today it shows up in two places: reloaded into the agent's context every session (ambient, not a wiki you grep), and at the gate in pre-commit/CI. Inline in the PR diff, or at the 2am incident specifically - not yet. That's really a gap, thank you for your valuable feedback. I'll consider it in the next release.

Really appreciate the feedback, thank you so much.
If you want to poke at it or contribute: github.com/Karthick-Ramachandran/p.... It would genuinely appreciate it.
Thank you so much again
Karthick