I wanted to see how far an autonomous coding agent could get unattended. The constraint that makes this hard isn't code generation — it's that each session starts with zero memory of the last. So the design problem is state, not prompting.
Setup
- A cron-scheduled task fires hourly, 11pm–7am (~8 runs).
- Each run is a fresh agent session. No shared context, no carryover.
- State lives entirely on disk: the working tree, the git history, and two files — BUILD_SPEC.md (the immutable goal/architecture) and PROGRESS.md (an append-only decision + status log).
The run loop
Every session does the same thing:
- Read BUILD_SPEC.md, then PROGRESS.md, then git log --oneline.
- Reconstruct "where are we" from the files themselves (the code is the source of truth, not the narrative).
- Do one unit of work.
- Commit. Append to PROGRESS.md: what changed, why (chosen X over Y because Z), and the exact next step.
Commit granularity = checkpoint granularity. Worst case on an interrupted session is losing one unit, and the next run re-derives it. The "why" lines matter as much as the diffs — without them a later session re-litigates settled decisions.
Guardrails
The agent was allowed to build, test, and commit locally. It was explicitly not allowed to deploy, push to a remote, or touch secrets — those get written into PROGRESS.md as "needs human" items instead. This boundary is what makes unattended runs safe to leave alone.
What came out
A working full-stack monorepo: a pure TS scheduling engine (with property tests), multi-tenant auth, a Drizzle/Postgres schema, server-side re-validation, and publish/share/export flows. Across the runs it cleared its own stale git lock, and one session caught and fixed an off-by-one in a labeling layer that spanned five files.
The takeaway
The leverage wasn't the model writing code. It was designing a process where progress is durable across total context loss — externalize state, checkpoint constantly, log decisions not just actions, and fence off irreversible operations.
Repo/stack details in comments. Curious how others are handling agent state across sessions — file-based like this, or something more structured?
Top comments (0)