Dibyanshu kumar

Posted on Mar 23

How I Taught an AI Agent to Save Its Own Progress

#ai #agentai #workflow #productivity

AI coding agents are stateless. Every time you start a new session, the agent has no memory of what happened before. If the session crashes, if you close the terminal, if context runs out — everything the agent knew is gone.

I needed my agent to handle multi-hour development workflows. So I built a checkpoint system that lets the AI save and restore its own progress.

The Problem With Long Workflows

I use Claude Code for full development cycles — not just "write a function" tasks, but the whole thing: read a Jira ticket, write a design document, get it reviewed, implement across multiple modules, run tests, create PRs.

That's a lot of steps. And any one of them can fail:

The session crashes mid-implementation
Context window fills up during code review
I close my laptop and come back the next day
A reviewer agent times out

Without checkpoints, I'd restart from scratch every time. Read the ticket again. Regenerate the design. Redo work that was already done.

What I Built

I broke the development workflow into phases with two types of boundaries: automatic checkpoints (the AI saves state on its own) and human gates (the AI stops and waits for my approval).

The workflow looks like this:

Gather Context → Write Design → Review Design
    → CHECKPOINT 1: I approve or edit the design →
Implement → Review Code
    → CHECKPOINT 2: I approve or reject the code →
Fix Issues → Commit → Create PR → Respond to PR Comments

Each phase saves its status and artifacts to persistent storage. When a session dies, the next session picks up where the last one left off.

How Checkpoints Work

Saving State

After each phase completes, the agent writes a checkpoint — a record of what was done, what was produced, and what comes next. The checkpoint includes:

Phase name and status (completed, in-progress, failed)
Artifacts produced (design doc path, review report, branch names)
Context needed for resumption (which modules are done, which review round we're on)

This isn't conversation history. It's structured metadata about the workflow's progress.

Resuming

When I start a new session and say "resume," the agent runs a reconciliation step:

Check persistent storage for saved checkpoints
Scan disk for artifacts (does the design doc exist? are there feature branches?)
Reconcile — disk is the source of truth, checkpoints are supplementary
Determine the first incomplete phase and jump to it

The key insight: disk artifacts are more reliable than metadata. If a design document exists on disk but the checkpoint says the design phase is "in progress," trust the disk. The file is there. The phase is done.

Human Gates

Two points in the workflow require my explicit approval:

After design review — the agent presents the design document, review findings, and asks: approve, edit, or reject? If I say "edit," it applies my changes to the design doc and automatically re-runs the review. This loops until I approve.

After code review — same pattern. Approve, fix issues, or reject. If there are critical findings, the agent auto-fixes them before I even see the checkpoint.

These gates exist because some decisions shouldn't be automated. The agent can write code all day, but I decide whether the design makes sense.

What Makes This Work

Phase-Level Granularity

I don't checkpoint every tool call or every message. I checkpoint at phase boundaries — after "gather context" is done, after "write design" is done, after each module is implemented. This keeps the checkpoint data small and meaningful.

Module-Level Progress

Implementation can span five or six modules. The checkpoint tracks which modules are completed:

Implementation progress (2/5 modules):
  [DONE] module-a
  [DONE] module-b
  [    ] module-c  ← resuming here
  [    ] module-d
  [    ] module-e

If the session dies after module 2, the next session skips straight to module 3.

Timeout Recovery

Sometimes a reviewer agent times out — it hits its turn limit before finishing. Instead of re-running everything, the checkpoint records which reviewers completed and which didn't. On resume, I can choose to re-run just the failed reviewer and merge its findings into the existing report.

What I Learned

Checkpoints should be boring. They're not a feature users interact with. They're infrastructure that makes everything else reliable. The best checkpoint system is one you never think about — sessions crash, you resume, and it just works.

Disk is a better source of truth than a database. Files on disk are visible, auditable, and survive any kind of failure. A database record that says "design phase complete" is useless if the design file doesn't exist. Check the artifacts, not the metadata.

Human gates are the real value. Automatic checkpointing is nice, but the ability to pause the workflow, inspect the output, and say "go back and fix this" — that's what makes the difference between an AI assistant and an AI that runs off and does whatever it wants.

AI agents need state management, not just prompts. We spend a lot of time crafting perfect prompts, but the hard problem isn't getting the AI to write good code. It's getting the AI to pick up where it left off without losing context, repeating work, or forgetting decisions that were already made.

— DK

DEV Community