DEV Community

Nova
Nova

Posted on

The Timeboxed Agent Loop: Use AI Without Letting It Eat Your Day

If you’ve ever opened an editor “just to ask the model a quick question,” you know the risk: 45 minutes later you have three half‑implemented approaches, a pile of new files, and no clear sense of what actually improved.

I write with a simple constraint: AI help is allowed, but it must fit inside a fixed, pre-declared timebox. Not because speed is everything, but because time pressure forces clarity.

This post shares a repeatable workflow I call the Timeboxed Agent Loop. It’s a 25–45 minute loop you can run for:

  • adding a small feature
  • refactoring a module
  • debugging a production-ish issue
  • writing tests for something you don’t fully understand yet

It’s not “let the assistant drive.” It’s you driving with guardrails.


The core idea: budgets create better prompts

Most prompt failures aren’t about phrasing. They’re about missing constraints.

When you timebox, you naturally answer:

  • What’s the minimum acceptable outcome?
  • What’s out of scope?
  • What evidence will convince me it’s done?

Those answers turn into a better prompt and a better plan.


The loop (copy/paste template)

Create a scratchpad (issue comment, note, Notion page—anything) and paste this:

TIMEBOXED AGENT LOOP

Timebox: 35 minutes
Goal (1 sentence):
Definition of done (3 bullets):
Constraints (3 bullets):
Risks / unknowns (up to 3):

Inputs I can provide:
- Repo paths:
- Expected behavior:
- Example input/output:
- Logs / stack traces:

Plan:
1)
2)
3)

Stop conditions:
- If X happens, stop and ask.
- If Y is unclear, propose 2 options and wait.
Enter fullscreen mode Exit fullscreen mode

Now you have a spec the assistant can actually follow.


Step 1 (5 minutes): write a “one-screen” spec

Before you ask for code, force the “one screen” rule:

  • Goal: one sentence
  • Done: 3 bullets maximum
  • Constraints: 3 bullets maximum

Example (realistic refactor):

  • Goal: Replace ad-hoc config parsing with a typed schema.
  • Done:
    • config.ts exports Config + loadConfig()
    • missing env vars produce a single actionable error message
    • unit tests cover happy path + one failure case
  • Constraints:
    • no new runtime dependency
    • keep existing env var names
    • Node 20+

That’s enough to start.


Step 2 (10 minutes): ask for a plan + patch strategy, not a full solution

I don’t start with “write the code.” I start with:

1) a plan with checkpoints
2) the exact files the assistant wants to touch
3) a patch strategy (how to apply changes safely)

Prompt:

You are helping me inside a 35-minute timebox.

Goal: Replace ad-hoc config parsing with a typed schema.
Done:
- config.ts exports Config + loadConfig()
- missing env vars produce a single actionable error message
- tests cover happy + failure

Constraints:
- no new runtime dependency
- keep env var names

Repo context:
- current config logic: src/config/index.ts
- tests: src/config/__tests__/

Task:
1) Propose a 3-step plan with checkpoints.
2) List which files you will change.
3) For each step, describe the smallest patch that keeps tests green.
If anything is ambiguous, ask up to 3 questions.
Enter fullscreen mode Exit fullscreen mode

This is the difference between “here’s 200 lines” and “here’s how we’ll not break production.”


Step 3 (15 minutes): run the “patch, verify, narrate” cycle

Inside the timebox, I repeat a mini-cycle:

  1. Patch (smallest diff)
  2. Verify (tests, lint, or a single reproduction command)
  3. Narrate (what changed + why)

A good assistant response in this phase looks like:

  • a focused diff
  • commands to run
  • a short explanation

If you can’t run commands in your environment, simulate verification by demanding the assistant provide:

  • expected outputs
  • edge cases
  • how to roll back

Here’s a “diff-first” prompt that works well:

Make the smallest possible change that moves us toward the goal.
Return:
- a unified diff
- the command(s) I should run to verify
- what success looks like
- one rollback instruction
Enter fullscreen mode Exit fullscreen mode

Why unified diff? Because it’s harder to hallucinate file boundaries, and easier for you to review.


Step 4 (5 minutes): force a stop and create a handoff

Timeboxes fail when you keep extending them.

When the timer hits zero, stop and write a handoff note. Even if the work is unfinished, you’ll avoid the “where was I?” tax.

Handoff template:

State:
- What works now:
- What’s failing / missing:

Next steps (max 3):
1)
2)
3)

Open questions:
- ?
Enter fullscreen mode Exit fullscreen mode

This is also the moment to decide whether to:

  • ship as-is
  • create a follow-up timebox
  • revert and try a different approach

Two concrete examples

Example A: debugging a flaky test

Timebox: 25 minutes

  • Goal: make UserService tests deterministic
  • Done: flaky test removed or stabilized; root cause documented
  • Constraints: don’t increase test runtime by >10%

Loop in practice:

1) Ask for a hypothesis list (max 5) based on the error + test file.
2) Pick one hypothesis and request the smallest patch.
3) Verify by running that one test in a loop (e.g. 20 runs).
4) If it’s still flaky, stop and move to hypothesis #2.

The key is serializing hypotheses. AI assistants love doing five things at once. Don’t let them.

Example B: adding a tiny feature

Feature: Add a --dry-run flag to a CLI command.

  • Done:
    • flag is documented in --help
    • no network calls happen in dry-run mode
    • one unit test asserts the behavior

Ask for the plan, then request a patch that only touches:

  • CLI arg parsing
  • one function boundary where side effects happen
  • tests

If the assistant wants to “restructure the whole CLI,” that’s a timebox violation.


Common failure modes (and the guardrails that fix them)

1) Scope creep via “nice-to-haves”

  • Guardrail: explicitly list out-of-scope items in the prompt.

2) Big bang refactors

  • Guardrail: require “smallest patch that keeps tests green.”

3) Invisible verification

  • Guardrail: every patch must include a verification command and expected result.

4) Ambiguity masquerading as progress

  • Guardrail: allow only 3 questions; otherwise propose 2 options and wait.

Why this works

The Timeboxed Agent Loop does two things humans are bad at under uncertainty:

  • it forces explicit constraints
  • it creates a cadence of evidence (patch → verify)

You still get the leverage of AI. You just don’t pay for it with your entire afternoon.

If you try this, start with a 25-minute loop on something small. The point isn’t to go fast—it’s to stay in control.

Top comments (0)