Nova Elvaris

Posted on Mar 12

The Timeboxed Agent Loop: Use AI Without Letting It Eat Your Day

#ai #devtools #promptengineering #productivity

If you’ve ever opened an editor “just to ask the model a quick question,” you know the risk: 45 minutes later you have three half‑implemented approaches, a pile of new files, and no clear sense of what actually improved.

I write with a simple constraint: AI help is allowed, but it must fit inside a fixed, pre-declared timebox. Not because speed is everything, but because time pressure forces clarity.

This post shares a repeatable workflow I call the Timeboxed Agent Loop. It’s a 25–45 minute loop you can run for:

adding a small feature
refactoring a module
debugging a production-ish issue
writing tests for something you don’t fully understand yet

It’s not “let the assistant drive.” It’s you driving with guardrails.

The core idea: budgets create better prompts

Most prompt failures aren’t about phrasing. They’re about missing constraints.

When you timebox, you naturally answer:

What’s the minimum acceptable outcome?
What’s out of scope?
What evidence will convince me it’s done?

Those answers turn into a better prompt and a better plan.

The loop (copy/paste template)

Create a scratchpad (issue comment, note, Notion page—anything) and paste this:

TIMEBOXED AGENT LOOP

Timebox: 35 minutes
Goal (1 sentence):
Definition of done (3 bullets):
Constraints (3 bullets):
Risks / unknowns (up to 3):

Inputs I can provide:
- Repo paths:
- Expected behavior:
- Example input/output:
- Logs / stack traces:

Plan:
1)
2)
3)

Stop conditions:
- If X happens, stop and ask.
- If Y is unclear, propose 2 options and wait.

Now you have a spec the assistant can actually follow.

Step 1 (5 minutes): write a “one-screen” spec

Before you ask for code, force the “one screen” rule:

Goal: one sentence
Done: 3 bullets maximum
Constraints: 3 bullets maximum

Example (realistic refactor):

Goal: Replace ad-hoc config parsing with a typed schema.
Done:
- config.ts exports Config + loadConfig()
- missing env vars produce a single actionable error message
- unit tests cover happy path + one failure case
Constraints:
- no new runtime dependency
- keep existing env var names
- Node 20+

That’s enough to start.

Step 2 (10 minutes): ask for a plan + patch strategy, not a full solution

I don’t start with “write the code.” I start with:

1) a plan with checkpoints
2) the exact files the assistant wants to touch
3) a patch strategy (how to apply changes safely)

Prompt:

You are helping me inside a 35-minute timebox.

Goal: Replace ad-hoc config parsing with a typed schema.
Done:
- config.ts exports Config + loadConfig()
- missing env vars produce a single actionable error message
- tests cover happy + failure

Constraints:
- no new runtime dependency
- keep env var names

Repo context:
- current config logic: src/config/index.ts
- tests: src/config/__tests__/

Task:
1) Propose a 3-step plan with checkpoints.
2) List which files you will change.
3) For each step, describe the smallest patch that keeps tests green.
If anything is ambiguous, ask up to 3 questions.

This is the difference between “here’s 200 lines” and “here’s how we’ll not break production.”

Step 3 (15 minutes): run the “patch, verify, narrate” cycle

Inside the timebox, I repeat a mini-cycle:

Patch (smallest diff)
Verify (tests, lint, or a single reproduction command)
Narrate (what changed + why)

A good assistant response in this phase looks like:

a focused diff
commands to run
a short explanation

If you can’t run commands in your environment, simulate verification by demanding the assistant provide:

expected outputs
edge cases
how to roll back

Here’s a “diff-first” prompt that works well:

Make the smallest possible change that moves us toward the goal.
Return:
- a unified diff
- the command(s) I should run to verify
- what success looks like
- one rollback instruction

Why unified diff? Because it’s harder to hallucinate file boundaries, and easier for you to review.

Step 4 (5 minutes): force a stop and create a handoff

Timeboxes fail when you keep extending them.

When the timer hits zero, stop and write a handoff note. Even if the work is unfinished, you’ll avoid the “where was I?” tax.

Handoff template:

State:
- What works now:
- What’s failing / missing:

Next steps (max 3):
1)
2)
3)

Open questions:
- ?

This is also the moment to decide whether to:

ship as-is
create a follow-up timebox
revert and try a different approach

Two concrete examples

Example A: debugging a flaky test

Timebox: 25 minutes

Goal: make UserService tests deterministic
Done: flaky test removed or stabilized; root cause documented
Constraints: don’t increase test runtime by >10%

Loop in practice:

1) Ask for a hypothesis list (max 5) based on the error + test file.
2) Pick one hypothesis and request the smallest patch.
3) Verify by running that one test in a loop (e.g. 20 runs).
4) If it’s still flaky, stop and move to hypothesis #2.

The key is serializing hypotheses. AI assistants love doing five things at once. Don’t let them.

Example B: adding a tiny feature

Feature: Add a --dry-run flag to a CLI command.

Done:
- flag is documented in --help
- no network calls happen in dry-run mode
- one unit test asserts the behavior

Ask for the plan, then request a patch that only touches:

CLI arg parsing
one function boundary where side effects happen
tests

If the assistant wants to “restructure the whole CLI,” that’s a timebox violation.

Common failure modes (and the guardrails that fix them)

1) Scope creep via “nice-to-haves”

Guardrail: explicitly list out-of-scope items in the prompt.

2) Big bang refactors

Guardrail: require “smallest patch that keeps tests green.”

3) Invisible verification

Guardrail: every patch must include a verification command and expected result.

4) Ambiguity masquerading as progress

Guardrail: allow only 3 questions; otherwise propose 2 options and wait.

Why this works

The Timeboxed Agent Loop does two things humans are bad at under uncertainty:

it forces explicit constraints
it creates a cadence of evidence (patch → verify)

You still get the leverage of AI. You just don’t pay for it with your entire afternoon.

If you try this, start with a 25-minute loop on something small. The point isn’t to go fast—it’s to stay in control.

DEV Community