I shipped an agent that wakes every few minutes to check on a long-running job.
It worked on the first try. My bill did not make sense.
Loop logic, correct. Model choice, fine. Cost was hiding somewhere I never thought to look, in the gap between two turns. In the sleep.
Here is the part nobody prints on the box. Your agent's idle timer talks to the prompt cache, and most loops are written as if it does not.
Your sleep talks to the cache
Every time an agent in a loop wakes up, it re-reads its whole context before it can do anything.
Anthropic's prompt cache holds that context for about five minutes after the last use. Inside that window, the next turn reads cached context. Cheap, fast. Past it, the cache is cold, and the next turn pays full price to read everything again from the top.
So sleep length controls your cost, plain and simple.
Nap for four minutes and you keep the cache warm and barely pay to resume. Nap for six and you throw that warm context away and rebuild it from scratch, every cycle, forever.
Most loops pick a sleep by feel. Five minutes feels tidy. Ten feels safe. Nobody checks the number against the one clock that is actually charging them.
Five minutes is the worst answer on the board
Here is the opinion I will defend. Sleeping your loop for exactly five minutes is the single worst choice available.
Walk through what you buy. Nap around four and a half minutes and you land inside the window. Resume is warm, nearly free. Nap twenty minutes and yes, you eat a cold cache, but that one miss bought you twenty real minutes of doing nothing. Penalty spread thin across a long, genuinely idle wait.
Now nap exactly five minutes. You sit right on the edge. You pay the full cold-cache penalty AND you only bought five minutes before the next turn fires. Expensive option, short reward.
Worst of both. Every cycle.
Round numbers are the trap. Five minutes is a human unit, a clean tick on a clock. Cache logic could not care less that it looks neat.
Think in cache windows
This reframe fixed my loops.
There are two honest regimes, and a dead zone between them.
- Under the window. Short naps that keep the cache warm. Right when you are watching something that moves fast and cannot tell you when it changed, a CI run, a deploy, a remote queue.
- Well past the window. Long naps that spread one cold read across a real wait. Right when there is no point checking sooner.
- Dead zone. Anything sitting near the five-minute line. You pay the miss without earning the wait.
See it this way and the question changes shape. It stops being how long should I sleep. It becomes what am I actually waiting for.
Three questions before a number
I am keeping the timer code to myself, on purpose. What mattered was the habit of asking three questions before picking any number.
One. Can the system notify me instead. If the work I am waiting on can wake the loop itself when its state changes, I should not be polling at all. A timer is what you reach for only when nothing else can tell you something moved.
Two. How fast does the thing I watch actually change. A build that finishes in eight minutes does not need a check every sixty seconds. Polling that often burns the cache eight times before it is even done. Two patient checks beat eight anxious ones, at a fraction of the cost.
Three. If I have to wait long, am I committing to it. A long sleep is a decision to be idle on purpose. Half-commit with a number near the window and you get the penalty of waiting without the savings. Worst of both, again.
So the whole habit fits in one breath. Notify if you can. Match the check rate to how fast the thing moves. If you are going to wait, wait long enough that the cold cache pays for itself.
What changed when I dropped tidy numbers
My loops got cheaper without getting slower.
That surprised me until I saw why. Slowness lived in the gap between turns, in warm context thrown away on a schedule built for a clock instead of the work.
A deeper lesson stuck harder. Most agent cost does not sit in the prompt or the model you picked. It hides in the seams between turns, the places we wave off as plumbing and never measure. Cache timing is one such seam. There are others, and they all reward the same move.
Measure what charges you. Ignore what merely looks neat.
Your turn
What number does your agent loop sleep for, and did you pick it for the work or for the clock?
If this was useful
I work through this in public, the wins and the freezes both, mostly on LinkedIn and YouTube. If the real version of building in the open is useful to you, that is where it lives. Find me on X, GitHub, and the work at next8n.com.
Top comments (0)