đĄ Originally published on my blog blog.leeguoo.com â field notes on reverse engineering, AI agents, and building things that ship.
In the status bar cs / claude-statusbar I wrote for Claude Code, thereâs a line that says cache 4m23s: green, ticking down every second, then turning into a red cache COLD when it reaches the end.
Someone asked me: how exactly is this number calculated, and is it accurate?
Thatâs a fair question. For Pro / Max subscribers, when thereâs a cache hit, that part of the context basically doesnât consume your 5h / 7d quota; let it go cold, and the next prompt has to feed the entire context back in at full price. So the âhow many minutes leftâ line decides whether âI should send another message now while itâs still warm.â Letâs pull it apart and answer whether itâs accurate along the way.
For people in a hurry, hereâs the one-line version: with the default configuration and a 5-minute cache, it is accurate; the only scenario where it systematically lies to you is when you enable a 1-hour cache but donât change its TTL â in that case, it reports 55 minutes too early. One config line fixes it. The reasoning is below.
First, distinguish the two âcachesâ; donât mix them up
There are two things called cache in this repo, so before asking âis it accurate?â we need to be clear which one we mean:
-
Data cache:
CACHE_MAX_AGE_S = 30incache.py. It cachesclaude-monitoroutput for 30 seconds, purely so the status bar doesnât have to shell out to a subprocess every time it redraws once per second. It has nothing to do with whether the countdown is accurate. - Prompt-cache countdown: todayâs main character. It calculates âhow long until Anthropicâs prompt cache expires.â
The rest only discusses the second one.
Where It Anchors
The logic is very short: just one function, get_cache_age_text. It does three things:
- Reads
~/.cache/claude-statusbar/last_stdin.jsonto get the current sessionâstranscript_path; - Reads that JSONL backwards, finds the most recent record where
type == "assistant", and takes itstimestamp; - Calculates
remaining = ttl_seconds - elapsed seconds, then formats it as a countdown.
Step two is _last_assistant_age, and the key part is just this:
if entry.get("type") != "assistant":
continue
...
return (datetime.now(timezone.utc) - last_ts).total_seconds()
Note the anchor point: the timestamp of the most recent assistant message â not the user message, not the file mtime. This choice is correct; the next section explains why.
The formula is just as straightforward:
remaining = ttl_seconds - age_s
if remaining <= 0:
return "COLD"
ttl_seconds defaults to 300. If remaining <= 0, or if no assistant record can be found at all (age_s is None), it returns COLD; if there isnât even a transcript_path, it returns an empty string and hides the whole segment.
A bit of history while weâre here: before the v3.2.2 PR, this line displayed âhow much time had already elapsedâ instead. It was later changed to a countdown, because what users actually want to know isnât âhow many minutes has it been since the last response,â but âdo I still have time to send another message before the cache dies?â A countdown answers that directly; elapsed time still makes you do the subtraction in your head.
Does It Model Anthropicâs Actual Behavior Correctly?
If you check the official documentation, Prompt caching, two sentences set the tone:
By default, the cache has a 5-minute lifetime.
The cache is refreshed for no additional cost each time the cached content is used.
In other words, the TTL is a sliding window: every cache hit resets it to 5 minutes.
This also explains why âanchoring to the most recent assistant turnâ is correct â each additional response resets age_s to zero, the countdown automatically refills, and it lines up with the server-side behavior of âuse it once, refresh it once.â The comment in the code, # 5min â Anthropic's default prompt cache TTL, isnât wrong. At this layer, the model is correct.
Where itâs inaccurate â with evidence
This is the real point. Three layers, ordered from most biting to least important.
1. The default TTL is hardcoded to 5 minutes, but you may be running a 1-hour cache
This is the only part that can genuinely mislead people. The evidence comes from the usage block in the most recent assistant record on my machine:
"cache_creation": {
"ephemeral_1h_input_tokens": 1421,
"ephemeral_5m_input_tokens": 0
}
Everything went into the 1-hour bucket. In other words, this machine is actually running a 1h cache, with a real lifetime of 60 minutes. But cs defaults cache_ttl_seconds = 300, so after 5 minutes it will shout cache COLD â 55 minutes earlier than the truth.
The most ironic part: the âtruth signalâ for deciding 5m vs 1h (ephemeral_1h_input_tokens vs ephemeral_5m_input_tokens) is sitting right there in the same file and the same record it has already opened. But _last_assistant_age only reads the type and timestamp fields, skipping straight past that usage block. In theory, it could automatically infer which TTL to use from the transcript; right now, you have to manually run cs config set cache_ttl_seconds 3600. Thatâs a TODO worth fixing.
2. The anchor is âthe turn finished,â not âthe cache was refreshedâ
The assistant timestamp is roughly when that turn finished writing; the cache is refreshed server-side when the request is sent. Thereâs a generation-latency gap between the two. Here are assistant timestamps from the same stretch of a real transcript:
assistant 2026-05-29T04:46:18.432Z
assistant 2026-05-29T04:46:19.653Z
assistant 2026-05-29T04:46:25.680Z
Thatâs on the order of a few to a dozen seconds. Relative to a 300s / 3600s TTL, itâs negligible. Directionally, itâs probably optimistic: the displayed remaining time is slightly higher than the real server-side value. But not enough to bite.
I should be honest here: the source code cannot prove whether Anthropicâs server starts counting from request start or request end. So the precise statement is: the anchor is a proxy accurate to within one turnâs latency, not the exact moment the cache refreshes. Good enough, but donât treat it like a stopwatch.
3. The color guesses from the string, not the number
An interesting engineering tradeoff. _cache_severity doesnât receive remaining seconds; it receives the already formatted string, then checks whether it contains m / h:
if cache_text == "COLD":
return theme.s_hot # red
if "m" in cache_text or "h" in cache_text:
return theme.s_ok # green, comfort zone
return theme.s_warn # yellow, plain "Ys", under 1 minute
When less than a minute remains, the formatter intentionally outputs bare Ys only (without m) so the colorizer can detect âtime to turn yellow.â The formatter and colorizer have an implicit contract between them. The repo even has a dedicated test_cache_severity.py to pin this contract down, so a future format change doesnât silently scramble the colors. It works, but it is coupling â worth knowing about.
One more edge case: reverse-reading the transcript has a 320KB limit (10Ă32KB). If a huge transcript doesnât contain an assistant record in the final 320KB scanned, it is treated as COLD. Thatâs a performance tradeoff â the status bar redraws every second, so it canât scan several MB every time. You wonât hit it in everyday use.
So, Is It Accurate?
-
5-minute cache + default config: Accurate. The anchor is right, the sliding-window model is right, and edge cases are handled too: clock rollback is clamped to 0, naive timestamps are treated as UTC, and the
Zsuffix is normalized. -
1-hour cache + unchanged TTL: It will systematically report 55 minutes early. One line fixes it:
cs config set cache_ttl_seconds 3600. - Second-level precision: Donât expect it. The anchor itself has proxy error from one round-trip of latency. Itâs a âhow many minutes are leftâ hint, not a timer.
One-sentence summary: it answers âShould I send one more message while the cache is still warm?â very accurately; if you use it as a stopwatch, youâre using the wrong tool.
If you want to inspect it yourself, start with _last_assistant_age and get_cache_age_text. Youâll finish reading them in thirty lines.
Links
- đ§ The tool: claude-statusbar on GitHub â Claude Code status line â 5h/7d rate-limit usage + reset countdown.
- đ More writing: blog.leeguoo.com â I'm Guo Li (leeguoo), a full-stack dev building small AI-agent tools and CLIs.
- đŹ Found it useful? A â on the repo or a follow here means a lot.
Top comments (0)