郭立

Posted on Jun 22 • Originally published at blog.leeguoo.com

Is the cache 4m23s Line in Claude Code's Status Bar Actually Accurate?

#claude #ai #cli #productivity

💡 Originally published on my blog blog.leeguoo.com — field notes on reverse engineering, AI agents, and building things that ship.

In the status bar cs / claude-statusbar I wrote for Claude Code, there’s a line that says cache 4m23s: green, ticking down every second, then turning into a red cache COLD when it reaches the end.

Someone asked me: how exactly is this number calculated, and is it accurate?

That’s a fair question. For Pro / Max subscribers, when there’s a cache hit, that part of the context basically doesn’t consume your 5h / 7d quota; let it go cold, and the next prompt has to feed the entire context back in at full price. So the “how many minutes left” line decides whether “I should send another message now while it’s still warm.” Let’s pull it apart and answer whether it’s accurate along the way.

For people in a hurry, here’s the one-line version: with the default configuration and a 5-minute cache, it is accurate; the only scenario where it systematically lies to you is when you enable a 1-hour cache but don’t change its TTL — in that case, it reports 55 minutes too early. One config line fixes it. The reasoning is below.

First, distinguish the two “caches”; don’t mix them up

There are two things called cache in this repo, so before asking “is it accurate?” we need to be clear which one we mean:

Data cache: CACHE_MAX_AGE_S = 30 in cache.py. It caches claude-monitor output for 30 seconds, purely so the status bar doesn’t have to shell out to a subprocess every time it redraws once per second. It has nothing to do with whether the countdown is accurate.
Prompt-cache countdown: today’s main character. It calculates “how long until Anthropic’s prompt cache expires.”

The rest only discusses the second one.

Where It Anchors

The logic is very short: just one function, get_cache_age_text. It does three things:

Reads ~/.cache/claude-statusbar/last_stdin.json to get the current session’s transcript_path;
Reads that JSONL backwards, finds the most recent record where type == "assistant", and takes its timestamp;
Calculates remaining = ttl_seconds - elapsed seconds, then formats it as a countdown.

Step two is _last_assistant_age, and the key part is just this:

if entry.get("type") != "assistant":
    continue
...
return (datetime.now(timezone.utc) - last_ts).total_seconds()

Note the anchor point: the timestamp of the most recent assistant message — not the user message, not the file mtime. This choice is correct; the next section explains why.

The formula is just as straightforward:

remaining = ttl_seconds - age_s
if remaining <= 0:
    return "COLD"

ttl_seconds defaults to 300. If remaining <= 0, or if no assistant record can be found at all (age_s is None), it returns COLD; if there isn’t even a transcript_path, it returns an empty string and hides the whole segment.

A bit of history while we’re here: before the v3.2.2 PR, this line displayed “how much time had already elapsed” instead. It was later changed to a countdown, because what users actually want to know isn’t “how many minutes has it been since the last response,” but “do I still have time to send another message before the cache dies?” A countdown answers that directly; elapsed time still makes you do the subtraction in your head.

Does It Model Anthropic’s Actual Behavior Correctly?

If you check the official documentation, Prompt caching, two sentences set the tone:

By default, the cache has a 5-minute lifetime.

The cache is refreshed for no additional cost each time the cached content is used.

In other words, the TTL is a sliding window: every cache hit resets it to 5 minutes.

This also explains why “anchoring to the most recent assistant turn” is correct — each additional response resets age_s to zero, the countdown automatically refills, and it lines up with the server-side behavior of “use it once, refresh it once.” The comment in the code, # 5min — Anthropic's default prompt cache TTL, isn’t wrong. At this layer, the model is correct.

Where it’s inaccurate — with evidence

This is the real point. Three layers, ordered from most biting to least important.

1. The default TTL is hardcoded to 5 minutes, but you may be running a 1-hour cache

This is the only part that can genuinely mislead people. The evidence comes from the usage block in the most recent assistant record on my machine:

"cache_creation": {
  "ephemeral_1h_input_tokens": 1421,
  "ephemeral_5m_input_tokens": 0
}

Everything went into the 1-hour bucket. In other words, this machine is actually running a 1h cache, with a real lifetime of 60 minutes. But cs defaults cache_ttl_seconds = 300, so after 5 minutes it will shout cache COLD — 55 minutes earlier than the truth.

The most ironic part: the “truth signal” for deciding 5m vs 1h (ephemeral_1h_input_tokens vs ephemeral_5m_input_tokens) is sitting right there in the same file and the same record it has already opened. But _last_assistant_age only reads the type and timestamp fields, skipping straight past that usage block. In theory, it could automatically infer which TTL to use from the transcript; right now, you have to manually run cs config set cache_ttl_seconds 3600. That’s a TODO worth fixing.

2. The anchor is “the turn finished,” not “the cache was refreshed”

The assistant timestamp is roughly when that turn finished writing; the cache is refreshed server-side when the request is sent. There’s a generation-latency gap between the two. Here are assistant timestamps from the same stretch of a real transcript:

assistant  2026-05-29T04:46:18.432Z
assistant  2026-05-29T04:46:19.653Z
assistant  2026-05-29T04:46:25.680Z

That’s on the order of a few to a dozen seconds. Relative to a 300s / 3600s TTL, it’s negligible. Directionally, it’s probably optimistic: the displayed remaining time is slightly higher than the real server-side value. But not enough to bite.

I should be honest here: the source code cannot prove whether Anthropic’s server starts counting from request start or request end. So the precise statement is: the anchor is a proxy accurate to within one turn’s latency, not the exact moment the cache refreshes. Good enough, but don’t treat it like a stopwatch.

3. The color guesses from the string, not the number

An interesting engineering tradeoff. _cache_severity doesn’t receive remaining seconds; it receives the already formatted string, then checks whether it contains m / h:

if cache_text == "COLD":
    return theme.s_hot          # red
if "m" in cache_text or "h" in cache_text:
    return theme.s_ok           # green, comfort zone
return theme.s_warn             # yellow, plain "Ys", under 1 minute

When less than a minute remains, the formatter intentionally outputs bare Ys only (without m) so the colorizer can detect “time to turn yellow.” The formatter and colorizer have an implicit contract between them. The repo even has a dedicated test_cache_severity.py to pin this contract down, so a future format change doesn’t silently scramble the colors. It works, but it is coupling — worth knowing about.

One more edge case: reverse-reading the transcript has a 320KB limit (10×32KB). If a huge transcript doesn’t contain an assistant record in the final 320KB scanned, it is treated as COLD. That’s a performance tradeoff — the status bar redraws every second, so it can’t scan several MB every time. You won’t hit it in everyday use.

So, Is It Accurate?

5-minute cache + default config: Accurate. The anchor is right, the sliding-window model is right, and edge cases are handled too: clock rollback is clamped to 0, naive timestamps are treated as UTC, and the Z suffix is normalized.
1-hour cache + unchanged TTL: It will systematically report 55 minutes early. One line fixes it: cs config set cache_ttl_seconds 3600.
Second-level precision: Don’t expect it. The anchor itself has proxy error from one round-trip of latency. It’s a “how many minutes are left” hint, not a timer.

One-sentence summary: it answers “Should I send one more message while the cache is still warm?” very accurately; if you use it as a stopwatch, you’re using the wrong tool.

If you want to inspect it yourself, start with _last_assistant_age and get_cache_age_text. You’ll finish reading them in thirty lines.

DEV Community