DEV Community: 郭立

Is the cache 4m23s Line in Claude Code's Status Bar Actually Accurate?

郭立 — Mon, 22 Jun 2026 10:19:13 +0000

💡 Originally published on my blog blog.leeguoo.com — field notes on reverse engineering, AI agents, and building things that ship.

In the status bar cs / claude-statusbar I wrote for Claude Code, there’s a line that says cache 4m23s: green, ticking down every second, then turning into a red cache COLD when it reaches the end.

Someone asked me: how exactly is this number calculated, and is it accurate?

That’s a fair question. For Pro / Max subscribers, when there’s a cache hit, that part of the context basically doesn’t consume your 5h / 7d quota; let it go cold, and the next prompt has to feed the entire context back in at full price. So the “how many minutes left” line decides whether “I should send another message now while it’s still warm.” Let’s pull it apart and answer whether it’s accurate along the way.

For people in a hurry, here’s the one-line version: with the default configuration and a 5-minute cache, it is accurate; the only scenario where it systematically lies to you is when you enable a 1-hour cache but don’t change its TTL — in that case, it reports 55 minutes too early. One config line fixes it. The reasoning is below.

First, distinguish the two “caches”; don’t mix them up

There are two things called cache in this repo, so before asking “is it accurate?” we need to be clear which one we mean:

Data cache: CACHE_MAX_AGE_S = 30 in cache.py. It caches claude-monitor output for 30 seconds, purely so the status bar doesn’t have to shell out to a subprocess every time it redraws once per second. It has nothing to do with whether the countdown is accurate.
Prompt-cache countdown: today’s main character. It calculates “how long until Anthropic’s prompt cache expires.”

The rest only discusses the second one.

Where It Anchors

The logic is very short: just one function, get_cache_age_text. It does three things:

Reads ~/.cache/claude-statusbar/last_stdin.json to get the current session’s transcript_path;
Reads that JSONL backwards, finds the most recent record where type == "assistant", and takes its timestamp;
Calculates remaining = ttl_seconds - elapsed seconds, then formats it as a countdown.

Step two is _last_assistant_age, and the key part is just this:

if entry.get("type") != "assistant":
    continue
...
return (datetime.now(timezone.utc) - last_ts).total_seconds()

Note the anchor point: the timestamp of the most recent assistant message — not the user message, not the file mtime. This choice is correct; the next section explains why.

The formula is just as straightforward:

remaining = ttl_seconds - age_s
if remaining <= 0:
    return "COLD"

ttl_seconds defaults to 300. If remaining <= 0, or if no assistant record can be found at all (age_s is None), it returns COLD; if there isn’t even a transcript_path, it returns an empty string and hides the whole segment.

A bit of history while we’re here: before the v3.2.2 PR, this line displayed “how much time had already elapsed” instead. It was later changed to a countdown, because what users actually want to know isn’t “how many minutes has it been since the last response,” but “do I still have time to send another message before the cache dies?” A countdown answers that directly; elapsed time still makes you do the subtraction in your head.

Does It Model Anthropic’s Actual Behavior Correctly?

If you check the official documentation, Prompt caching, two sentences set the tone:

By default, the cache has a 5-minute lifetime.

The cache is refreshed for no additional cost each time the cached content is used.

In other words, the TTL is a sliding window: every cache hit resets it to 5 minutes.

This also explains why “anchoring to the most recent assistant turn” is correct — each additional response resets age_s to zero, the countdown automatically refills, and it lines up with the server-side behavior of “use it once, refresh it once.” The comment in the code, # 5min — Anthropic's default prompt cache TTL, isn’t wrong. At this layer, the model is correct.

Where it’s inaccurate — with evidence

This is the real point. Three layers, ordered from most biting to least important.

1. The default TTL is hardcoded to 5 minutes, but you may be running a 1-hour cache

This is the only part that can genuinely mislead people. The evidence comes from the usage block in the most recent assistant record on my machine:

"cache_creation": {
  "ephemeral_1h_input_tokens": 1421,
  "ephemeral_5m_input_tokens": 0
}

Everything went into the 1-hour bucket. In other words, this machine is actually running a 1h cache, with a real lifetime of 60 minutes. But cs defaults cache_ttl_seconds = 300, so after 5 minutes it will shout cache COLD — 55 minutes earlier than the truth.

The most ironic part: the “truth signal” for deciding 5m vs 1h (ephemeral_1h_input_tokens vs ephemeral_5m_input_tokens) is sitting right there in the same file and the same record it has already opened. But _last_assistant_age only reads the type and timestamp fields, skipping straight past that usage block. In theory, it could automatically infer which TTL to use from the transcript; right now, you have to manually run cs config set cache_ttl_seconds 3600. That’s a TODO worth fixing.

2. The anchor is “the turn finished,” not “the cache was refreshed”

The assistant timestamp is roughly when that turn finished writing; the cache is refreshed server-side when the request is sent. There’s a generation-latency gap between the two. Here are assistant timestamps from the same stretch of a real transcript:

assistant  2026-05-29T04:46:18.432Z
assistant  2026-05-29T04:46:19.653Z
assistant  2026-05-29T04:46:25.680Z

That’s on the order of a few to a dozen seconds. Relative to a 300s / 3600s TTL, it’s negligible. Directionally, it’s probably optimistic: the displayed remaining time is slightly higher than the real server-side value. But not enough to bite.

I should be honest here: the source code cannot prove whether Anthropic’s server starts counting from request start or request end. So the precise statement is: the anchor is a proxy accurate to within one turn’s latency, not the exact moment the cache refreshes. Good enough, but don’t treat it like a stopwatch.

3. The color guesses from the string, not the number

An interesting engineering tradeoff. _cache_severity doesn’t receive remaining seconds; it receives the already formatted string, then checks whether it contains m / h:

if cache_text == "COLD":
    return theme.s_hot          # red
if "m" in cache_text or "h" in cache_text:
    return theme.s_ok           # green, comfort zone
return theme.s_warn             # yellow, plain "Ys", under 1 minute

When less than a minute remains, the formatter intentionally outputs bare Ys only (without m) so the colorizer can detect “time to turn yellow.” The formatter and colorizer have an implicit contract between them. The repo even has a dedicated test_cache_severity.py to pin this contract down, so a future format change doesn’t silently scramble the colors. It works, but it is coupling — worth knowing about.

One more edge case: reverse-reading the transcript has a 320KB limit (10×32KB). If a huge transcript doesn’t contain an assistant record in the final 320KB scanned, it is treated as COLD. That’s a performance tradeoff — the status bar redraws every second, so it can’t scan several MB every time. You won’t hit it in everyday use.

So, Is It Accurate?

5-minute cache + default config: Accurate. The anchor is right, the sliding-window model is right, and edge cases are handled too: clock rollback is clamped to 0, naive timestamps are treated as UTC, and the Z suffix is normalized.
1-hour cache + unchanged TTL: It will systematically report 55 minutes early. One line fixes it: cs config set cache_ttl_seconds 3600.
Second-level precision: Don’t expect it. The anchor itself has proxy error from one round-trip of latency. It’s a “how many minutes are left” hint, not a timer.

One-sentence summary: it answers “Should I send one more message while the cache is still warm?” very accurately; if you use it as a stopwatch, you’re using the wrong tool.

If you want to inspect it yourself, start with _last_assistant_age and get_cache_age_text. You’ll finish reading them in thirty lines.

Letting an AI Agent Click Into Cross-Origin Iframes (How chrome-use Solves It)

郭立 — Mon, 22 Jun 2026 10:18:42 +0000

💡 Originally published on my blog blog.leeguoo.com — field notes on reverse engineering, AI agents, and building things that ship.

Connecting an AI agent to a browser starts out smoothly: open a page, read the content, fill in a search box. What really gets you stuck are the forms hidden inside cross-origin iframes—Google Payments payout profiles, checkout components, KYC widgets. The agent can read the text inside them and fill in values, but it just can’t click that “Save” button. It can see the task, but it can’t finish it.

This is a write-up of how we got past that hurdle. The protagonist is chrome-use—a Rust-based browser automation CLI for agents that directly drives the Chrome browser where you are actually logged in, without Playwright and without headless mode.

Why cross-origin iframes are so hard

Regular pages are easy: capture the accessibility tree, get element references, and click. But cross-origin iframes—for example, an adsense.google.com page embedding a payments.google.com iframe—hit three problems at once:

Selectors can’t get in. Under the same-origin policy, CSS selectors and eval running in the outer document can’t touch the DOM inside the iframe. document.querySelector is blind here.
Scrolling misses the target. You think you’re scrolling the page, but the thing that can actually scroll is the scroll container inside the iframe. Wheel events go to the outer document, while the inside stays still—the target row remains “off screen” forever, not even visible.
You’re left blindly clicking coordinates. The first two problems force you back to “screenshot + guess pixel coordinates,” which is the least precise approach and the easiest way to click a neighboring field by mistake. On a form that edits global payment profile information, one wrong click can be costly.

The foundation of chrome-use: agents get “references,” not HTML

Before explaining the fix, it’s worth covering the basic design—because this is also the fundamental difference between chrome-use and the camp that feeds raw HTML to models.

chrome-use does not hand page source to the agent. Instead, it captures an accessibility tree snapshot, assigning each interactive element a compact reference:

- textbox "Email" [ref=e2]
- listbox "Country/region" [ref=e60]
- button "Save" [ref=e41]

The agent acts directly on those references: fill @e2 "...", click @e41. A page costs roughly 200–400 tokens instead of a whole screen of DOM noise. This reference mechanism is exactly what makes it possible to work through iframes later—as long as the snapshot can “see” nodes inside the iframe, it can assign references to them.

Three hurdles, one at a time

First hurdle: make the snapshot see what’s inside the iframe.
The accessibility tree needs to pass through cross-origin iframes and include their nodes with references. After fixing that, snapshot can list them directly:

- textbox "Phone number (optional)" [ref=e59]
- listbox "Country/region code: Japan (+81)" [ref=e60]

Where selectors can’t enter, references can.

Second hurdle: make scrolling affect the iframe’s scroll container.
Instead of sending every wheel event to the outer document, scroll the container that actually needs to scroll. The lower form rows can finally move into view, and their references become available.

Third hurdle, the hardest one: the enabled submit button inside the cross-origin iframe does nothing when clicked.
This stage is the most maddening because everything looks right:

The number is entered with real keystrokes, and get value confirms it is there;
The “Save” button becomes enabled when it should—it is disabled before a valid value is entered, then appears after filling;
Then click @e41—and the form does nothing. find text "Save"? Cross-origin access blocks it. Focus and press Enter or Space? Still nothing.

It looks correct, yet everything is wrong. The root cause: Material/framework buttons inside cross-origin iframes do not accept synthetic clicks; and fill only changed the input value without dispatching the input/change events the framework expects. The form still thinks “nothing changed,” so the Save button is either disabled or clicking it is equivalent to doing nothing.

The fix has two parts: value entry switches to real keystrokes so every character triggers real events that the framework recognizes; clicking dispatches a full set of real mouse/keyboard activations against the content node inside the iframe, rather than slapping a click() onto it.

The finish: click in, save successfully

Once all three hurdles are cleared, the whole chain works: open → scroll to the target row → capture references from the snapshot → fill with real keystrokes → press Save. The deadlock of “can read it, can’t complete it” ends there.

A few hard-earned lessons for others building agent browser automation

Prefer accessibility references; don’t default to clicking screenshot coordinates. Once the snapshot can see the iframe, references are always more stable than guessing pixels. Save screenshots for cases that truly have no structure, like canvas or WebGL.
Cross-origin iframes are a clear boundary. Selectors and eval stop there. Either your tool can penetrate the a11y tree, or you are left blindly clicking.
Test whether you can submit, not just whether you can fill. A value being entered does not mean the framework received it. Bugs like fill not dispatching events only show up when you actually try to save.
If you can use a real logged-in browser, don’t use headless. Login state, cookies, and extensions are all already there, and there is no automation fingerprint—that is also why chrome-use takes the path of “driving your own Chrome.”

Try it

curl -fsSL https://raw.githubusercontent.com/leeguooooo/chrome-use/main/install.sh | sh

The repository is at github.com/leeguooooo/chrome-use. I keep building tools like this—tools that use your own subscriptions and connect agents to real browsers/devices—and I post updates on X @leeguooooo.

Let Claude Code Generate Images with Your ChatGPT Subscription (No API Key)

郭立 — Mon, 22 Jun 2026 10:12:01 +0000

💡 Originally published on my blog blog.leeguoo.com — field notes on reverse engineering, AI agents, and building things that ship.

You ask Claude Code to write a README or a technical document. Can it also generate the accompanying images along the way? Yes—and without an OPENAI_API_KEY, without paying extra, and using the ChatGPT subscription you’re already paying for. The doodle illustrations in this article were generated by Claude Code itself while writing it.

There are two ways to connect image generation into an agent. The difference lies in which bucket of your subscription it uses, and whether free accounts can use it. This article focuses on the most interesting path behind the scenes: the web backend, and how it gets around chatgpt.com’s anti-scraping defenses.

First, Let’s Correct a Common Misunderstanding

Many people think there is only one cost-saving way to “generate images with a ChatGPT subscription”: reuse Codex CLI’s OAuth token and directly POST to backend-api/codex/responses. That was the approach covered in my previous article, “Turning a ChatGPT Subscription into an Image Generation API”.

There were two things it did not fully explain.

First, it consumes Codex metered quota. A subscription is not one single bucket; it has two separate rate-limit buckets: one for your chat quota on the ChatGPT web app, and one for Codex usage. Calling codex/responses spends the latter—the very bucket you least want to waste when using Codex CLI to write code.

Second, it requires you to have installed Codex CLI and run codex login. Free ChatGPT accounts do not have Codex, so this path simply does not work.

So the question becomes: can we generate images on the “web conversation” path instead? Free accounts have that path too. Free users can already generate images in the ChatGPT web app; it spends chat quota and does not touch Codex usage at all.

Yes—but the cost is that you really have to go to the “web” side to generate the image. That is the web backend of chatgpt-imagegen, and it is the main subject of this article.

Why Not POST Directly Like the Codex Backend?

The intuitive approach: image generation on the web is also just sending HTTP requests, so couldn’t you capture the traffic, grab the cookies, and replay it?

Anyone who has tried gets stuck on chatgpt.com’s anti-bot defenses. It is worth breaking down the layers here, because the layer that actually blocks you is not the one most people expect.

Defense	What It Is	Can a Bare Client Pass?
Cloudflare edge checks	Standard CF bot detection	✅ Can pass
Sentinel proof of work	`backend-api/sentinel/chat-requirements` + in-page `sentinel/sdk.js` computes a PoW token	✅ Can pass; the algorithm is in the page JS and can be replicated
Cloudflare Turnstile token	One-time token produced by interactive verification	❌ Cannot pass

The first two layers can both be simulated by a pure Python client. The real wall is the third layer: the Turnstile token can only be produced on the spot by a real, interactive browser, and it is valid for one use only. There is no shortcut where you “harvest a token in the browser first, then replay it headlessly.” The token is burned after one use; the next request needs a new one, and generating a new token requires a real browser to be present.

So the conclusion is pretty counterintuitive: you cannot bypass it; you have to be “inside.” Instead of forging something that only a browser can produce, just drive a real browser directly.

Solution: Drive Your Own Logged-In Chrome

The web backend uses chrome-use (a browser automation CLI with a Chrome extension) to connect to your real Chrome instance where you’re already logged into chatgpt.com, then generates the image inside a normal conversation. It’s the same interface, the same cookies, and the same Turnstile context as when you manually type “draw me a picture” in the app.

Full flow:

chatgpt-imagegen --backend web
   │
   ├── chrome-use connects to your logged-in Chrome and opens https://chatgpt.com/
   │     (must be a normal conversation; Temporary Chat disables image generation tools)
   │
   ├── resolves the ChatGPT project (--project, default: imagegen)
   │     fetches the project list in-page; if missing, POSTs /backend-api/projects to create one
   │     archives image-generation conversations into this project to avoid polluting main history
   │
   ├── types the prompt into the input box using real keyboard events
   │     ChatGPT’s ProseMirror/React input does not accept plain DOM .value= / fill,
   │     so key-by-key typing must be simulated, otherwise the submitted prompt is empty
   │
   ├── polls the page: waits for streaming output to finish and for a new <img> resource to stabilize
   │
   └── fetches the image bytes in-page (credentials:'include') → base64 → writes to disk
         (the signed estuary/content URL is authorized by the browser’s own cookies;
          the token never leaves the browser)

A few of these details were learned the hard way.

Directly setting value on the ProseMirror input box, or using an automation tool’s fill, does not work. React does not treat it as user input; you have to send real keyboard events.

To decide that “the image is ready,” you cannot only look at whether streaming has ended. You also have to wait until the newly appeared <img> resource inside the conversation’s main container has stabilized and its URL stops changing; otherwise you may capture a placeholder image or the image from the previous run.

The image bytes are not downloaded externally with curl either. That image URL is signed and requires cookies to download, so the page itself calls fetch(..., {credentials:'include'}), letting the browser authorize the request with its own session. The token never leaves the browser.

A String of Unavoidable Engineering Pitfalls

Getting from “it runs” to “it runs reliably” involved real battle scars.

The web backend concurrency can only be 1. It shares the same logged-in Chrome. In early versions, concurrent image generation could cross-contaminate outputs (#7, fixed in v0.6.0 by limiting detection to the current conversation’s main container). Also, chatgpt.com applies aggressive page-side rate limits (“Too many requests… temporarily limited access to your conversations”). So web runs are serialized across processes: extra processes queue on the flock slot. It is safe, but wall-clock time is roughly the sum of serial runs. If you want real parallelism (up to 4), explicitly use --backend codex; the tradeoff is spending Codex quota. This is a quota-saving vs. faster-output tradeoff. The tool does not decide for you; it follows --backend or the default auto.

Rate limits must fail fast, not retry blindly. When the page shows “Too many requests,” the web backend detects the popup and errors immediately. If it happens before submission, auto mode falls back to Codex; if it happens after submission, it stops cleanly without spending money twice. That image may still appear in the conversation later, so check manually.

History is not kept by default. Image-generation conversations are deleted by default with PATCH is_visible:false. They are only temporarily moved into the project as a handoff step, and after the run they leave no trace in your ChatGPT history (--keep-conversation preserves them).

Image-to-image uses the same path. -i/--ref uploads the reference image into the ChatGPT input box and then sends an edit prompt, the same mechanism as manually dragging in an image and asking it to modify it: still subscription-based, still no key required.

Two New Things: Style Presets + Proactively Illustrating Docs

A style preset is a reusable prompt fragment. Save it under a name, then apply it with one parameter:

chatgpt-imagegen "a robot mascot" --style doodle
chatgpt-imagegen style add brand "flat vector, bold shapes, white bg"
chatgpt-imagegen style use brand        # set as default; applied automatically after that

There’s a built-in doodle style that is intentionally terrible, like something scraped out with a mouse in an old-school drawing program. The “ugly-cute” illustrations in this repo’s README, including the ones in this article, were generated by the tool itself using that style. There’s no default style out of the box; if you don’t actively use one, it behaves exactly as before.

Proactive illustration is for AI agents. Once installed as a skill, when an agent writes a blog post, technical proposal, or design doc, it will proactively suggest illustrations and generate them in parallel in the background, instead of waiting for you to ask for images. The degree of parallelism depends on your backend configuration: web runs serially to conserve quota, while codex runs in parallel and consumes quota.

How to Choose Between the Two Backends

Scenario	Which backend to use	Why
Laptop/desktop, with Chrome open and logged in	web (default)	Doesn’t spend Codex quota; works even with a free account
Server / headless agent machine	codex	There’s no browser there, and `auto` will fall back on its own
Need truly parallel batch image generation	codex	web is serial; codex supports up to 4-way parallelism, but consumes quota

By default, auto tries web first and falls back to codex if that fails, which means it saves your Codex quota by default.

When Not to Use This, and Go Straight to the Official API

The subscription channel is not a free version of the API; it’s a different product form with its own boundaries.

If you strictly need quality=high or a transparent background, the subscription cannot provide that. You need to use OPENAI_API_KEY and call /v1/images/generations.

If you’re building an external production service, using a personal subscription to generate images for end users violates the OpenAI Terms and will also burn through your own ChatGPT quota. One more thing: this tool rides on an unpublished internal endpoint. Large-scale abuse is the fastest way to get that opening shut down, and if it’s shut down, everyone loses it.

If you need stable throughput above 10 images per minute, subscription rate limits are tighter than the API.

If you need a team-level, remotely callable HTTP gateway, use the sister project agent-cli-to-api, which exposes the same subscription as an OpenAI-compatible interface.

Getting Started

# Install chrome-use (for the web backend), and connect it to Chrome where you’re logged into chatgpt.com
curl -fsSL https://raw.githubusercontent.com/leeguooooo/chrome-use/main/install.sh | sh
chrome-use extension install   # Then install the Chrome extension, restart, and log in to chatgpt.com

# Install the CLI (for the agent; easiest option)
npx skills add leeguooooo/chatgpt-imagegen -g

# Or use it standalone
git clone https://github.com/leeguooooo/chatgpt-imagegen
./chatgpt-imagegen "a watercolor cat on a windowsill" -o cat.png

Not having chrome-use installed is fine: auto automatically falls back to codex, with only a one-line stderr note saying “installing chrome-use lets image generation avoid using Codex quota.”

A Few Common Questions

Will this get my account banned? Probably not, but there are red lines. The web backend drives your own logged-in browser and does things you could also do manually, so the traffic is no different from clicking around in the app a few times. The codex backend replays the Codex CLI protocol with a real auth token, so it looks like normal Codex usage. The real risks are two things: first, volume — sustained >10 images/minute, or generating dozens of images in a large fan-out, will hit rate limits, and forcing it long-term is also likely to draw attention; second, using a personal subscription to provide an external image-generation service, which is a clear terms-of-service red line. The tool keeps its footprint low by default: image-generation chats are deleted by default, chats are grouped into a project, rate-limit popups fail fast without retries, and concurrency is capped. Personal local use at normal volume is low-risk, but it sits on unpublished endpoints, so use at your own risk and don’t abuse it.

Is chrome-use especially wasteful with tokens? For this tool, the image-generation process does not consume any LLM tokens. The web backend does not have an AI watch the screen and operate the browser step by step; it uses a fixed Python script to call chrome-use with hard-coded steps, with no model inference in between. What really burns tokens is the screenshot-driven approach where every step is fed to a large model. chrome-use itself is the opposite: it uses accessibility-tree snapshots plus compact @eN references, around 200–400 tokens per step, which is much cheaper for an agent than dumping raw HTML or screenshots.

Final Notes

The codex backend replays the official protocol, while the web backend gets the job done in a real browser. The latter takes a more roundabout path, but gives you three things in return: no API key required, no Codex quota consumed, and free accounts can use it too. The hard part is not the Cloudflare edge; it is the Turnstile token that only a real browser can generate and that expires after a single use. Once you recognize that, the solution shifts from “forging it” to “operating directly inside it.”

GitHub: leeguooooo/chatgpt-imagegen
HTTP gateway sister project: agent-cli-to-api
skill installation: npx skills add leeguooooo/chatgpt-imagegen -g

Disclaimer: This tool calls ChatGPT’s internal backend-api endpoint (the same one used by Codex CLI), not a public API with documented guarantees. OpenAI may change or restrict it at any time. Please use it only for personal or local agent use within the scope permitted by the OpenAI Terms of Use, and do not offer commercial image-generation services to others.

DEV Community: 郭立

Is the cache 4m23s Line in Claude Code's Status Bar Actually Accurate?

First, distinguish the two “caches”; don’t mix them up

Where It Anchors

Does It Model Anthropic’s Actual Behavior Correctly?

Where it’s inaccurate — with evidence

1. The default TTL is hardcoded to 5 minutes, but you may be running a 1-hour cache

2. The anchor is “the turn finished,” not “the cache was refreshed”

3. The color guesses from the string, not the number

So, Is It Accurate?

Links

Letting an AI Agent Click Into Cross-Origin Iframes (How chrome-use Solves It)

Why cross-origin iframes are so hard

The foundation of chrome-use: agents get “references,” not HTML

Three hurdles, one at a time

The finish: click in, save successfully

A few hard-earned lessons for others building agent browser automation

Try it

Links

Let Claude Code Generate Images with Your ChatGPT Subscription (No API Key)

First, Let’s Correct a Common Misunderstanding

Why Not POST Directly Like the Codex Backend?

Solution: Drive Your Own Logged-In Chrome

A String of Unavoidable Engineering Pitfalls

Two New Things: Style Presets + Proactively Illustrating Docs

How to Choose Between the Two Backends

When Not to Use This, and Go Straight to the Official API

Getting Started

A Few Common Questions

Final Notes

Links