<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 郭立</title>
    <description>The latest articles on DEV Community by 郭立 (@leeguoo).</description>
    <link>https://dev.to/leeguoo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2556481%2F67758934-84c0-4708-a64b-0bab37b9c711.png</url>
      <title>DEV Community: 郭立</title>
      <link>https://dev.to/leeguoo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/leeguoo"/>
    <language>en</language>
    <item>
      <title>Is the cache 4m23s Line in Claude Code's Status Bar Actually Accurate?</title>
      <dc:creator>郭立</dc:creator>
      <pubDate>Mon, 22 Jun 2026 10:19:13 +0000</pubDate>
      <link>https://dev.to/leeguoo/is-the-cache-4m23s-line-in-claude-codes-status-bar-actually-accurate-cl</link>
      <guid>https://dev.to/leeguoo/is-the-cache-4m23s-line-in-claude-codes-status-bar-actually-accurate-cl</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡 Originally published on my blog &lt;strong&gt;&lt;a href="https://blog.leeguoo.com" rel="noopener noreferrer"&gt;blog.leeguoo.com&lt;/a&gt;&lt;/strong&gt; — field notes on reverse engineering, AI agents, and building things that ship.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the status bar &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar" rel="noopener noreferrer"&gt;&lt;code&gt;cs&lt;/code&gt; / claude-statusbar&lt;/a&gt; I wrote for Claude Code, there’s a line that says &lt;code&gt;cache 4m23s&lt;/code&gt;: green, ticking down every second, then turning into a red &lt;code&gt;cache COLD&lt;/code&gt; when it reaches the end.&lt;/p&gt;

&lt;p&gt;Someone asked me: how exactly is this number calculated, and is it accurate?&lt;/p&gt;

&lt;p&gt;That’s a fair question. For Pro / Max subscribers, when there’s a cache hit, that part of the context basically doesn’t consume your 5h / 7d quota; let it go cold, and the next prompt has to feed the entire context back in at full price. So the “how many minutes left” line decides whether “I should send another message now while it’s still warm.” Let’s pull it apart and answer whether it’s accurate along the way.&lt;/p&gt;

&lt;p&gt;For people in a hurry, here’s the one-line version: &lt;strong&gt;with the default configuration and a 5-minute cache, it is accurate; the only scenario where it systematically lies to you is when you enable a 1-hour cache but don’t change its TTL — in that case, it reports 55 minutes too early.&lt;/strong&gt; One config line fixes it. The reasoning is below.&lt;/p&gt;

&lt;h2&gt;
  
  
  First, distinguish the two “caches”; don’t mix them up
&lt;/h2&gt;

&lt;p&gt;There are two things called cache in this repo, so before asking “is it accurate?” we need to be clear which one we mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data cache&lt;/strong&gt;: &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/blob/26b7475f4a0c885450f29e62b64a393562be9962/src/claude_statusbar/cache.py#L49" rel="noopener noreferrer"&gt;&lt;code&gt;CACHE_MAX_AGE_S = 30&lt;/code&gt; in &lt;code&gt;cache.py&lt;/code&gt;&lt;/a&gt;. It caches &lt;code&gt;claude-monitor&lt;/code&gt; output for 30 seconds, purely so the status bar doesn’t have to shell out to a subprocess every time it redraws once per second. It has nothing to do with whether the countdown is accurate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt-cache countdown&lt;/strong&gt;: today’s main character. It calculates “how long until Anthropic’s prompt cache expires.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rest only discusses the second one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where It Anchors
&lt;/h2&gt;

&lt;p&gt;The logic is very short: just one function, &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/blob/26b7475f4a0c885450f29e62b64a393562be9962/src/claude_statusbar/core.py#L964-L1030" rel="noopener noreferrer"&gt;&lt;code&gt;get_cache_age_text&lt;/code&gt;&lt;/a&gt;. It does three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads &lt;code&gt;~/.cache/claude-statusbar/last_stdin.json&lt;/code&gt; to get the current session’s &lt;code&gt;transcript_path&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;Reads that JSONL backwards, finds the &lt;strong&gt;most recent&lt;/strong&gt; record where &lt;code&gt;type == "assistant"&lt;/code&gt;, and takes its &lt;code&gt;timestamp&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;Calculates &lt;code&gt;remaining = ttl_seconds - elapsed seconds&lt;/code&gt;, then formats it as a countdown.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step two is &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/blob/26b7475f4a0c885450f29e62b64a393562be9962/src/claude_statusbar/core.py#L901-L961" rel="noopener noreferrer"&gt;&lt;code&gt;_last_assistant_age&lt;/code&gt;&lt;/a&gt;, and the key part is just this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;last_ts&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the anchor point: &lt;strong&gt;the timestamp of the most recent assistant message&lt;/strong&gt; — not the user message, not the file mtime. This choice is correct; the next section explains why.&lt;/p&gt;

&lt;p&gt;The formula is just as straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;remaining&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ttl_seconds&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;age_s&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;remaining&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COLD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ttl_seconds&lt;/code&gt; defaults to 300. If &lt;code&gt;remaining &amp;lt;= 0&lt;/code&gt;, or if no assistant record can be found at all (&lt;code&gt;age_s is None&lt;/code&gt;), it returns &lt;code&gt;COLD&lt;/code&gt;; if there isn’t even a &lt;code&gt;transcript_path&lt;/code&gt;, it returns an empty string and hides the whole segment.&lt;/p&gt;

&lt;p&gt;A bit of history while we’re here: before &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/pull/20" rel="noopener noreferrer"&gt;the v3.2.2 PR&lt;/a&gt;, this line displayed “how much time had already elapsed” instead. It was later changed to a countdown, because what users actually want to know isn’t “how many minutes has it been since the last response,” but “do I still have time to send another message before the cache dies?” A countdown answers that directly; elapsed time still makes you do the subtraction in your head.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does It Model Anthropic’s Actual Behavior Correctly?
&lt;/h2&gt;

&lt;p&gt;If you check the official documentation, &lt;a href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching" rel="noopener noreferrer"&gt;Prompt caching&lt;/a&gt;, two sentences set the tone:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;By default, the cache has a 5-minute lifetime.&lt;/p&gt;

&lt;p&gt;The cache is refreshed for no additional cost each time the cached content is used.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In other words, the TTL is a &lt;strong&gt;sliding window&lt;/strong&gt;: every cache hit resets it to 5 minutes.&lt;/p&gt;

&lt;p&gt;This also explains why “anchoring to the most recent assistant turn” is correct — each additional response resets &lt;code&gt;age_s&lt;/code&gt; to zero, the countdown automatically refills, and it lines up with the server-side behavior of “use it once, refresh it once.” The comment in the code, &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/blob/26b7475f4a0c885450f29e62b64a393562be9962/src/claude_statusbar/config.py#L22" rel="noopener noreferrer"&gt;&lt;code&gt;# 5min — Anthropic's default prompt cache TTL&lt;/code&gt;&lt;/a&gt;, isn’t wrong. At this layer, the model is correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it’s inaccurate — with evidence
&lt;/h2&gt;

&lt;p&gt;This is the real point. Three layers, ordered from most biting to least important.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The default TTL is hardcoded to 5 minutes, but you may be running a 1-hour cache
&lt;/h3&gt;

&lt;p&gt;This is the only part that can genuinely mislead people. The evidence comes from the &lt;code&gt;usage&lt;/code&gt; block in the most recent assistant record on my machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"cache_creation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ephemeral_1h_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1421&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ephemeral_5m_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything went into the 1-hour bucket. In other words, this machine is actually running a &lt;strong&gt;1h cache, with a real lifetime of 60 minutes&lt;/strong&gt;. But &lt;code&gt;cs&lt;/code&gt; defaults &lt;code&gt;cache_ttl_seconds = 300&lt;/code&gt;, so after 5 minutes it will shout &lt;code&gt;cache COLD&lt;/code&gt; — 55 minutes earlier than the truth.&lt;/p&gt;

&lt;p&gt;The most ironic part: the “truth signal” for deciding 5m vs 1h (&lt;code&gt;ephemeral_1h_input_tokens&lt;/code&gt; vs &lt;code&gt;ephemeral_5m_input_tokens&lt;/code&gt;) is sitting right there in the &lt;strong&gt;same file and the same record it has already opened&lt;/strong&gt;. But &lt;code&gt;_last_assistant_age&lt;/code&gt; only reads the &lt;code&gt;type&lt;/code&gt; and &lt;code&gt;timestamp&lt;/code&gt; fields, skipping straight past that &lt;code&gt;usage&lt;/code&gt; block. In theory, it could automatically infer which TTL to use from the transcript; right now, you have to manually run &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/pull/9" rel="noopener noreferrer"&gt;&lt;code&gt;cs config set cache_ttl_seconds 3600&lt;/code&gt;&lt;/a&gt;. That’s a TODO worth fixing.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The anchor is “the turn finished,” not “the cache was refreshed”
&lt;/h3&gt;

&lt;p&gt;The assistant &lt;code&gt;timestamp&lt;/code&gt; is roughly when that turn &lt;strong&gt;finished&lt;/strong&gt; writing; the cache is refreshed server-side when the request is &lt;strong&gt;sent&lt;/strong&gt;. There’s a generation-latency gap between the two. Here are assistant timestamps from the same stretch of a real transcript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;assistant  2026-05-29T04:46:18.432Z
assistant  2026-05-29T04:46:19.653Z
assistant  2026-05-29T04:46:25.680Z
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s on the order of a few to a dozen seconds. Relative to a 300s / 3600s TTL, it’s negligible. Directionally, it’s probably optimistic: the displayed remaining time is slightly higher than the real server-side value. But not enough to bite.&lt;/p&gt;

&lt;p&gt;I should be honest here: the source code cannot prove whether Anthropic’s server starts counting from request start or request end. So the precise statement is: &lt;strong&gt;the anchor is a proxy accurate to within one turn’s latency&lt;/strong&gt;, not the exact moment the cache refreshes. Good enough, but don’t treat it like a stopwatch.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The color guesses from the string, not the number
&lt;/h3&gt;

&lt;p&gt;An interesting engineering tradeoff. &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/blob/26b7475f4a0c885450f29e62b64a393562be9962/src/claude_statusbar/styles.py#L40-L56" rel="noopener noreferrer"&gt;&lt;code&gt;_cache_severity&lt;/code&gt;&lt;/a&gt; doesn’t receive remaining seconds; it receives the &lt;strong&gt;already formatted string&lt;/strong&gt;, then checks whether it contains &lt;code&gt;m&lt;/code&gt; / &lt;code&gt;h&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cache_text&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COLD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;theme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s_hot&lt;/span&gt;          &lt;span class="c1"&gt;# red
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cache_text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;h&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cache_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;theme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s_ok&lt;/span&gt;           &lt;span class="c1"&gt;# green, comfort zone
&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;theme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s_warn&lt;/span&gt;             &lt;span class="c1"&gt;# yellow, plain "Ys", under 1 minute
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When less than a minute remains, the formatter &lt;strong&gt;intentionally&lt;/strong&gt; outputs bare &lt;code&gt;Ys&lt;/code&gt; only (without &lt;code&gt;m&lt;/code&gt;) so the colorizer can detect “time to turn yellow.” The formatter and colorizer have an implicit contract between them. The repo even has a dedicated &lt;code&gt;test_cache_severity.py&lt;/code&gt; to pin this contract down, so a future format change doesn’t silently scramble the colors. It works, but it is coupling — worth knowing about.&lt;/p&gt;

&lt;p&gt;One more edge case: reverse-reading the transcript has a &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/blob/26b7475f4a0c885450f29e62b64a393562be9962/src/claude_statusbar/core.py#L897-L898" rel="noopener noreferrer"&gt;320KB limit (10×32KB)&lt;/a&gt;. If a huge transcript doesn’t contain an assistant record in the final 320KB scanned, it is treated as &lt;code&gt;COLD&lt;/code&gt;. That’s a performance tradeoff — the status bar redraws every second, so it can’t scan several MB every time. You won’t hit it in everyday use.&lt;/p&gt;

&lt;h2&gt;
  
  
  So, Is It Accurate?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;5-minute cache + default config&lt;/strong&gt;: Accurate. The anchor is right, the sliding-window model is right, and edge cases are handled too: clock rollback is clamped to 0, naive timestamps are treated as UTC, and the &lt;code&gt;Z&lt;/code&gt; suffix is normalized.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1-hour cache + unchanged TTL&lt;/strong&gt;: It will systematically report 55 minutes early. One line fixes it: &lt;code&gt;cs config set cache_ttl_seconds 3600&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Second-level precision&lt;/strong&gt;: Don’t expect it. The anchor itself has proxy error from one round-trip of latency. It’s a “how many minutes are left” hint, not a timer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One-sentence summary: it answers “Should I send one more message while the cache is still warm?” very accurately; if you use it as a stopwatch, you’re using the wrong tool.&lt;/p&gt;

&lt;p&gt;If you want to inspect it yourself, start with &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/blob/26b7475f4a0c885450f29e62b64a393562be9962/src/claude_statusbar/core.py#L901-L961" rel="noopener noreferrer"&gt;&lt;code&gt;_last_assistant_age&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar/blob/26b7475f4a0c885450f29e62b64a393562be9962/src/claude_statusbar/core.py#L964-L1030" rel="noopener noreferrer"&gt;&lt;code&gt;get_cache_age_text&lt;/code&gt;&lt;/a&gt;. You’ll finish reading them in thirty lines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🔧 &lt;strong&gt;The tool:&lt;/strong&gt; &lt;a href="https://github.com/leeguooooo/claude-code-usage-bar" rel="noopener noreferrer"&gt;claude-statusbar on GitHub&lt;/a&gt; — Claude Code status line — 5h/7d rate-limit usage + reset countdown.&lt;/li&gt;
&lt;li&gt;📝 &lt;strong&gt;More writing:&lt;/strong&gt; &lt;a href="https://blog.leeguoo.com" rel="noopener noreferrer"&gt;blog.leeguoo.com&lt;/a&gt; — I'm &lt;a href="https://leeguoo.com/about" rel="noopener noreferrer"&gt;Guo Li (leeguoo)&lt;/a&gt;, a full-stack dev building small AI-agent tools and CLIs.&lt;/li&gt;
&lt;li&gt;💬 Found it useful? A ⭐ on the repo or a follow here means a lot.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>cli</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Letting an AI Agent Click Into Cross-Origin Iframes (How chrome-use Solves It)</title>
      <dc:creator>郭立</dc:creator>
      <pubDate>Mon, 22 Jun 2026 10:18:42 +0000</pubDate>
      <link>https://dev.to/leeguoo/letting-an-ai-agent-click-into-cross-origin-iframes-how-chrome-use-solves-it-k3l</link>
      <guid>https://dev.to/leeguoo/letting-an-ai-agent-click-into-cross-origin-iframes-how-chrome-use-solves-it-k3l</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡 Originally published on my blog &lt;strong&gt;&lt;a href="https://blog.leeguoo.com" rel="noopener noreferrer"&gt;blog.leeguoo.com&lt;/a&gt;&lt;/strong&gt; — field notes on reverse engineering, AI agents, and building things that ship.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connecting an AI agent to a browser starts out smoothly: open a page, read the content, fill in a search box. What really gets you stuck are the &lt;strong&gt;forms hidden inside cross-origin iframes&lt;/strong&gt;—Google Payments payout profiles, checkout components, KYC widgets. The agent can read the text inside them and fill in values, but &lt;strong&gt;it just can’t click that “Save” button&lt;/strong&gt;. It can see the task, but it can’t finish it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fz3yjcb1s5ixxjyaoxjik.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fz3yjcb1s5ixxjyaoxjik.png" alt="An agent tries to reach into a “window inside a window,” only to click nothing" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a write-up of how we got past that hurdle. The protagonist is &lt;a href="https://github.com/leeguooooo/chrome-use" rel="noopener noreferrer"&gt;chrome-use&lt;/a&gt;—a Rust-based browser automation CLI for agents that directly drives the Chrome browser where you are &lt;strong&gt;actually logged in&lt;/strong&gt;, without Playwright and without headless mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why cross-origin iframes are so hard
&lt;/h2&gt;

&lt;p&gt;Regular pages are easy: capture the accessibility tree, get element references, and click. But cross-origin iframes—for example, an &lt;code&gt;adsense.google.com&lt;/code&gt; page embedding a &lt;code&gt;payments.google.com&lt;/code&gt; iframe—hit three problems at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Selectors can’t get in.&lt;/strong&gt; Under the same-origin policy, CSS selectors and &lt;code&gt;eval&lt;/code&gt; running in the outer document can’t touch the DOM inside the iframe. &lt;code&gt;document.querySelector&lt;/code&gt; is blind here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scrolling misses the target.&lt;/strong&gt; You think you’re scrolling the page, but the thing that can actually scroll is the scroll container inside the iframe. Wheel events go to the outer document, while the inside stays still—the target row remains “off screen” forever, not even visible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You’re left blindly clicking coordinates.&lt;/strong&gt; The first two problems force you back to “screenshot + guess pixel coordinates,” which is the least precise approach and the easiest way to click a neighboring field by mistake. On a form that edits &lt;strong&gt;global payment profile&lt;/strong&gt; information, one wrong click can be costly.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The foundation of chrome-use: agents get “references,” not HTML
&lt;/h2&gt;

&lt;p&gt;Before explaining the fix, it’s worth covering the basic design—because this is also the fundamental difference between chrome-use and the camp that feeds raw HTML to models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6a27kz1nrumj6r342gos.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6a27kz1nrumj6r342gos.png" alt="Replacing a scary blob of HTML with clean references like @e1 @e2 @e3" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;chrome-use does not hand page source to the agent. Instead, it captures an &lt;strong&gt;accessibility tree snapshot&lt;/strong&gt;, assigning each interactive element a compact reference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- textbox "Email" [ref=e2]
- listbox "Country/region" [ref=e60]
- button "Save" [ref=e41]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent acts directly on those references: &lt;code&gt;fill @e2 "..."&lt;/code&gt;, &lt;code&gt;click @e41&lt;/code&gt;. A page costs roughly 200–400 tokens instead of a whole screen of DOM noise. This reference mechanism is exactly what makes it possible to work through iframes later—as long as the snapshot can “see” nodes inside the iframe, it can assign references to them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three hurdles, one at a time
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;First hurdle: make the snapshot see what’s inside the iframe.&lt;/strong&gt;&lt;br&gt;
The accessibility tree needs to pass through cross-origin iframes and include their nodes with references. After fixing that, &lt;code&gt;snapshot&lt;/code&gt; can list them directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- textbox "Phone number (optional)" [ref=e59]
- listbox "Country/region code: Japan (+81)" [ref=e60]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where selectors can’t enter, references can.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second hurdle: make scrolling affect the iframe’s scroll container.&lt;/strong&gt;&lt;br&gt;
Instead of sending every wheel event to the outer document, scroll the container that actually needs to scroll. The lower form rows can finally move into view, and their references become available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third hurdle, the hardest one: the enabled submit button inside the cross-origin iframe does nothing when clicked.&lt;/strong&gt;&lt;br&gt;
This stage is the most maddening because &lt;strong&gt;everything looks right&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The number is entered with real keystrokes, and &lt;code&gt;get value&lt;/code&gt; confirms it is there;&lt;/li&gt;
&lt;li&gt;The “Save” button becomes enabled when it should—it is disabled before a valid value is entered, then appears after filling;&lt;/li&gt;
&lt;li&gt;Then &lt;code&gt;click @e41&lt;/code&gt;—and the form does nothing. &lt;code&gt;find text "Save"&lt;/code&gt;? Cross-origin access blocks it. Focus and press Enter or Space? Still nothing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It looks correct, yet everything is wrong. The root cause: Material/framework buttons inside cross-origin iframes do not accept &lt;strong&gt;synthetic clicks&lt;/strong&gt;; and &lt;code&gt;fill&lt;/code&gt; only changed the input value without dispatching the &lt;code&gt;input&lt;/code&gt;/&lt;code&gt;change&lt;/code&gt; events the framework expects. The form still thinks “nothing changed,” so the Save button is either disabled or clicking it is equivalent to doing nothing.&lt;/p&gt;

&lt;p&gt;The fix has two parts: value entry switches to &lt;strong&gt;real keystrokes&lt;/strong&gt; so every character triggers real events that the framework recognizes; clicking dispatches a full set of &lt;strong&gt;real mouse/keyboard activations&lt;/strong&gt; against the content node inside the iframe, rather than slapping a &lt;code&gt;click()&lt;/code&gt; onto it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The finish: click in, save successfully
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fakrw0li6e57kvw8404ij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fakrw0li6e57kvw8404ij.png" alt="An agent wearing a party hat reaches into an iframe, successfully presses SAVE, and gets a green saved checkmark" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once all three hurdles are cleared, the whole chain works: &lt;strong&gt;open → scroll to the target row → capture references from the snapshot → fill with real keystrokes → press Save&lt;/strong&gt;. The deadlock of “can read it, can’t complete it” ends there.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few hard-earned lessons for others building agent browser automation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prefer accessibility references; don’t default to clicking screenshot coordinates.&lt;/strong&gt; Once the snapshot can see the iframe, references are always more stable than guessing pixels. Save screenshots for cases that truly have no structure, like canvas or WebGL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-origin iframes are a clear boundary.&lt;/strong&gt; Selectors and &lt;code&gt;eval&lt;/code&gt; stop there. Either your tool can penetrate the a11y tree, or you are left blindly clicking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test whether you can submit, not just whether you can fill.&lt;/strong&gt; A value being entered does not mean the framework received it. Bugs like &lt;code&gt;fill&lt;/code&gt; not dispatching events only show up when you actually try to save.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you can use a real logged-in browser, don’t use headless.&lt;/strong&gt; Login state, cookies, and extensions are all already there, and there is no automation fingerprint—that is also why chrome-use takes the path of “driving your own Chrome.”&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/leeguooooo/chrome-use/main/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repository is at &lt;a href="https://github.com/leeguooooo/chrome-use" rel="noopener noreferrer"&gt;github.com/leeguooooo/chrome-use&lt;/a&gt;. I keep building tools like this—tools that use your own subscriptions and connect agents to real browsers/devices—and I post updates on &lt;a href="https://x.com/leeguooooo" rel="noopener noreferrer"&gt;X @leeguooooo&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🔧 &lt;strong&gt;The tool:&lt;/strong&gt; &lt;a href="https://github.com/leeguooooo/chrome-use" rel="noopener noreferrer"&gt;chrome-use on GitHub&lt;/a&gt; — drive your real, logged-in Chrome from any AI agent.&lt;/li&gt;
&lt;li&gt;📝 &lt;strong&gt;More writing:&lt;/strong&gt; &lt;a href="https://blog.leeguoo.com" rel="noopener noreferrer"&gt;blog.leeguoo.com&lt;/a&gt; — I'm &lt;a href="https://leeguoo.com/about" rel="noopener noreferrer"&gt;Guo Li (leeguoo)&lt;/a&gt;, a full-stack dev building small AI-agent tools and CLIs.&lt;/li&gt;
&lt;li&gt;💬 Found it useful? A ⭐ on the repo or a follow here means a lot.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>browser</category>
      <category>automation</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Let Claude Code Generate Images with Your ChatGPT Subscription (No API Key)</title>
      <dc:creator>郭立</dc:creator>
      <pubDate>Mon, 22 Jun 2026 10:12:01 +0000</pubDate>
      <link>https://dev.to/leeguoo/let-claude-code-generate-images-with-your-chatgpt-subscription-no-api-key-1fbp</link>
      <guid>https://dev.to/leeguoo/let-claude-code-generate-images-with-your-chatgpt-subscription-no-api-key-1fbp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡 Originally published on my blog &lt;strong&gt;&lt;a href="https://blog.leeguoo.com" rel="noopener noreferrer"&gt;blog.leeguoo.com&lt;/a&gt;&lt;/strong&gt; — field notes on reverse engineering, AI agents, and building things that ship.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You ask Claude Code to write a README or a technical document. Can it also generate the accompanying images along the way? Yes—and without an &lt;code&gt;OPENAI_API_KEY&lt;/code&gt;, without paying extra, and using the ChatGPT subscription you’re already paying for. The doodle illustrations in this article were generated by Claude Code itself while writing it.&lt;/p&gt;

&lt;p&gt;There are two ways to connect image generation into an agent. The difference lies in which bucket of your subscription it uses, and whether free accounts can use it. This article focuses on the most interesting path behind the scenes: the web backend, and how it gets around chatgpt.com’s anti-scraping defenses.&lt;/p&gt;

&lt;h2&gt;
  
  
  First, Let’s Correct a Common Misunderstanding
&lt;/h2&gt;

&lt;p&gt;Many people think there is only one cost-saving way to “generate images with a ChatGPT subscription”: reuse Codex CLI’s OAuth token and directly POST to &lt;code&gt;backend-api/codex/responses&lt;/code&gt;. That was the approach covered in my previous article, &lt;a href="https://blog.misonote.com/zh/posts/chatgpt-subscription-image-api/" rel="noopener noreferrer"&gt;“Turning a ChatGPT Subscription into an Image Generation API”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There were two things it did not fully explain.&lt;/p&gt;

&lt;p&gt;First, it consumes Codex metered quota. A subscription is not one single bucket; it has two separate rate-limit buckets: one for your chat quota on the ChatGPT web app, and one for Codex usage. Calling &lt;code&gt;codex/responses&lt;/code&gt; spends the latter—the very bucket you least want to waste when using Codex CLI to write code.&lt;/p&gt;

&lt;p&gt;Second, it requires you to have installed Codex CLI and run &lt;code&gt;codex login&lt;/code&gt;. Free ChatGPT accounts do not have Codex, so this path simply does not work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fw7nlkave2wfceoel589t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fw7nlkave2wfceoel589t.png" alt="Chat quota and Codex usage are two separate buckets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the question becomes: can we generate images on the “web conversation” path instead? Free accounts have that path too. Free users can already generate images in the ChatGPT web app; it spends chat quota and does not touch Codex usage at all.&lt;/p&gt;

&lt;p&gt;Yes—but the cost is that you really have to go to the “web” side to generate the image. That is the &lt;strong&gt;web backend&lt;/strong&gt; of &lt;a href="https://github.com/leeguooooo/chatgpt-imagegen" rel="noopener noreferrer"&gt;&lt;code&gt;chatgpt-imagegen&lt;/code&gt;&lt;/a&gt;, and it is the main subject of this article.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not POST Directly Like the Codex Backend?
&lt;/h2&gt;

&lt;p&gt;The intuitive approach: image generation on the web is also just sending HTTP requests, so couldn’t you capture the traffic, grab the cookies, and replay it?&lt;/p&gt;

&lt;p&gt;Anyone who has tried gets stuck on chatgpt.com’s anti-bot defenses. It is worth breaking down the layers here, because the layer that actually blocks you is not the one most people expect.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Defense&lt;/th&gt;
&lt;th&gt;What It Is&lt;/th&gt;
&lt;th&gt;Can a Bare Client Pass?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare edge checks&lt;/td&gt;
&lt;td&gt;Standard CF bot detection&lt;/td&gt;
&lt;td&gt;✅ Can pass&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sentinel proof of work&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;backend-api/sentinel/chat-requirements&lt;/code&gt; + in-page &lt;code&gt;sentinel/sdk.js&lt;/code&gt; computes a PoW token&lt;/td&gt;
&lt;td&gt;✅ Can pass; the algorithm is in the page JS and can be replicated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloudflare Turnstile token&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One-time token produced by interactive verification&lt;/td&gt;
&lt;td&gt;❌ &lt;strong&gt;Cannot pass&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first two layers can both be simulated by a pure Python client. The real wall is the third layer: the Turnstile token can only be produced on the spot by a real, interactive browser, and it is valid for one use only. There is no shortcut where you “harvest a token in the browser first, then replay it headlessly.” The token is burned after one use; the next request needs a new one, and generating a new token requires a real browser to be present.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fykt7dzk932u0sjok6212.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fykt7dzk932u0sjok6212.png" alt="Three gates: the first two open, the third Turnstile gate locked shut"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the conclusion is pretty counterintuitive: you cannot bypass it; you have to be “inside.” Instead of forging something that only a browser can produce, just drive a real browser directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution: Drive Your Own Logged-In Chrome
&lt;/h2&gt;

&lt;p&gt;The web backend uses &lt;a href="https://github.com/leeguooooo/chrome-use" rel="noopener noreferrer"&gt;&lt;code&gt;chrome-use&lt;/code&gt;&lt;/a&gt; (a browser automation CLI with a Chrome extension) to connect to your real Chrome instance where you’re already logged into chatgpt.com, then generates the image inside a normal conversation. It’s the same interface, the same cookies, and the same Turnstile context as when you manually type “draw me a picture” in the app.&lt;/p&gt;

&lt;p&gt;Full flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chatgpt-imagegen --backend web
   │
   ├── chrome-use connects to your logged-in Chrome and opens https://chatgpt.com/
   │     (must be a normal conversation; Temporary Chat disables image generation tools)
   │
   ├── resolves the ChatGPT project (--project, default: imagegen)
   │     fetches the project list in-page; if missing, POSTs /backend-api/projects to create one
   │     archives image-generation conversations into this project to avoid polluting main history
   │
   ├── types the prompt into the input box using real keyboard events
   │     ChatGPT’s ProseMirror/React input does not accept plain DOM .value= / fill,
   │     so key-by-key typing must be simulated, otherwise the submitted prompt is empty
   │
   ├── polls the page: waits for streaming output to finish and for a new &amp;lt;img&amp;gt; resource to stabilize
   │
   └── fetches the image bytes in-page (credentials:'include') → base64 → writes to disk
         (the signed estuary/content URL is authorized by the browser’s own cookies;
          the token never leaves the browser)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few of these details were learned the hard way.&lt;/p&gt;

&lt;p&gt;Directly setting &lt;code&gt;value&lt;/code&gt; on the ProseMirror input box, or using an automation tool’s &lt;code&gt;fill&lt;/code&gt;, does not work. React does not treat it as user input; you have to send real keyboard events.&lt;/p&gt;

&lt;p&gt;To decide that “the image is ready,” you cannot only look at whether streaming has ended. You also have to wait until the newly appeared &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; resource inside the conversation’s &lt;code&gt;main&lt;/code&gt; container has stabilized and its URL stops changing; otherwise you may capture a placeholder image or the image from the previous run.&lt;/p&gt;

&lt;p&gt;The image bytes are not downloaded externally with &lt;code&gt;curl&lt;/code&gt; either. That image URL is signed and requires cookies to download, so the page itself calls &lt;code&gt;fetch(..., {credentials:'include'})&lt;/code&gt;, letting the browser authorize the request with its own session. The token never leaves the browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  A String of Unavoidable Engineering Pitfalls
&lt;/h2&gt;

&lt;p&gt;Getting from “it runs” to “it runs reliably” involved real battle scars.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The web backend concurrency can only be 1.&lt;/strong&gt; It shares the same logged-in Chrome. In early versions, concurrent image generation could cross-contaminate outputs (&lt;a href="https://github.com/leeguooooo/chatgpt-imagegen/issues/7" rel="noopener noreferrer"&gt;#7&lt;/a&gt;, fixed in v0.6.0 by limiting detection to the current conversation’s &lt;code&gt;main&lt;/code&gt; container). Also, chatgpt.com applies aggressive page-side rate limits (“Too many requests… temporarily limited access to your conversations”). So web runs are serialized across processes: extra processes queue on the &lt;code&gt;flock&lt;/code&gt; slot. It is safe, but wall-clock time is roughly the sum of serial runs. If you want real parallelism (up to 4), explicitly use &lt;code&gt;--backend codex&lt;/code&gt;; the tradeoff is spending Codex quota. This is a quota-saving vs. faster-output tradeoff. The tool does not decide for you; it follows &lt;code&gt;--backend&lt;/code&gt; or the default &lt;code&gt;auto&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limits must fail fast, not retry blindly.&lt;/strong&gt; When the page shows “Too many requests,” the web backend detects the popup and errors immediately. If it happens before submission, &lt;code&gt;auto&lt;/code&gt; mode falls back to Codex; if it happens after submission, it stops cleanly without spending money twice. That image may still appear in the conversation later, so check manually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;History is not kept by default.&lt;/strong&gt; Image-generation conversations are deleted by default with &lt;code&gt;PATCH is_visible:false&lt;/code&gt;. They are only temporarily moved into the project as a handoff step, and after the run they leave no trace in your ChatGPT history (&lt;code&gt;--keep-conversation&lt;/code&gt; preserves them).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image-to-image uses the same path.&lt;/strong&gt; &lt;code&gt;-i/--ref&lt;/code&gt; uploads the reference image into the ChatGPT input box and then sends an edit prompt, the same mechanism as manually dragging in an image and asking it to modify it: still subscription-based, still no key required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two New Things: Style Presets + Proactively Illustrating Docs
&lt;/h2&gt;

&lt;p&gt;A style preset is a reusable prompt fragment. Save it under a name, then apply it with one parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;chatgpt-imagegen &lt;span class="s2"&gt;"a robot mascot"&lt;/span&gt; &lt;span class="nt"&gt;--style&lt;/span&gt; doodle
chatgpt-imagegen style add brand &lt;span class="s2"&gt;"flat vector, bold shapes, white bg"&lt;/span&gt;
chatgpt-imagegen style use brand        &lt;span class="c"&gt;# set as default; applied automatically after that&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There’s a built-in &lt;code&gt;doodle&lt;/code&gt; style that is intentionally terrible, like something scraped out with a mouse in an old-school drawing program. The “ugly-cute” illustrations in this repo’s README, including the ones in this article, were generated by the tool itself using that style. There’s no default style out of the box; if you don’t actively use one, it behaves exactly as before.&lt;/p&gt;

&lt;p&gt;Proactive illustration is for AI agents. Once installed as a skill, when an agent writes a blog post, technical proposal, or design doc, it will proactively suggest illustrations and generate them in parallel in the background, instead of waiting for you to ask for images. The degree of parallelism depends on your backend configuration: &lt;code&gt;web&lt;/code&gt; runs serially to conserve quota, while &lt;code&gt;codex&lt;/code&gt; runs in parallel and consumes quota.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Choose Between the Two Backends
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Which backend to use&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Laptop/desktop, with Chrome open and logged in&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;web&lt;/strong&gt; (default)&lt;/td&gt;
&lt;td&gt;Doesn’t spend Codex quota; works even with a free account&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server / headless agent machine&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;codex&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;There’s no browser there, and &lt;code&gt;auto&lt;/code&gt; will fall back on its own&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need truly parallel batch image generation&lt;/td&gt;
&lt;td&gt;codex&lt;/td&gt;
&lt;td&gt;web is serial; codex supports up to 4-way parallelism, but consumes quota&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;By default, &lt;code&gt;auto&lt;/code&gt; tries web first and falls back to codex if that fails, which means it saves your Codex quota by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Not to Use This, and Go Straight to the Official API
&lt;/h2&gt;

&lt;p&gt;The subscription channel is not a free version of the API; it’s a different product form with its own boundaries.&lt;/p&gt;

&lt;p&gt;If you strictly need &lt;code&gt;quality=high&lt;/code&gt; or a transparent background, the subscription cannot provide that. You need to use &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; and call &lt;code&gt;/v1/images/generations&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you’re building an external production service, using a personal subscription to generate images for end users violates the &lt;a href="https://openai.com/policies/row-terms-of-use/" rel="noopener noreferrer"&gt;OpenAI Terms&lt;/a&gt; and will also burn through your own ChatGPT quota. One more thing: this tool rides on an unpublished internal endpoint. Large-scale abuse is the fastest way to get that opening shut down, and if it’s shut down, everyone loses it.&lt;/p&gt;

&lt;p&gt;If you need stable throughput above 10 images per minute, subscription rate limits are tighter than the API.&lt;/p&gt;

&lt;p&gt;If you need a team-level, remotely callable HTTP gateway, use the sister project &lt;a href="https://github.com/leeguooooo/agent-cli-to-api" rel="noopener noreferrer"&gt;agent-cli-to-api&lt;/a&gt;, which exposes the same subscription as an OpenAI-compatible interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install chrome-use (for the web backend), and connect it to Chrome where you’re logged into chatgpt.com&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/leeguooooo/chrome-use/main/install.sh | sh
chrome-use extension &lt;span class="nb"&gt;install&lt;/span&gt;   &lt;span class="c"&gt;# Then install the Chrome extension, restart, and log in to chatgpt.com&lt;/span&gt;

&lt;span class="c"&gt;# Install the CLI (for the agent; easiest option)&lt;/span&gt;
npx skills add leeguooooo/chatgpt-imagegen &lt;span class="nt"&gt;-g&lt;/span&gt;

&lt;span class="c"&gt;# Or use it standalone&lt;/span&gt;
git clone https://github.com/leeguooooo/chatgpt-imagegen
./chatgpt-imagegen &lt;span class="s2"&gt;"a watercolor cat on a windowsill"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; cat.png
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not having &lt;code&gt;chrome-use&lt;/code&gt; installed is fine: &lt;code&gt;auto&lt;/code&gt; automatically falls back to codex, with only a one-line stderr note saying “installing chrome-use lets image generation avoid using Codex quota.”&lt;/p&gt;

&lt;h2&gt;
  
  
  A Few Common Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Will this get my account banned?&lt;/strong&gt; Probably not, but there are red lines. The web backend drives your own logged-in browser and does things you could also do manually, so the traffic is no different from clicking around in the app a few times. The codex backend replays the Codex CLI protocol with a real auth token, so it looks like normal Codex usage. The real risks are two things: first, volume — sustained &amp;gt;10 images/minute, or generating dozens of images in a large fan-out, will hit rate limits, and forcing it long-term is also likely to draw attention; second, using a personal subscription to provide an external image-generation service, which is a clear terms-of-service red line. The tool keeps its footprint low by default: image-generation chats are deleted by default, chats are grouped into a project, rate-limit popups fail fast without retries, and concurrency is capped. Personal local use at normal volume is low-risk, but it sits on unpublished endpoints, so use at your own risk and don’t abuse it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is &lt;code&gt;chrome-use&lt;/code&gt; especially wasteful with tokens?&lt;/strong&gt; For this tool, the image-generation process does not consume any LLM tokens. The web backend does not have an AI watch the screen and operate the browser step by step; it uses a fixed Python script to call &lt;code&gt;chrome-use&lt;/code&gt; with hard-coded steps, with no model inference in between. What really burns tokens is the screenshot-driven approach where every step is fed to a large model. &lt;code&gt;chrome-use&lt;/code&gt; itself is the opposite: it uses accessibility-tree snapshots plus compact &lt;code&gt;@eN&lt;/code&gt; references, around 200–400 tokens per step, which is much cheaper for an agent than dumping raw HTML or screenshots.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Notes
&lt;/h2&gt;

&lt;p&gt;The codex backend replays the official protocol, while the web backend gets the job done in a real browser. The latter takes a more roundabout path, but gives you three things in return: no API key required, no Codex quota consumed, and free accounts can use it too. The hard part is not the Cloudflare edge; it is the Turnstile token that only a real browser can generate and that expires after a single use. Once you recognize that, the solution shifts from “forging it” to “operating directly inside it.”&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/leeguooooo/chatgpt-imagegen" rel="noopener noreferrer"&gt;leeguooooo/chatgpt-imagegen&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;HTTP gateway sister project: &lt;a href="https://github.com/leeguooooo/agent-cli-to-api" rel="noopener noreferrer"&gt;agent-cli-to-api&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;skill installation: &lt;code&gt;npx skills add leeguooooo/chatgpt-imagegen -g&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Disclaimer: This tool calls ChatGPT’s internal &lt;code&gt;backend-api&lt;/code&gt; endpoint (the same one used by Codex CLI), not a public API with documented guarantees. OpenAI may change or restrict it at any time. Please use it only for personal or local agent use within the scope permitted by the &lt;a href="https://openai.com/policies/row-terms-of-use/" rel="noopener noreferrer"&gt;OpenAI Terms of Use&lt;/a&gt;, and do not offer commercial image-generation services to others.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🔧 &lt;strong&gt;The tool:&lt;/strong&gt; &lt;a href="https://github.com/leeguooooo/chatgpt-imagegen" rel="noopener noreferrer"&gt;chatgpt-imagegen on GitHub&lt;/a&gt; — generate images from your ChatGPT subscription, no API key.&lt;/li&gt;
&lt;li&gt;📝 &lt;strong&gt;More writing:&lt;/strong&gt; &lt;a href="https://blog.leeguoo.com" rel="noopener noreferrer"&gt;blog.leeguoo.com&lt;/a&gt; — I'm &lt;a href="https://leeguoo.com/about" rel="noopener noreferrer"&gt;Guo Li (leeguoo)&lt;/a&gt;, a full-stack dev building small AI-agent tools and CLIs.&lt;/li&gt;
&lt;li&gt;💬 Found it useful? A ⭐ on the repo or a follow here means a lot.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>chatgpt</category>
      <category>python</category>
      <category>cli</category>
    </item>
  </channel>
</rss>
