郭立

Posted on Jun 22 • Originally published at blog.leeguoo.com

Let Claude Code Generate Images with Your ChatGPT Subscription (No API Key)

#ai #chatgpt #python #cli

💡 Originally published on my blog blog.leeguoo.com — field notes on reverse engineering, AI agents, and building things that ship.

You ask Claude Code to write a README or a technical document. Can it also generate the accompanying images along the way? Yes—and without an OPENAI_API_KEY, without paying extra, and using the ChatGPT subscription you’re already paying for. The doodle illustrations in this article were generated by Claude Code itself while writing it.

There are two ways to connect image generation into an agent. The difference lies in which bucket of your subscription it uses, and whether free accounts can use it. This article focuses on the most interesting path behind the scenes: the web backend, and how it gets around chatgpt.com’s anti-scraping defenses.

First, Let’s Correct a Common Misunderstanding

Many people think there is only one cost-saving way to “generate images with a ChatGPT subscription”: reuse Codex CLI’s OAuth token and directly POST to backend-api/codex/responses. That was the approach covered in my previous article, “Turning a ChatGPT Subscription into an Image Generation API”.

There were two things it did not fully explain.

First, it consumes Codex metered quota. A subscription is not one single bucket; it has two separate rate-limit buckets: one for your chat quota on the ChatGPT web app, and one for Codex usage. Calling codex/responses spends the latter—the very bucket you least want to waste when using Codex CLI to write code.

Second, it requires you to have installed Codex CLI and run codex login. Free ChatGPT accounts do not have Codex, so this path simply does not work.

So the question becomes: can we generate images on the “web conversation” path instead? Free accounts have that path too. Free users can already generate images in the ChatGPT web app; it spends chat quota and does not touch Codex usage at all.

Yes—but the cost is that you really have to go to the “web” side to generate the image. That is the web backend of chatgpt-imagegen, and it is the main subject of this article.

Why Not POST Directly Like the Codex Backend?

The intuitive approach: image generation on the web is also just sending HTTP requests, so couldn’t you capture the traffic, grab the cookies, and replay it?

Anyone who has tried gets stuck on chatgpt.com’s anti-bot defenses. It is worth breaking down the layers here, because the layer that actually blocks you is not the one most people expect.

Defense	What It Is	Can a Bare Client Pass?
Cloudflare edge checks	Standard CF bot detection	✅ Can pass
Sentinel proof of work	`backend-api/sentinel/chat-requirements` + in-page `sentinel/sdk.js` computes a PoW token	✅ Can pass; the algorithm is in the page JS and can be replicated
Cloudflare Turnstile token	One-time token produced by interactive verification	❌ Cannot pass

The first two layers can both be simulated by a pure Python client. The real wall is the third layer: the Turnstile token can only be produced on the spot by a real, interactive browser, and it is valid for one use only. There is no shortcut where you “harvest a token in the browser first, then replay it headlessly.” The token is burned after one use; the next request needs a new one, and generating a new token requires a real browser to be present.

So the conclusion is pretty counterintuitive: you cannot bypass it; you have to be “inside.” Instead of forging something that only a browser can produce, just drive a real browser directly.

Solution: Drive Your Own Logged-In Chrome

The web backend uses chrome-use (a browser automation CLI with a Chrome extension) to connect to your real Chrome instance where you’re already logged into chatgpt.com, then generates the image inside a normal conversation. It’s the same interface, the same cookies, and the same Turnstile context as when you manually type “draw me a picture” in the app.

Full flow:

chatgpt-imagegen --backend web
   │
   ├── chrome-use connects to your logged-in Chrome and opens https://chatgpt.com/
   │     (must be a normal conversation; Temporary Chat disables image generation tools)
   │
   ├── resolves the ChatGPT project (--project, default: imagegen)
   │     fetches the project list in-page; if missing, POSTs /backend-api/projects to create one
   │     archives image-generation conversations into this project to avoid polluting main history
   │
   ├── types the prompt into the input box using real keyboard events
   │     ChatGPT’s ProseMirror/React input does not accept plain DOM .value= / fill,
   │     so key-by-key typing must be simulated, otherwise the submitted prompt is empty
   │
   ├── polls the page: waits for streaming output to finish and for a new <img> resource to stabilize
   │
   └── fetches the image bytes in-page (credentials:'include') → base64 → writes to disk
         (the signed estuary/content URL is authorized by the browser’s own cookies;
          the token never leaves the browser)

A few of these details were learned the hard way.

Directly setting value on the ProseMirror input box, or using an automation tool’s fill, does not work. React does not treat it as user input; you have to send real keyboard events.

To decide that “the image is ready,” you cannot only look at whether streaming has ended. You also have to wait until the newly appeared <img> resource inside the conversation’s main container has stabilized and its URL stops changing; otherwise you may capture a placeholder image or the image from the previous run.

The image bytes are not downloaded externally with curl either. That image URL is signed and requires cookies to download, so the page itself calls fetch(..., {credentials:'include'}), letting the browser authorize the request with its own session. The token never leaves the browser.

A String of Unavoidable Engineering Pitfalls

Getting from “it runs” to “it runs reliably” involved real battle scars.

The web backend concurrency can only be 1. It shares the same logged-in Chrome. In early versions, concurrent image generation could cross-contaminate outputs (#7, fixed in v0.6.0 by limiting detection to the current conversation’s main container). Also, chatgpt.com applies aggressive page-side rate limits (“Too many requests… temporarily limited access to your conversations”). So web runs are serialized across processes: extra processes queue on the flock slot. It is safe, but wall-clock time is roughly the sum of serial runs. If you want real parallelism (up to 4), explicitly use --backend codex; the tradeoff is spending Codex quota. This is a quota-saving vs. faster-output tradeoff. The tool does not decide for you; it follows --backend or the default auto.

Rate limits must fail fast, not retry blindly. When the page shows “Too many requests,” the web backend detects the popup and errors immediately. If it happens before submission, auto mode falls back to Codex; if it happens after submission, it stops cleanly without spending money twice. That image may still appear in the conversation later, so check manually.

History is not kept by default. Image-generation conversations are deleted by default with PATCH is_visible:false. They are only temporarily moved into the project as a handoff step, and after the run they leave no trace in your ChatGPT history (--keep-conversation preserves them).

Image-to-image uses the same path. -i/--ref uploads the reference image into the ChatGPT input box and then sends an edit prompt, the same mechanism as manually dragging in an image and asking it to modify it: still subscription-based, still no key required.

Two New Things: Style Presets + Proactively Illustrating Docs

A style preset is a reusable prompt fragment. Save it under a name, then apply it with one parameter:

chatgpt-imagegen "a robot mascot" --style doodle
chatgpt-imagegen style add brand "flat vector, bold shapes, white bg"
chatgpt-imagegen style use brand        # set as default; applied automatically after that

There’s a built-in doodle style that is intentionally terrible, like something scraped out with a mouse in an old-school drawing program. The “ugly-cute” illustrations in this repo’s README, including the ones in this article, were generated by the tool itself using that style. There’s no default style out of the box; if you don’t actively use one, it behaves exactly as before.

Proactive illustration is for AI agents. Once installed as a skill, when an agent writes a blog post, technical proposal, or design doc, it will proactively suggest illustrations and generate them in parallel in the background, instead of waiting for you to ask for images. The degree of parallelism depends on your backend configuration: web runs serially to conserve quota, while codex runs in parallel and consumes quota.

How to Choose Between the Two Backends

Scenario	Which backend to use	Why
Laptop/desktop, with Chrome open and logged in	web (default)	Doesn’t spend Codex quota; works even with a free account
Server / headless agent machine	codex	There’s no browser there, and `auto` will fall back on its own
Need truly parallel batch image generation	codex	web is serial; codex supports up to 4-way parallelism, but consumes quota

By default, auto tries web first and falls back to codex if that fails, which means it saves your Codex quota by default.

When Not to Use This, and Go Straight to the Official API

The subscription channel is not a free version of the API; it’s a different product form with its own boundaries.

If you strictly need quality=high or a transparent background, the subscription cannot provide that. You need to use OPENAI_API_KEY and call /v1/images/generations.

If you’re building an external production service, using a personal subscription to generate images for end users violates the OpenAI Terms and will also burn through your own ChatGPT quota. One more thing: this tool rides on an unpublished internal endpoint. Large-scale abuse is the fastest way to get that opening shut down, and if it’s shut down, everyone loses it.

If you need stable throughput above 10 images per minute, subscription rate limits are tighter than the API.

If you need a team-level, remotely callable HTTP gateway, use the sister project agent-cli-to-api, which exposes the same subscription as an OpenAI-compatible interface.

Getting Started

# Install chrome-use (for the web backend), and connect it to Chrome where you’re logged into chatgpt.com
curl -fsSL https://raw.githubusercontent.com/leeguooooo/chrome-use/main/install.sh | sh
chrome-use extension install   # Then install the Chrome extension, restart, and log in to chatgpt.com

# Install the CLI (for the agent; easiest option)
npx skills add leeguooooo/chatgpt-imagegen -g

# Or use it standalone
git clone https://github.com/leeguooooo/chatgpt-imagegen
./chatgpt-imagegen "a watercolor cat on a windowsill" -o cat.png

Not having chrome-use installed is fine: auto automatically falls back to codex, with only a one-line stderr note saying “installing chrome-use lets image generation avoid using Codex quota.”

A Few Common Questions

Will this get my account banned? Probably not, but there are red lines. The web backend drives your own logged-in browser and does things you could also do manually, so the traffic is no different from clicking around in the app a few times. The codex backend replays the Codex CLI protocol with a real auth token, so it looks like normal Codex usage. The real risks are two things: first, volume — sustained >10 images/minute, or generating dozens of images in a large fan-out, will hit rate limits, and forcing it long-term is also likely to draw attention; second, using a personal subscription to provide an external image-generation service, which is a clear terms-of-service red line. The tool keeps its footprint low by default: image-generation chats are deleted by default, chats are grouped into a project, rate-limit popups fail fast without retries, and concurrency is capped. Personal local use at normal volume is low-risk, but it sits on unpublished endpoints, so use at your own risk and don’t abuse it.

Is chrome-use especially wasteful with tokens? For this tool, the image-generation process does not consume any LLM tokens. The web backend does not have an AI watch the screen and operate the browser step by step; it uses a fixed Python script to call chrome-use with hard-coded steps, with no model inference in between. What really burns tokens is the screenshot-driven approach where every step is fed to a large model. chrome-use itself is the opposite: it uses accessibility-tree snapshots plus compact @eN references, around 200–400 tokens per step, which is much cheaper for an agent than dumping raw HTML or screenshots.

Final Notes

The codex backend replays the official protocol, while the web backend gets the job done in a real browser. The latter takes a more roundabout path, but gives you three things in return: no API key required, no Codex quota consumed, and free accounts can use it too. The hard part is not the Cloudflare edge; it is the Turnstile token that only a real browser can generate and that expires after a single use. Once you recognize that, the solution shifts from “forging it” to “operating directly inside it.”

GitHub: leeguooooo/chatgpt-imagegen
HTTP gateway sister project: agent-cli-to-api
skill installation: npx skills add leeguooooo/chatgpt-imagegen -g

Disclaimer: This tool calls ChatGPT’s internal backend-api endpoint (the same one used by Codex CLI), not a public API with documented guarantees. OpenAI may change or restrict it at any time. Please use it only for personal or local agent use within the scope permitted by the OpenAI Terms of Use, and do not offer commercial image-generation services to others.

DEV Community