đĄ Originally published on my blog blog.leeguoo.com â field notes on reverse engineering, AI agents, and building things that ship.
You ask Claude Code to write a README or a technical document. Can it also generate the accompanying images along the way? Yesâand without an OPENAI_API_KEY, without paying extra, and using the ChatGPT subscription youâre already paying for. The doodle illustrations in this article were generated by Claude Code itself while writing it.
There are two ways to connect image generation into an agent. The difference lies in which bucket of your subscription it uses, and whether free accounts can use it. This article focuses on the most interesting path behind the scenes: the web backend, and how it gets around chatgpt.comâs anti-scraping defenses.
First, Letâs Correct a Common Misunderstanding
Many people think there is only one cost-saving way to âgenerate images with a ChatGPT subscriptionâ: reuse Codex CLIâs OAuth token and directly POST to backend-api/codex/responses. That was the approach covered in my previous article, âTurning a ChatGPT Subscription into an Image Generation APIâ.
There were two things it did not fully explain.
First, it consumes Codex metered quota. A subscription is not one single bucket; it has two separate rate-limit buckets: one for your chat quota on the ChatGPT web app, and one for Codex usage. Calling codex/responses spends the latterâthe very bucket you least want to waste when using Codex CLI to write code.
Second, it requires you to have installed Codex CLI and run codex login. Free ChatGPT accounts do not have Codex, so this path simply does not work.
So the question becomes: can we generate images on the âweb conversationâ path instead? Free accounts have that path too. Free users can already generate images in the ChatGPT web app; it spends chat quota and does not touch Codex usage at all.
Yesâbut the cost is that you really have to go to the âwebâ side to generate the image. That is the web backend of chatgpt-imagegen, and it is the main subject of this article.
Why Not POST Directly Like the Codex Backend?
The intuitive approach: image generation on the web is also just sending HTTP requests, so couldnât you capture the traffic, grab the cookies, and replay it?
Anyone who has tried gets stuck on chatgpt.comâs anti-bot defenses. It is worth breaking down the layers here, because the layer that actually blocks you is not the one most people expect.
| Defense | What It Is | Can a Bare Client Pass? |
|---|---|---|
| Cloudflare edge checks | Standard CF bot detection | â Can pass |
| Sentinel proof of work |
backend-api/sentinel/chat-requirements + in-page sentinel/sdk.js computes a PoW token |
â Can pass; the algorithm is in the page JS and can be replicated |
| Cloudflare Turnstile token | One-time token produced by interactive verification | â Cannot pass |
The first two layers can both be simulated by a pure Python client. The real wall is the third layer: the Turnstile token can only be produced on the spot by a real, interactive browser, and it is valid for one use only. There is no shortcut where you âharvest a token in the browser first, then replay it headlessly.â The token is burned after one use; the next request needs a new one, and generating a new token requires a real browser to be present.
So the conclusion is pretty counterintuitive: you cannot bypass it; you have to be âinside.â Instead of forging something that only a browser can produce, just drive a real browser directly.
Solution: Drive Your Own Logged-In Chrome
The web backend uses chrome-use (a browser automation CLI with a Chrome extension) to connect to your real Chrome instance where youâre already logged into chatgpt.com, then generates the image inside a normal conversation. Itâs the same interface, the same cookies, and the same Turnstile context as when you manually type âdraw me a pictureâ in the app.
Full flow:
chatgpt-imagegen --backend web
â
âââ chrome-use connects to your logged-in Chrome and opens https://chatgpt.com/
â (must be a normal conversation; Temporary Chat disables image generation tools)
â
âââ resolves the ChatGPT project (--project, default: imagegen)
â fetches the project list in-page; if missing, POSTs /backend-api/projects to create one
â archives image-generation conversations into this project to avoid polluting main history
â
âââ types the prompt into the input box using real keyboard events
â ChatGPTâs ProseMirror/React input does not accept plain DOM .value= / fill,
â so key-by-key typing must be simulated, otherwise the submitted prompt is empty
â
âââ polls the page: waits for streaming output to finish and for a new <img> resource to stabilize
â
âââ fetches the image bytes in-page (credentials:'include') â base64 â writes to disk
(the signed estuary/content URL is authorized by the browserâs own cookies;
the token never leaves the browser)
A few of these details were learned the hard way.
Directly setting value on the ProseMirror input box, or using an automation toolâs fill, does not work. React does not treat it as user input; you have to send real keyboard events.
To decide that âthe image is ready,â you cannot only look at whether streaming has ended. You also have to wait until the newly appeared <img> resource inside the conversationâs main container has stabilized and its URL stops changing; otherwise you may capture a placeholder image or the image from the previous run.
The image bytes are not downloaded externally with curl either. That image URL is signed and requires cookies to download, so the page itself calls fetch(..., {credentials:'include'}), letting the browser authorize the request with its own session. The token never leaves the browser.
A String of Unavoidable Engineering Pitfalls
Getting from âit runsâ to âit runs reliablyâ involved real battle scars.
The web backend concurrency can only be 1. It shares the same logged-in Chrome. In early versions, concurrent image generation could cross-contaminate outputs (#7, fixed in v0.6.0 by limiting detection to the current conversationâs main container). Also, chatgpt.com applies aggressive page-side rate limits (âToo many requests⊠temporarily limited access to your conversationsâ). So web runs are serialized across processes: extra processes queue on the flock slot. It is safe, but wall-clock time is roughly the sum of serial runs. If you want real parallelism (up to 4), explicitly use --backend codex; the tradeoff is spending Codex quota. This is a quota-saving vs. faster-output tradeoff. The tool does not decide for you; it follows --backend or the default auto.
Rate limits must fail fast, not retry blindly. When the page shows âToo many requests,â the web backend detects the popup and errors immediately. If it happens before submission, auto mode falls back to Codex; if it happens after submission, it stops cleanly without spending money twice. That image may still appear in the conversation later, so check manually.
History is not kept by default. Image-generation conversations are deleted by default with PATCH is_visible:false. They are only temporarily moved into the project as a handoff step, and after the run they leave no trace in your ChatGPT history (--keep-conversation preserves them).
Image-to-image uses the same path. -i/--ref uploads the reference image into the ChatGPT input box and then sends an edit prompt, the same mechanism as manually dragging in an image and asking it to modify it: still subscription-based, still no key required.
Two New Things: Style Presets + Proactively Illustrating Docs
A style preset is a reusable prompt fragment. Save it under a name, then apply it with one parameter:
chatgpt-imagegen "a robot mascot" --style doodle
chatgpt-imagegen style add brand "flat vector, bold shapes, white bg"
chatgpt-imagegen style use brand # set as default; applied automatically after that
Thereâs a built-in doodle style that is intentionally terrible, like something scraped out with a mouse in an old-school drawing program. The âugly-cuteâ illustrations in this repoâs README, including the ones in this article, were generated by the tool itself using that style. Thereâs no default style out of the box; if you donât actively use one, it behaves exactly as before.
Proactive illustration is for AI agents. Once installed as a skill, when an agent writes a blog post, technical proposal, or design doc, it will proactively suggest illustrations and generate them in parallel in the background, instead of waiting for you to ask for images. The degree of parallelism depends on your backend configuration: web runs serially to conserve quota, while codex runs in parallel and consumes quota.
How to Choose Between the Two Backends
| Scenario | Which backend to use | Why |
|---|---|---|
| Laptop/desktop, with Chrome open and logged in | web (default) | Doesnât spend Codex quota; works even with a free account |
| Server / headless agent machine | codex | Thereâs no browser there, and auto will fall back on its own |
| Need truly parallel batch image generation | codex | web is serial; codex supports up to 4-way parallelism, but consumes quota |
By default, auto tries web first and falls back to codex if that fails, which means it saves your Codex quota by default.
When Not to Use This, and Go Straight to the Official API
The subscription channel is not a free version of the API; itâs a different product form with its own boundaries.
If you strictly need quality=high or a transparent background, the subscription cannot provide that. You need to use OPENAI_API_KEY and call /v1/images/generations.
If youâre building an external production service, using a personal subscription to generate images for end users violates the OpenAI Terms and will also burn through your own ChatGPT quota. One more thing: this tool rides on an unpublished internal endpoint. Large-scale abuse is the fastest way to get that opening shut down, and if itâs shut down, everyone loses it.
If you need stable throughput above 10 images per minute, subscription rate limits are tighter than the API.
If you need a team-level, remotely callable HTTP gateway, use the sister project agent-cli-to-api, which exposes the same subscription as an OpenAI-compatible interface.
Getting Started
# Install chrome-use (for the web backend), and connect it to Chrome where youâre logged into chatgpt.com
curl -fsSL https://raw.githubusercontent.com/leeguooooo/chrome-use/main/install.sh | sh
chrome-use extension install # Then install the Chrome extension, restart, and log in to chatgpt.com
# Install the CLI (for the agent; easiest option)
npx skills add leeguooooo/chatgpt-imagegen -g
# Or use it standalone
git clone https://github.com/leeguooooo/chatgpt-imagegen
./chatgpt-imagegen "a watercolor cat on a windowsill" -o cat.png
Not having chrome-use installed is fine: auto automatically falls back to codex, with only a one-line stderr note saying âinstalling chrome-use lets image generation avoid using Codex quota.â
A Few Common Questions
Will this get my account banned? Probably not, but there are red lines. The web backend drives your own logged-in browser and does things you could also do manually, so the traffic is no different from clicking around in the app a few times. The codex backend replays the Codex CLI protocol with a real auth token, so it looks like normal Codex usage. The real risks are two things: first, volume â sustained >10 images/minute, or generating dozens of images in a large fan-out, will hit rate limits, and forcing it long-term is also likely to draw attention; second, using a personal subscription to provide an external image-generation service, which is a clear terms-of-service red line. The tool keeps its footprint low by default: image-generation chats are deleted by default, chats are grouped into a project, rate-limit popups fail fast without retries, and concurrency is capped. Personal local use at normal volume is low-risk, but it sits on unpublished endpoints, so use at your own risk and donât abuse it.
Is chrome-use especially wasteful with tokens? For this tool, the image-generation process does not consume any LLM tokens. The web backend does not have an AI watch the screen and operate the browser step by step; it uses a fixed Python script to call chrome-use with hard-coded steps, with no model inference in between. What really burns tokens is the screenshot-driven approach where every step is fed to a large model. chrome-use itself is the opposite: it uses accessibility-tree snapshots plus compact @eN references, around 200â400 tokens per step, which is much cheaper for an agent than dumping raw HTML or screenshots.
Final Notes
The codex backend replays the official protocol, while the web backend gets the job done in a real browser. The latter takes a more roundabout path, but gives you three things in return: no API key required, no Codex quota consumed, and free accounts can use it too. The hard part is not the Cloudflare edge; it is the Turnstile token that only a real browser can generate and that expires after a single use. Once you recognize that, the solution shifts from âforging itâ to âoperating directly inside it.â
- GitHub: leeguooooo/chatgpt-imagegen
- HTTP gateway sister project: agent-cli-to-api
- skill installation:
npx skills add leeguooooo/chatgpt-imagegen -g
Disclaimer: This tool calls ChatGPTâs internal backend-api endpoint (the same one used by Codex CLI), not a public API with documented guarantees. OpenAI may change or restrict it at any time. Please use it only for personal or local agent use within the scope permitted by the OpenAI Terms of Use, and do not offer commercial image-generation services to others.
Links
- đ§ The tool: chatgpt-imagegen on GitHub â generate images from your ChatGPT subscription, no API key.
- đ More writing: blog.leeguoo.com â I'm Guo Li (leeguoo), a full-stack dev building small AI-agent tools and CLIs.
- đŹ Found it useful? A â on the repo or a follow here means a lot.


Top comments (0)