Gautam Vhavle

Posted on Feb 19

I Reverse Engineered ChatGPT's UI Into an OpenAI Compatible API and Here's Why You Shouldn't

#ai #opensource #python #openai

A weekend project that wasn't supposed to work. But it did. And now we need to talk about it.

Let me be upfront: this project exists for educational purposes only. What you're about to read shouldn't be replicated in production, shouldn't be used to skirt terms of service, and honestly shouldn't work as well as it does.

But here we are.

I took ChatGPT's entire web frontend the same UI you and I use every day and reverse-engineered it into my backend. Browser automation, stealth patches, Cloudflare bypass, clipboard hijacking, DOM scraping, virtual displays. The result? A fully OpenAI-compatible REST API that any SDK, any LangChain agent, any curl command can hit as a drop-in replacement. Tool calling, image generation, file uploads, vision, all of it using your own account and subscription with the latest Model.

The project is called CatGPT-Gateway I'll attach the Github link below for reference. I need to tell you how it works.

because understanding what's possible is the first step to building better defenses against it.

⚠️ Before we go any further: This project is built purely for educational purposes and cybersecurity research. I'm not a hacker. This isn't a bug. I'm just a curious developer who wanted to understand the limits of browser automation. Stick around, we'll talk ethics at the end.

The Origin Story: A Developer's Frustration

Here’s what happened.....

I signed into OpenAI using my regular account while building an application that required API calls for testing. I needed to review pricing, generate API keys, and wire everything into a fairly complex LangChain workflow I was developing.

At the same time, I was working on another project that required repeated image generation. Not just one or two images, several iterations for testing and refinement.

So I paused.

ChatGPT could already generate images directly inside the chat interface. It could also produce structured responses to my prompts with context awareness.

That raised a technical question.

What if the chat interface itself could be automated? Not as a workaround, and not to avoid API usage or pricing, but as a pure engineering experiment. Could a conversational UI be programmatically driven? Could it behave like an interaction layer that mimics API semantics? What architectural differences would emerge between UI automation and formal API integration?

The curiosity wasn’t financial. It was structural.

I wanted to understand system behavior, automation mechanics, and the boundaries between user-facing interfaces and backend-accessible endpoints.

It was supposed to be a quick weekend hack, the kind you start at 11 PM on a Friday, fully expecting it to collapse within the hour.

It didn’t. And that’s where things got interesting.

Every great terrible idea starts at 11 PM with too much coffee.

How Is This Even Possible?

Let me break it down. The architecture is simple on paper, absolutely unhinged in execution.

External Clients (curl, Python, LangChain)
          │
          ▼
    ┌───────────┐
    │  FastAPI  │ ← OpenAI-compatible API (port 8000)
    │  Server   │
    └─────┬─────┘
          │
          ▼
   ┌──────────────┐
   │ ChatGPTClient│ ← Sends messages, waits for responses,
   │              │   extracts text via clipboard/DOM
   └──────┬───────┘
          │
          ▼
  ┌───────────────┐
  │BrowserManager │ ← Patchright (stealth Playwright fork)
  │+ Stealth      │   controlling a real Chrome instance
  │+ Human Sim    │   with anti-detection patches
  └───────┬───────┘
          │
          ▼
  ┌───────────────┐
  │  Xvfb + VNC   │ ← Virtual display + VNC viewer
  │  (port 6080)  │   (the magic trick — more on this later)
  └───────────────┘

Here's what's happening:

You hit the API — standard OpenAI-format request, POST /v1/chat/completions
FastAPI receives it and hands it to the ChatGPT client
The client literally types your message into ChatGPT's chat input box using a headful Chrome browser
It waits for the response by watching for ChatGPT's copy button to appear (that's how it knows the response is complete — clever, right?)
It clicks the copy button, grabs the text from the clipboard, and returns it as a proper OpenAI-format JSON response

Your LangChain app, your agent framework, your curl script — they all think they're talking to OpenAI. They're actually talking to a cat controlling a browser.

The Cloudflare Boss Fight

Okay, here's where it gets interesting. You can't just puppeteer.launch() and waltz into ChatGPT. OpenAI uses Cloudflare's human verification — and modern Cloudflare is really good at detecting bots.

I want to genuinely appreciate OpenAI and Cloudflare here. Their security gave me so many blockers and roadblocks that I almost gave up multiple times. We're talking:

navigator.webdriver detection — the first thing any anti-bot checks
Canvas fingerprinting — headless browsers have different rendering signatures
WebGL fingerprinting — GPU-level identification
Plugin enumeration — headless Chrome reports different plugins
Behavioral analysis — instant typing? Pixel-perfect clicks? That's not human

Every single one of these had to be defeated. Not by exploiting a vulnerability — but by making the browser more human.

Actual footage of me vs. Cloudflare's bot detection (dramatized).

Here's what CatGPT does to pass as human:

Stealth Patches — Using Patchright (a Playwright fork built for stealth) combined with playwright-stealth, CatGPT patches the browser fingerprint at every level. navigator.webdriver returns false, canvas renders match real browsers, WebGL reports are spoofed.

Human Simulation — Messages aren't typed character-by-character (that's detectable too). Instead, CatGPT uses clipboard-paste injection with randomized delays. Mouse movements happen during "thinking" pauses — idle cursor drifts to random positions with 5-15 intermediate steps, like a bored human.

Viewport Jitter — Every launch randomizes the viewport by ±20px from the base 1280×720. No two sessions have exactly the same fingerprint.

The Docker DNS Bug Discovery — This one's wild. I found that calling Playwright's add_init_script() — even with just a console.log("") — completely breaks Chrome's DNS resolution inside Docker containers. Every navigation after that returns ERR_NAME_NOT_RESOLVED. The fix? Inject all stealth JavaScript via page.evaluate() at runtime and re-inject on every framenavigated event instead. Days of debugging. One line fix. Classic.

The VNC Trick — Schrödinger's Browser

This is the part I'm most proud of. This is what makes CatGPT fundamentally different from every other browser automation project I've seen.

The problem: Cloudflare can detect headless Chrome. So you need headful Chrome (a real browser window). But servers don't have monitors. So how do you run a "headed" browser on a headless server?

The solution: You give it a fake monitor.

Chrome (headful mode)
    renders to →  Xvfb (Virtual Framebuffer — fake display :99)
                      captured by →  x11vnc (VNC server)
                                        served via →  noVNC (WebSocket, port 6080)
                                                          viewable in →  Your browser

This was the most shocking part for me too.

Here's the beautiful paradox:

To Cloudflare: It's a real, headed, GPU-rendering Chrome browser with a proper display. Nothing to see here, human user browsing normally.
To your server: It's a headless process. No monitor needed. Runs in Docker. Fully automated.
To you: You can open http://localhost:6080 in your browser and watch the automation happen in real-time through VNC. Debug visually. Handle CAPTCHAs manually if needed. Sign in through the actual browser.

It's headed AND headless at the same time. Schrödinger's browser.

Headed for Cloudflare. Headless for your server. Both at the same time.

Four processes run in the Docker container, managed by supervisord:

Process	What It Does	Port
Xvfb	Virtual framebuffer — the "fake monitor"	Display `:99`
x11vnc	VNC server — captures the virtual display	5900
noVNC	WebSocket bridge — makes VNC browser-accessible	6080
FastAPI	The actual API server	8000

One docker compose up and you've got the whole stack.

OpenAI-Compatible Endpoints — Drop-In Replacement

This is where the software engineer in me got excited. The API isn't just "some endpoint that returns text." It's a full OpenAI-compatible API. Pydantic schemas matching OpenAI's spec exactly. You can point any OpenAI SDK at it and it just works.

# It's literally the same format as OpenAI's API
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token" \
  -d '{
    "model": "catgpt-browser",
    "messages": [{"role": "user", "content": "Explain quantum computing in 3 sentences"}]
  }'

What you get back:

{
  "id": "chatcmpl-abc123...",
  "object": "chat.completion",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum computing uses quantum bits (qubits)..."
    },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 12, "completion_tokens": 45, "total_tokens": 57 }
}

Standard OpenAI response format. id, choices, usage — everything.

Tool / Function Calling

Yeah, it supports tool calling too. Since we're automating a browser (not hitting an API with native function calling), CatGPT uses a clever technique: it builds a system prompt with tool definitions and few-shot examples, instructing ChatGPT to output structured JSON. Then it parses that JSON with regex and returns proper tool_calls in the response.

# Works with LangChain / LangGraph out of the box
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:8000/v1",
    api_key="your-token",
    model="catgpt-browser"
)

# Bind tools, create agents — it all works
llm_with_tools = llm.bind_tools([get_weather, calculate])

Any modern agent framework — LangChain, LangGraph, CrewAI, AutoGen — just point the base_url at CatGPT and you're golden.

Wait, It Can DALL-E Too?!

This was the feature that started the whole project, remember? I needed image generation for testing, and there was no straightforward API button for developers who sign in with OpenAI.

So how does CatGPT handle POST /v1/images/generations?

Your request comes in — "prompt": "A sunset over mountains"
CatGPT sends it to ChatGPT as a chat message, with hints about size and quality
ChatGPT invokes DALL-E internally and renders the image in the chat
The detector watches the DOM for img[alt="Generated image"] or images inside div[id^="image-"] containers
The image is downloaded using the browser's own fetch() API — this is key because it preserves the authentication cookies. No separate auth needed.
Returned to you as either base64 JSON or a URL, matching OpenAI's image response format

# Generate an image — same format as OpenAI's DALL-E API
curl -X POST http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token" \
  -d '{
    "prompt": "An adorable orange tabby kitten astronaut floating in space",
    "size": "1024x1024",
    "quality": "hd"
  }'

The response includes a revised_prompt — the actual prompt DALL-E used internally — and the image data. Works with the OpenAI Python SDK too:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="your-token")
response = client.images.generate(
    model="dall-e-3",
    prompt="A cat hacking into a computer, cyberpunk style",
    size="1024x1024"
)

I never thought this would actually work. Detecting dynamically-generated images in someone else's DOM, downloading them through the browser's auth context, and returning them in a standard API format? It felt impossible until it wasn't.

The Cyberpunk TUI — Because CLIs Should Look Cool

I couldn't just ship an API. I had to build a terminal UI. You know, for vibes.

CatGPT comes with a full-screen Textual-based TUI — a cyberpunk-themed chat interface running right in your terminal:

GitHub Dark color scheme — deep #0d1117 backgrounds, #58a6ff accent blues, #3fb950 greens
ASCII cat splash screen — because every good project needs one
Color-coded messages — blue borders for your messages, green for the assistant, purple for images
Rich Markdown rendering — code blocks, tables, lists, all rendered beautifully
Slash commands — /new, /threads, /images, /status, /help
Keyboard shortcuts — Ctrl+N for new chat, Ctrl+T for threads

CATGPT TUI: Actual functional conversation.

The TUI: because talking to AI should feel like you're in a sci-fi movie.

The Simplicity — Step by Step

Here's what amazes me about how this came together. Despite all the complexity under the hood — stealth patches, virtual displays, DOM observers, clipboard extraction — using it is dead simple:

# 1. Clone it
git clone https://github.com/GautamVhavle/CatGPT-Gateway.git && cd CatGPT-Gateway

# 2. Start everything
docker compose up --build -d

# 3. Sign in once via VNC
#    Open http://localhost:6080 → Log into ChatGPT → Done

# 4. Hit the API
curl -H "Authorization: Bearer dummy123" \
     http://localhost:8000/v1/chat/completions \
     -d '{"model":"catgpt-browser","messages":[{"role":"user","content":"Hello!"}]}'

Four steps. One Docker container. No API keys. Your browser session persists across restarts via a Docker volume.

The project is structured cleanly too — every component has its own module:

src/browser/ — Browser lifecycle, stealth, human simulation
src/chatgpt/ — ChatGPT client, response detection, image handling
src/api/ — FastAPI routes, OpenAI-compatible schemas
src/cli/ — Terminal UI
src/selectors.py — All DOM selectors centralized in ONE file. When ChatGPT updates their UI, you update one file. That's it.

The selector fallback system is one of my favorite design decisions. Every selector (chat input, send button, copy button, etc.) is a list of fallbacks. If ChatGPT changes a data-testid, the next selector in the list catches it. Resilient by design.

The Response Detection — Three Strategies Deep

How do you know when ChatGPT is done generating its response? This was one of the hardest problems.

CatGPT uses a three-strategy detection system:

Copy Button Detection (primary) — ChatGPT only shows the copy button after the full response is generated. Count the copy buttons before sending, wait for a new one. Elegant and reliable.
Stop Button Detection (fallback) — Watch the stop/generation button. When it appears, streaming started. When it disappears, response is done.
Text Stability (last resort) — Poll the response text every second. If it's identical for 5 consecutive polls, we're done.

Three layers of fallback. Because when you're automating someone else's frontend, you plan for everything to break.

Three strategies. Because one is never enough when you're parsing someone else's DOM.

Let's Talk About Ethics

Okay, real talk time.

This project is for educational purposes only. Full stop.

I built CatGPT because I was genuinely curious about the limits of browser automation. Could you control a modern web app's frontend so completely that it becomes your backend? What security measures exist to prevent this? How good is Cloudflare's bot detection really?

The answers are fascinating — and that's the whole point. This is a learning exercise, not a production tool.

Here's what I want to be crystal clear about:

🚫 Don't use this in production. It's a single-browser, single-session gateway. There's an asyncio.Lock() serializing every request. It's not built for scale and it's not meant to be.
🚫 Don't use this to circumvent OpenAI's terms of service. Respect the platform you're using.
🚫 Don't use this to build commercial products. Get a proper API key for that.
✅ Do use this to learn about browser automation, stealth techniques, API design, and cybersecurity.
✅ Do use this to understand how modern anti-bot systems work and why they're important.
✅ Do use this for testing and prototyping when you need a quick way to interact with ChatGPT programmatically.

I'm not a hacker. I didn't find a security vulnerability. This isn't a bug — ChatGPT works exactly as designed. I'm just controlling a browser, the same way any user does. The difference is, my "user" is a Python script.

OpenAI's security is legit. Cloudflare's human verification gave me more headaches than any coding challenge I've faced. The fact that I had to build viewport jitter, human-like mouse movement, clipboard-based text injection, and a triple-layer detection system just to reliably interact with a website — that's a testament to how good their security is.

It's Open Source — Come Build With Me

CatGPT-Gateway is fully open source on GitHub:

👉 github.com/GautamVhavle/CatGPT-Gateway

The codebase is clean, well-documented, and modular. Want to contribute? Here's what I'd love help with:

More resilient selectors as ChatGPT's UI evolves
Streaming support (stream: true in the API)
Multi-session support — multiple browser instances
Better error recovery — auto-retry on Cloudflare challenges
Tests — there's always room for more tests

I'm actively maintaining this project. If you have feedback, ideas, bug reports, or just want to say hi — open an issue, submit a PR, or drop a comment below.

This started as a Friday night "what if?" and turned into one of the most fun projects I've ever built. If you're a developer who's curious about how far browser automation can go, or a cybersecurity enthusiast who wants to understand the cat-and-mouse game between bots and detection systems — this project is for you.

Star it. Fork it. Break it. Fix it. Let's learn together.

CatGPT-Gateway is out. Star the repo. Don't do anything I wouldn't do. 😼

If you made it this far — thanks for reading. Drop a 🦄 or a 💬 and let me know what you think. If you actually run it and your ChatGPT session starts talking to your LangChain agents... don't blame me. I warned you. Also Congratulations you can run your OpenAI Endpoint with any Agentic Framework like LangChain, AutoGen or CrewAI.

Top comments (1)

aayush goyal • Feb 19

Intresting🫡

Some comments may only be visible to logged-in visitors. Sign in to view all comments.