DEV Community

Muhammad Ali
Muhammad Ali

Posted on

I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key

I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key

Every developer building with AI hits the same wall eventually.

You're prototyping something. It's working. Then the bill arrives — or worse, the rate limit. You stare at 429 RESOURCE_EXHAUSTED and think: there has to be another way.

There is. And it's sitting right on your desktop.


The Insight Nobody Talks About

Every major AI company gives you free access through their UI. Claude has a desktop app. ChatGPT has a desktop app. DeepSeek and Gemini run in your browser. You log in, you type, you get a reply. Completely free.

So I asked myself: why am I paying for API access when the same model is available for free one layer above?

The answer: because there's no programmatic way to use it.

So I built one.


What AI Gateway Does

AI Gateway is a local Flask server that sits between your application and the AI desktop apps on your machine. You send it an HTTP request. It controls the desktop app using OS-level automation, types your query, waits for the reply, extracts it, and returns it as JSON.

Your App / Terminal / Browser
        ↓
POST http://localhost:5000/ask
        ↓
AI Gateway Server (Flask + Queue)
        ↓
Auto-detects OS → routes to correct handler
        ↓
Controls AI Desktop App (Claude / ChatGPT / DeepSeek / Gemini)
        ↓
Returns reply as JSON
Enter fullscreen mode Exit fullscreen mode

No API key. No billing. No rate limits per token. Just your existing free account doing what it already does — except now your code can talk to it.


How to Use It

Setup (5 minutes)

git clone https://github.com/malikasana/ai-gateway
cd ai-gateway
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
copy .env.example .env
python server.py
Enter fullscreen mode Exit fullscreen mode

Server starts at http://localhost:5000.

Make sure your AI apps are open and logged in before starting.

Send a query from Python

import requests

response = requests.post("http://localhost:5000/ask", json={
    "query": "Explain recursion in one paragraph",
    "ai": "claude",
    "mode": "incognito"
})

print(response.json()["reply"])
Enter fullscreen mode Exit fullscreen mode

Works with claude, chatgpt, deepseek, and gemini. Switch the ai field and you're talking to a different model.

Response format

{
  "status": "ok",
  "ai": "claude",
  "mode": "incognito",
  "query": "Explain recursion in one paragraph",
  "reply": "Recursion is...",
  "chars": 240
}
Enter fullscreen mode Exit fullscreen mode

Browser UI

Open http://localhost:5000 in your browser. There's a built-in UI — select your AI, type your query, hit Send. Works on mobile too if you expose it via ngrok.

Public access via ngrok

ngrok http 5000
Enter fullscreen mode Exit fullscreen mode

Now you can hit your local gateway from your phone, a remote server, anywhere.


The Architecture

The project is small but deliberately structured:

ai-gateway/
├── server.py              # Flask server, /ask and /health endpoints
├── queue_manager.py       # One request at a time, OS detection, routing
├── templates/
│   └── index.html         # Browser UI
└── instances/
    ├── claude/windows/incognito.py
    ├── chatgpt/windows/incognito.py
    ├── deepseek/windows/incognito.py
    └── gemini/windows/incognito.py
Enter fullscreen mode Exit fullscreen mode

Each AI has its own handler. The queue manager ensures requests are processed one at a time — because you can't have two things typing into Claude simultaneously. OS detection routes to the right handler automatically so the same API call works regardless of platform (Mac support coming).


What I Learned Building This

Desktop automation is fragile but powerful. Every AI app has its own quirks. DeepSeek needed a Copy button workaround for reliable reply extraction. Gemini's Chrome automation behaves differently from the desktop apps. Each handler required its own approach.

Queue management matters more than you think. Early versions had race conditions where two simultaneous requests would collide mid-automation. The queue enforces serial execution cleanly.

The free tier is genuinely generous. During development and testing I sent hundreds of queries across all four models. Zero cost. The free tiers from these companies are substantial if you use them through the UI rather than the API.


Honest Limitations

This isn't a production API replacement. Be clear-eyed about what it is:

  • One request at a time — queue-based, not concurrent
  • Requires desktop apps open — it's automation, not an API call
  • Windows only right now — Mac support is in progress
  • No conversation memory yet — each query is stateless (stateful mode coming)
  • Fragile to UI changes — if Claude updates their desktop app layout, the handler may break

If you need high-throughput production AI calls, use the official APIs. This is for developers who want to prototype, experiment, build side projects, or simply can't afford API costs right now.


Current Status and Roadmap

✅ Claude — Windows incognito mode

✅ ChatGPT — Windows incognito mode

✅ DeepSeek — Windows incognito mode

✅ Gemini — Windows incognito mode

⬜ Mac support for all AIs

⬜ Stateful mode (persistent conversations)

⬜ Browser UI improvements


Get It

GitHub: github.com/malikasana/ai-gateway


Top comments (0)