Muhammad Ali

Posted on Apr 17

I Built a Smart Gemini API Key Manager Because Rate Limits Were Driving Me Crazy

#api #gemini #showdev #tooling

pip install gemini-flux

GitHub: https://github.com/malikasana/gemini-flux

It Started With a Dubbing App

I'm building a video dubbing application. The core of it is simple: take a video transcript, send it to an AI with a large set of instructions, get back a translated version. Do this continuously for every chunk of every video.

I turned to the Gemini API. Free tier. Seemed perfect.

Then I hit this:

429 RESOURCE_EXHAUSTED — You exceeded your current quota.

Fine. I'll just create another API key. Made a second key in the same project. Made another request. Same error.

That's when I learned something that most developers don't know — and it changes everything.

The Thing Most Developers Don't Know

Gemini rate limits are per PROJECT, not per API key.

Multiple keys inside the same project share the exact same quota. Creating 10 keys in one project gives you zero extra capacity. It's completely useless.

So what actually works?

The Trick

Google lets you create up to 10 separate Cloud projects per account. Each project gets its own completely independent quota. So if you create 8 projects and get 1 API key per project, you now have 8 completely independent rate limits.

But there's another limit — 10 projects per Google account. What if you need more?

Use a second Google account. Each account gets its own 10 projects independently. So:

Account 1 → 6 projects → 6 independent keys
Account 2 → 2 projects → 2 independent keys
Total → 8 keys, 8 independent quotas

With 8 keys on the free tier:

gemini-2.5-flash:      250 RPD × 8 = 2,000 requests/day
gemini-2.5-flash-lite: 1000 RPD × 8 = 8,000 requests/day
Total: 10,000+ requests/day — completely free

Now the next problem: how do you manage all these keys intelligently? Which key do you use? When did you last use it? Is it cooled down? Has it hit its daily limit?

That's what I built gemini-flux to solve.

Why Dumb Rotation Doesn't Work

Most people who figure out the multi-project trick write a simple round-robin rotator — use key 1, then key 2, then key 3, rotate every 30 seconds.

The problem? 30 seconds is completely arbitrary. It ignores the actual math behind rate limits.

Gemini's free tier has a 250,000 tokens per minute (TPM) limit per project. The actual cooldown depends entirely on how many tokens you sent:

cooldown = token_count / tokens_per_minute

1M token request:   1,000,000 / 250,000 = 4 minutes cooldown
500k token request:   500,000 / 250,000 = 2 minutes cooldown
100k token request:   100,000 / 250,000 = 24 seconds cooldown
10k token request:     10,000 / 250,000 = 2.4 seconds cooldown

A dumb rotator with 30 second intervals will:

Make you wait unnecessarily on small requests (waste time)
Send too early on large requests (hit rate limits anyway)

The right approach is to calculate the exact cooldown per request and schedule accordingly.

With 8 keys the worst case interval becomes:

interval = cooldown / n_keys

1M token request: 240s / 8 = 30 seconds between requests
10k token request: 2.4s / 8 = 0.3 seconds — nearly instant!

This is the math gemini-flux is built on.

How gemini-flux Works

Token counting (FREE)

Before every request, gemini-flux counts tokens using Google's free count_tokens API — costs zero quota units.

Sliding window per key

Each key maintains a 60-second sliding window of token usage. The scheduler knows exactly how much capacity each key has right now, not just a vague "is it cooling down" status.

Pick the best key

For each incoming request:

Find key with enough capacity RIGHT NOW → send immediately
No key ready → calculate exact seconds until soonest available key → wait precisely that long

No wasted time. No unnecessary delays.

Model exhaustion chain

When a model's daily quota hits on a key, gemini-flux moves to the next model automatically — not because it failed, but because it's exhausted for the day:

1. gemini-2.5-pro                → 100 RPD per key
2. gemini-2.5-flash              → 250 RPD per key ← main workhorse
3. gemini-2.5-flash-lite         → 1000 RPD per key
4. gemini-3.1-pro-preview        → newest pro generation
5. gemini-3-flash-preview        → newest flash generation
6. gemini-3.1-flash-lite-preview → newest lite generation

Smart policy fetcher

On startup, gemini-flux sends 1 request to Gemini asking about its own free tier limits. It parses the response and uses those numbers for all internal math. Cached for 7 days. If Google changes limits → gemini-flux catches it automatically on next refresh.

Key validation on startup

Every key is validated before use. Invalid keys are removed. Exhausted keys are flagged. You see a full health report before any request is sent.

Daily reset

All exhausted keys reset automatically at midnight Pacific Time.

Total Free Capacity (8 keys)

Model	RPD per key	x 8 keys	Daily total
gemini-2.5-pro	100	x 8	800/day
gemini-2.5-flash	250	x 8	2,000/day
gemini-2.5-flash-lite	1000	x 8	8,000/day
Preview models	varies	x 8	bonus!
TOTAL			10,800+/day

All free. No credit card.

Using It

Install:

pip install gemini-flux

Basic usage:

from gemini_flux import GeminiFlux

flux = GeminiFlux(
    keys=["key1", "key2", ..., "key8"],
    mode="both",
    log=True
)

response = flux.generate("Translate this transcript to Spanish...")
print(response["response"])
# {
#   "response": "...",
#   "key_used": 3,
#   "model_used": "gemini-2.5-flash",
#   "tokens_used": 45231,
#   "wait_applied": 1.8,
#   "retried": False
# }

Keys via .env (no hardcoding):

GEMINI_KEY_1=AIza...
GEMINI_KEY_2=AIza...
...
GEMINI_KEY_8=AIza...
GEMINI_MODE=both
GEMINI_LOG=true

Docker microservice:

docker build -t gemini-flux .
docker run -p 8000:8000 --env-file .env gemini-flux

Kaggle:

!pip install gemini-flux
from gemini_flux import GeminiFlux
flux = GeminiFlux(keys=["key1", "key2", ...])

What the Console Looks Like

==================================================
  gemini-flux 🔥  Starting up with 8 keys
==================================================

[STARTUP] Checking 8 keys...
[KEY 1] ✅ Healthy
[KEY 2] ✅ Healthy
[KEY 3] ⚠️  Exhausted — will reset at midnight PT
[KEY 4] ❌ Invalid — removed from pool
[STARTUP] Pool ready: 6 healthy, 1 exhausted, 1 invalid

[MODELS] Exhaustion chain:
  1. gemini-2.5-pro
  2. gemini-2.5-flash
  3. gemini-2.5-flash-lite
  ...

[STARTUP] Dynamic interval: 240s / 6 keys = 40.0s (worst case)
[STARTUP] ✅ gemini-flux ready! Mode: BOTH

[REQUEST] Incoming — 450,000 tokens detected
[SCHEDULER] Key #2 selected — sending via gemini-2.5-flash
[RESPONSE] ✅ Success via Key #2 (gemini-2.5-flash)
[KEY 2] gemini-2.5-flash: 1/250 requests used today

Runtime Controls

flux.set_mode("flash_only")    # change mode anytime
flux.disable_key(3)            # disable a specific key
flux.enable_key(3)             # re-enable it
flux.refresh_policy()          # force re-fetch Gemini limits
flux.status()                  # see all key statuses + usage

Who Should Use This

Building translation, dubbing, or transcription pipelines
Processing large documents at scale
Running RAG systems with high request volume
Any AI application that needs continuous Gemini access on a budget
Anyone who keeps hitting 429 errors and doesn't want to pay yet

What's Next

Async support for parallel requests
Per-key usage dashboard
Support for other providers (OpenAI, Anthropic) with the same scheduling logic

Try It

pip install gemini-flux

GitHub: https://github.com/malikasana/gemini-flux
PyPI: https://pypi.org/project/gemini-flux

If this helped you understand the trick or saved you from rate limit hell, drop a star ⭐

Built by Muhammad Ali — malikasana2810@gmail.com

DEV Community