Marcus Rowe

Posted on Apr 16 • Originally published at techsifted.com

Google Gemini API Not Working: Quota Errors, Auth Issues and Rate Limits Explained

#googlegemini #geminiapi #apierrors #googleai

The Gemini API has two variants (AI Studio and Vertex AI), overlapping model naming between generations, and error messages that aren't always clear about what's actually wrong. If you're here because your Gemini API integration broke, you're probably dealing with one of five specific issues.

This guide is for developers. If you're looking for Gemini consumer product issues, that's the Gemini not working guide.

The Two Gemini APIs — Know Which One You're Using

Before troubleshooting: are you using the Google AI Studio API (generativelanguage.googleapis.com) or the Vertex AI Generative AI API (aiplatform.googleapis.com)?

They have different:

Base URLs
Authentication methods (API key vs. OAuth/service account)
Model name formats
Quota systems

If you're using a simple API key and the Python SDK with genai.configure(api_key=...), you're using the AI Studio API. If you're using vertexai.init(project=..., location=...), you're on Vertex AI. Troubleshooting steps differ.

Error: 429 RESOURCE_EXHAUSTED

The most common Gemini API error for active developers.

What the error response actually contains:

{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED",
    "details": [...]
  }
}

The details array is where the useful information is. It typically specifies:

Which quota was hit (requests per minute, tokens per minute, or daily limit)
The current quota value
Sometimes how long to wait before retrying

Two different quota types, same error code:

RPM (Requests Per Minute): You sent too many API calls in a 60-second window. Free tier AI Studio API is 15 RPM for Gemini 1.5 Pro, 60 RPM for Gemini 1.5 Flash.

TPM (Tokens Per Minute): Your requests collectively included too many tokens — either in prompts, context, or outputs. A single large request (e.g., a 100K token document) can hit TPM without hitting RPM.

Fixes:

Implement exponential backoff. Don't retry immediately on 429. Wait 1 second, retry. If still 429, wait 2 seconds, then 4, then 8. Most 429s from rate limiting clear within 60 seconds.

import time

def call_with_retry(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
                wait = 2 ** attempt
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")

Reduce request frequency. Add time.sleep(1) between API calls in loops. Aggressive batching against free tier quotas will always 429.
Switch to Flash for lower-stakes calls. Gemini 1.5 Flash has higher free tier RPM/TPM quotas than Gemini 1.5 Pro. Use Flash for chunking/preprocessing calls and reserve Pro for the final output.
Upgrade to paid tier. Free tier quotas are for development, not production workloads. Enabling billing in Google Cloud dramatically increases quotas.

Error: PERMISSION_DENIED

This one has multiple causes and the error message isn't always specific.

Cause 1: API key restrictions. If you created your API key with referrer or IP restrictions in Google Cloud Console, those restrictions are enforced on every request. A key that works in AI Studio (because AI Studio is whitelisted) can fail in your application server.

Fix: In Google Cloud Console > Credentials, find your API key, click Edit, and check the "Application restrictions" section. For development, set to "None." For production, set specific IP addresses or referrer URLs that match your deployment.

Cause 2: API not enabled for your project. The Generative Language API (for AI Studio) or Vertex AI API (for Vertex) must be explicitly enabled for your Google Cloud project.

Fix: Go to Google Cloud Console > APIs & Services > Library. Search "Generative Language API" or "Vertex AI API" and enable it for your project.

Cause 3: Service account missing IAM roles (Vertex AI). For Vertex AI requests using service account auth, the service account needs the "Vertex AI User" IAM role at minimum.

Fix: IAM & Admin > IAM > find your service account > add "Vertex AI User" role.

Cause 4: Wrong project. Your API key is from Project A but your billing/quota is set up in Project B.

Fix: In Google Cloud Console, confirm the project shown in the header matches the project associated with your API key.

Error: Model Not Found — The Gemini 1.5 vs 2.0 Naming Confusion

This trips up a lot of developers who set up a Gemini integration in 2024 and haven't updated their model strings.

The model name history:

Old name	Status	What to use instead
`gemini-pro`	Legacy (maps to old model)	`gemini-1.5-pro` or `gemini-2.0-flash`
`gemini-pro-vision`	Deprecated	`gemini-1.5-pro` (handles vision natively)
`gemini-ultra`	Never publicly available via API	Use `gemini-1.5-pro` or `gemini-2.0-pro-exp`
`gemini-1.0-pro`	Legacy	`gemini-1.5-pro`

Current model names (2026):

gemini-1.5-pro — 1M context window, best for complex tasks
gemini-1.5-flash — Faster and cheaper, great for high-volume
gemini-2.0-flash — Latest flash model, improved reasoning
gemini-2.0-pro-exp — Experimental Pro, not stable for production

Fix: Update your model string to a current, explicit version name. Don't use gemini-pro — it's either deprecated or maps to legacy behavior depending on when it was last updated.

Error: API Key Scope Issues

Scenario: Your API key works for some models but not others, or works in one project context but not another.

API key project binding. AI Studio API keys are bound to a Google account or Google Cloud project. If you're calling gemini-2.0-flash but your key is on a project where that model isn't yet available or enabled, you'll get a permission or model-not-found error.

The experimental model access issue. gemini-2.0-pro-exp and other experimental models aren't available to all API keys. You need an API key from a project that has been granted early access.

Fix:

Confirm which models are available to your specific API key by calling the List Models API:

   GET https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_KEY

The response lists every model your key can access. If a model isn't in this list, your key doesn't have access to it.

For experimental model access, enroll your project in Google's early access program via AI Studio.

Error: Context Length Exceeded

What you see: INVALID_ARGUMENT with a message about token count exceeding the model's limits.

Gemini 1.5 Pro has a 1M token context window; Flash has the same. This is huge — but it's possible to exceed it with very large documents or accumulated conversation history.

Fixes:

Count tokens before sending with the countTokens API endpoint.
For very long documents, chunk them and process sequentially rather than sending the entire document in one request.
For multi-turn conversations, trim conversation history by summarizing older turns instead of sending the full history every time.

Debugging Tip: Always Check the Full Error Response

Gemini API errors contain detailed structured information that gets swallowed when you only look at the HTTP status code. Log the full response body, not just the status.

import google.generativeai as genai

try:
    response = model.generate_content(prompt)
except Exception as e:
    print(f"Full error: {e}")
    # For genai SDK, also check e.args for structured error data

The error details — especially for 429 and PERMISSION_DENIED — tell you exactly what's wrong and often what to do about it.

For an overview of Gemini's capabilities (not the API specifically), the Google Gemini Review covers what the model does well and where it falls short. And if you're choosing between Gemini and OpenAI APIs for a new project, the architecture and pricing differences between them are significant — that comparison is worth reading before committing to either.