We spent a week debugging Gemini API in production. Here's what we found.

#gemini #typescript #node #opensource

We've been running Google Gemini inside a multi-agent research pipeline at Emotix for a few months. It's fast, cheap, and surprisingly capable — but production exposed three reliability issues that cost us real debugging time. This is what we found and how we fixed it.

Problem 1: MALFORMED_FUNCTION_CALL

This one hurt the most.

Gemini's FunctionCallingMode.ANY is supposed to guarantee a tool call. It doesn't. When tool arguments contain large strings — typically 1000+ characters — Gemini returns finishReason: MALFORMED_FUNCTION_CALL with no output. No error message, no partial result. Just silence.

We dug into Google's issue tracker and found it's a confirmed P2 bug:

Still open. No ETA.

Fix 1 — Prompt instruction (~90% reduction)

Adding this to the prompt before every tool call cuts the error dramatically:

"Ensure all string values in function call arguments are properly JSON-escaped (newlines as \n, quotes as \", backslashes as \)."

We tested this over 500+ production calls. It's not perfect but it handles the vast majority of cases.

Fix 2 — Structured output fallback (100% elimination)

When function calling still fails after retries, switch to responseMimeType: 'application/json' + responseSchema. This bypasses the function-calling code path entirely. No MALFORMED possible.

// Instead of function calling:
const result = await model.generateContent({
  generationConfig: {
    responseMimeType: 'application/json',
    responseSchema: yourToolSchema,
  }
});

The tradeoff: you lose the structured function-call interface and parse the JSON yourself. Worth it for reliability.

Problem 2: 429s with no backoff

The free tier is 10 RPM. The SDK throws on 429 but gives you nothing to work with — no retry logic, no backoff, no way to know how backed up the queue is.

We built an adaptive token-bucket rate limiter:

import { GeminiRateLimiter } from 'gemini-heal';

const limiter = new GeminiRateLimiter({ rpm: 60 });

// Before every Gemini call:
await limiter.acquire();

// When you get a 429:
limiter.reportRateLimit();

// Circuit breaker — skip Gemini when queue is too deep:
if (limiter.shouldSkip()) {
  // fall back to another model
}

It automatically halves RPM on each 429 (floor: 2 RPM) and recovers by +2 RPM every 60 seconds of clean traffic. The shouldSkip() circuit breaker lets you route to a fallback model instead of waiting.

Problem 3: JSON wrapped in markdown blocks

This one is small but annoying. Even with responseMimeType: 'application/json', Gemini sometimes returns:

{"key": "value"}

instead of plain JSON. It breaks JSON.parse silently if you don't handle it.

import { stripMarkdownCodeBlock } from 'gemini-heal';

const clean = stripMarkdownCodeBlock(geminiResponse);
const data = JSON.parse(clean);

We open-sourced the fixes

After patching these issues across multiple services we extracted everything into a library called gemini-heal.

npm install gemini-heal @google/generative-ai

It includes:

GeminiRateLimiter — adaptive token-bucket with circuit breaker
GeminiClient — completion wrapper with rate limiting and cost tracking
ToolCaller — forced tool calling with MALFORMED retry + structured output fallback
Utility helpers for 429 detection and markdown stripping

Zero dependencies. TypeScript. MIT license.

→ GitHub: https://github.com/emotixco/gemini-heal

If you're running Gemini in production and hitting any of these, hope it saves you some time. And if you've found other Gemini quirks worth handling, open an issue — we'd like to keep adding to it.