We've been running Google Gemini inside a multi-agent research pipeline at Emotix for a few months. It's fast, cheap, and surprisingly capable — but production exposed three reliability issues that cost us real debugging time. This is what we found and how we fixed it.
Problem 1: MALFORMED_FUNCTION_CALL
This one hurt the most.
Gemini's FunctionCallingMode.ANY is supposed to guarantee a tool call. It doesn't. When tool arguments contain large strings — typically 1000+ characters — Gemini returns finishReason: MALFORMED_FUNCTION_CALL with no output. No error message, no partial result. Just silence.
We dug into Google's issue tracker and found it's a confirmed P2 bug:
Still open. No ETA.
Fix 1 — Prompt instruction (~90% reduction)
Adding this to the prompt before every tool call cuts the error dramatically:
"Ensure all string values in function call arguments are properly JSON-escaped (newlines as \n, quotes as \", backslashes as \)."
We tested this over 500+ production calls. It's not perfect but it handles the vast majority of cases.
Fix 2 — Structured output fallback (100% elimination)
When function calling still fails after retries, switch to responseMimeType: 'application/json' + responseSchema. This bypasses the function-calling code path entirely. No MALFORMED possible.
// Instead of function calling:
const result = await model.generateContent({
generationConfig: {
responseMimeType: 'application/json',
responseSchema: yourToolSchema,
}
});
The tradeoff: you lose the structured function-call interface and parse the JSON yourself. Worth it for reliability.
Problem 2: 429s with no backoff
The free tier is 10 RPM. The SDK throws on 429 but gives you nothing to work with — no retry logic, no backoff, no way to know how backed up the queue is.
We built an adaptive token-bucket rate limiter:
import { GeminiRateLimiter } from 'gemini-heal';
const limiter = new GeminiRateLimiter({ rpm: 60 });
// Before every Gemini call:
await limiter.acquire();
// When you get a 429:
limiter.reportRateLimit();
// Circuit breaker — skip Gemini when queue is too deep:
if (limiter.shouldSkip()) {
// fall back to another model
}
It automatically halves RPM on each 429 (floor: 2 RPM) and recovers by +2 RPM every 60 seconds of clean traffic. The shouldSkip() circuit breaker lets you route to a fallback model instead of waiting.
Problem 3: JSON wrapped in markdown blocks
This one is small but annoying. Even with responseMimeType: 'application/json', Gemini sometimes returns:
{"key": "value"}
instead of plain JSON. It breaks JSON.parse silently if you don't handle it.
import { stripMarkdownCodeBlock } from 'gemini-heal';
const clean = stripMarkdownCodeBlock(geminiResponse);
const data = JSON.parse(clean);
We open-sourced the fixes
After patching these issues across multiple services we extracted everything into a library called gemini-heal.
npm install gemini-heal @google/generative-ai
It includes:
-
GeminiRateLimiter— adaptive token-bucket with circuit breaker -
GeminiClient— completion wrapper with rate limiting and cost tracking -
ToolCaller— forced tool calling with MALFORMED retry + structured output fallback - Utility helpers for 429 detection and markdown stripping
Zero dependencies. TypeScript. MIT license.
→ GitHub: https://github.com/emotixco/gemini-heal
If you're running Gemini in production and hitting any of these, hope it saves you some time. And if you've found other Gemini quirks worth handling, open an issue — we'd like to keep adding to it.
Top comments (0)