The first production incident wasn't a bad translation. It was a Markdown code fence wrapping the JSON response.
One day, error notifications flooded in. The UI was rendering blank blocks where translations should have been. The cause? The model had quietly started being "helpful" by wrapping its JSON responses in json ... fences. JSON.parse() choked immediately, and the translation feature went down — not because of bad translations, but because of three backticks.
This article walks through the exact defense system I built to stabilize structured output from the OpenRouter API in production, in the order the failures surfaced. The main topic is malformed JSON responses. I also cover retry/fallback and language detection, but JSON handling is where most of the engineering hours went.
TL;DR
-
The Core Issue: LLM translation quality doesn't matter if
JSON.parse()fails. Markdown code fences and truncation will break your app before bad translations do. -
The Safe Baseline:
json_object+response-healing+ fail-closed parsing + expected-key validation. -
The Fix: A 3-layer defense using
response_format: { type: 'json_object' }, OpenRouter'sresponse-healingplugin, and a custom defensive parser that rejects partial or malformed data instead of returning incomplete results. - Bonus: Why you should only retry HTTP 429/5xx errors, and why binary language detection fails for tech content.
| Failure mode | Symptom | Defense |
|---|---|---|
| Code fences |
JSON.parse() fails |
json_object + response-healing
|
| Missing keys | Blank UI blocks | Fail-closed parser + expected-key validation |
| 429 / 5xx | Intermittent request failure | Retry + model fallback double loop |
| Mixed-language text | Wasted API calls or false skips | Ratio-based detection with asymmetric thresholds |
Context: This comes from building auto-translation (Japanese to English, bidirectional) for Lovai, an AI recipe-sharing platform. The translation handles user-generated posts with titles, summaries, and multi-block body content.
This article focuses on in-app content translation — it's a separate layer from
hreflang-based multilingual SEO.
Fixing LLM JSON Corruption: 3-Layer Defense with OpenRouter
The first thing you need to handle when you run an LLM API in production is not translation quality — it's malformed JSON responses. Bad quality means "the translation is awkward." Parse failure means "the feature is down."
In my initial implementation, I wasn't using JSON Mode at all. The system prompt just said "return JSON," with no response_format specified. This worked fine for a while — until the model started wrapping responses in Markdown code fences without warning.
Here's what was actually coming back:
```json
{"__title": "Built Translation with OpenRouter"}
```
JSON.parse() chokes on this immediately. I added three layers of defense.
Layer 1: Enable JSON Mode with response_format: { type: 'json_object' }
First, I added response_format: { type: 'json_object' }. This constrains the model to return valid JSON. Running LLM output through JSON.parse() without response_format is not safe for production. Prompt-only instructions break silently when models update or when the service is under load.
Note that structured outputs (json_object / json_schema) are only available on supported models. OpenRouter's model pages list compatibility.
Layer 2: OpenRouter response-healing Plugin for Auto-Repair
OpenRouter has a response-healing plugin that automatically fixes:
- Markdown code fence removal (
json ...) - Missing brackets and trailing commas
- JSON extraction from surrounding text
const requestBody = {
model,
messages: [/* ... */],
response_format: { type: 'json_object' },
plugins: [{ id: 'response-healing' }], // Enable auto-repair
};
response-healing works alongside json_object / json_schema. Known constraints: it's non-streaming only, and it can't fix truncation from max_tokens cutoff.
Layer 3: Fail-Closed Parsing — Parse Success Is Not Enough
Even with JSON Mode and response-healing, I keep a defensive parser on the application side. It's insurance against model behavior changes or API spec updates.
The parser rejects partial or malformed data instead of returning incomplete results. If it can't produce valid, complete JSON with all expected keys, it throws rather than silently serving broken translations.
function parseTranslationResponse(
content: string,
expectedKeys: string[]
): Record<string, string> {
// Step 1: Try parsing raw content directly
let parsed: unknown;
try {
parsed = JSON.parse(content);
} catch {
// Step 2: Strip code fences and retry
const stripped = content
.replace(/^```
{% endraw %}
(?:json)?\s*/gm, '')
.replace(/^
{% raw %}
```\s*/gm, '')
.trim();
parsed = JSON.parse(stripped); // Let it throw if still invalid
}
// Step 3: Validate structure — fail closed
if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) {
throw new Error('LLM response is not a JSON object');
}
const result: Record<string, string> = {};
for (const key of expectedKeys) {
const value = (parsed as Record<string, unknown>)[key];
if (typeof value !== 'string') {
throw new Error(`Missing or non-string value for key: ${key}`);
}
result[key] = value;
}
// Log unexpected keys (model occasionally adds metadata fields)
const extraKeys = Object.keys(parsed as Record<string, unknown>)
.filter(k => !expectedKeys.includes(k));
if (extraKeys.length > 0) {
console.warn(`LLM returned unexpected keys: ${extraKeys.join(', ')}`);
}
return result;
}
Key design decisions:
- Fail closed: Missing keys throw an error rather than silently returning partial data. This routes failures to the retry loop instead of serving broken translations.
-
Key validation: The caller passes
expectedKeys(block IDs), and the parser verifies every expected key is present with a string value. This catches cases whereJSON.parse()succeeds but the model dropped or renamed keys. - Extra key warning: Unexpected keys get logged but don't fail the request — the model occasionally adds metadata fields that are harmless.
Stabilizing JSON output from LLM APIs requires both API-side constraints (json_object + response-healing) and application-side defensive parsing with key validation. Either one alone leaves you exposed to model behavior drift or API spec changes.
When to use
json_schemainstead: OpenRouter also supportsjson_schemamode. Withjson_schema+strict: true, you get output that matches a predefined schema. For translation, the keys are dynamic (they depend on block IDs per post), sojson_objectis simpler. If your keys are static and predictable — like entity extraction (person names, organizations, dates as fixed fields) —json_schema+strict: trueis more reliable, and Layer 1 alone may be sufficient. That said, you can approximate dynamic keys withjson_schemaby generating the schema per request or usingadditionalProperties— it's just more implementation overhead.
LLM API Retry Design: Model Fallback and HTTP Error Strategy
After JSON parsing, the next thing I needed to stabilize was API call reliability.
Model Fallback x Retry Double Loop
const MODEL = 'google/gemini-3-flash-preview';
const FALLBACK_MODEL = process.env.OPENROUTER_TRANSLATE_FALLBACK_MODEL || '';
const modelCandidates = Array.from(
new Set([MODEL, FALLBACK_MODEL].filter(Boolean))
);
for (const model of modelCandidates) {
for (let attempt = 0; attempt <= maxRetry; attempt++) {
// Translation attempt
}
}
The outer loop switches models. The inner loop handles retries. If the primary model exhausts all retries (max 2), it falls through to the fallback model.
In my implementation, I only retry HTTP 429 and 5xx responses. Retrying a 400 (bad request) or 401 (auth error) won't change the outcome — you end up in an infinite retry loop on a config error.
const shouldRetryStatus = (status: number): boolean =>
status === 429 || status >= 500;
Note: this covers HTTP-level errors. Transport-level failures (timeouts, connection resets, DNS failures) are handled separately by the AbortController timeout below — those always get retried since they don't indicate a permanent problem.
Timeout Control with AbortController
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 15_000);
try {
const response = await fetch(OPENROUTER_API_URL, {
signal: controller.signal,
// ...
});
} finally {
clearTimeout(timeout);
}
I initially set it to 8 seconds, but longer posts (5,000+ characters) were timing out mid-flight. I bumped it to 15 seconds. LLM API timeouts need to scale with input length. Too short and you kill legitimate requests. Too long and you can't detect hangs.
Language Detection for Mixed-Script Text: Why Ratio Beats Binary
Before translating, you need to check: "Is this text already in the target language?" This prevents unnecessary API calls. I got this wrong in the first version too.
The initial approach was simple binary detection — "does the text contain characters of this language?"
// Initial implementation (broken)
const hasJapanese = (text: string): boolean =>
/[\u3040-\u30ff\u3400-\u9fff]/u.test(text);
const hasLatin = (text: string): boolean =>
/[A-Za-z]/.test(text);
// Has Japanese && no Latin → Japanese text → skip translation
if (isJa && !isEn) { /* skip */ }
This pattern applies to any language pair where technical terminology bleeds across scripts. A Japanese post like "Next.jsでReactアプリを作った" contains both Japanese characters and Latin characters, so binary detection flags it for translation.
In tech content, English terms mixed into non-Latin-script text is the norm, not the exception. This isn't limited to Japanese-English — Korean tech posts with English terms, Chinese posts with API names, Arabic text with framework names all hit the same trap. Any language pair where one script dominates but technical terms intrude from another will produce false positives with binary detection.
Binary detection classified nearly every Japanese post as "needs translation," triggering a flood of unnecessary API calls.
The fix was ratio-based detection.
function isAlreadyInTargetLanguage(
text: string,
target: SupportedLanguage
): boolean {
const jaChars = (
text.match(/[\u3040-\u30ff\u3400-\u9fff\uff00-\uffef]/gu) || []
).length;
const latChars = (text.match(/[A-Za-z]/g) || []).length;
const total = jaChars + latChars;
if (total === 0) return true;
const jaRatio = jaChars / total;
if (target === 'ja' && jaRatio > 0.95) return true;
if (target === 'en' && (1 - jaRatio) > 0.95) return true;
return false;
}
The threshold started at 70%, but Japanese text with 30%+ English terms was being misclassified as "already English." I raised it to 95%. After this change, false skips (Japanese text that didn't get translated) dropped to near zero.
Detection is split into two functions with different jobs:
| Function | Purpose | Threshold | Strategy |
|---|---|---|---|
isAlreadyInTargetLanguage() |
Filter out text that doesn't need translation | 95% | Strict (if in doubt, translate) |
isLikelyInTargetLanguage() |
Validate translation output quality | 25%+ or 8+ chars | Lenient (tolerate mixed terminology) |
For text where technical terms cross script boundaries, "skip detection" should be strict and "output validation" should be lenient. This asymmetry matters. Flip it and you either miss texts that need translation, or reject perfectly good translations.
Why OpenRouter + LLM over Google Translate or DeepL
I evaluated Google Cloud Translation API, DeepL API, and OpenRouter + LLM. Here's why I chose OpenRouter.
The dealbreaker was control over the response structure. Posts on the platform have titles, summaries, and multiple body blocks, each with a unique ID. Google Translate and DeepL can batch-translate, but they return an array — you have to track which translation maps to which block by index position yourself.
With LLM translation, I can use block IDs as JSON keys. The response comes back with those same keys, values translated. No mapping logic needed.
// Input: block IDs as keys
{ "__title": "Built Translation with OpenRouter", "block_abc_text": "Body text..." }
// Output: keys preserved, values translated
{ "__title": "OpenRouterで翻訳を作った", "block_abc_text": "本文テキスト..." }
This "send keyed JSON, get keyed JSON back" pattern works well specifically because of OpenRouter's json_object mode combined with the response-healing plugin.
On cost (at the time of writing): translating one post (~2,000 chars / ~1,500 tokens) costs about $0.04 on Google NMT vs. under $0.001 on gemini-2.5-flash-lite (input $0.10 / output $0.40 per 1M tokens). Character-based and token-based pricing don't compare directly, but for short-form content translation, the LLM route is significantly cheaper.
Detailed comparison: Google Cloud Translation / DeepL / OpenRouter
| Factor | Google Cloud Translation / DeepL | OpenRouter + LLM |
|---|---|---|
| Response structure | Array of translated strings in input order. Map translations back to fields by index | Send JSON with keys, get JSON back with keys preserved and only values translated |
| Terminology control | Glossary (pre-registered term mappings). Precise, but requires manual registration | Prompt-level instructions. No pre-registration, less strict |
| Model switching | Limited. Google has NMT vs Translation LLM. DeepL has model_type
|
Change one environment variable. gemini-2.5-flash-lite to gemini-3-flash-preview without changing application code |
| Pricing | Per-character (Google NMT: $20/1M chars, 500K free/month. DeepL Free: 500K chars/month) | Per-token (varies by model) |
For terminology control, Google/DeepL Glossaries are more precise. But in AI/tech content, new terms appear constantly. Registering each one gets expensive in maintenance time. With LLM translation, I just tell the prompt "preserve technical terms as-is." Less strict, but that simplicity matters when you're a solo developer.
OpenRouter lets you call multiple LLM models through a unified API. I started with gemini-2.5-flash-lite and later switched to gemini-3-flash-preview — just an environment variable change, without changing application code.
Timeline: How This Feature Actually Evolved
This feature didn't ship complete. It improved every time production broke.
| Version | Changes | What Broke |
|---|---|---|
| v1 (Jan) | Basic implementation. gemini-2.5-flash-lite, 8s timeout, binary language detection, no JSON Mode |
Initial release |
| v1.1 (Jan) | Explicit prompt rules (preserve technical terms, preserve keys, etc.) | Technical terms getting translated unintentionally |
| v2 (Feb) | JSON Mode added, response-healing enabled, defensive parser, model switch (gemini-3-flash-preview), ratio-based language detection (70%), timeout bumped to 15s |
JSON parse errors flooding error notifications |
| v3 (Current) | Language detection threshold raised to 95%, output validation function separated, key validation added | 70% threshold causing "Japanese text not translated" misclassifications |
Getting from v1 ("it works") to v2 ("it doesn't break") took the most effort. The v2 JSON corruption fix alone took a full day.
Three Principles for Stable Structured LLM Output
These are the three principles that stabilized LLM structured output in my production system:
-
JSON corruption is priority zero.
response_format: { type: 'json_object' }+response-healingplugin handles most cases, but keep an application-side defensive parser with key validation formax_tokenstruncation and model differences. If your schema is static,json_schema+strict: trueis more reliable. - Only retry HTTP 429 and 5xx. Retrying 4xx is pointless. Handle transport-level failures (timeouts, connection resets) separately. Separate model fallback and retry into a double loop.
- Use ratio-based language detection. Binary detection is useless when technical terms cross script boundaries. Make "skip detection" strict and "output validation" lenient — the asymmetry is the point.
These three principles aren't specific to translation. They apply to any case where you expect structured output from an LLM API — text classification, entity extraction, content structuring.
If you're parsing JSON from an LLM in production, treat malformed output as an uptime problem, not a quality problem.
If you've hit a different failure mode with structured LLM output, I'd like to hear about it.
Top comments (0)