DEV Community

Cover image for OpenRouter Structured Output Broke Before Translation Quality Did — 3 Layers of Defense for Production
Lovanaut
Lovanaut

Posted on

OpenRouter Structured Output Broke Before Translation Quality Did — 3 Layers of Defense for Production

The first production incident wasn't a bad translation. It was a Markdown code fence wrapping the JSON response.

One day, error notifications flooded in. The UI was rendering blank blocks where translations should have been. The cause? The model had quietly started being "helpful" by wrapping its JSON responses in json ... fences. JSON.parse() choked immediately, and the translation feature went down — not because of bad translations, but because of three backticks.

This article walks through the exact defense system I built to stabilize structured output from the OpenRouter API in production, in the order the failures surfaced. The main topic is malformed JSON responses. I also cover retry/fallback and language detection, but JSON handling is where most of the engineering hours went.

TL;DR

  • The Core Issue: LLM translation quality doesn't matter if JSON.parse() fails. Markdown code fences and truncation will break your app before bad translations do.
  • The Safe Baseline: json_object + response-healing + fail-closed parsing + expected-key validation.
  • The Fix: A 3-layer defense using response_format: { type: 'json_object' }, OpenRouter's response-healing plugin, and a custom defensive parser that rejects partial or malformed data instead of returning incomplete results.
  • Bonus: Why you should only retry HTTP 429/5xx errors, and why binary language detection fails for tech content.
Failure mode Symptom Defense
Code fences JSON.parse() fails json_object + response-healing
Missing keys Blank UI blocks Fail-closed parser + expected-key validation
429 / 5xx Intermittent request failure Retry + model fallback double loop
Mixed-language text Wasted API calls or false skips Ratio-based detection with asymmetric thresholds

Context: This comes from building auto-translation (Japanese to English, bidirectional) for Lovai, an AI recipe-sharing platform. The translation handles user-generated posts with titles, summaries, and multi-block body content.

This article focuses on in-app content translation — it's a separate layer from hreflang-based multilingual SEO.


Fixing LLM JSON Corruption: 3-Layer Defense with OpenRouter

The first thing you need to handle when you run an LLM API in production is not translation quality — it's malformed JSON responses. Bad quality means "the translation is awkward." Parse failure means "the feature is down."

In my initial implementation, I wasn't using JSON Mode at all. The system prompt just said "return JSON," with no response_format specified. This worked fine for a while — until the model started wrapping responses in Markdown code fences without warning.

Here's what was actually coming back:

```json
{"__title": "Built Translation with OpenRouter"}
```
Enter fullscreen mode Exit fullscreen mode

JSON.parse() chokes on this immediately. I added three layers of defense.

Layer 1: Enable JSON Mode with response_format: { type: 'json_object' }

First, I added response_format: { type: 'json_object' }. This constrains the model to return valid JSON. Running LLM output through JSON.parse() without response_format is not safe for production. Prompt-only instructions break silently when models update or when the service is under load.

Note that structured outputs (json_object / json_schema) are only available on supported models. OpenRouter's model pages list compatibility.

Layer 2: OpenRouter response-healing Plugin for Auto-Repair

OpenRouter has a response-healing plugin that automatically fixes:

  • Markdown code fence removal (json ...)
  • Missing brackets and trailing commas
  • JSON extraction from surrounding text
const requestBody = {
  model,
  messages: [/* ... */],
  response_format: { type: 'json_object' },
  plugins: [{ id: 'response-healing' }],  // Enable auto-repair
};
Enter fullscreen mode Exit fullscreen mode

response-healing works alongside json_object / json_schema. Known constraints: it's non-streaming only, and it can't fix truncation from max_tokens cutoff.

Layer 3: Fail-Closed Parsing — Parse Success Is Not Enough

Even with JSON Mode and response-healing, I keep a defensive parser on the application side. It's insurance against model behavior changes or API spec updates.

The parser rejects partial or malformed data instead of returning incomplete results. If it can't produce valid, complete JSON with all expected keys, it throws rather than silently serving broken translations.

function parseTranslationResponse(
  content: string,
  expectedKeys: string[]
): Record<string, string> {
  // Step 1: Try parsing raw content directly
  let parsed: unknown;
  try {
    parsed = JSON.parse(content);
  } catch {
    // Step 2: Strip code fences and retry
    const stripped = content
      .replace(/^```
{% endraw %}
(?:json)?\s*/gm, '')
      .replace(/^
{% raw %}
```\s*/gm, '')
      .trim();
    parsed = JSON.parse(stripped);  // Let it throw if still invalid
  }

  // Step 3: Validate structure — fail closed
  if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) {
    throw new Error('LLM response is not a JSON object');
  }

  const result: Record<string, string> = {};
  for (const key of expectedKeys) {
    const value = (parsed as Record<string, unknown>)[key];
    if (typeof value !== 'string') {
      throw new Error(`Missing or non-string value for key: ${key}`);
    }
    result[key] = value;
  }

  // Log unexpected keys (model occasionally adds metadata fields)
  const extraKeys = Object.keys(parsed as Record<string, unknown>)
    .filter(k => !expectedKeys.includes(k));
  if (extraKeys.length > 0) {
    console.warn(`LLM returned unexpected keys: ${extraKeys.join(', ')}`);
  }

  return result;
}
Enter fullscreen mode Exit fullscreen mode

Key design decisions:

  • Fail closed: Missing keys throw an error rather than silently returning partial data. This routes failures to the retry loop instead of serving broken translations.
  • Key validation: The caller passes expectedKeys (block IDs), and the parser verifies every expected key is present with a string value. This catches cases where JSON.parse() succeeds but the model dropped or renamed keys.
  • Extra key warning: Unexpected keys get logged but don't fail the request — the model occasionally adds metadata fields that are harmless.

Stabilizing JSON output from LLM APIs requires both API-side constraints (json_object + response-healing) and application-side defensive parsing with key validation. Either one alone leaves you exposed to model behavior drift or API spec changes.

When to use json_schema instead: OpenRouter also supports json_schema mode. With json_schema + strict: true, you get output that matches a predefined schema. For translation, the keys are dynamic (they depend on block IDs per post), so json_object is simpler. If your keys are static and predictable — like entity extraction (person names, organizations, dates as fixed fields) — json_schema + strict: true is more reliable, and Layer 1 alone may be sufficient. That said, you can approximate dynamic keys with json_schema by generating the schema per request or using additionalProperties — it's just more implementation overhead.


LLM API Retry Design: Model Fallback and HTTP Error Strategy

After JSON parsing, the next thing I needed to stabilize was API call reliability.

Model Fallback x Retry Double Loop

const MODEL = 'google/gemini-3-flash-preview';
const FALLBACK_MODEL = process.env.OPENROUTER_TRANSLATE_FALLBACK_MODEL || '';

const modelCandidates = Array.from(
  new Set([MODEL, FALLBACK_MODEL].filter(Boolean))
);

for (const model of modelCandidates) {
  for (let attempt = 0; attempt <= maxRetry; attempt++) {
    // Translation attempt
  }
}
Enter fullscreen mode Exit fullscreen mode

The outer loop switches models. The inner loop handles retries. If the primary model exhausts all retries (max 2), it falls through to the fallback model.

In my implementation, I only retry HTTP 429 and 5xx responses. Retrying a 400 (bad request) or 401 (auth error) won't change the outcome — you end up in an infinite retry loop on a config error.

const shouldRetryStatus = (status: number): boolean =>
  status === 429 || status >= 500;
Enter fullscreen mode Exit fullscreen mode

Note: this covers HTTP-level errors. Transport-level failures (timeouts, connection resets, DNS failures) are handled separately by the AbortController timeout below — those always get retried since they don't indicate a permanent problem.

Timeout Control with AbortController

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 15_000);

try {
  const response = await fetch(OPENROUTER_API_URL, {
    signal: controller.signal,
    // ...
  });
} finally {
  clearTimeout(timeout);
}
Enter fullscreen mode Exit fullscreen mode

I initially set it to 8 seconds, but longer posts (5,000+ characters) were timing out mid-flight. I bumped it to 15 seconds. LLM API timeouts need to scale with input length. Too short and you kill legitimate requests. Too long and you can't detect hangs.


Language Detection for Mixed-Script Text: Why Ratio Beats Binary

Before translating, you need to check: "Is this text already in the target language?" This prevents unnecessary API calls. I got this wrong in the first version too.

The initial approach was simple binary detection — "does the text contain characters of this language?"

// Initial implementation (broken)
const hasJapanese = (text: string): boolean =>
  /[\u3040-\u30ff\u3400-\u9fff]/u.test(text);
const hasLatin = (text: string): boolean =>
  /[A-Za-z]/.test(text);

// Has Japanese && no Latin → Japanese text → skip translation
if (isJa && !isEn) { /* skip */ }
Enter fullscreen mode Exit fullscreen mode

This pattern applies to any language pair where technical terminology bleeds across scripts. A Japanese post like "Next.jsでReactアプリを作った" contains both Japanese characters and Latin characters, so binary detection flags it for translation.

In tech content, English terms mixed into non-Latin-script text is the norm, not the exception. This isn't limited to Japanese-English — Korean tech posts with English terms, Chinese posts with API names, Arabic text with framework names all hit the same trap. Any language pair where one script dominates but technical terms intrude from another will produce false positives with binary detection.

Binary detection classified nearly every Japanese post as "needs translation," triggering a flood of unnecessary API calls.

The fix was ratio-based detection.

function isAlreadyInTargetLanguage(
  text: string,
  target: SupportedLanguage
): boolean {
  const jaChars = (
    text.match(/[\u3040-\u30ff\u3400-\u9fff\uff00-\uffef]/gu) || []
  ).length;
  const latChars = (text.match(/[A-Za-z]/g) || []).length;
  const total = jaChars + latChars;
  if (total === 0) return true;

  const jaRatio = jaChars / total;
  if (target === 'ja' && jaRatio > 0.95) return true;
  if (target === 'en' && (1 - jaRatio) > 0.95) return true;
  return false;
}
Enter fullscreen mode Exit fullscreen mode

The threshold started at 70%, but Japanese text with 30%+ English terms was being misclassified as "already English." I raised it to 95%. After this change, false skips (Japanese text that didn't get translated) dropped to near zero.

Detection is split into two functions with different jobs:

Function Purpose Threshold Strategy
isAlreadyInTargetLanguage() Filter out text that doesn't need translation 95% Strict (if in doubt, translate)
isLikelyInTargetLanguage() Validate translation output quality 25%+ or 8+ chars Lenient (tolerate mixed terminology)

For text where technical terms cross script boundaries, "skip detection" should be strict and "output validation" should be lenient. This asymmetry matters. Flip it and you either miss texts that need translation, or reject perfectly good translations.


Why OpenRouter + LLM over Google Translate or DeepL

I evaluated Google Cloud Translation API, DeepL API, and OpenRouter + LLM. Here's why I chose OpenRouter.

The dealbreaker was control over the response structure. Posts on the platform have titles, summaries, and multiple body blocks, each with a unique ID. Google Translate and DeepL can batch-translate, but they return an array — you have to track which translation maps to which block by index position yourself.

With LLM translation, I can use block IDs as JSON keys. The response comes back with those same keys, values translated. No mapping logic needed.

// Input: block IDs as keys
{ "__title": "Built Translation with OpenRouter", "block_abc_text": "Body text..." }
// Output: keys preserved, values translated
{ "__title": "OpenRouterで翻訳を作った", "block_abc_text": "本文テキスト..." }
Enter fullscreen mode Exit fullscreen mode

This "send keyed JSON, get keyed JSON back" pattern works well specifically because of OpenRouter's json_object mode combined with the response-healing plugin.

On cost (at the time of writing): translating one post (~2,000 chars / ~1,500 tokens) costs about $0.04 on Google NMT vs. under $0.001 on gemini-2.5-flash-lite (input $0.10 / output $0.40 per 1M tokens). Character-based and token-based pricing don't compare directly, but for short-form content translation, the LLM route is significantly cheaper.

Detailed comparison: Google Cloud Translation / DeepL / OpenRouter

Factor Google Cloud Translation / DeepL OpenRouter + LLM
Response structure Array of translated strings in input order. Map translations back to fields by index Send JSON with keys, get JSON back with keys preserved and only values translated
Terminology control Glossary (pre-registered term mappings). Precise, but requires manual registration Prompt-level instructions. No pre-registration, less strict
Model switching Limited. Google has NMT vs Translation LLM. DeepL has model_type Change one environment variable. gemini-2.5-flash-lite to gemini-3-flash-preview without changing application code
Pricing Per-character (Google NMT: $20/1M chars, 500K free/month. DeepL Free: 500K chars/month) Per-token (varies by model)

For terminology control, Google/DeepL Glossaries are more precise. But in AI/tech content, new terms appear constantly. Registering each one gets expensive in maintenance time. With LLM translation, I just tell the prompt "preserve technical terms as-is." Less strict, but that simplicity matters when you're a solo developer.

OpenRouter lets you call multiple LLM models through a unified API. I started with gemini-2.5-flash-lite and later switched to gemini-3-flash-preview — just an environment variable change, without changing application code.


Timeline: How This Feature Actually Evolved

This feature didn't ship complete. It improved every time production broke.

Version Changes What Broke
v1 (Jan) Basic implementation. gemini-2.5-flash-lite, 8s timeout, binary language detection, no JSON Mode Initial release
v1.1 (Jan) Explicit prompt rules (preserve technical terms, preserve keys, etc.) Technical terms getting translated unintentionally
v2 (Feb) JSON Mode added, response-healing enabled, defensive parser, model switch (gemini-3-flash-preview), ratio-based language detection (70%), timeout bumped to 15s JSON parse errors flooding error notifications
v3 (Current) Language detection threshold raised to 95%, output validation function separated, key validation added 70% threshold causing "Japanese text not translated" misclassifications

Getting from v1 ("it works") to v2 ("it doesn't break") took the most effort. The v2 JSON corruption fix alone took a full day.


Three Principles for Stable Structured LLM Output

These are the three principles that stabilized LLM structured output in my production system:

  1. JSON corruption is priority zero. response_format: { type: 'json_object' } + response-healing plugin handles most cases, but keep an application-side defensive parser with key validation for max_tokens truncation and model differences. If your schema is static, json_schema + strict: true is more reliable.
  2. Only retry HTTP 429 and 5xx. Retrying 4xx is pointless. Handle transport-level failures (timeouts, connection resets) separately. Separate model fallback and retry into a double loop.
  3. Use ratio-based language detection. Binary detection is useless when technical terms cross script boundaries. Make "skip detection" strict and "output validation" lenient — the asymmetry is the point.

These three principles aren't specific to translation. They apply to any case where you expect structured output from an LLM API — text classification, entity extraction, content structuring.

If you're parsing JSON from an LLM in production, treat malformed output as an uptime problem, not a quality problem.

If you've hit a different failure mode with structured LLM output, I'd like to hear about it.


References

Top comments (0)