How I fixed LLM structured output failures in a PowerPoint translator (0 errors on 1,214 translations)

#python #llm #claudeai #opensource

I built PPTranslate — an open-source tool that translates PowerPoint files while preserving all formatting. The translation engine works. The layout preservation works. But for two weeks, I had a maddening bug I couldn't shake.

The Problem
The core idea is simple: PPTX files are ZIP archives of XML. Text lives in nodes. I extract all text, send it to Claude for translation, write it back. Layout is preserved because I never touch formatting attributes.

The tricky part: a single slide can have 50+ text items. You can't send one API call per item — that's 50 API calls per slide, completely impractical. So I batch them: send all 50 items in one call, get 50 translations back.

My first approach was the obvious one. Give Claude a numbered list and ask for a JSON array back:

prompt = f"""Translate these items from English to Japanese.
Return a JSON array with exactly {len(texts)} translations.

0: Hello World
1: Click to add title  
2: Q4 Financial Results
...

Output JSON array only:"""

This worked 95% of the time. The other 5%, Claude would do something bizarre: split a single translation into individual characters.

Expected output for item 0: ["こんにちは世界"]

Actual output: ["こ","ん","に","ち","は","世","界"]

Instead of 50 translations, I'd get 289 items. The index mapping breaks completely. On a 59-page deck with 843 translation items, this caused ~42 broken slides per run.

What Didn't Work
I tried everything the obvious way first:

More explicit prompting: Added "DO NOT split characters", "each array element must be a complete translation", "array length MUST equal input count". Helped slightly. Still failed.

Temperature = 0: No effect on this type of failure.

Smaller batches: Reduced batch size from 50 to 20. Error rate dropped but didn't disappear. And now I needed 2.5x as many API calls.

JSON schema in system prompt: Showed Claude an example of correct output format. Marginally better, still not reliable.

The root problem: I was asking Claude to produce free-form text that happens to be valid JSON of a specific structure. When generating token by token, there's nothing preventing it from deciding "this translation has multiple parts" and producing an array within the array.

The Fix: Tool Use
Claude's Tool Use (function calling) API lets you define a tool with a strict JSON schema that Claude must call with valid inputs. It's not generating free-form text anymore — it's filling in structured fields.

Instead of a flat array, I defined named properties for each translation:

properties = {
    f"t{i}": {
        "type": "string",
        "description": f'Complete translation of item {i}: {texts[i][:40]}'
    }
    for i in range(len(texts))
}

tools = [{
    "name": "submit_translations",
    "description": f"Submit all {len(texts)} translations",
    "input_schema": {
        "type": "object",
        "properties": properties,
        "required": list(properties.keys()),
    }
}]

The key insight: "type": "string" on each property means Claude cannot return an array for a single translation. The schema literally doesn't allow it. You're not hoping Claude follows instructions — you're making the incorrect output structurally impossible.

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=4096,
    tools=tools,
    tool_choice={"type": "tool", "name": "submit_translations"},
    messages=[{"role": "user", "content": prompt}]
)

for block in message.content:
    if block.type == "tool_use" and block.name == "submit_translations":
        result = [block.input.get(f"t{i}", texts[i]) for i in range(len(texts))]
        return result

Result: 0 errors across 1,214 translations on the same 59-page deck that previously produced 16 failures.

The Overflow Problem I Didn't Expect
Once translation was reliable, a new problem surfaced: English → Japanese on decks with tight single-line headings caused text clipping. Japanese can be significantly denser than English in character count but takes up more horizontal space in certain fonts.

My fix: after translation, for single-paragraph text boxes, calculate the visual expansion ratio and proportionally widen the cx attribute on :

# Visual width calculation (CJK chars ≈ 2x English width)
def visual_width(text):
    width = 0.0
    for ch in text:
        cp = ord(ch)
        if (0x2E80 <= cp <= 0x9FFF or 0xAC00 <= cp <= 0xD7AF):
            width += 2.0
        else:
            width += 1.0
    return max(width, 1.0)

ratio = visual_width(translated) / visual_width(original)
if ratio > 1.15:  # Only expand if >15% wider
    new_cx = min(int(cx * ratio * 1.05), slide_width - x - margin)
    ext.set("cx", str(new_cx))

The Result
The full project is on GitHub: cloudyview/ppt-translator

There's also a hosted version at pptranslate.com if you want to try it without setting anything up. MIT licensed, self-hostable on any Ubuntu VPS.

The Tool Use lesson applies beyond translation — any time you need an LLM to return structured data with a specific schema, forcing it through function calling is dramatically more reliable than prompt engineering alone.

Top comments (1)

James Dai • Apr 5

Happy to answer questions about the XML manipulation approach
or the Tool Use structured output pattern — both were more
interesting to figure out than I expected.