DEV Community

James Dai
James Dai

Posted on

How I fixed LLM structured output failures in a PowerPoint translator (0 errors on 1,214 translations)

I built PPTranslate — an open-source tool that translates PowerPoint files while preserving all formatting. The translation engine works. The layout preservation works. But for two weeks, I had a maddening bug I couldn't shake.

The Problem
The core idea is simple: PPTX files are ZIP archives of XML. Text lives in nodes. I extract all text, send it to Claude for translation, write it back. Layout is preserved because I never touch formatting attributes.

The tricky part: a single slide can have 50+ text items. You can't send one API call per item — that's 50 API calls per slide, completely impractical. So I batch them: send all 50 items in one call, get 50 translations back.

My first approach was the obvious one. Give Claude a numbered list and ask for a JSON array back:

prompt = f"""Translate these items from English to Japanese.
Return a JSON array with exactly {len(texts)} translations.

0: Hello World
1: Click to add title  
2: Q4 Financial Results
...

Output JSON array only:"""
Enter fullscreen mode Exit fullscreen mode

This worked 95% of the time. The other 5%, Claude would do something bizarre: split a single translation into individual characters.

Expected output for item 0: ["こんにちは世界"]

Actual output: ["こ","ん","に","ち","は","世","界"]

Instead of 50 translations, I'd get 289 items. The index mapping breaks completely. On a 59-page deck with 843 translation items, this caused ~42 broken slides per run.

What Didn't Work
I tried everything the obvious way first:

More explicit prompting: Added "DO NOT split characters", "each array element must be a complete translation", "array length MUST equal input count". Helped slightly. Still failed.

Temperature = 0: No effect on this type of failure.

Smaller batches: Reduced batch size from 50 to 20. Error rate dropped but didn't disappear. And now I needed 2.5x as many API calls.

JSON schema in system prompt: Showed Claude an example of correct output format. Marginally better, still not reliable.

The root problem: I was asking Claude to produce free-form text that happens to be valid JSON of a specific structure. When generating token by token, there's nothing preventing it from deciding "this translation has multiple parts" and producing an array within the array.

The Fix: Tool Use
Claude's Tool Use (function calling) API lets you define a tool with a strict JSON schema that Claude must call with valid inputs. It's not generating free-form text anymore — it's filling in structured fields.

Instead of a flat array, I defined named properties for each translation:

properties = {
    f"t{i}": {
        "type": "string",
        "description": f'Complete translation of item {i}: {texts[i][:40]}'
    }
    for i in range(len(texts))
}

tools = [{
    "name": "submit_translations",
    "description": f"Submit all {len(texts)} translations",
    "input_schema": {
        "type": "object",
        "properties": properties,
        "required": list(properties.keys()),
    }
}]

Enter fullscreen mode Exit fullscreen mode

The key insight: "type": "string" on each property means Claude cannot return an array for a single translation. The schema literally doesn't allow it. You're not hoping Claude follows instructions — you're making the incorrect output structurally impossible.

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=4096,
    tools=tools,
    tool_choice={"type": "tool", "name": "submit_translations"},
    messages=[{"role": "user", "content": prompt}]
)

for block in message.content:
    if block.type == "tool_use" and block.name == "submit_translations":
        result = [block.input.get(f"t{i}", texts[i]) for i in range(len(texts))]
        return result
Enter fullscreen mode Exit fullscreen mode

Result: 0 errors across 1,214 translations on the same 59-page deck that previously produced 16 failures.

The Overflow Problem I Didn't Expect
Once translation was reliable, a new problem surfaced: English → Japanese on decks with tight single-line headings caused text clipping. Japanese can be significantly denser than English in character count but takes up more horizontal space in certain fonts.

My fix: after translation, for single-paragraph text boxes, calculate the visual expansion ratio and proportionally widen the cx attribute on :

# Visual width calculation (CJK chars ≈ 2x English width)
def visual_width(text):
    width = 0.0
    for ch in text:
        cp = ord(ch)
        if (0x2E80 <= cp <= 0x9FFF or 0xAC00 <= cp <= 0xD7AF):
            width += 2.0
        else:
            width += 1.0
    return max(width, 1.0)

ratio = visual_width(translated) / visual_width(original)
if ratio > 1.15:  # Only expand if >15% wider
    new_cx = min(int(cx * ratio * 1.05), slide_width - x - margin)
    ext.set("cx", str(new_cx))
Enter fullscreen mode Exit fullscreen mode

The Result
The full project is on GitHub: cloudyview/ppt-translator

There's also a hosted version at pptranslate.com if you want to try it without setting anything up. MIT licensed, self-hostable on any Ubuntu VPS.

The Tool Use lesson applies beyond translation — any time you need an LLM to return structured data with a specific schema, forcing it through function calling is dramatically more reliable than prompt engineering alone.

Top comments (1)

Collapse
 
cloudyview profile image
James Dai

Happy to answer questions about the XML manipulation approach
or the Tool Use structured output pattern — both were more
interesting to figure out than I expected.