Two engines for AI slide decks: HTML output vs gpt-image-2 (and how we solved CJK rendering)

#ai #architecture #llm #softwareengineering

A few months ago, a user emailed us with a screenshot. They'd generated a Chinese-language slide deck with our tool — and every Chinese character was either missing, replaced with a square, or warped into something that wasn't quite the right glyph.

The screenshot was bad. The fix was harder than it looked.

This post is about the architectural decision we ended up making: running two different rendering engines for the same product, and why neither one alone was enough.

The problem with AI slides + CJK

Most AI slide generators do this:

LLM writes the content (text + structure)
A template engine (HTML/CSS or PPTX) lays it out
Done

This works fine for English. The text is a string; the font is whatever the template specifies. The user sees what they expect.

CJK breaks step 2 in two ways:

Font fallback. When the template's font doesn't include Chinese / Japanese / Korean glyphs, browsers fall back to whatever's available. The result is typographically inconsistent — half your slide is in your designed font, half is in something Noto-ish that the browser found.

Image-based generation. If you skip the template and ask an AI image model to "make a slide with this Chinese text", you'll get the garbled-CJK problem most generative image tools have — the model produces something that looks like Chinese but isn't actually any specific character. (Try this in DALL·E or Midjourney with any non-Latin script. You'll see what I mean.)

Two engines, two trade-offs

We ended up shipping both:

Engine 1: HTML path

The LLM produces a structured spec, we render it with a reveal.js / Slidev-style template. Output is an inline-editable web slide deck.

Pros: users can tweak content after generation (it's just HTML); fast; smaller file size for exports.
Cons: CJK looks acceptable but never great; visual variety is constrained by what the template supports.

Engine 2: gpt-image-2 path

OpenAI's gpt-image-2 (released April 2026) is the first image model where text rendering is genuinely usable for CJK. We compose a "slide-as-prompt" — layout description, content, style — and the model renders the entire slide as a single image.

Pros: typography is sharp and consistent; CJK characters render correctly; visual variety is essentially unlimited.
Cons: the user can't tweak content post-generation without re-rendering; ~5x slower than the HTML path; PPTX export has each slide as one image (not editable in PowerPoint).

The decision: ship both

We let the user pick. Default to HTML for fast iteration; switch to gpt-image-2 when CJK accuracy matters more than editability.

User flow:
  Article / link / PDF → LLM extracts structure
                         ↓
            ┌────────────┴────────────┐
   HTML path                      gpt-image-2 path
   (Slidev-style template)       (full-image render)
            ↓                            ↓
     Editable web slides         Image-per-page export

Why this isn't obviously the right architecture

Two engines means more code, more bugs, more decisions for the user. It also means our "What does the tool do?" elevator pitch has two halves — which is harder to sell than a single clean story.

But for CJK users, the HTML path alone wasn't acceptable, and dropping the HTML path entirely was a regression for everyone who wanted editable output. So: both.

What I'd do differently

In hindsight, we should have made the engine choice per-slide instead of per-deck. Some slides need editing (talking points, agenda); some need typography fidelity (a single Chinese headline on a chart). Forcing the user to pick one engine for the whole deck is the wrong granularity. We're fixing this now.

Try it

If you want to see what gpt-image-2 looks like as a slide engine — especially with CJK — you can sign up at AnySlide (60 free credits, no card). I'd genuinely love feedback on the engine switch UX; it's the part I'm least sure about.
ai, showdev, typography, i18n