<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 汪小春</title>
    <description>The latest articles on DEV Community by 汪小春 (@xspring1982).</description>
    <link>https://dev.to/xspring1982</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3438219%2Fd386d39d-f7f6-47a0-ba8b-dbc44cbc7b0a.jpg</url>
      <title>DEV Community: 汪小春</title>
      <link>https://dev.to/xspring1982</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xspring1982"/>
    <language>en</language>
    <item>
      <title>Two engines for AI slide decks: HTML output vs gpt-image-2 (and how we solved CJK rendering)</title>
      <dc:creator>汪小春</dc:creator>
      <pubDate>Wed, 13 May 2026 08:04:52 +0000</pubDate>
      <link>https://dev.to/xspring1982/two-engines-for-ai-slide-decks-html-output-vs-gpt-image-2-and-how-we-solved-cjk-rendering-2h85</link>
      <guid>https://dev.to/xspring1982/two-engines-for-ai-slide-decks-html-output-vs-gpt-image-2-and-how-we-solved-cjk-rendering-2h85</guid>
      <description>&lt;p&gt;A few months ago, a user emailed us with a screenshot. They'd generated a Chinese-language slide deck with our tool — and every Chinese character was either missing, replaced with a square, or warped into something that wasn't quite the right glyph.&lt;/p&gt;

&lt;p&gt;The screenshot was bad. The fix was harder than it looked.&lt;/p&gt;

&lt;p&gt;This post is about the architectural decision we ended up making: &lt;strong&gt;running two different rendering engines for the same product&lt;/strong&gt;, and why neither one alone was enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with AI slides + CJK
&lt;/h2&gt;

&lt;p&gt;Most AI slide generators do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;LLM writes the content (text + structure)&lt;/li&gt;
&lt;li&gt;A template engine (HTML/CSS or PPTX) lays it out&lt;/li&gt;
&lt;li&gt;Done&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This works fine for English. The text is a string; the font is whatever the template specifies. The user sees what they expect.&lt;/p&gt;

&lt;p&gt;CJK breaks step 2 in two ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Font fallback.&lt;/strong&gt; When the template's font doesn't include Chinese / Japanese / Korean glyphs, browsers fall back to whatever's available. The result is typographically inconsistent — half your slide is in your designed font, half is in something Noto-ish that the browser found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image-based generation.&lt;/strong&gt; If you skip the template and ask an AI image model to "make a slide with this Chinese text", you'll get the garbled-CJK problem most generative image tools have — the model produces something that looks like Chinese but isn't actually any specific character. (Try this in DALL·E or Midjourney with any non-Latin script. You'll see what I mean.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Two engines, two trade-offs
&lt;/h2&gt;

&lt;p&gt;We ended up shipping both:&lt;/p&gt;

&lt;h3&gt;
  
  
  Engine 1: HTML path
&lt;/h3&gt;

&lt;p&gt;The LLM produces a structured spec, we render it with a reveal.js / Slidev-style template. Output is an inline-editable web slide deck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; users can tweak content after generation (it's just HTML); fast; smaller file size for exports.&lt;br&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; CJK looks acceptable but never great; visual variety is constrained by what the template supports.&lt;/p&gt;
&lt;h3&gt;
  
  
  Engine 2: gpt-image-2 path
&lt;/h3&gt;

&lt;p&gt;OpenAI's &lt;code&gt;gpt-image-2&lt;/code&gt; (released April 2026) is the first image model where text rendering is genuinely usable for CJK. We compose a "slide-as-prompt" — layout description, content, style — and the model renders the entire slide as a single image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; typography is sharp and consistent; CJK characters render correctly; visual variety is essentially unlimited.&lt;br&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; the user can't tweak content post-generation without re-rendering; ~5x slower than the HTML path; PPTX export has each slide as one image (not editable in PowerPoint).&lt;/p&gt;
&lt;h2&gt;
  
  
  The decision: ship both
&lt;/h2&gt;

&lt;p&gt;We let the user pick. Default to HTML for fast iteration; switch to gpt-image-2 when CJK accuracy matters more than editability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User flow:
  Article / link / PDF → LLM extracts structure
                         ↓
            ┌────────────┴────────────┐
   HTML path                      gpt-image-2 path
   (Slidev-style template)       (full-image render)
            ↓                            ↓
     Editable web slides         Image-per-page export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why this isn't obviously the right architecture
&lt;/h2&gt;

&lt;p&gt;Two engines means more code, more bugs, more decisions for the user. It also means our "What does the tool do?" elevator pitch has two halves — which is harder to sell than a single clean story.&lt;/p&gt;

&lt;p&gt;But for CJK users, the HTML path alone wasn't acceptable, and dropping the HTML path entirely was a regression for everyone who wanted editable output. So: both.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;In hindsight, we should have made the engine choice &lt;strong&gt;per-slide&lt;/strong&gt; instead of per-deck. Some slides need editing (talking points, agenda); some need typography fidelity (a single Chinese headline on a chart). Forcing the user to pick one engine for the whole deck is the wrong granularity. We're fixing this now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;If you want to see what gpt-image-2 looks like as a slide engine — especially with CJK — you can sign up at &lt;a href="https://anyslide.app" rel="noopener noreferrer"&gt;AnySlide&lt;/a&gt; (60 free credits, no card). I'd genuinely love feedback on the engine switch UX; it's the part I'm least sure about.&lt;br&gt;
ai, showdev, typography, i18n&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
