DEV Community

Cover image for What I learned adding diagram and chart slides to a CI-rendered YouTube pipeline
MORINAGA
MORINAGA

Posted on

What I learned adding diagram and chart slides to a CI-rendered YouTube pipeline

The conclusion first: pre-rendering diagrams and charts to PNG before compositing them onto slides — rather than generating visual content inline or inside ffmpeg — is the right architecture for a CI video pipeline. The tooling gap between Chromium-backed Mermaid rendering, headless matplotlib, and ffmpeg's static frame expectation makes a shared PNG handoff the only approach that keeps each piece testable and replaceable.

I added three new slide types to the YouTube slide renderer last week: diagram (Mermaid flowcharts and sequence diagrams), chart (branded horizontal bar charts via matplotlib), and image (license-clear photos from Openverse). The existing slides — title, bullets, table, tool, outro — all draw directly with Pillow. These three render externally, produce a PNG, and get pasted into the same Pillow canvas. Same output contract, different render path.

Why pre-render instead of embed

The two-host pipeline assembles video by compositing a still image for each dialogue segment, synthesizing audio with edge-tts, and using ffmpeg to concatenate the clips. ffmpeg expects the still to be a file or a stream of identical frames — it does not run JavaScript, and it cannot call a browser mid-concat.

Mermaid runs through Puppeteer and Chromium. Pillow draws directly on a numpy-backed image. There is no in-process way to make these talk. The only clean option is: mmdc produces a PNG, Pillow pastes the PNG.

matplotlib is different — it could theoretically produce an image buffer in the same process. But having a consistent "render to PNG file, paste PNG file" pattern for all three visual types means they share the same _visual_slide scaffold and graceful-degradation path. One code path is better than two.

Mermaid via mmdc: the CI-specific configuration

render_mermaid() in visuals.py writes two config files before calling mmdc:

cfg = os.path.join(tempfile.gettempdir(), "bs-mmd-theme.json")
with open(cfg, "w", encoding="utf-8") as f:
    json.dump(_MMD_THEME, f)

pcfg = os.path.join(tempfile.gettempdir(), "bs-puppeteer.json")
with open(pcfg, "w", encoding="utf-8") as f:
    json.dump({"args": ["--no-sandbox", "--disable-setuid-sandbox"]}, f)
Enter fullscreen mode Exit fullscreen mode

The puppeteer config is the one that bit me first. Chromium refuses to start as root without --no-sandbox, and GitHub Actions runs as root inside the Ubuntu container. Without --disable-setuid-sandbox, it will also fail on containers where setuid is restricted. Both flags are needed.

The theme config uses Mermaid's base theme, not dark or forest. The other named themes override themeVariables, so color injection does not work reliably with them. Only base respects the custom palette (confirmed in the Mermaid.js theme docs):

_MMD_THEME = {
    "theme": "base",
    "themeVariables": {
        "primaryColor": PANEL, "primaryTextColor": INK, "primaryBorderColor": ACCENT,
        "lineColor": MUTED, "secondaryColor": "#1B2B4A", "tertiaryColor": BG,
        "fontFamily": "Arial", "fontSize": "22px",
        "clusterBkg": PANEL, "clusterBorder": ACCENT,
    },
}
Enter fullscreen mode Exit fullscreen mode

The mmdc binary resolution tries three paths:

def _mmdc_cmd() -> list[str]:
    local = os.path.join(_DIR, "node_modules", ".bin", "mmdc")
    if os.path.isfile(local):
        return [local]
    if shutil.which("mmdc"):
        return ["mmdc"]
    return ["npx", "--yes", "@mermaid-js/mermaid-cli"]
Enter fullscreen mode Exit fullscreen mode

The npx fallback is correct for CI: the GitHub Actions workflow installs @mermaid-js/mermaid-cli as a dev dependency, so local node_modules is the hot path. The npx branch exists for local dev where you have not run npm install. Do not make npx the primary path — it downloads the package on every invocation, which adds 20-30 seconds per diagram slide in a cold runner.

The -w 1600 width flag matters. At 1920x1080, the content area after chrome and heading is roughly 1700x700. Rendering at 1600px wide gives mmdc enough resolution to produce readable text without scaling artifacts when _paste_visual() thumbnails it into the slot.

matplotlib horizontal bars with a custom dark palette

render_chart() takes a simple spec:

# spec = {"items": [["Tool A", 41200], ["Tool B", 28900], ...], "unit": "stars", "highlight": 0}
Enter fullscreen mode Exit fullscreen mode

Three things that tripped me up when trying to match the slide background:

First: matplotlib.use("Agg") must be called before import matplotlib.pyplot as plt. In Python, the backend selection call and the pyplot import are order-dependent — if you call use("Agg") after pyplot is imported, it either silently fails or raises. The function imports matplotlib inside the function body to avoid this at module load time:

def render_chart(spec, out_png):
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt
Enter fullscreen mode Exit fullscreen mode

Second: setting the background requires two separate calls:

fig.patch.set_facecolor(BG)
ax.set_facecolor(BG)
Enter fullscreen mode Exit fullscreen mode

fig.patch is the outer figure background; ax is the axes area. Missing either one leaves a white rectangle where the background should be dark navy.

Third: plt.tight_layout(pad=1.2) is not enough on its own. Adding bbox_inches="tight" to fig.savefig() is required to clip the white padding matplotlib adds around the bounding box by default. Without it, the saved PNG has a white border that composites badly onto the dark slide background.

The highlight index accentuates one bar in ACCENT2 (green) instead of ACCENT (blue). The spec author sets it to mark the bar that is the point of the slide — the tool with the most stars, or the benchmark winner. It is optional; when absent, all bars render in blue.

The _visual_slide scaffold and graceful degradation

All three types share the same scaffold:

def _visual_slide(spec, render_fn):
    img, d = _base()
    _chrome(img, d, spec.get("page", ""))
    heading = spec.get("heading", "")
    if heading:
        d.text((MARGIN, 150), heading, font=font(56, "bold"), fill=INK)
        d.rectangle([MARGIN, 228, MARGIN + 120, 236], fill=ACCENT)
    import tempfile
    fd, tmp = tempfile.mkstemp(suffix=".png")
    os.close(fd)
    try:
        render_fn(tmp)
        _paste_visual(img, tmp, top=270 if heading else 150)
    except Exception as e:
        sys.stderr.write(f"WARN: visual render failed ({e})\n")
        d.text((MARGIN, 420), "[visual unavailable]", font=font(40), fill=MUTED)
    finally:
        if os.path.exists(tmp):
            os.unlink(tmp)
    return img
Enter fullscreen mode Exit fullscreen mode

The tempfile.mkstemp pattern (create and close descriptor separately) is deliberately cross-platform: on Windows, NamedTemporaryFile with delete=False sometimes holds a lock that prevents a subprocess from writing to the same path. mkstemp avoids this. On Linux it makes no practical difference, but the code runs locally on macOS too.

The except Exception fallback is intentional. If mmdc is not installed, if matplotlib is not present, or if Openverse returns no usable images, the slide renders as a clean heading card with a muted "[visual unavailable]" placeholder. The video build continues. A missing diagram is not a build failure.

This matches the design of the rest of the pipeline: the CI job should produce a video even when external dependencies are partially absent. A video with a placeholder slide is reviewable; a failed build is not.

Where the approach has limits

mmdc startup cost. Each mmdc call launches a Chromium process, renders the SVG, and exits. That takes 2-4 seconds per diagram on the GitHub Actions ubuntu-latest runner. A video spec with five diagram slides adds 10-20 seconds to the build. For the current video lengths (15-25 segments), this is acceptable. If the format grew to 50+ slides, pre-rendering all diagrams in parallel before the main loop would matter.

matplotlib version pinning. The chart code relies on a specific call signature for barh() and tight_layout(). matplotlib has changed these interfaces across minor versions. The workflow pins matplotlib==3.9.* to avoid surprises in future runner image updates.

Mermaid syntax drift. mmdc's supported Mermaid syntax depends on the installed package version. Sequence diagrams work. gitGraph works. More recent additions (xychart-beta, sankey-beta) require a newer mmdc version than what ships with npm without explicit pinning. The solution is to add @mermaid-js/mermaid-cli@^11 to the workflow's npm install step rather than relying on the default version.

No preview in local dev. The analytics feedback loop tells me which slides hold attention; I cannot see a diagram slide without building the full video. Adding a --demo flag to slides.py that renders a single slide type from a spec file would help iterate faster, but I have not built it yet. For now the iteration loop is: edit the JSON spec, push a commit, wait for the CI video build, scrub to the diagram segment. That is slow. The single CI pipeline takes about 4 minutes end-to-end for a 20-segment video, which is reasonable for final validation but too slow for visual design iteration. A local --demo render that skips TTS and ffmpeg and just opens the PNG would cut that to under 10 seconds.

FAQ

Does mmdc need Node.js pre-installed on the runner?
Yes. The ubuntu-latest runner ships with Node.js 20, so the workflow just needs npm install -D @mermaid-js/mermaid-cli in a setup step.

Can I use Mermaid's dark theme instead of base with custom variables?
No. The dark theme resets most themeVariables to its own palette. The only theme that lets you fully override colors via themeVariables is base.

What happens if Openverse returns zero results for a query?
fetch_image() raises RuntimeError("no usable Openverse image for query: ..."). The _visual_slide scaffold catches it and renders a heading-only fallback card. No build failure.

Why horizontal bar charts instead of vertical?
Longer tool or model names truncate or require rotation on vertical axes. Horizontal bars handle 20-40 character labels without layout code.

Is the PNG-to-Pillow paste lossless?
Pillow opens the PNG fully decoded at 24-bit color depth. The save to final PNG at the end of slide composition uses default Pillow compression (level 6), which is lossless. There is no quality loss from the round trip.

Related: YouTube slide renderer — eight kinds, no browserTwo-host video pipeline with edge-tts and ffmpegFree neural TTS options for CI pipelines

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

Top comments (0)