How I cut speech-bubble retries from 70% to 0% with 200 lines of Pillow code

#ai #pillow #sideprojects #indie

If you've ever asked Stable Diffusion or DALL-E to render readable text inside a comic panel, you know the pain. It almost works. The letters look like letters. Until you read them — "WHAT ARE YOU DONIG", "HEILP", "BLEAH BLAH". About 70% of my generations needed a regen just because the dialogue was garbled, and every regen burned ~$0.04 in GPU time.

For Comicory I gave up trying to make the model render text and moved typography into a deterministic post-processing step. The model now draws empty speech bubbles. Pillow draws the words. Retry rate for text-related issues: zero. Total post-processing code: ~200 lines.

Here's the pipeline.

Step 1: Bubble shape detection

The model is told (via prompt + LoRA) to draw an empty white speech bubble with a black outline somewhere in the panel. I find it with classic CV — no ML, no models, no surprises:

from PIL import Image
import numpy as np
import cv2

def find_bubble(panel: Image.Image) -> tuple[int, int, int, int] | None:
    arr = np.array(panel.convert("L"))
    _, mask = cv2.threshold(arr, 245, 255, cv2.THRESH_BINARY)
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    blobs = sorted(contours, key=cv2.contourArea, reverse=True)
    for blob in blobs[1:5]:
        x, y, w, h = cv2.boundingRect(blob)
        aspect = w / h
        if 0.6 < aspect < 3.0 and w * h > 5000:
            return (x, y, w, h)
    return None

The aspect-ratio bound rejects long thin clouds and full-panel backgrounds. Across ~2,000 panels, this lands the right bubble 96% of the time.

Step 2: Font selection by character mood

Every Comicory character has a mood field. Each mood maps to a font + weight:

FONT_MAP = {
    "calm":     ("AnimeAce2.ttf",        24, None),
    "angry":    ("BadaBoom-BB.ttf",      30, "bold"),
    "shouting": ("BadaBoom-BB.ttf",      36, "bold"),
    "whisper":  ("AnimeAce2.ttf",        20, "italic"),
    "narrator": ("CCWildwords-Roman.ttf", 22, None),
}

Properly licensed comic fonts I bought once for ~$80. Free Google Fonts comic alternatives (Bangers, Permanent Marker) look like Canva templates — readers spot the AI-comic vibe instantly.

Step 3: Text wrapping that fits the bubble

Pillow's textwrap is naive. My version binary-searches font size + line breaks until rendered text fits the bubble:

from PIL import ImageDraw, ImageFont

def fit_text(text, bubble, font_path, max_size):
    x, y, w, h = bubble
    inner_w, inner_h = int(w * 0.75), int(h * 0.75)
    for size in range(max_size, 10, -2):
        font = ImageFont.truetype(font_path, size)
        lines = wrap_to_width(text, font, inner_w)
        line_h = font.getbbox("Ay")[3] - font.getbbox("Ay")[1]
        total_h = line_h * len(lines) * 1.15
        if total_h <= inner_h:
            return font, lines

font.getlength() is the key — actual rendered width per character, kerning-aware. The 0.75 inscribed-rect factor leaves visible margin so the eye reads it as "professionally laid out."

Step 4: Kerning + outline (polish)

def draw_text_with_outline(draw, lines, font, center_x, top_y, outline_w=2):
    line_h = (font.getbbox("Ay")[3] - font.getbbox("Ay")[1]) * 1.15
    for i, line in enumerate(lines):
        line_w = font.getlength(line)
        x = center_x - line_w / 2
        y = top_y + i * line_h
        for dx in (-outline_w, 0, outline_w):
            for dy in (-outline_w, 0, outline_w):
                if dx or dy:
                    draw.text((x + dx, y + dy), line, font=font, fill="white")
        draw.text((x, y), line, font=font, fill="black")

The 8-direction stroke produces a clean white halo around black text, improving readability over busy backgrounds. Modern Pillow has native stroke_width but I keep manual stroke — chunkier, reads more "comic-y."

For kerning, don't draw character-by-character in a loop. That throws away the font's kerning pairs. Use getlength() and let Pillow respect the metric table.

Before vs. after

Metric	Pre (in-prompt text)	Post (Pillow composite)
Text legibility (manual review)	31% acceptable	100% acceptable
Regens triggered by text issues	70% of panels	0%
Avg latency per panel	8.4s	8.6s (+200ms Pillow)
GPU $ saved per 100 panels	—	$2.80
Lines of code total	0	~200

+200ms Pillow overhead is invisible to users. $2.80/100-panels compounds to ~$120/month I no longer pay for failed text generations.

The bigger win is user trust. When you see an AI comic with garbled text, your brain immediately tags it "AI slop." Clean, kerned, outlined typography reads as "someone made this on purpose." Cheapest credibility upgrade in the pipeline.

If you want to see the composite output in the wild, Comicory is the side project this lives inside — every comic generated there ships through the exact 4 steps above before it reaches the canvas.