DEV Community

qcrao
qcrao

Posted on

5 LoRA training pitfalls when you're trying to lock down a comic character

TLDR: Most "my LoRA works in test prompts but breaks the second I put it in a comic panel" problems are caused at training time, not at inference. Here are the five training-side mistakes that ate the most weekends for me.


I've spent the last eight months building Comicory, an AI comic generator where the entire pitch is "your character looks the same on page 1 and page 12." That sentence is easy to say. It is grindingly hard to ship.

Almost every fix I shipped in those eight months traced back to LoRA training, not the prompt or the sampler or the seed. This post is the list I wish someone had given me on day one.

Pitfall 1: Your training set has too many "same shot"

The first character LoRA I trained had 32 images. 28 of them were 3/4 portrait, neutral lighting, looking slightly off-camera. It was the dataset I had, scraped from concept-art-style references.

The LoRA trained beautifully. Then I tried to use it in an actual comic panel — wide shot, side profile, character mid-action — and the output looked nothing like the reference. The model had memorized the pose, not the character.

Fix: aim for pose, framing, and lighting diversity before you aim for image count. My current target for a character is roughly:

  • 30% close-up faces (multiple angles)
  • 30% medium shots (waist-up, multiple angles)
  • 25% full-body shots
  • 15% "weird" shots — back of head, dramatic angle, partial occlusion

Quality of coverage matters more than count. A 25-image set with this distribution beats a 70-image set of nothing-but-portraits, every single time.

Pitfall 2: You captioned the character into the wallpaper

This one is sneaky. In my early datasets, every caption looked like:

ck_character standing in a forest, anime style, soft lighting, high detail
Enter fullscreen mode Exit fullscreen mode

The model learned ck_character as inseparable from "standing in a forest, soft lighting." When I prompted ck_character on a spaceship bridge, the LoRA pulled in foliage and warm light because those concepts had been bound to the trigger token.

Fix: caption away the things you want to vary, leave only what is invariant about the character. If your character is supposed to be wearable in any setting, your caption should look like:

ck_character, red jacket, short black hair, freckles
Enter fullscreen mode Exit fullscreen mode

That's it. No setting, no lighting, no mood. Those are the variables you'll set at inference time.

# What I do during caption preprocessing now
INVARIANT_TAGS = ["red_jacket", "short_black_hair", "freckles"]
STRIPPED_TAGS = ["forest", "soft_lighting", "high_detail", "outdoor", "indoor"]

def clean_caption(raw_tags, trigger="ck_character"):
    keep = [t for t in raw_tags if t in INVARIANT_TAGS]
    return f"{trigger}, " + ", ".join(keep)
Enter fullscreen mode Exit fullscreen mode

This change alone gave me the single biggest jump in cross-scene consistency.

Pitfall 3: You trained at one resolution and then panel-rendered at another

Stable Diffusion 1.5 LoRAs trained at 512×512 fall apart at 768×1152 panel aspect ratios. SDXL is more forgiving but not immune. The model has not seen the character at the panel aspect ratio you actually need.

Fix: bucketed training across the aspect ratios you'll actually render at. kohya-ss supports this out of the box. My current bucket config covers:

  • 512×768 (portrait panel)
  • 768×512 (landscape panel)
  • 768×768 (splash square)
  • 1024×1536 (full-page hero)

Image counts in each bucket should roughly match how often you'll render at that aspect. If 70% of your panels are landscape, 70% of your training images should be landscape — even if it means cropping the same source image into multiple buckets.

Pitfall 4: Your learning rate is fighting your dataset size

There is no universal "good" LR. Tiny datasets (15-25 images) want a lower LR and more steps so the model doesn't overfit on the handful of examples. Bigger sets (60+) tolerate a higher LR and fewer epochs.

What I use as a starting point now (kohya-ss, SDXL LoRA, rank 16):

Dataset size unet_lr text_encoder_lr epochs
15-25 images 1e-4 5e-5 12-15
25-50 images 2e-4 1e-4 8-10
50-100 images 3e-4 1e-4 6-8

These are starting points, not laws. But they will save you from the two failure modes I kept hitting: undertraining ("LoRA does nothing") and overcooking ("LoRA always renders the same expression").

Check loss curves. If validation loss bottoms out around epoch 4 and rises after, your LR is too high or you have too few images. If it's still falling at the last epoch, train longer.

Pitfall 5: You skipped regularization images and now the LoRA bleeds into everything

You ship the LoRA. You prompt a coffee shop, no characters, photorealistic. Your character shows up anyway, faintly haunting the espresso machine.

This is the LoRA "leaking" into general concepts because it has no contrast set. The model has no examples of "what a person who is NOT this character looks like" during training, so the LoRA's identity bleeds into the base model's "person" concept.

Fix: regularization images. During training, alongside your character set, include a folder of generic "person" images (200-300, captioned simply as person) generated by the base model itself. These tell the LoRA "this is what NOT-the-character looks like."

In kohya-ss config:

[[datasets]]
  [[datasets.subsets]]
    image_dir = "/data/ck_character"
    class_tokens = "ck_character"
    num_repeats = 10

  [[datasets.subsets]]
    image_dir = "/data/reg_person"
    class_tokens = "person"
    num_repeats = 1
    is_reg = true
Enter fullscreen mode Exit fullscreen mode

The leaking effect drops to near-zero. Your background characters look like background characters again.

Closing

Character consistency is, in practice, a checklist of these five training-time decisions plus a workflow that uses the resulting LoRA correctly. The inference side (ControlNet, IP-Adapter, reference-only) only matters once your LoRA is solid. If your LoRA is bad, no amount of inference scaffolding will save it.

I built Comicory because I wanted a comic generator that didn't make me re-prompt the character on every panel. The five fixes above are the spine of how it works under the hood.

Top comments (0)