DEV Community

N4k3l
N4k3l

Posted on

How I cut AI generation errors in half without burning more LLM calls

A few months ago I was deep into building Crimetube, a pipeline that turns a topic prompt into a full narrated documentary video. The whole thing runs on its own: research, script, shot planning, image generation, video generation, voiceover, the works. One video ends up with over 200 AI generated shots, spread across around 38 different clusters.
If you’ve ever generated more than a handful of AI images or video clips in a row, you already know the problem I ran into. The model does not always listen. A character’s outfit changes halfway through. The lighting in a location suddenly looks different from the shot before it. Sometimes the model just ignores a rule you gave it three times.
Early on, roughly 1 in 10 shots would break some rule. Wrong character look, inconsistent color, continuity that didn’t hold up. With 200+ shots per video, that is a lot of broken shots to deal with.
The lazy fix, and my first fix
My first instinct was the obvious one. If a cluster of shots had problems, just regenerate the whole cluster. Feed it back to the LLM, let it try again.
This worked, technically. But it was slow and it was expensive. Every regeneration meant another full LLM call, and I was often re-generating shots that were actually fine just to fix the one or two that weren’t. It felt wasteful because it was wasteful.
I needed something smarter than “throw it back at the AI and hope.”
What I built instead
I ended up building a 23 rule lint pass that runs after every batch of shots gets generated. Think of it like a linter for code, except instead of checking syntax, it checks things like: does this shot match the locked character description, does the color grade match the location’s style lock, are there any obvious continuity breaks.
The key idea was splitting the fix into two layers.
Layer one is deterministic. A good chunk of the 23 rules can be checked and fixed with plain code. No AI involved at all. Simple regex level corrections, straightforward pattern matching. If a rule violation falls into this bucket, it gets fixed instantly and for free.
Layer two only kicks in for what’s left. After the deterministic layer does its job, whatever shots still violate a rule get sent back for a targeted regeneration. Not the whole cluster. Just that one shot, with the specific problem called out.
This second part mattered more than I expected. Before, “fix the problem” meant “regenerate everything and cross your fingers.” Now it means “regenerate exactly the thing that’s broken, and nothing else.”
The result
Violations dropped from around 20 per cluster down to under 8. That’s already a solid improvement on its own.
But the part I actually care more about is the cost side. Because most fixes now happen at the code layer first, I skip a full LLM call per cluster in the common case. When you’re generating hundreds of shots across dozens of clusters, skipping unnecessary LLM calls adds up fast, both in time and in money.
Why this pattern is worth remembering
I think this is a pattern worth keeping in your back pocket if you’re building anything that leans on an LLM to produce a lot of output. The instinct when something goes wrong is to throw it back at the model. But not every fix needs a model. A lot of problems are simple enough that plain code can catch and correct them, and you should let it, before you ever reach for the expensive tool.
Save the LLM for the part that actually needs judgment. Let the cheap, deterministic layer handle everything else.
That one shift, splitting fixes into “code can handle this” and “this genuinely needs the model,” is probably the single change that made the whole pipeline feel production ready instead of like a fragile demo.

Top comments (0)