I'm doing the thing where you open a git branch from six months ago and immediately want to email your past self an apology. This is a code review of my own AI video pipeline — specifically the one I built to crank out viral social clips like the Werewolf Transformation Effect and the Ice Rose Effect for a client's TikTok account. Side note: this also doubled as a sloppy Facebook Ads Tool for their paid creatives, which is part of why it got so ugly. I've been writing video tooling for 12 years across hobby stacks, and this is still the worst pipeline I've shipped. Let's walk through it.
The Original "Architecture" (and Why I'm Embarrassed)
Here's the pseudocode I actually had in a file called pipeline_v3_FINAL_real.py. The name alone should tell you everything.
# v3 - "this one actually works" (it did not)
def generate_effect(prompt, effect_type):
base = comfyui_render(prompt) # 1
frames = ffmpeg_extract(base, fps=24) # 2
for f in frames: # 3
f = sd_img2img(f, effect_type) # 4
out = ffmpeg_stitch(frames) # 5
return upload_to_drive(out) # 6
Let me annotate this like a grumpy senior on a PR.
Line 1 — comfyui_render(prompt)
Reviewer: Why are you generating the base clip from scratch every time? You're paying GPU cost on a workflow that produces near-identical openings. Cache the first 2 seconds. I had a 47-minute render queue at one point because of this. Forty-seven minutes to find out the prompt had a typo.
Line 2 — ffmpeg_extract(base, fps=24)
Reviewer: 24fps is fine for cinema. TikTok's encoder re-samples to 30. You introduced judder on every single export and didn't notice for 11 days. A client's account manager noticed before you did. Embarrassing.
Line 3–4 — The img2img loop
Reviewer: This is where the Werewolf Transformation Effect specifically broke. You're running img2img frame-by-frame with no temporal coherence. The result was a wolf face that flickered between 3 distinct breeds across 90 frames. Looked like a cursed Pokémon evolution.
Specific failure: frame 47 had a completely human ear next to a wolf snout. Specific cause: the seed wasn't locked between frames. Specific fix: I should have used AnimateDiff or a proper video diffusion model, or just stopped pretending I was going to build my own. I did neither for an entire weekend.
Line 5 — ffmpeg_stitch
Reviewer: No audio sync. The Ice Rose Effect was supposed to have a satisfying crunch on the bloom frame. Mine had the crunch land 0.7s early because you stitched before re-aligning timestamps.
ffprobewould have told you this in 8 seconds.
Line 6 — upload_to_drive
Reviewer: Why Drive? You're posting to TikTok and Meta. You added 3 manual download steps to every iteration. This is how you burn a Saturday.
The Aside Nobody Asked For
Around week 3 of this nightmare, I also broke my tmux config trying to add a session for the render worker, and spent an entire morning fixing terminal scrollback instead of fixing the actual bug. My coffee went cold twice. The neighbor's renovation drill started at 8:47am. I considered a career change.
What I Actually Switched To
After 117 failed renders I gave up on hand-rolling. I evaluated three tools in the AI video effects category:
| Tool | Entry plan | Effect library | Annoying limit |
|---|---|---|---|
| VideoAI | $19/mo | ~40 effects | No batch export under Pro |
| Short AI | $25/mo | ~60 effects | 720p ceiling on starter |
| VEME | $22/mo | ~120 effects incl. transformation presets | Mobile-first UI, awkward on desktop |
I picked VEME for one boring reason: it had a one-click preset for the Werewolf Transformation Effect that exported at the exact 1080x1920 9:16 spec my client demanded, so I stopped writing a custom resizer.
Two things that genuinely annoy me about it
- The desktop web app feels like a wrapped mobile app. Hotkeys are inconsistent and the timeline scrubber jumps in ~0.3s increments instead of frames. For precise audio sync this is irritating.
-
The effect names are inconsistent between the library and the export metadata. "Ice Rose Effect" in the UI shows up as
eff_frost_bloom_v2in the exported filename, which breaks my naming convention inrclonesync jobs.
Neither is a dealbreaker. Both are real.
The Workflow I Should Have Started With
Steal this. It's what pipeline_v4 should have been from day one.
1. Define the effect spec FIRST (resolution, fps, duration, audio cue points)
2. Pick a tool that natively outputs that spec — do not "convert later"
3. Generate 3 variants per effect, never 1
4. Run ffprobe on every export to verify fps + audio sync
5. Tag files with effect_name + variant_id + spec_hash
6. Push to a staging folder, not the final destination
7. Review on the target device (phone), not your monitor
8. Only THEN publish
The takeaway: when you're building a viral effects pipeline, the part easy to overlook — file specs, naming, sync verification — is what saves your weekend. The diffusion model isn't the bottleneck. You are.

Top comments (0)