<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kumar K Jha</title>
    <description>The latest articles on DEV Community by Kumar K Jha (@kkjcodes).</description>
    <link>https://dev.to/kkjcodes</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4004304%2F0a105ec7-3a78-4563-a1c8-506722b96754.png</url>
      <title>DEV Community: Kumar K Jha</title>
      <link>https://dev.to/kkjcodes</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kkjcodes"/>
    <language>en</language>
    <item>
      <title>What it actually costs to generate one AI cartoon video, line by line</title>
      <dc:creator>Kumar K Jha</dc:creator>
      <pubDate>Fri, 26 Jun 2026 17:13:57 +0000</pubDate>
      <link>https://dev.to/kkjcodes/what-it-actually-costs-to-generate-one-ai-cartoon-video-line-by-line-3omh</link>
      <guid>https://dev.to/kkjcodes/what-it-actually-costs-to-generate-one-ai-cartoon-video-line-by-line-3omh</guid>
      <description>&lt;p&gt;I sat down to look at my fal.ai billing dashboard last week for the first time in a few months. Not because something was wrong — just because I wanted to write a "here's what this costs" post and I'd been winging the numbers for a while.&lt;/p&gt;

&lt;p&gt;A couple of hours later I had a spreadsheet, three tabs of receipts, and a slightly different view of my own pipeline than I started with. So I'm writing it up.&lt;/p&gt;

&lt;p&gt;This is about the cost structure of a small AI-video product I built solo. The point isn't to argue it's a great cost structure — it isn't, particularly — but to break down where the money actually goes. If you're building one of these and trying to figure out where to optimize, the numbers might be useful. And if you're not, the surprise at the end is at least kind of fun.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built, briefly and honestly
&lt;/h2&gt;

&lt;p&gt;A user uploads a photo. The pipeline glues together a few third-party models — image generator, image-to-video, TTS, a few LLM calls — and out the other end pops a personalized animated MP4 starring that person (and optionally up to four people in the same video).&lt;/p&gt;

&lt;p&gt;It's &lt;a href="https://www.atveanimation.com" rel="noopener noreferrer"&gt;atveanimation.com&lt;/a&gt; if you want to poke at it. Free tier is real. Free tier is also why this cost post exists — running a free tier means knowing where the money goes.&lt;/p&gt;

&lt;p&gt;I did not invent any of the techniques here. Character consistency, keyframe anchoring, LoRA conditioning — those are all standard patterns at this point. Frontier models (Kling 3.0 Motion Control, Seedance 2.0, Hedra, Wan 2.7 multi-ref) do the consistency part natively and arguably better. What I'm posting is just the unit economics of a particular stack: WAN + Flux Kontext + flux-lora + Kokoro.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I measured this
&lt;/h2&gt;

&lt;p&gt;Prices come from fal.ai, Replicate, and Anthropic posted API rates, cross-referenced with averaged usage across the last ~100 generations on my account. For models priced by output (Kontext Pro) or character count (Kokoro), I used the per-call mean rather than the marginal-token rate.&lt;/p&gt;

&lt;p&gt;A couple of definitions worth being precise about, because I tripped over them in an earlier draft:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Generated seconds&lt;/strong&gt;: what you actually pay for. WAN i2v emits ~6-second clips at my frame settings, so 4 scenes = 24 generated
seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finished seconds&lt;/strong&gt;: what the user watches. My pipeline trims each scene to &lt;code&gt;audio_length + 0.5s&lt;/code&gt; after merging, so a 6-second clip with a 3-second voice line becomes a 3.5-second finished scene. A 4-scene video usually finishes around 16 seconds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both numbers are real. The per-second figure depend and I'll keep them straight throughout.&lt;/p&gt;

&lt;p&gt;Amortization: I'm dividing per-character setup costs across &lt;strong&gt;5 videos per character&lt;/strong&gt;, which is what the repeat-use pattern looks like in my own data. At one video per character the per-second cost roughly doubles. I'll show the range.&lt;/p&gt;

&lt;p&gt;What this excludes: failed-and-rerolled generations (handled separately below), Azure blob storage and egress (rounding error), Container Apps baseline (~$30–50/month, fixed, amortized across all traffic), developer time. What it includes: the vision call that auto-picks each character's voice, the Claude Haiku call that writes the scene brief.&lt;/p&gt;

&lt;h2&gt;
  
  
  The teardown
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Per scene
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Per call&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Keyframe (solo character)&lt;/td&gt;
&lt;td&gt;fal-ai/flux-lora&lt;/td&gt;
&lt;td&gt;$0.04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Keyframe (multi-character)&lt;/td&gt;
&lt;td&gt;fal-ai/flux-pro/kont&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Keyframe (anchor scene)&lt;/td&gt;
&lt;td&gt;FLUX Kontext Pro&lt;/td&gt;
&lt;td&gt;$0.04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Animated clip (100 frames, ~6s)&lt;/td&gt;
&lt;td&gt;fal-ai/wan-i2v&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice line&lt;/td&gt;
&lt;td&gt;Kokoro TTS&lt;/td&gt;
&lt;td&gt;$0.005&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt sanitization&lt;/td&gt;
&lt;td&gt;Claude Haiku&lt;/td&gt;
&lt;td&gt;$0.0008&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Per multi-character scene: about &lt;strong&gt;$0.555&lt;/strong&gt;. Four scenes: &lt;strong&gt;$2.22&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Worth noting: fal's posted price for wan-i2v at 720r clip, not per second. The 1.25× multiplier kicks in for clips over 81 frames. My pipeline requests 16 fps) per scene to give the audio room, which lands me in the multiplier band at $0.50 per clip. Dropping to 80 frames would save $0.10 per clip and 5 seconds is plenty for most voice lines — that's a real optimization I should run, and I'll get to it below.&lt;/p&gt;

&lt;p&gt;WAN dominates the per-scene line. No surprise — I expected the video model to be the expensive part. What I didn't expect was where&lt;br&gt;
the &lt;em&gt;rest&lt;/em&gt; of the bill came from.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per character (one-time, amortized)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Visual description&lt;/td&gt;
&lt;td&gt;Claude Sonnet (vision)&lt;/td&gt;
&lt;td&gt;$0.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Style transfer (4 cartoon options)&lt;/td&gt;
&lt;td&gt;FLUX Kontext Pro × 4&lt;/td&gt;
&lt;td&gt;$0.16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training augmentations (35 images)&lt;/td&gt;
&lt;td&gt;FLUX Kontext&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA fine-tune (1500 steps)&lt;/td&gt;
&lt;td&gt;fal-ai/flux-lora-fast-training&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total per character&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$1.975&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here's the surprise that made me actually write this post: &lt;strong&gt;the augmentation step costs 3.5× more than the LoRA training itself.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you've never built one of these, you might assume LoRA training is the expensive part — it's the line item with "training" in the name. It's not. The expensive line item is generating the 35 cartoon variations you need to &lt;em&gt;feed&lt;/em&gt; the training, because a LoRA fine-tuned on a single source photo overfits horribly and the resulting character looks generic and same-y across scenes.&lt;/p&gt;

&lt;p&gt;So you need pose variation. Expression variation. Lighting variation. Each one costs ~$0.04 to generate via Kontext Pro. Stack 35 of them and you've spent $1.40 before you've trained a single weight.&lt;/p&gt;

&lt;p&gt;I generate 20 variations from the cartoon style image (poses, expressions) plus 15 variations from the original selfie (anchored on the real face, to counterbalance the cartoon-side darkening of skin tone that I observed when I had a more skewed mix). That 20+15 split is what makes the LoRA actually produce a recognizable person.&lt;/p&gt;

&lt;p&gt;It's the hidden cost nobody flags when they talk about "LoRA fine-tuning is cheap now."&lt;/p&gt;

&lt;h2&gt;
  
  
  Reconciliation
&lt;/h2&gt;

&lt;p&gt;For a typical 4-scene, 2-character video, amortized over 5 videos per character:&lt;/p&gt;

&lt;p&gt;Per-character setup (amortized):   2 × $1.975 / 5  =  $0.79&lt;br&gt;
Per-scene (4 × $0.555):                            =  $2.22&lt;br&gt;
Per-project brief:                                &amp;lt;  $0.01&lt;br&gt;
────────────────────────────────&lt;br&gt;
Total per video                                    ≈  $3.02&lt;br&gt;
÷ 24 generated seconds (4 × 6.25s clips)           ≈  $0.126 / sec&lt;br&gt;
÷ 16 finished seconds (after audio-aware trim)     ≈  $0.189 / sec&lt;/p&gt;

&lt;p&gt;So: about &lt;strong&gt;$0.13 per generated second&lt;/strong&gt; (what fal/Replicate/Anthropic invoice me for), or &lt;strong&gt;$0.19 per finished second&lt;/strong&gt; (what a&lt;br&gt;
viewer actually experiences). Both are real; the finished-second number is the more honest headline because it's what the user gets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Share of the bill
&lt;/h3&gt;

&lt;p&gt;This is the part I think is actually useful:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;$ / video&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WAN i2v (4 clips)&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;66%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Augmentation (amortized, 2 characters)&lt;/td&gt;
&lt;td&gt;$0.56&lt;/td&gt;
&lt;td&gt;19%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA training (amortized)&lt;/td&gt;
&lt;td&gt;$0.16&lt;/td&gt;
&lt;td&gt;5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-Kontext keyframes (4 scenes)&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Style transfer + vision describe (amortized)&lt;/td&gt;
&lt;td&gt;$0.07&lt;/td&gt;
&lt;td&gt;2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kokoro voice lines (4 scenes)&lt;/td&gt;
&lt;td&gt;$0.02&lt;/td&gt;
&lt;td&gt;&amp;lt;1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everything else (LLM, moderation)&lt;/td&gt;
&lt;td&gt;&amp;lt;$0.01&lt;/td&gt;
&lt;td&gt;&amp;lt;1%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The video model is the biggest line. It's not the only line, and it doesn't dominate the way I expected. Augmentation alone is almost a fifth of the bill.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amortization range
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Videos / character&lt;/th&gt;
&lt;th&gt;Per generated sec (24s)&lt;/th&gt;
&lt;th&gt;Per finished sec (16s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 (single video, new character)&lt;/td&gt;
&lt;td&gt;~$0.26&lt;/td&gt;
&lt;td&gt;~$0.39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;~$0.15&lt;/td&gt;
&lt;td&gt;~$0.22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5 (my measured average)&lt;/td&gt;
&lt;td&gt;~$0.13&lt;/td&gt;
&lt;td&gt;~$0.19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;~$0.11&lt;/td&gt;
&lt;td&gt;~$0.16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;~$0.10&lt;/td&gt;
&lt;td&gt;~$0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The generated-second column is what your accountant cares about (it matches the invoice). The finished-second column is what the user experiences. They diverge because each WAN clip generates ~6 seconds but the concat step trims to audio-length + 0.5s — most voice lines come back at 3-4 seconds, so a lot of generated frames get cut.&lt;/p&gt;

&lt;p&gt;Shape worth noticing: going from 5 to 10 videos only saves $0.02–0.03/sec. Going from 1 to 2 saves about $0.08/sec. The biggest unit-economic win is getting a user to make their &lt;em&gt;second&lt;/em&gt; video on an existing character, not their tenth. Most of the product features I've been building (preset scenes, group videos, "make a sequel") are essentially shaped by that math.&lt;/p&gt;

&lt;h2&gt;
  
  
  Effective vs. sticker cost
&lt;/h2&gt;

&lt;p&gt;In my logs, scene image generation fails on the first try roughly 1 in 8–10 attempts. Two main causes: transient 5xx from fal, and WAN's content filter rejecting a scene description with action-y or fight-y language even after my Claude Haiku rewriter swaps the trigger words. With the rewriter, the practical reroll rate is about &lt;strong&gt;10–12%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So effective cost is ~1.10–1.12× sticker. Not nothing, but not the 1.4× you'd get with a stricter video model. If I were on Sora 2 Pro or early Veo this multiplier would be much bigger.&lt;/p&gt;

&lt;h2&gt;
  
  
  How this stacks up against just the video model
&lt;/h2&gt;

&lt;p&gt;For raw per-second video-model pricing as of mid-20&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Posted price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Seedance 1.5 Pro&lt;/td&gt;
&lt;td&gt;~$0.025/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kling 3.0&lt;/td&gt;
&lt;td&gt;~$0.029/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runway Gen-4 Turbo&lt;/td&gt;
&lt;td&gt;~$0.05/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sora 2 base&lt;/td&gt;
&lt;td&gt;~$0.10/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Veo 3.1 Fast&lt;/td&gt;
&lt;td&gt;~$0.10–$0.15/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Veo 3.1 Standard&lt;/td&gt;
&lt;td&gt;~$0.40/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sora 2 Pro&lt;/td&gt;
&lt;td&gt;~$0.30–$0.50/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WAN 2.1 i2v (720p, ≤81 frames)&lt;/td&gt;
&lt;td&gt;$0.40 / clip&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WAN 2.1 i2v (720p, 82–100 frames)&lt;/td&gt;
&lt;td&gt;$0.50 / clip&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A note on WAN pricing: fal bills it per clip, not per second. At 720p the base is $0.40 per clip, but clips over 81 frames incur a 1.25× multiplier — $0.50 per clip. My pipeline requests 100 frames per scene (~6.25 seconds raw) so I'm in the multiplier band. A clip generates 5-6 seconds of footage but I pay the same regardless of how short the finished cut is. Roundup posts that quote $0.04–$0.08/sec for WAN are usually referring to the 480p variant ($0.20/clip, halving the price) or dividing $0.40/clip by the maximum frame count rather than what the model actually emits at default settings.&lt;/p&gt;

&lt;p&gt;So in raw video-model terms my pipeline is mid-range — about Sora 2 base, cheaper than Veo Standard, more expensive than Kling. The all-in cost works out to roughly &lt;strong&gt;1.5–2× the video model alone&lt;/strong&gt;. That overhead is structural to a multi-model personalization stack: keyframe conditioning, per-character training, voice, vision, brief.&lt;/p&gt;

&lt;p&gt;Migrating to Seedance would save ~$0.04/sec on the video line and leave the other overhead untouched. The optimization question isn't "which cheaper video model?" — it's "how much can I cut from the wrapper around it?"&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually optimize
&lt;/h2&gt;

&lt;p&gt;Honest, in rough order of impact:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cut augmentation calls.&lt;/strong&gt; Biggest non-video line item and the most room. Replacing the 35-image Kontext-Pro augmentation set with a Flux LoRA training pass directly on the source + selected style image would save ~$1.40 per character. The trade-off is real (less expression range in the LoRA) — that's the next A/B I want to run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drop WAN num_frames from 100 to 80.&lt;/strong&gt; Per fal's posted pricing, 720p clips over 81 frames pay a 1.25× multiplier — so I'm paying $0.50 when I could be paying $0.40. The audio-aware trim downstream means my finished scenes rarely exceed 5 seconds anyway. Net savings: $0.40 per video, ~$0.025/finished second. This is the easiest unit-economic win in the whole stack and I have no excuse for not having shipped it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raise videos-per-character.&lt;/strong&gt; Every additional video on an existing character drops per-second cost by ~$0.02/sec. Product features that bring users back to existing characters have a direct unit-economic lever. Cheaper than optimizing models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't touch TTS.&lt;/strong&gt; Kokoro is $0.005/scene. Anything cheaper would be rounding error and Kokoro sounds better than the alternatives at this price.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What I would &lt;em&gt;not&lt;/em&gt; spend time on: switching the video model. Cheaper options exist but the headroom isn't in the model — it's in the&lt;br&gt;
 conditioning and training around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it if you want
&lt;/h2&gt;

&lt;p&gt;If you want to see what $0.19/finished second looks like as an actual video — and figure out whether my math is right — the product is at &lt;a href="https://www.atveanimation.com" rel="noopener noreferrer"&gt;atveanimation.com&lt;/a&gt;. Upload a photo, pick a style, hit generate. Free tier gives you 10 scenes a day, which is enough for two short videos.&lt;/p&gt;

&lt;p&gt;If you find a way to crash the augmentation step or rack up a $20 bill on a single account, please tell me. I'd genuinely like to know.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;I'm not going to pretend this teardown is novel — anyone with a billing dashboard and a calculator can produce one. But I hadn't seen one written publicly for a WAN + Kontext + flux-lora + Kokoro stack, and the augmentation surprise (3.5× the cost of the LoRA training it feeds) was non-obvious enough to me, after a year of building this, that it probably warranted writing down.&lt;/p&gt;

&lt;p&gt;If you spot something off in the numbers, the comments are open. I'll fix the post rather than defend it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Posted from my own desk on a Friday afternoon. Numbers reconcile to the nearest cent; if they don't reconcile to yours I'd love to know why.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>sideprojects</category>
    </item>
  </channel>
</rss>
