Kazutaka Sugiyama

Posted on Feb 27

I Switched My AI Image Model and the Cost Went Up 6.7x — Here's Why I Did It Anyway

#ai #webdev #opensource #showdev

You know that feeling when you find a better tool, but the pricing page makes you close the tab?

I run RepoClip, a SaaS that generates promo videos from GitHub repos using AI. The pipeline analyzes source code with Gemini, generates scene images, adds narration, and renders a video — all automatically. Images are at the center of the output quality. Every video has 6 scene images, and they're what users see first.

I'd been using FLUX.2 [dev] via Fal.ai for image generation since launch. It worked. Then I saw that Nano Banana 2 — Google's new image model — had landed on Fal.ai. I decided to test it with the exact same prompt.

The results were not close.

Same Prompt, Different Universe

I built a comparison script that runs the same prompt through both models across 4 visual styles (Tech, Realistic, Minimal, Vibrant). Here's the prompt:

"A digital dashboard showing interconnected data nodes and flowing information streams, representing an intelligent automation platform that connects multiple services and workflows seamlessly"

Tech Style

FLUX.2	Nano Banana 2

FLUX.2 gives you a flat infographic with a UI overlay. Nano Banana 2 produces a cinematic, three-dimensional data flow with depth and lighting that looks like concept art.

Realistic Style

FLUX.2	Nano Banana 2

FLUX.2 renders a network graph that looks like a screenshot from a monitoring tool. Nano Banana 2 creates a photorealistic control room scene — you can almost feel the screen glow.

Vibrant Style

FLUX.2	Nano Banana 2

This is where the gap is widest. FLUX.2 gives a cartoon explosion of color. Nano Banana 2 produces a bold, structured composition with neon circuit aesthetics that actually looks intentional.

The difference wasn't subtle. Every style, every prompt — Nano Banana 2 was producing images that looked like they belonged in a final product, not a prototype.

But then I checked the pricing.

The 6.7x Problem

	FLUX.2 [dev]	Nano Banana 2
Cost per image	~$0.012	~$0.08
Generation time	~7.7s	~31.3s

That's a 6.7x increase in per-image cost and 4x slower generation. For a bootstrapped SaaS, these numbers don't inspire confidence.

My gut said "too expensive." But I'd learned from past experience that gut feelings about costs are usually wrong — you need actual numbers.

Running the Simulation

Instead of staring at a spreadsheet, I asked Claude Code to run a scenario:

Assumptions:

10 Starter users ($29/mo each) generating 50 videos total
3 Pro users ($79/mo each) generating 50 videos total
6 images per video
Only revenue: subscription fees. Only cost: image generation API.

                    FLUX.2          Nano Banana 2
─────────────────────────────────────────────────
Images generated    600             600
Image cost          $7.20           $48.00
Monthly revenue     $527.00         $527.00
Profit              $519.80         $479.00
Margin              98.6%           90.9%

The difference was $40.80/month. That's it.

Even in this conservative scenario — just 13 paying users, no free tier buffer, ignoring all other operational costs — the profit impact was under 8 percentage points. With real-world numbers where hosting, AI analysis, and TTS costs dominate the bill, image generation was a rounding error either way.

The quality gap was obvious. The cost gap was negligible. Decision made.

Problem #1: File Size Explosion

The first thing I noticed after switching: Nano Banana 2's PNG outputs were massive.

Format	File Size (same prompt)
FLUX.2 PNG	978 KB
Nano Banana 2 PNG	2,043 KB
Nano Banana 2 JPEG	401 KB

Nano Banana 2 PNGs were 2x larger than FLUX.2. For a video pipeline that downloads 6 images per generation and passes them to a Lambda renderer, this matters — both for speed and storage costs.

The fix was one line:

const result = await fal.subscribe("fal-ai/nano-banana-2", {
  input: {
    prompt: enhancedPrompt,
    aspect_ratio: aspectRatio,
    resolution: "1K",
    output_format: "jpeg",  // <-- this
    num_images: 1,
  },
});

Since these images are frames in a video (which itself uses lossy H.264 compression), lossless PNG was overkill from the start. JPEG at default quality cut the size by 80% with no visible difference in the final rendered video.

Problem #2: The 4x Slower Generation

Nano Banana 2 takes ~31 seconds per image versus FLUX.2's ~8 seconds. For 6 scenes, that's 186 seconds sequential — over 3 minutes just on images.

But I was already generating all scene images in parallel:

const images = await Promise.all(
  scenes.map(scene => generateImage(scene.imagePrompt, aspectRatio, visualStyle))
);

With Promise.all, the wall-clock time is the slowest single image, not the sum. In practice, that's ~36 seconds — about 28 seconds more than before. Against a pipeline timeout of 15 minutes, this was a non-issue.

If you're calling AI APIs sequentially and wondering why things are slow, this is your sign to parallelize.

Problem #3: Brand Logo Hallucination

This one surprised me. Nano Banana 2 is significantly better at rendering recognizable imagery — and that's not always a good thing.

When the scene prompt contained words like "GitHub" or "Python," FLUX.2 would generate abstract tech art. Nano Banana 2 would render the actual Octocat logo or a realistic Python logo. For a product that generates promotional videos, having trademarked imagery appear in user content is a liability.

The fix was adding explicit exclusion instructions to every prompt:

const enhancedPrompt = `${prompt}. ${stylePrompt}. No text, no UI elements, no screenshots, no logos, no brand imagery, no mascots.`;

The last three exclusions (no logos, no brand imagery, no mascots) were added specifically for Nano Banana 2. FLUX.2 never needed them because it wasn't capable enough to reproduce them.

The Migration Diff

The actual code change was small. Here's the core of fal.ts before and after:

- const result = await fal.subscribe("fal-ai/flux-2", {
+ const result = await fal.subscribe("fal-ai/nano-banana-2", {
    input: {
      prompt: enhancedPrompt,
-     image_size: "landscape_16_9",
-     num_inference_steps: 28,
+     aspect_ratio: aspectRatio,
+     resolution: "1K",
+     output_format: "jpeg",
      num_images: 1,
    },
  });

Different model, different API shape, but the wrapper function signature stayed the same. Downstream code didn't change at all.

What I Learned

1. Always simulate before you panic. A 6.7x per-unit cost increase sounds terrifying in isolation. In the context of actual revenue and usage patterns, it was noise. Run the numbers before making decisions based on sticker shock.

2. Lossy formats are fine for intermediate assets. If your images are being consumed by a video encoder, compressed for web display, or otherwise transformed downstream, PNG is a waste of bandwidth. Match the format to the use case.

3. Better models bring new problems. FLUX.2 couldn't render brand logos, so I never had to worry about it. Nano Banana 2 can, so now I need explicit exclusion prompts. Capability upgrades aren't just free wins — they shift the problem space.

4. Parallelization absorbs latency. A 4x slower model barely matters when you're already running requests concurrently. Design for parallelism from the start and model speed becomes a secondary concern.

The Stack

Framework: Next.js (App Router) + TypeScript
Orchestration: Inngest
Code Analysis: Gemini 2.5 Flash
Image Generation: Nano Banana 2 via Fal.ai
Narration: OpenAI TTS
Video Rendering: Remotion Lambda
Database/Auth/Storage: Supabase
Deployment: Vercel

Try It

If you want to see what Nano Banana 2 produces in a real pipeline, try it on your own repo: repoclip.io

The free tier gives you 2 videos/month. Paste a GitHub URL and you'll have a narrated demo video in a few minutes.

I'd love to hear from the community:

Have you switched AI models in production and been surprised by the cost impact?
What's your approach to evaluating model upgrades — vibes, benchmarks, or simulations?

Drop a comment or find me on GitHub.

The comparison images in this post were generated with the exact same prompt, same visual style settings, same pipeline. The only variable was the model. Sometimes the upgrade really is worth it.

DEV Community