<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ricardo Ghekiere (runflow)</title>
    <description>The latest articles on DEV Community by Ricardo Ghekiere (runflow) (@ricardoghekiere).</description>
    <link>https://dev.to/ricardoghekiere</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875571%2F789bb090-d1fb-403a-9a9a-eec0f7eb33dd.jpeg</url>
      <title>DEV Community: Ricardo Ghekiere (runflow)</title>
      <link>https://dev.to/ricardoghekiere</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ricardoghekiere"/>
    <language>en</language>
    <item>
      <title>I Generated 35 Million AI Images. The Model Was Never the Product.</title>
      <dc:creator>Ricardo Ghekiere (runflow)</dc:creator>
      <pubDate>Sun, 12 Apr 2026 22:23:01 +0000</pubDate>
      <link>https://dev.to/ricardoghekiere/i-generated-35-million-ai-images-the-model-was-never-the-product-2n9j</link>
      <guid>https://dev.to/ricardoghekiere/i-generated-35-million-ai-images-the-model-was-never-the-product-2n9j</guid>
      <description>&lt;p&gt;Most teams building with AI image generation APIs obsess over which model to use. FLUX or Stable Diffusion? Which checkpoint? Which LoRA?&lt;/p&gt;

&lt;p&gt;I ran an AI headshot company that generated over 35 million images in two years. Crossed $2.2M in revenue. Hit 87% gross margins. And the model we used was open source. Free.&lt;/p&gt;

&lt;p&gt;The model was never what made it work. The workflow around the model was.&lt;/p&gt;

&lt;p&gt;Here's what I learned building AI image pipelines at scale, and why most teams get the architecture completely wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "generate and pray" problem
&lt;/h2&gt;

&lt;p&gt;Here's how most teams ship AI-generated images today:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User sends a request&lt;/li&gt;
&lt;li&gt;Call an image generation API&lt;/li&gt;
&lt;li&gt;Return whatever comes back&lt;/li&gt;
&lt;li&gt;Hope it's good&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This works fine at 10 images a day. It breaks completely at 10,000.&lt;/p&gt;

&lt;p&gt;At scale, defects become statistical certainties. Face distortions. Wrong backgrounds. Artifacts that look fine at thumbnail size and horrific at full resolution. Skin tone inconsistencies. Missing fingers (the classic).&lt;/p&gt;

&lt;p&gt;When you generate 100 images, you might get lucky. When you generate 100,000, you will ship garbage. Guaranteed. The only question is how much.&lt;/p&gt;

&lt;p&gt;We learned this the hard way. Our first month running AI headshots, we generated a batch of images for a customer and delivered them without any automated QA. The customer's feedback: "Why does my colleague have three ears?"&lt;/p&gt;

&lt;p&gt;That was the last time we shipped without scoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  The assembly line, not the craftsman
&lt;/h2&gt;

&lt;p&gt;In 1913, a skilled craftsman took 12 hours to build a single car chassis. Henry Ford didn't hire a better craftsman. He built the assembly line. Specialized stations. Quality inspection at every step. Rework loops when something failed. Result: 93 minutes per chassis. 8x faster. 69% cheaper.&lt;/p&gt;

&lt;p&gt;Most AI image teams today are still in the craftsman era. One model call. One output. Ship it.&lt;/p&gt;

&lt;p&gt;What we built instead was an assembly line for AI images. Three distinct layers, each solving a different problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: Generate more than you need
&lt;/h2&gt;

&lt;p&gt;This sounds wasteful. It's the opposite.&lt;/p&gt;

&lt;p&gt;For every customer request, we didn't generate 1 image. We generated 240 candidates. Only the best 60 made it to the customer. The other 180 went straight to the trash.&lt;/p&gt;

&lt;p&gt;The math works because GPU time is cheap compared to a bad customer experience. At our volumes, generating 4x more candidates added roughly $0.02 per delivered image. A single refund from a bad image costs 100x that.&lt;/p&gt;

&lt;p&gt;The key insight: treat image generation like a funnel, not a function call. You're not calling an API. You're running a selection process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simplified version of our generation loop
&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_candidates&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;random_seed&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;select_cheapest_available_provider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Score all candidates
&lt;/span&gt;&lt;span class="n"&gt;scored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;quality_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Deliver only what passes threshold
&lt;/span&gt;&lt;span class="n"&gt;delivered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scored&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Layer 2: Score everything before it ships
&lt;/h2&gt;

&lt;p&gt;This is where most teams have a blind spot. They generate images but have no automated way to evaluate whether the output is actually good.&lt;/p&gt;

&lt;p&gt;We built a three-tier scoring system:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1: Generic quality.&lt;/strong&gt; Does the image have artifacts? Is it sharp? Does it match the prompt? These checks apply to every single image regardless of use case. Think of it as a basic sanity check.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2: Use-case specific.&lt;/strong&gt; For headshots, this meant: face fidelity, expression naturalness, skin tone consistency, lighting quality, background coherence. A perfectly sharp image with a distorted face is still unusable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 3: Custom rules.&lt;/strong&gt; Business-specific criteria. "No visible branding in the background." "Skin tone must be within 2 stops of reference." "Eyes must be open." Whatever the client cares about.&lt;/p&gt;

&lt;p&gt;Each dimension gets scored independently. The final decision isn't a single number. It's a pass/fail across all dimensions, with configurable thresholds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;quality_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="c1"&gt;# Tier 1: Generic
&lt;/span&gt;    &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;artifacts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_artifacts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sharpness&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;measure_sharpness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt_alignment&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clip_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Tier 2: Use-case specific
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;use_case&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;headshot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;face_fidelity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;score_face&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;expression&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;score_expression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;skin_tone&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;score_skin_consistency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Tier 3: Custom rules
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rule&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;custom_rules&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Pass/fail per dimension
&lt;/span&gt;    &lt;span class="n"&gt;passed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ScoredImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result: we eliminated manual QA entirely. No human ever looked at the rejected images. The scoring layer caught everything.&lt;/p&gt;

&lt;p&gt;When we didn't have this (early days), our customer support tickets were 40% image quality complaints. After implementing automated scoring, they dropped to under 3%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: Route to the cheapest GPU that can do the job
&lt;/h2&gt;

&lt;p&gt;This is the one nobody talks about.&lt;/p&gt;

&lt;p&gt;When you're calling AI image generation APIs at scale, you're probably using one provider. Maybe &lt;a href="http://fal.ai" rel="noopener noreferrer"&gt;fal.ai&lt;/a&gt;, maybe Replicate, maybe &lt;a href="http://Together.ai" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt;. You picked one, integrated it, and moved on.&lt;/p&gt;

&lt;p&gt;That's leaving money on the table.&lt;/p&gt;

&lt;p&gt;We built a routing layer that checked multiple providers on every single request and sent the job to the cheapest one that was currently available and fast enough.&lt;/p&gt;

&lt;p&gt;Why this matters: provider pricing varies wildly. Not just between providers, but within the same provider over time. Spot pricing changes. Capacity fluctuates. Cold start times spike during peak hours.&lt;/p&gt;

&lt;p&gt;Some real numbers from our routing data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider scenario&lt;/th&gt;
&lt;th&gt;Cost per image (1 megapixel)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single provider, no routing&lt;/td&gt;
&lt;td&gt;$0.035&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cheapest provider at any given moment&lt;/td&gt;
&lt;td&gt;$0.012&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;With fallback on timeout/error&lt;/td&gt;
&lt;td&gt;$0.014&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's a 60-65% cost reduction just from routing. At 100K+ images per month, this is the difference between a viable business and burning cash.&lt;/p&gt;

&lt;p&gt;The routing decision is simple in concept:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;select_provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_healthy_providers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Filter by capability
&lt;/span&gt;    &lt;span class="n"&gt;capable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;supports&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="c1"&gt;# Sort by current effective cost
&lt;/span&gt;    &lt;span class="n"&gt;capable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current_cost_per_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Return cheapest, with fallback chain
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;capable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;capable&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;fallback_provider&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, there's more to it. You need health checking (is this provider actually responding right now?), timeout handling (if it takes too long, abort and retry on a different provider), and cost tracking (did the actual cost match what we expected?).&lt;/p&gt;

&lt;p&gt;But the basic pattern is dead simple: check what's available, pick the cheapest, have a fallback.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers that convinced me
&lt;/h2&gt;

&lt;p&gt;Before the routing and scoring layers, our unit economics looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;COGS: ~40% of revenue&lt;/li&gt;
&lt;li&gt;Customer complaints about quality: ~40% of support tickets&lt;/li&gt;
&lt;li&gt;Manual QA required: yes, for every batch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;COGS: 11% of revenue&lt;/li&gt;
&lt;li&gt;Quality complaints: under 3% of tickets&lt;/li&gt;
&lt;li&gt;Manual QA: zero&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gross margins went from roughly 60% to 87%. On the same models. Same images. Same customers. The only thing that changed was the workflow around the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this pattern works for any AI image use case
&lt;/h2&gt;

&lt;p&gt;We started with headshots. But the pattern applies everywhere.&lt;/p&gt;

&lt;p&gt;Background removal? Same thing. Commercial APIs charge $0.02 to $0.20 per image. Self-hosted open source models can do it for $0.0004. But only if you have the routing and quality layers to handle provider failures, cold starts, and the occasional garbage output.&lt;/p&gt;

&lt;p&gt;Product photography? Virtual try-on? Ad creative generation? The specific models change. The scoring dimensions change. But the architecture stays the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate more candidates than you need&lt;/li&gt;
&lt;li&gt;Score every candidate automatically&lt;/li&gt;
&lt;li&gt;Route to the cheapest capable provider&lt;/li&gt;
&lt;li&gt;Only deliver what passes your quality bar&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's not complicated. It's just a pattern most teams haven't adopted yet because they're still in the "call one API and hope" phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;If I were starting a new AI image product today, I'd build the scoring layer before I built the product. Not after. Not when quality becomes a problem. Before.&lt;/p&gt;

&lt;p&gt;Here's why: the scoring layer changes what's possible. When you can automatically evaluate quality, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use cheaper models and compensate with volume&lt;/li&gt;
&lt;li&gt;Switch providers without regression testing every image by hand&lt;/li&gt;
&lt;li&gt;Set up automated retry loops (generate, score, regenerate if failed)&lt;/li&gt;
&lt;li&gt;Give customers quality guarantees instead of quality hopes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model is a commodity. There are hundreds of them. New ones every week. The workflow is the moat.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we're building now
&lt;/h2&gt;

&lt;p&gt;We took everything we learned from generating 35 million images and turned it into &lt;a href="https://www.runflow.io" rel="noopener noreferrer"&gt;Runflow&lt;/a&gt;. It's the infrastructure layer we wish existed when we started: automated quality evaluation, multi-provider routing, one-click deployment for ComfyUI workflows. The things that took us two years to build from scratch.&lt;/p&gt;

&lt;p&gt;If you're running AI image generation at any kind of scale and want to compare notes, I'm always up for a conversation. Find me on &lt;a href="https://www.linkedin.com/in/ricardoghekiere/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop a comment.&lt;/p&gt;

&lt;p&gt;The model is never the product. The workflow is the product.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Ricardo Ghekiere, CEO at &lt;a href="https://www.runflow.io" rel="noopener noreferrer"&gt;Runflow&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nanobanana</category>
      <category>infrastructure</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
