<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: lsm166</title>
    <description>The latest articles on DEV Community by lsm166 (@166_73a70b7af425e036b1).</description>
    <link>https://dev.to/166_73a70b7af425e036b1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3910331%2Fa72171af-e9ac-44fa-9213-68a0ea81f5b4.png</url>
      <title>DEV Community: lsm166</title>
      <link>https://dev.to/166_73a70b7af425e036b1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/166_73a70b7af425e036b1"/>
    <language>en</language>
    <item>
      <title>I Built a Multi-Model AI Image &amp; Video Platform — Here's What I Learned</title>
      <dc:creator>lsm166</dc:creator>
      <pubDate>Sun, 03 May 2026 13:01:49 +0000</pubDate>
      <link>https://dev.to/166_73a70b7af425e036b1/i-built-a-multi-model-ai-image-video-platform-heres-what-i-learned-5d0j</link>
      <guid>https://dev.to/166_73a70b7af425e036b1/i-built-a-multi-model-ai-image-video-platform-heres-what-i-learned-5d0j</guid>
      <description>&lt;p&gt;A few months ago I started building &lt;a href="https://bananai.io/" rel="noopener noreferrer"&gt;Bananai&lt;/a&gt; — a platform that&lt;br&gt;
lets you generate images, edit photos, and create AI videos all in one place without&lt;br&gt;
switching between five different tools and five different billing dashboards.&lt;/p&gt;

&lt;p&gt;Here's what I actually learned integrating 10+ models end-to-end as an indie dev.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why I built it
&lt;/h2&gt;

&lt;p&gt;My problem was embarrassingly mundane: I needed to test GPT Image 2 against Nano Banana&lt;br&gt;
Pro for an e-commerce client's product shots. I had &lt;strong&gt;four browser tabs open&lt;/strong&gt;, four&lt;br&gt;
different logins, four different credit top-ups, and I was copy-pasting the same prompt&lt;br&gt;
over and over.&lt;/p&gt;

&lt;p&gt;The obvious solution was to wrap them in a unified UI. What I thought would take a&lt;br&gt;
weekend ended up being a real product — because model integration is the easy part.&lt;br&gt;
Everything else is the hard part.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. Picking the right model for the task is not obvious
&lt;/h2&gt;

&lt;p&gt;The first mistake I made was treating "best model = best output for every use case." That's&lt;br&gt;
wrong. Here's how I actually think about it now:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Model choice&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fast social content iteration&lt;/td&gt;
&lt;td&gt;Nano Banana 2&lt;/td&gt;
&lt;td&gt;2–5s per image, good enough quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Client deliverables / 4K output&lt;/td&gt;
&lt;td&gt;Nano Banana Pro&lt;/td&gt;
&lt;td&gt;Max quality, character consistency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text in images (logos, UI mocks)&lt;/td&gt;
&lt;td&gt;GPT Image 2&lt;/td&gt;
&lt;td&gt;Best text rendering accuracy by far&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Animate a still product photo&lt;/td&gt;
&lt;td&gt;Seedance / Veo&lt;/td&gt;
&lt;td&gt;Different motion styles, test both&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Background removal + style transfer&lt;/td&gt;
&lt;td&gt;Nano Banana (editing mode)&lt;/td&gt;
&lt;td&gt;Natural language edit instructions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The practical takeaway: &lt;strong&gt;expose model selection to the user early.&lt;/strong&gt; Users figure out&lt;br&gt;
their own preferences faster than any default you set. I wasted a month pre-selecting&lt;br&gt;
defaults before realizing users just want the picker.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. The cost structure is wilder than you'd expect
&lt;/h2&gt;

&lt;p&gt;Before building this I assumed model pricing was roughly proportional to quality. It's not.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT Image 2&lt;/strong&gt; — ~$0.006/image at standard quality. Shockingly cheap for what it outputs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nano Banana 2&lt;/strong&gt; — fast and economical, great for high-volume generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video models&lt;/strong&gt; — anywhere from 10x to 50x more expensive than image per output second.
Budget for this separately. Video is a different product category economically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thing that trips up most builders: &lt;strong&gt;you need to model your credit burn rate for&lt;br&gt;
different user behavior patterns, not averages.&lt;/strong&gt; A power user generating 4K video clips&lt;br&gt;
will burn through credits 30x faster than someone doing quick image edits. If you price&lt;br&gt;
on averages, power users kill your margin.&lt;/p&gt;

&lt;p&gt;My approach: track generation cost per user session, flag sessions above the 95th&lt;br&gt;
percentile, and cap or up-sell those users. Works better than blanket rate limits.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. Async generation UX is harder than sync
&lt;/h2&gt;

&lt;p&gt;Most image models are fast enough to feel synchronous (~3–5 seconds). But video generation&lt;br&gt;
can take 20–60 seconds depending on model and duration. That's a different UX contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What didn't work:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spinner with no feedback → users thought it crashed and refreshed&lt;/li&gt;
&lt;li&gt;Showing "estimated time" → estimates were wrong enough to frustrate people more than no
estimate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What actually worked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Progress bar that moves in two phases: "queued → processing" with distinct visual states&lt;/li&gt;
&lt;li&gt;Showing a low-res preview/thumbnail as soon as it's available while full resolution
renders&lt;/li&gt;
&lt;li&gt;Email/notification when a long video finishes (reduces page-staring behavior)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For image generation, the UX expectation is basically instant. Anything over 8 seconds&lt;br&gt;
feels broken to users, even if technically the model is just slow. Cache aggressively,&lt;br&gt;
pre-warm where possible.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. Prompt UX is underrated
&lt;/h2&gt;

&lt;p&gt;Most AI image tools show you a blank text box. That's terrible for conversion and&lt;br&gt;
terrible for retention — new users don't know what to type and leave.&lt;/p&gt;

&lt;p&gt;What I added instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;promptSuggestions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;🎨 Product Hero Shot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;🖼️ Remove Background&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;🎬 Cinematic Scene&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;✨ Style Transfer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These aren't just UI sugar. Each one loads a &lt;strong&gt;pre-filled prompt template with the&lt;br&gt;
right model pre-selected and the right settings pre-configured.&lt;/strong&gt; Click "Product Hero&lt;br&gt;
Shot" and you get an image-to-image flow with the right aspect ratio for e-commerce, not&lt;br&gt;
a blank canvas.&lt;/p&gt;

&lt;p&gt;Conversion from landing → first generation went up significantly after this. Users who&lt;br&gt;
generate at least one image on their first visit retain at 3x the rate of those who&lt;br&gt;
don't.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Multi-model output comparison is the killer feature nobody talks about
&lt;/h2&gt;

&lt;p&gt;The most popular session pattern I see in analytics: user generates with one model, then&lt;br&gt;
immediately tries the same prompt with another model to compare. This is the workflow&lt;br&gt;
professional designers actually use — they're not loyal to a model, they want to see options.&lt;/p&gt;

&lt;p&gt;Building side-by-side comparison mode is on the roadmap. If you're building a similar&lt;br&gt;
tool: &lt;strong&gt;this is worth prioritizing early.&lt;/strong&gt; It's also the stickiest feature because it&lt;br&gt;
locks users into your platform (they can't compare across separate tools with one click).&lt;/p&gt;




&lt;h2&gt;
  
  
  What's live now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://bananai.io/" rel="noopener noreferrer"&gt;Bananai&lt;/a&gt; currently has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Image generation&lt;/strong&gt;: Nano Banana 2 &amp;amp; Pro, GPT Image 2, Grok Imagine, Midjourney,
Seedream, Wan 2.7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image editing&lt;/strong&gt;: background removal, style transfer, inpainting, upscaling — all
via natural language&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video generation&lt;/strong&gt;: Veo 3.1, Seedance 2.0, Wan 2.7 Video, Grok Imagine Video&lt;/li&gt;
&lt;li&gt;Free credits on sign-up, daily check-in credits, no credit card required to start&lt;/li&gt;
&lt;li&gt;GPT Image 2 at &lt;strong&gt;$0.006/image&lt;/strong&gt; — cheapest I've found anywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building something with AI image generation or just want to test models&lt;br&gt;
without juggling multiple accounts, &lt;a href="https://bananai.io/" rel="noopener noreferrer"&gt;give it a try&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with one model, nail the UX, then add models.&lt;/strong&gt; I added too many too fast and
spread the QA effort too thin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instrument cost tracking from day one.&lt;/strong&gt; I retrofitted it and lost two weeks of data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't underestimate video.&lt;/strong&gt; It looks like "image but animated" but it's actually
a completely different infrastructure, moderation, and pricing problem.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Happy to answer questions about any of this — model integration, credit system design,&lt;br&gt;
or the UX decisions. Drop them in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Next.js, deployed on Vercel, with Cloudflare for CDN/image resizing.&lt;br&gt;
Backend is a monolith I'm slowly ashamed of.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>buildinpublic</category>
    </item>
  </channel>
</rss>
