In e-commerce, the difference between a sale and a bounce is often the lighting. But professional shoots are expensive. At Katalyst AI, we wanted to bridge that gap by turning raw smartphone photos into marketplace-ready 4K assets in under 60 seconds.
Here’s the technical breakdown of how we built the pipeline using Next.js and Gemini.
- The Challenge: Beyond Background Removal Most tools just remove backgrounds. We needed to handle:
Scene Consistency: Ensuring the product lighting matches the generated background.
Resolution: Upscaling mobile shots to 4K without losing texture.
SEO Automation: Generating marketplace-specific metadata simultaneously.
- The Tech Stack We leaned into a modern, performance-first stack:
Framework: Next.js 14 (App Router) for high-performance SSR.
AI Engine: Gemini 3 Flash for high-speed image-to-image prompting and vision analysis.
Styling: Tailwind CSS for a clean, minimalist UI that stays out of the user's way.
Database: PostgreSQL (via Prisma) for managing user galleries and asset states.
- Implementing the Gemini Pipeline Integrating Gemini wasn't just about a single API call; it was about a multi-stage workflow.
Stage A: Vision Analysis
We use Gemini’s vision capabilities to "see" the product. It identifies the material (e.g., "matte leather") and the original lighting source to ensure the generated environment feels real.
Stage B: The Image-to-Image Prompt
Instead of generic prompts, we programmatically wrap user inputs:
TypeScript
const systemPrompt = Act as a professional product photographer.
Place the detected object in a ${userSelectedTheme} setting.
Maintain sharp focus on the product, 4k resolution, cinematic lighting.
- Scaling Performance with Next.js To maintain a 95%+ success rate, we utilized:
Edge Functions: To handle the initial image processing near the user, reducing latency.
Optimistic UI Updates: Users see the "Processing" state immediately, with real-time previews as the Gemini model iterates on the upscale.
- Lessons Learned (The "Gotchas") Prompt Injection: We found that "less is more." Over-prompting Gemini often led to artifacts. We moved to a "base-template" approach where the AI has more creative freedom within set guardrails.
File Size: Handling 4K images meant optimizing our S3 upload strategy using pre-signed URLs to keep the Next.js server from becoming a bottleneck.
- What’s Next for Katalyst AI? We are currently exploring batch processing—allowing a seller to upload 20 raw shots and receive a full catalog in one go. We’re also refining our SEO engine to automatically generate descriptions that aren't just accurate but persuasive.
Final Thoughts
Building with Gemini and Next.js allowed us to move from concept to a production-ready tool with incredible speed. If you're building in the AI space, the key is focusing on the user's friction, not just the AI's novelty.
Check out the live version at katalystai.co.uk
Top comments (0)