Ryan Kramer

Posted on Mar 7

How I Built an AI Product Photography Tool With FastAPI and Flux Models

#ai #python #fastapi #ecommerce

I spent $6,000 last year on product photography for my ecommerce store. 60 SKUs, $200-500 per shoot, a week turnaround each time, and half the shots were unusable.

I'm also a developer. So I built PixelPanda — upload a phone snap of any product, get 200 studio-quality photos in about 30 seconds.

This post breaks down the technical architecture, the AI pipeline, and the tradeoffs I made building it as a solo developer.

Architecture Overview

Client (Jinja2 + vanilla JS)
    |
FastAPI (Python)
    |
+----------------------------------+
|  Replicate API                   |
|  +- Flux Kontext Max (product)   |
|  +- Flux 1.1 Pro Ultra (avatar)  |
|  +- BRIA RMBG-1.4 (bg removal)  |
|  +- Real-ESRGAN (upscaling)      |
+----------------------------------+
    |
Cloudflare R2 (storage)
    |
MySQL (metadata)

The whole thing runs on a single Ubuntu VPS behind Nginx with Supervisor managing the process. Total infra cost: ~$50/month.

Why FastAPI Over Django or Express

Three reasons:

Async by default. Image generation calls take 5-30 seconds. FastAPI's native async support means I can handle many concurrent generation requests without blocking.
Pydantic validation. Every API request gets validated before it touches the AI pipeline. When you're burning $0.03-0.05 per Replicate API call, you don't want malformed requests wasting money.
Simple enough to stay in one file per feature. Each router handles one domain — processing.py for image transforms, avatars.py for avatar generation, catalog.py for batch product photos. No framework magic to debug.

@router.post("/api/process")
async def process_image(
    file: UploadFile,
    processing_type: str,
    user: User = Depends(get_current_user)
):
    if user.credits < 1:
        raise HTTPException(402, "Insufficient credits")

    result_url = await run_replicate_model(
        model=MODEL_MAP[processing_type],
        input_image=file
    )

    user.credits -= 1
    db.commit()

    return {"result_url": result_url}

The AI Pipeline: How Product Photos Get Generated

The core product photo generation uses Flux Kontext Max through Replicate. Here's how it works:

Step 1: Background Removal

Before compositing, I strip the background using BRIA's RMBG-1.4 model. This gives me a clean product cutout regardless of what the user uploads — kitchen counter, carpet, hand-held, doesn't matter.

Step 2: Scene Compositing

The cleaned product image gets sent to Flux Kontext Max along with a scene prompt. The model handles:

Lighting direction and intensity
Realistic shadows and reflections
Background composition
Product placement and scale

Each scene template (studio, lifestyle, outdoor, flat lay, etc.) maps to a carefully tuned prompt. This is where most of the iteration went — getting prompts that produce consistent, professional results across different product types.

SCENE_TEMPLATES = {
    "white_studio": {
        "prompt": "Professional product photograph on clean white background, "
                  "soft studio lighting from upper left, subtle shadow, "
                  "commercial ecommerce style, 4K",
        "negative": "text, watermark, blurry, low quality"
    },
    "lifestyle_kitchen": {
        "prompt": "Product placed naturally on marble kitchen counter, "
                  "warm morning light through window, shallow depth of field, "
                  "lifestyle photography style",
        "negative": "text, watermark, artificial looking"
    },
    # ... 10 more templates
}

Step 3: Quality Enhancement (Optional)

Users can upscale results using Real-ESRGAN for marketplace listings that need high-res images (Amazon requires 1600px minimum on the longest side).

The Hardest Technical Problem: Prompt Consistency

The biggest challenge wasn't the pipeline — it was getting consistent results. Early versions would:

Change the product color or shape
Add phantom elements (extra products, random objects)
Produce lighting that didn't match the scene
Scale the product incorrectly

The fix was a combination of:

Aggressive negative prompting to prevent hallucinations
Reference image anchoring — Flux Kontext Max accepts both a reference image and a prompt, which keeps the product faithful to the original
Post-generation validation — basic checks on output dimensions, color distribution, and face detection (to catch cases where the model hallucinates people into product shots)

This prompt engineering was 80% of the development time. The actual API integration and web app were straightforward.

Avatar Generation: A Different Pipeline

For lifestyle marketing shots (model holding/wearing the product), I use a separate pipeline built on Flux 1.1 Pro Ultra with Raw Mode.

Raw Mode is key — it produces photorealistic, unprocessed-looking images. Without it, AI-generated people have that telltale "too perfect" look. With Raw Mode enabled, you get natural skin texture, realistic lighting falloff, and believable imperfections.

The avatar system lets users either pick from 111 pre-made AI models or build their own using a guided wizard. The wizard collects demographic preferences and generates a consistent character that can be reused across multiple product shots.

Payments: Why Stripe One-Time Checkout

The entire payment system is a single Stripe Checkout session:

session = stripe.checkout.Session.create(
    mode="payment",  # not "subscription"
    line_items=[{
        "price_data": {
            "currency": "usd",
            "unit_amount": 500,  # $5.00
            "product_data": {"name": "PixelPanda - 200 Credits"}
        },
        "quantity": 1
    }],
    metadata={
        "user_id": str(user.id),
        "credits_amount": "200"
    }
)

One webhook handler catches checkout.session.completed, reads the metadata, and applies credits. No subscription state machine, no recurring billing logic, no failed payment recovery flows. The simplest possible payment integration.

The tradeoff is obvious: $5 per customer makes paid acquisition nearly impossible. My Google Ads CPA is $35. But the simplicity saved weeks of development time and eliminates an entire category of support tickets.

Infrastructure: Keeping It Simple

No Kubernetes. No microservices. No message queues.

Nginx (SSL termination, static files)
  +- Supervisor (process management)
      +- Uvicorn (FastAPI app, 4 workers)
          +- MySQL (local)

Replicate handles all the GPU compute. I don't run any ML models locally. This means:

No GPU servers to manage
No model loading/unloading
No CUDA driver headaches
Scaling = Replicate's problem

The downside is latency (network round-trip to Replicate) and cost (their margin on top of compute). But for a solo developer, not managing GPU infrastructure is worth it.

Cloudflare R2 stores all generated images. It's S3-compatible, has no egress fees, and costs nearly nothing at my scale.

Numbers

Being transparent because I think more developers should share real numbers:

Infra cost: ~$50/month (VPS + domain)
Variable cost: $0.03-0.05 per generation (Replicate API)
Revenue: Low three figures/month (2-3 purchases/day at $5)
Best acquisition channel: ChatGPT referrals (11% signup conversion — I didn't do anything to cause this)
Photo quality: Within 2-3% CTR of professional photography in A/B tests on real ecommerce listings

What I'd Do Differently

Start with prompt engineering, not code. I built the entire web app before nailing down the prompts. Should have spent the first month just generating photos in a notebook and perfecting prompts.
Skip the free tools. I built 26 free image tools (background remover, resizer, etc.) for SEO. They get 5,000+ sessions/week but almost nobody converts. The traffic and the paying audience are completely different.
Charge more from day one. $5 felt right as a user but it's brutal as a business. Low enough that paid acquisition doesn't work, high enough that people still hesitate. The worst of both worlds.

Try It

If you sell physical products and want to see the output quality: pixelpanda.ai

If you're building with Replicate or Flux models and have questions about the pipeline, drop a comment — happy to go deeper on any part of this.

DEV Community