Om Prakash

Posted on Mar 23 • Originally published at pixelapi.dev

We Built a One-Call AI Product Photography API — Here's How It Works Under the Hood

#api #python #machinelearning #webdev

We Built a One-Call AI Product Photography API — Here's How It Works Under the Hood

If you sell products online, you know the pain: raw product photos shot on a kitchen counter look terrible. Professional product photography costs $50–$200 per shot. AI tools like PhotoRoom exist, but their API costs $0.15+ per image and requires you to chain multiple calls yourself.

We just shipped a single API endpoint that takes your raw product image and returns a studio-quality shot — background removed, AI scene generated, shadows added — in one call at $0.075 per image.

Here's exactly how it's built.

The Problem With the "Do It Yourself" Approach

Most image AI APIs are building blocks, not solutions. To get a professional product photo you had to:

POST /remove-background → get transparent PNG
POST /generate-background with SDXL → get background image
Composite foreground + background yourself
POST /add-shadow → get final image

That's 3 API calls, 3 billing events, your own compositing code, and significant latency. Nobody has time for that.

The One-Call Pipeline

curl -X POST https://api.pixelapi.dev/v1/image/product-photo \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "image=@product.jpg" \
  -F "preset=white-studio" \
  -F "shadow_type=soft"

Response:

{
  "generation_id": "abc-123",
  "status": "queued",
  "credits_used": 0.075,
  "pipeline": ["remove-background", "replace-background", "add-shadow"],
  "estimated_seconds": 15
}

Then poll:

curl https://api.pixelapi.dev/v1/image/abc-123 \
  -H "Authorization: Bearer YOUR_API_KEY"

{
  "status": "completed",
  "output_url": "https://api.pixelapi.dev/outputs/files/..."
}

Done. One call, one poll, professional product photo.

What Happens Inside

Step 1: Background Removal (BiRefNet)

We use BiRefNet — currently the best open-source background removal model (beats the model used by remove.bg in most benchmarks). It outputs a transparent PNG with the product isolated.

BiRefNet handles hair, glass, transparent products, and complex edges far better than the older U2Net/ISNET models.

Step 2: Smart Background Generation

The preset parameter maps to a background prompt fed into our background engine:

Preset	What Gets Generated
`white-studio`	Clean white background, soft diffused lighting
`gradient-light`	Subtle white-to-gray gradient
`marble`	White marble surface with studio lighting
`outdoor`	Natural setting with soft bokeh
`custom`	Anything you describe in `background_prompt`

For solid colors and gradients, we skip the GPU entirely and generate them instantly with PIL. For complex scenes, we use SDXL to generate a background that matches the product's dimensions.

# Smart detection — no GPU for simple backgrounds
def _smart_background(prompt, width, height):
    if re.match(r'#[0-9a-fA-F]{6}', prompt):
        return solid_color_image(hex_to_rgb(prompt), width, height)
    if 'white' in prompt and ('background' in prompt or 'studio' in prompt):
        return solid_color_image((255, 255, 255), width, height)
    if 'gradient' in prompt:
        return generate_gradient(prompt, width, height)
    return None  # Fall through to SDXL

Step 3: Shadow Compositing

We composite the extracted foreground onto the generated background, then add a configurable shadow. Four shadow types:

soft — Blurred drop shadow, offset slightly down (best for white backgrounds)
hard — Sharp shadow, offset diagonally (dramatic/fashion look)
natural — Medium blur, slight diagonal (closest to real studio lighting)
floating — Large diffuse shadow directly below (product appears to float)

Shadow is generated by alpha-compositing the foreground's alpha channel, applying Gaussian blur, applying opacity, and painting it beneath the foreground layer.

Why LLM3 (RTX 6000 Ada, 48GB)

The pipeline runs BiRefNet + SDXL in sequence. On a 16GB GPU, if other jobs are warm in VRAM, you hit OOM. We route product-photo jobs to our LLM3 overflow worker which has 48GB VRAM — plenty of headroom to run both models without flushing.

Available Presets & Examples

White Studio (e-commerce standard)

-F "preset=white-studio" -F "shadow_type=soft"

Clean white background. Works for Amazon, Shopify, Flipkart listings.

Marble (luxury/premium products)

-F "preset=marble" -F "shadow_type=natural"

White marble surface. Great for cosmetics, jewelry, electronics.

Outdoor (lifestyle look)

-F "preset=outdoor" -F "shadow_type=floating"

Natural bokeh background. Good for apparel, outdoor gear, food.

Custom Scene

-F "preset=custom" \
-F "background_prompt=dark wood table, warm candlelight ambiance, luxury product photography"

Describe anything — SDXL generates it.

Pricing

Provider	Cost per image
PhotoRoom API	$0.150+
Fal.ai (chained)	~$0.120
PixelAPI	$0.075

Our rule: always 2x cheaper than the cheapest mainstream competitor.

Try It Now

Full API docs: pixelapi.dev/docs

import requests

def product_photo(image_path, preset="white-studio", shadow="soft"):
    with open(image_path, "rb") as f:
        r = requests.post(
            "https://api.pixelapi.dev/v1/image/product-photo",
            headers={"Authorization": "Bearer YOUR_KEY"},
            files={"image": f},
            data={"preset": preset, "shadow_type": shadow, "output_format": "jpeg"},
        )
    job = r.json()

    import time
    while True:
        result = requests.get(
            f"https://api.pixelapi.dev/v1/image/{job['generation_id']}",
            headers={"Authorization": "Bearer YOUR_KEY"},
        ).json()
        if result["status"] in ("completed", "failed"):
            return result
        time.sleep(5)

result = product_photo("my_product.jpg", preset="marble", shadow="natural")
print(result["output_url"])

PixelAPI is an AI image and video API built for developers. All processing runs on our own GPU infrastructure — no third-party API calls, no per-request markups.

DEV Community

We Built a One-Call AI Product Photography API — Here's How It Works Under the Hood

We Built a One-Call AI Product Photography API — Here's How It Works Under the Hood

The Problem With the "Do It Yourself" Approach

The One-Call Pipeline

What Happens Inside

Step 1: Background Removal (BiRefNet)

Step 2: Smart Background Generation

Step 3: Shadow Compositing

Why LLM3 (RTX 6000 Ada, 48GB)

Available Presets & Examples

White Studio (e-commerce standard)

Marble (luxury/premium products)

Outdoor (lifestyle look)

Custom Scene

Pricing

Try It Now

Top comments (0)