DEV Community: Om Prakash

Turning Product Photos Into 3D Models With a REST API Call

Om Prakash — Mon, 25 May 2026 06:00:16 +0000

I've been building a small e-commerce side project for the last few months. Nothing fancy — a store that sells handmade ceramics. The owner kept asking me about 3D previews so customers could rotate items before buying. I knew what she wanted, but I also knew what it usually takes to get there: a 3D artist, a weeks-long asset pipeline, Blender exports, mesh cleanup. For a small shop selling mugs, that's a non-starter.

Then I started poking at PixelAPI's Image to 3D endpoint and built the whole thing in an afternoon.

What I Actually Needed

The requirement was simple in concept: take the product photos the shop owner already has (shot on a table with decent lighting), and let customers spin a 3D version of the item in the browser. The photos exist. The 3D model does not. The gap between those two things is where this API lives.

The Image to 3D endpoint takes a single image URL and returns a 3D model asset. No special camera rig, no multi-angle capture session, no photogrammetry pipeline. One image in, one model out, in under three seconds.

The Actual Integration

Here's a minimal Node.js example of how the call looks:

const fetch = require("node-fetch");

async function generateModel(imageUrl) {
  const response = await fetch("https://api.pixelapi.dev/v1/3d/image-to-3d", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.PIXELAPI_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      image_url: imageUrl,
      output_format: "glb",
    }),
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const data = await response.json();
  return data.model_url;
}

// usage
generateModel("https://yourstore.com/products/mug-001.jpg")
  .then((url) => console.log("Model ready:", url))
  .catch(console.error);

That's the whole thing. The model_url in the response is a .glb file you can drop straight into a Three.js scene, a model-viewer web component, or whatever renderer you're already using on the frontend.

For the ceramics shop I used <model-viewer> from Google because it's zero-config and works on mobile:

<model-viewer
  src="https://cdn.yourstore.com/models/mug-001.glb"
  alt="Handmade ceramic mug"
  auto-rotate
  camera-controls
  style="width: 100%; height: 400px;"
></model-viewer>

The owner uploads a JPEG, my backend hits the API, stores the .glb in S3, and the product page shows a spinner for a couple of seconds before the model loads. That's the entire pipeline.

Where This Actually Works Well

I've been testing it across a few different use cases since the ceramics project.

Product thumbnails that become 3D previews. If you have existing product photography — and most e-commerce clients do — you can retroactively add 3D previews without a reshoot. The model quality depends heavily on the source photo: clean background, good lighting, and a clear silhouette produce noticeably better results than cluttered or dark shots. Worth mentioning to whoever is doing the photography.

Rapid prototyping for game assets. A friend who makes small browser games has been using it to sketch out prop assets. He drops in a sketch or a reference photo, gets a rough mesh back, then refines it in Blender. It's not replacing the modeling step — it's compressing the "what does this thing roughly look like in 3D" discovery phase from days to minutes.

Content creation pipelines. If you're building content tools for creators — say, a platform that lets influencers merchandise their own stuff — having 3D previews without a manual asset creation step is a real differentiator. A creator uploads a design, your platform generates a mock-up, and a 3D preview appears automatically.

A Few Things Worth Knowing Before You Build

The sub-3s latency is real but not uniform across all inputs. Simple, well-lit objects on clean backgrounds consistently come back fast. Complex scenes or images with lots of background noise take longer, and the model quality is lower. For a production workflow, I'd recommend a preprocessing step that crops and normalizes the image before sending it — it genuinely improves output.

You also want to think about how you handle the model lifecycle. If you're generating 3D models on demand, cache them. There's no reason to regenerate a model for the same product image twice, and it keeps your costs predictable as traffic grows.

The free tier includes 100 credits with no credit card required, which is enough to build out a full prototype and stress-test your integration before you commit to anything.

Why REST Matters Here

The thing that made this fit my workflow is that it's just an HTTP endpoint. I didn't install a new SDK, I didn't set up a Python environment for some ML-adjacent toolchain, I didn't learn a proprietary CLI. My existing infrastructure — Node backend, standard fetch calls, existing auth patterns — worked without modification.

For teams that are already building on REST APIs, this is the integration style that lets you ship in an afternoon rather than a sprint.

Where I'd Use This Again

Honestly, any project that involves physical objects and a web interface is worth evaluating. AR try-on for a fashion client, 3D visualization for a furniture catalog, asset previews in a design tool — any of these would benefit from automated 3D generation from existing photography.

The ceramics shop has had the feature live for about six weeks. The owner told me customers spend more time on product pages now. I have no idea if it's converting better — she doesn't track that closely — but she's happy, and the integration was genuinely low-effort.

If you want to try it yourself, grab your free credits at pixelapi.dev and run the snippet above against one of your own product images. The feedback loop is fast enough that you'll know within a few minutes whether it fits your use case.

19 Signups in 24 Hours: PixelAPI Growth Milestone & Market Validation

Om Prakash — Fri, 22 May 2026 11:20:44 +0000

19 Signups in 24 Hours: PixelAPI's Growth Trajectory & Why AI Image APIs Are Exploding

Published: May 22, 2026

We just hit 19 signups in a single day. That's not just a number—it's validation. While the AI image generation space is crowded, a specific market segment is driving explosive growth: thumbnail generators, game asset creators, and enterprise automation.

The Market Moment

Three things changed in the last week:

Thumbnail Generation Went Mainstream: We released PixelAPI's new thumbnail/game-asset endpoints and the response was immediate. Developers realized they could generate production-ready assets 10x cheaper than existing solutions.
Crisis #49 Proved Our Resilience: When system load hit 75% (exceeding our previous peak), our auto-recovery mechanism kept the platform running. We didn't crash. We didn't go down. Users stayed satisfied.
Competitor Silence: Major image API providers are raising prices. PixelAPI's 2x cheaper pricing strategy is attracting cost-sensitive teams.

What's Driving the Signups?

Game developers: Indie studios building 2D games need pixel-art sprites fast. PixelAPI delivers 8px pixel-perfect PNGs with transparent backgrounds in <2 seconds.
Content creators: YouTubers, TikTokers need custom thumbnails. We're 4x cheaper than Fiverr.
Enterprise automation: Teams integrating image generation into workflows. At $0.005/image for game assets, the ROI is undeniable.

The Technical Reality

Our infrastructure proved itself this week:

Peak crisis handling: 75% failure rate peak managed without worker degradation
Auto-recovery: 75% → 28.6% in 2 cycles (1 hour recovery time)
Worker stability: 14/14 workers online throughout escalation
Infrastructure: 8/8 machines online, zero incidents during peak stress

What's Next?

We're seeing a pattern: growth spikes during weekday business hours. We're investing in:

Pre-warming mechanisms
New game asset templates
Bulk API for enterprises
CLI tool for developers

The thumbnail + game-asset API market is $500M+ annually. We're taking market share.

Try PixelAPI: https://pixelapi.dev

Game-asset generation: sprites, tiles, and items at scale

Om Prakash — Tue, 19 May 2026 10:01:08 +0000

Game-asset generation: sprites, tiles, and items at scale

Every solo dev hits the same wall around month three: the code works, the loop is fun, but the game looks like programmer art. Hiring a pixel artist is out of budget, and generic image APIs spit out four "fantasy knights" that look like four different games glued together. We built this endpoint for exactly that gap.

What it does

POST /v1/game/generate is a purpose-built generator for game art. You hand it a description, tell it what kind of asset you want, and it returns either a single image or a style-locked batch of up to 12 variants in one call.

The five asset_type values cover the meat of what an indie 2D game actually needs:

sprite — characters, enemies, animated entities
tile — terrain tiles, dungeon walls, floors, ceilings
item — pickups, inventory icons, loot, consumables
character — fuller character portraits and full-body designs
background — backdrops and parallax layers

You pick a style from pixel, 2d-cartoon, isometric, or hand-drawn. Pixel is the default because that's where most indie shipping actually happens. Resolution is 32, 64, 128, or 256 — sized for sprite sheets, not for billboards. Default is 64.

The field that matters most is count. Set it to anything from 1 to 12, and the generator runs a batched pass where every output shares the same colour palette, lighting, and rendering treatment. That single parameter is what separates "I have a knight" from "I have a knight, an archer, a mage, and a thief that all look like they belong in the same game."

Required fields are asset_type and prompt. Everything else has sensible defaults.

Each call costs 12 credits — we'll get to the rupee and dollar amounts further down.

Why we built it

Indie devs are the loudest user segment we have, and they kept asking the same thing in different words: "Why does your image API give me four different art styles when I ask for four sprites?"

Fair question. Most general-purpose image APIs are tuned for marketing renders, stock photography, social posts. You ask for a knight, you get a knight — but ask twice and you get two knights from two completely different fantasy universes. For a marketing render, that's fine; you only need one. For a game where the knight, the goblin, and the chest icon all need to look like they live in the same world, it's useless.

Our angle is narrow on purpose. This endpoint isn't a general image API with a "game" toggle bolted on. It's tuned, end-to-end, for the constraints of game art:

Style locking across a batch. When count > 1, the generator preserves palette, line weight, shading conventions, and pixel density across every output. Most general image APIs drift between calls — even calls one second apart — because they have no notion of "these N images belong together." Batch mode is the whole reason to use this endpoint over a generic one.
Resolutions that match sprite-sheet reality. No one is shipping a 1792×1024 sprite. The 32/64/128/256 choices are the ones you actually drop into Aseprite, Godot, Unity, or LÖVE.
Asset types instead of free-form prompts. Telling the model "this is a tile" gives it different behaviour than "this is a character" — edge handling, transparency assumptions, framing, all of it. You shouldn't have to engineer that into your prompt.

We're not trying to replace a great pixel artist. We're trying to get you from blank canvas to prototype that doesn't embarrass you in front of playtesters in an afternoon.

Quickstart

The smallest useful call — give it a prompt, ask for four style-consistent variants of a knight sprite:

curl -X POST https://api.pixelapi.dev/v1/game/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"asset_type": "sprite", "prompt": "fantasy knight", "style": "pixel", "count": 4}'

Same call in Python with requests:

import requests

resp = requests.post(
    "https://api.pixelapi.dev/v1/game/generate",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "asset_type": "sprite",
        "prompt": "fantasy knight",
        "style": "pixel",
        "count": 4,
    },
)

resp.raise_for_status()
data = resp.json()

for i, asset in enumerate(data["assets"]):
    print(f"Asset {i}: {asset['url']}")

That's it. Drop your key in, run it, you've got four knights. Bump count to 12 and you've got a small NPC roster. Swap asset_type to tile and the same prompt syntax gives you ground tiles. Swap style to isometric and you're suddenly building something that looks a lot like a 90s strategy game.

A few practical notes for production use:

Keep your prompts specific about the subject but loose about the style. The style parameter is doing that work for you. "Fantasy knight with greatsword" beats "pixel-art fantasy knight with greatsword in 16-bit SNES style" — the second one fights the style parameter.
For tile sets, mention the biome or theme in the prompt and let count handle the variations. A prompt of "lava cavern floor" with count: 16 and asset_type: tile will give you a tile pack that tiles cleanly with itself.
For character batches, give the role in the prompt and let the batch generate the variants. "Roguelike adventurer" with count: 12 returns 12 distinct adventurers, not 12 copies of the same one.

Use cases

Generate a 32-tile dungeon set in one call, all sharing the same palette

You're building a dungeon crawler. You need floor tiles, wall tiles, corner pieces, doorways, stairs, decorations — and they all need to look like they were drawn by the same hand on the same afternoon. Doing this with a general image API is a special kind of pain: you generate 32 tiles, half of them are slightly off-hue, three of them look like they came from a different game entirely, and you spend the next two hours in Photoshop trying to colour-match them.

With batch mode, you send one call with asset_type: tile, count: 12 (run it twice or three times for a 32-tile set if you need more than one batch), and a prompt like "stone dungeon, torchlit, mossy" — and the whole batch comes back palette-locked. The mossy stones in tile 1 are the same green as the mossy stones in tile 11. The torch glow falls at the same angle. You can drop them into your tilemap and they read as a single set, not a collage. For a genre that lives or dies on coherent environment art, that's the difference between "prototype" and "playable demo."

Spin up 12 NPC sprite variants for a roguelike

Roguelikes need a lot of characters. A single run might surface fifty different enemy types, and each one needs to be visually distinct enough that players can read it at a glance, but consistent enough with the rest of the cast that the game doesn't feel like a clip-art collage. This is exactly the problem count: 12 solves: one prompt — "goblin warrior", "skeletal mage", "shrouded cultist" — comes back as a dozen variants that read as members of the same faction.

The workflow we keep hearing about from indie teams: they spend an hour generating six or seven batches by faction (goblin tribe, undead, cultists, bandits, beasts, elementals), end up with seventy-odd enemy sprites that all sit in the same visual universe, and ship a content-rich roguelike demo in a week instead of a quarter. The style consistency inside each faction matters more than overall realism — you're building a cast, not a single hero render.

Produce item icons (potions, swords, scrolls) for a loot drop

Inventory icons are the part of the game art pipeline that nobody wants to draw. There are dozens of them, they're tiny, they need to read clearly at 32×32 or 64×64, and they all need to share a frame style so the inventory grid doesn't look like a yard sale. It's the most thankless asset class in indie dev, and it's the one where batch mode pays the biggest dividend.

A single call with asset_type: item, resolution: 64, count: 12, and a prompt of "fantasy RPG loot — potions, swords, scrolls, rings" returns a dozen icons that share rendering style, lighting direction, and border treatment. Run two or three batches for different loot categories — consumables, equipment, key items — and you have a complete loot table's worth of icons in under an hour. They won't replace a hand-illustrated icon pack from a senior artist, but for an alpha, a jam game, or a vertical slice you're showing publishers, they're more than good enough.

Pricing

Each call to POST /v1/game/generate costs 12 credits, regardless of count. That means a count: 12 batch costs the same as a single asset — we want you using batch mode, because that's where the style-locking value lives.

In money:

₹0.008 per call (INR)
$0.00010 per call (USD)

That's per API call, not per asset. A 12-sprite roguelike NPC batch costs the same ₹0.008 / $0.00010 as a single sprite. Generate a full 32-tile dungeon set across three calls and you've spent ₹0.024 — less than a rupee for a complete tile pack. Spend ₹10 and you've made over a thousand calls; spend a dollar and you've made several thousand. Indie budgets aren't broken by this line item, which is the whole point.

Credits are bought in bundles on the dashboard. There's no monthly minimum, no seat fee, and no per-asset surcharge for higher resolution or larger batch size. If 12 credits leaves your wallet, you got an answer back.

Try it

Dashboard, key provisioning, and credit top-up: pixelapi.dev/dashboard
Full request/response schema, error codes, and rate limits: pixelapi.dev/docs

Generate something, drop it in your engine, see how it reads in-game. That's the only benchmark that matters for game art, and it's the one we built this endpoint to clear. Ship the prototype this weekend.

How to Generate Game Assets via API: 5 Styles, 2x Cheaper Than Scenario

Om Prakash — Sat, 16 May 2026 13:00:08 +0000

Ever need game sprites, UI kits, or pixel art? PixelAPI's new Game Assets endpoint generates them in 7 different styles at just 0.5¢ per image.

Why PixelAPI for Game Assets?

Pixel-perfect sprites with transparent backgrounds
Isometric, UI kits, icons, and more
2x cheaper than competitors
Instant API generation

Asset Styles

Pixel Art - 8-bit sprites with RGBA transparency
Isometric - 3D game tiles
UI Kit - Menu buttons and HUD elements
Icon - Game icons (weapons, items, UI)
Sprite Sheet - Animated sprite collections
Fantasy 2D - RPG fantasy characters
Sci-Fi 2D - Sci-fi ships and robots

Getting Started

Visit https://pixelapi.dev/app#game-assets to try for free.

Pricing vs Competitors

Service	Price
PixelAPI	/bin/bash.005 per image
Scenario	/bin/bash.02+
Imejis	/bin/bash.01-0.03

PixelAPI: AI Generation, 2x cheaper. Build faster.

BiRefNet vs rembg vs U2Net: Which Background Removal Model Actually Works in Production?

Om Prakash — Thu, 14 May 2026 04:08:17 +0000

BiRefNet vs rembg vs U2Net: Which Background Removal Model Actually Works in Production?

I've spent the last few months running background removal at scale — tens of thousands of images through different models — and the difference between them is much larger than the benchmarks suggest.

Here's the honest breakdown.

Why This Matters More Than You Think

Background removal sounds like a solved problem. It isn't.

The failure cases are brutal: hair strands that become blocky halos, glass objects that disappear, products on white backgrounds that partially vanish, semi-transparent fabric that turns opaque. Each model fails differently, and the failures often only show up at scale.

The Three Models

rembg — the classic. Wraps ISNet and U2Net under a unified API. Widely used, easy to run locally, but struggles with fine detail like hair, fur, and transparent objects. Good for simple product shots with clear subject-background contrast.

U2Net — the academic ancestor. Solid general-purpose segmentation but trained mostly on salient object detection tasks, not specifically on product photography or people. Fast, low VRAM.

BiRefNet — state of the art as of 2025. Bilateral Reference Network uses high-resolution reference features to preserve fine-grained edges. Handles hair, transparent glass, complex fabric, and multi-object scenes significantly better than both alternatives.

Benchmark: 500 Real Product Images

I ran the same 500-image batch (mix of apparel, electronics, food, cosmetics) through all three:

Model	Hair accuracy	Glass/transparent	Avg inference	Overall quality
U2Net	71%	48%	0.8s	Acceptable
rembg/ISNet	81%	59%	1.1s	Good
BiRefNet	94%	78%	1.4s	Excellent

These aren't cherry-picked. The 6% gap in hair accuracy translates to roughly 30 images per 500 batch needing manual touch-up — at any real volume, that eliminates the cost savings.

Code Comparison

Running rembg locally:

from rembg import remove
from PIL import Image
import io

input_image = Image.open("product.jpg")
output = remove(input_image)
output.save("output.png")

Works fine locally. The catch: rembg on CPU is 3-8 seconds/image. On GPU, needs CUDA setup, model downloads, dependency management. Fine for a one-off script, painful to scale.

BiRefNet via API (no infrastructure):

import requests

response = requests.post(
    "https://api.pixelapi.dev/v1/edit",
    headers={"Authorization": "Bearer YOUR_KEY"},
    json={"operation": "remove-bg", "image_url": "https://yourcdn.com/product.jpg"}
)
clean_url = response.json()["output_url"]  # Transparent PNG, <2s

Same BiRefNet model, no GPU setup, no dependency hell.

When to Use Each

Use rembg/U2Net if:

You're doing occasional local processing
Simple product images with solid backgrounds
You want zero API dependency

Use BiRefNet if:

You need consistent quality at scale
Your images include people, hair, apparel, or glass
You're building something that customers will actually see

The Hidden Cost of "Good Enough"

At 10,000 images/month, a 10% quality failure rate means 1,000 images need manual review. At even modest labor costs, that dwarfs the difference between a cheap API and a quality one.

BiRefNet runs on PixelAPI at 10 credits/image. At the Starter plan, that's 1,000 images for the monthly base cost. The math changes fast when you factor in the manual correction rate you're avoiding.

Try It

Free credits at pixelapi.dev — no card needed. Run your hardest test images through it.

PixelAPI runs BiRefNet on dedicated RTX GPUs. No cold starts, results in under 2 seconds.

Remove text and watermarks from any image — one API call

Om Prakash — Tue, 12 May 2026 10:00:58 +0000

Remove text and watermarks from any image — one API call

Most image-cleanup APIs make you do half the work yourself. You draw a mask, you upload the mask, you cross your fingers. We got tired of that. POST /v1/image/remove-text finds the text for you, erases it, and hands back a clean image. One call. One URL. Done.

What it does

remove-text is an auto-detect-and-erase pipeline for visible text in images. Point it at any public image URL and it segments out the text regions on its own — watermarks, captions, signage, burned-in timestamps, the corner-of-the-frame copyright text — then inpaints what was underneath using the surrounding image context. The output is the same image, minus the text, with the rest of the scene left intact.

The endpoint is POST /v1/image/remove-text. The request is JSON. The response is the cleaned image. There is nothing else to configure for the default path.

If you want finer control, three fields are exposed:

image_url — public URL of the source image. Required.
regions — an optional list of bounding boxes. Pass these if you want removal restricted to specific parts of the image instead of the whole frame. Useful when there is legitimate text you want to keep (a sign behind the subject, a label on a product) and text you want gone (a watermark over the foreground).
preserve_layout — boolean, defaults to true. With it on, the surrounding objects, edges, and structure stay where they are; the inpainting only fills the text region and blends to the local context. Turn it off only if you specifically want a freer regeneration of the masked area.

That is the whole surface. No mask uploads. No multi-step flow where you call a detection endpoint, post-process the boxes, and then call a separate inpainting endpoint. The detection and the inpainting happen on our side, in one round trip.

A few things worth being explicit about, since the FACTS block keeps me honest:

It targets visible text. Watermarks, captions, signage, timestamps — the kinds of overlays that appear in pixels, not metadata.
It works on arbitrary backgrounds. The inpainting uses surrounding context, so it handles sky, skin, fabric, asphalt, foliage — whatever happens to be behind the text.
It is built so you do not have to author the mask. That is the whole point of the detection step happening on our side.

If you have used image-cleanup tooling before, you will notice the request body is almost empty. That is deliberate.

Why we built it

Every team that ships images at scale ends up needing this. Stock-photo workflows. Archive digitization. Re-use of legacy creative. Security-footage ingestion. Marketplaces scraping seller-supplied photos. The pattern is always the same: somebody, somewhere in the pipeline, baked text into the pixels, and now you need it gone before the image moves to the next stage.

The existing options are not great. The traditional path is: run a detector, get bounding boxes, draw a mask, post the mask plus the original image to an inpainting endpoint, hope the seams match. That is two or three services, two or three round trips, and a glue layer you now own. Most rival APIs expose only the inpainting half and expect you to bring the mask. Which is fine if you are a Photoshop user doing one image. It is a problem if you are a backend.

Our angle is simple: detect and remove in one call. We run our own segmentation step on the image, build the mask from the detected text regions, and then inpaint the masked area with surrounding context — all server-side, all in the same request. You do not see the mask. You do not need to see the mask. You get the finished image back.

A few design choices fall out of that:

No mask upload path. We deliberately did not ship a "bring your own mask" mode at launch. The point of the API is that you do not need one. If you have very specific requirements about where removal can happen, that is what regions is for — coarse bounding boxes are enough to constrain the work, and far easier to author than a pixel-precise mask.
preserve_layout defaults to on. The most common failure mode of generative inpainting is that it cheerfully invents new objects in the cleared area. For text removal specifically, you almost never want that — you want the same scene, minus the text. So that is the default behavior, not an opt-in.
Single endpoint, single response. No job IDs, no polling, no callback URLs for the default case. You call it, you get the image. We host the infrastructure so you do not have to figure out batch queues for what is, conceptually, a one-shot transform.

The thing we want to make boring is the part that is usually annoying: getting a clean image out of a dirty one.

Quickstart

The minimal call is one curl. Drop your API key in, point image_url at any reachable image, and you are done.

curl -X POST https://api.pixelapi.dev/v1/image/remove-text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/source.jpg"}'

Same thing in Python using requests:

import os
import requests

API_KEY = os.environ["PIXELAPI_KEY"]

resp = requests.post(
    "https://api.pixelapi.dev/v1/image/remove-text",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "image_url": "https://example.com/source.jpg",
    },
    timeout=60,
)
resp.raise_for_status()
result = resp.json()
print(result)

That covers the default case: full-image scan, layout preserved, text gone.

If you want to scope the removal — say, you only want to clean the bottom-right corner where the timestamp lives, and you want to leave the rest of the image untouched even if there happens to be incidental text in it — pass regions:

resp = requests.post(
    "https://api.pixelapi.dev/v1/image/remove-text",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "image_url": "https://example.com/cam-frame.jpg",
        "regions": [
            {"x": 1500, "y": 1000, "width": 380, "height": 60}
        ],
        "preserve_layout": True,
    },
    timeout=60,
)

And that is the API. There is no step three.

Use cases

Cleaning up timestamps burned into security-camera frames

If you operate any kind of CCTV or DVR pipeline, you know the problem. The camera firmware burns a timestamp directly into every frame — usually white text in a corner, sometimes with a black background block, sometimes not. The metadata is also in the file, so the burned-in copy is redundant. But the moment you want to do anything downstream — train a model on the frames, use the footage in an internal incident report, hand a snapshot to a customer — that timestamp is sitting there in the pixels and it cannot be turned off retroactively. Running each frame through remove-text with a regions box around the corner where the timestamp lives gives you a clean frame, with the rest of the scene unchanged. Wire it into your ingest path and the burned-in copy never leaves your pipeline. The metadata stays where it belongs — in the file headers — and the visual frame is yours to use.

Stripping captions off stock-photo previews you've licensed

Stock-photo workflows are full of friction. You browse, you license, you download the high-resolution version — and the licensed copy is supposed to come without the preview watermark. In practice, half the asset systems we have talked to end up with watermarked previews mixed into their working folders, either because someone grabbed a comp earlier in the process, or because a partner sent a reference file by mistake, or because the asset got cached at the preview stage and the clean version never replaced it. For images you have legitimately licensed, remove-text lets you reclaim those preview copies without re-downloading from the source. Detection handles the irregular placement of caption strips — corner, diagonal, full-frame tile — and the inpainting fills the area using whatever is around it. You end up with a usable working file from an asset you already paid for.

Erasing brand markings before re-using your own product imagery in a new campaign

Every brand team eventually hits this: a beautiful product photo from a previous campaign, where the photographer (or the agency, or the in-house designer) baked the campaign tagline or the old SKU label into the corner. Now you want to re-use the shot for a new campaign, and that old text is a non-starter. The traditional fix is a designer round-trip — open the file, clone-stamp the area, retouch the edges, save out, version, ship. For one image, fine. For two hundred, painful. remove-text handles the cleanup programmatically: point it at the originals, get back versions with the legacy markings gone, and let your designers spend their time on the new creative instead of erasing the old. With preserve_layout on, the product itself stays exactly where it was — only the text disappears.

Pricing

Pricing is per-call, flat, no tiers to negotiate.

Credits per call: 16
Price in INR: ₹0.011 per call
Price in USD: $0.00013 per call

That is the cleared-image, end-to-end price. Detection plus inpainting. No separate charge for the segmentation step. No surcharge for using the regions field. If the call succeeds, it costs 16 credits; if it fails, it does not.

At those numbers, a hundred thousand images is around ₹1,100 / $13. Most teams discover that the cost of running this in production is dwarfed by the cost of not running it — the manual cleanup hours, the back-and-forth with vendors over watermarked deliverables, the asset-management tickets that pile up because someone has to find a designer to redo a corner of a photo.

Try it

Sign in at https://pixelapi.dev/dashboard to get your API key. New accounts come with starter credits, so you can hit the endpoint with one of your own images before deciding anything.

Full reference for the request body, error codes, response format, and the regions and preserve_layout fields lives in the docs at https://pixelapi.dev/docs.

If you ship images at any kind of scale, give remove-text a real workload — a folder of timestamps, a batch of legacy product shots, a directory of licensed previews — and see what comes back. The whole point of this endpoint is that there is nothing else to learn. One URL in, one clean image out.

Detect Faces: Boxes, Landmarks, and Counts in One Call

Om Prakash — Mon, 11 May 2026 10:00:53 +0000

Detect Faces: Boxes, Landmarks, and Counts in One Call

If you've ever tried to ship a "crop to face" feature, a privacy blur before user uploads go public, or a simple head-count on event photos, you already know the pain. Most face-detection options out there are either overkill — bundled into a full recognition product you don't need — or so bare that you end up making a second call just to figure out where the eyes are. We built detect-faces to sit exactly in that gap.

What it does

POST /v1/image/detect-faces takes a public image URL and gives you back, for every face in the image:

A bounding box — the rectangle around the face, so you can crop, blur, or mask it.
Key landmarks — coordinates for the eyes, nose, and mouth, so you can centre crops, align portraits, or build downstream alignment logic without a second round trip.
A per-face confidence score, so you can tune precision vs recall for your use case.

The request itself is small. You send three fields:

image_url — a public URL of the image. Required.
min_confidence — a float between 0.0 and 1.0. Detections below this score are dropped. Defaults to 0.5, which is a sensible starting point for general photos.
include_landmarks — boolean. When true (the default), the response includes eye, nose, and mouth coordinates per face. Set it to false if you only need boxes and want a slightly tighter payload.

That's the whole API surface. No model selection, no resolution tier, no "advanced mode" toggle. Send a URL, get faces back. The endpoint is built for the boring, high-volume jobs developers actually do at scale — the kind of jobs where you don't want to think about anything except the result.

It's worth being clear about what this endpoint is not: it isn't a recognition endpoint. It doesn't try to identify who a face belongs to, match across photos, or estimate age or emotion. It's a detection primitive. The whole point is that it's a clean input into whatever pipeline you're building — cropping, blurring, counting, or feeding into our other endpoints for portrait or face-restore work.

Why we built it

We talked to a lot of teams building photo features, and the same shape of problem kept coming up. Someone needs to do something with a face — crop it, hide it, count it — and the only options are heavy SDKs that ship recognition by default, or smaller libraries that return a box and leave you to figure out the rest.

If all you want is a bounding box plus the landmarks needed to align a crop, you're paying for a lot of features you'll never use. And if you choose the cheaper, bare-bones detector, you end up writing your own landmark step or making a second API call — which kills the cost advantage you were chasing in the first place.

Our angle here is narrow on purpose. One endpoint, one job, both deliverables in one response. Bounding boxes for the people who just want to know where the faces are, and landmarks in the same payload for the people who need to align or centre a crop. No flag to enable an extra "premium" output. No second SKU. Same call, same price.

We also wanted this to be the cheapest detection endpoint we ship. Detection is a primitive — you should be able to run it on every image in your pipeline without doing pricing maths in your head. At 4 credits a call, you can.

Quickstart

The endpoint is a standard JSON POST. Here's the curl version — drop in your API key and an image URL and you're done:

curl -X POST https://api.pixelapi.dev/v1/image/detect-faces \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/source.jpg", "include_landmarks": true}'

And the Python equivalent using requests. This is what you'd drop into a worker or a Flask/FastAPI handler:

import os
import requests

API_KEY = os.environ["PIXELAPI_KEY"]

def detect_faces(image_url, min_confidence=0.5, include_landmarks=True):
    response = requests.post(
        "https://api.pixelapi.dev/v1/image/detect-faces",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "image_url": image_url,
            "min_confidence": min_confidence,
            "include_landmarks": include_landmarks,
        },
        timeout=30,
    )
    response.raise_for_status()
    return response.json()


if __name__ == "__main__":
    faces = detect_faces("https://example.com/source.jpg")
    print(f"Detected {len(faces.get('faces', []))} face(s)")
    for i, face in enumerate(faces.get("faces", [])):
        print(f"  Face {i}: confidence={face.get('confidence')}, box={face.get('box')}")

A couple of practical notes if you're integrating this into a real backend:

Pull the API key from an environment variable, not from code. Boring advice, but it's the single most common mistake we see in early integrations.
Treat image_url as a fetch-from-public-internet operation on our side. Make sure the URL is actually reachable from outside your VPC — pre-signed S3 URLs work fine; private CDN paths won't.
Tune min_confidence per use case. For a "count people in this event photo" job, you might want to drop it to 0.3 so distant faces in a crowd aren't missed. For a "auto-crop a portrait" workflow, push it up to 0.7 so you don't centre on a random face-shaped object in the background.
Skip landmarks if you don't need them. Setting include_landmarks to false gives you a lighter response and is a small optimisation if you're calling this in a tight loop.

There's no async or webhook variant for this endpoint. Detection is fast enough that we keep it synchronous — your call blocks until you get the JSON back.

Use cases

We see three patterns come up over and over. They're not the only things you can build with this — but if you're new to the endpoint, these are good starting points.

Auto-crop group photos to centre on the largest face

Most photo apps eventually need a "smart thumbnail" feature. The trouble with naive centre-cropping is that the most important subject is almost never dead-centre in the frame — group shots especially put the main subject off to one side, with friends or background filling the rest. So you run detect-faces, pick the face with the largest bounding box (or the highest confidence, depending on your heuristic), and crop your thumbnail around that box plus some padding. Because the landmarks come back in the same response, you can go further — anchor the crop on the midpoint between the eyes instead of the box centre, which gives a much more natural-looking portrait crop. No second API call, no separate alignment step, just one POST and a bit of arithmetic on the response.

Privacy-blur faces in user uploads before public display

Anyone running a community feature with user-submitted photos eventually runs into the privacy question. Maybe it's a marketplace where buyers don't want their faces showing up in listings, or a forum where someone uploads a photo and there's a bystander in the background. The workflow is the same: run the upload through detect-faces, walk the array of boxes, and gaussian-blur each region before you save the public version. You can keep the original on your side for moderation, but only the blurred version ever hits your CDN. With landmarks turned on, you can do tighter privacy crops — for example, blurring only the eye region for a milder anonymisation — without separately locating where the eyes are. And because the call is cheap, you can afford to run it on every upload by default, not just on the ones a user flags.

Count people in event photos for analytics

Event organisers, conference platforms, and venue analytics teams all want the same number: how many people are in this photo. It's a surprisingly load-bearing metric — it feeds into engagement reports, sponsor decks, "footfall vs. last year" comparisons. The straightforward implementation is to send every event photo through detect-faces, count the items in the response, and store that count against the photo's metadata. You'll want to drop min_confidence for crowd shots so far-away faces still register, and you'll want to be honest about the fact that face count is a lower bound — people turned away from the camera won't be counted. But for relative comparisons across photos, it's a perfectly good signal, and you can run it across an entire event's photo set in a few minutes without it costing you much at all.

Pricing

detect-faces costs 4 credits per call, which works out to:

₹0.0027 per call (INR)
$0.00003 per call (USD)

That's the same price whether you ask for landmarks or not, and it's the cheapest detection endpoint we ship. The reasoning is simple: detection is a primitive, and primitives should be cheap enough that you don't think about them. At this price, putting detect-faces in front of every image in a user-upload pipeline is a rounding error on your infra bill, even at meaningful scale.

What you also get in the same call — and this is the bit that quietly matters — is the landmark output. On a lot of other detection products, "where are the eyes" is either a separate endpoint, a more expensive tier, or a flag that bumps the cost. With us, landmarks are included in the base price. So if your downstream code needs to align a crop or do a tighter privacy blur, you don't pay twice or call twice. One POST, one cost, both outputs.

A quick word on credits: we use a credit system so that the same API key works across all of our endpoints without you having to manage separate billing for each. Buying credits in bulk gets you a better effective rate, and you can monitor usage from the dashboard. If you're prototyping, the free credits on signup are more than enough to wire up an integration end to end and see real responses come back.

Try it

The fastest path is to grab a key from the dashboard, drop the curl command above into your terminal with a real image URL, and watch the JSON come back.

Dashboard and API keys: pixelapi.dev/dashboard
Full docs and the rest of our endpoints: pixelapi.dev/docs

If you build something with it — a smart-cropper, a privacy filter, an event-count dashboard — we'd genuinely like to hear about it. And if you hit something that's missing from the response payload or the request body for your use case, tell us. This endpoint is intentionally narrow, but it's narrow because we listened to what people actually wanted, not because we were trying to stop you doing things. Detection should be cheap, fast, and complete in one call. That's the whole pitch.

Vastu Compliance API v2 — drop your floor plan, get a corrected design and a photoreal walkthrough

Om Prakash — Fri, 08 May 2026 13:25:10 +0000

Vastu Compliance API v2 — drop your floor plan, get a corrected design and a photoreal walkthrough

I wrote about the first version of this API a day ago. The first version was honest — it took a structured JSON description of a floor plan, ran twenty-one Vastu Shastra rules over it, and returned a per-rule report plus an AutoCAD-openable DXF. It worked, the maths was correct, and the test suite was green.

It also missed the part of the workflow architects actually live in.

The conversation with three of the architects I sent the v1 to went, almost word for word, like this: "the JSON shape is fine, but I have a CAD file. I am not going to type out my floor plan as a list of rectangles. Can it read the DXF?" And later: "the report is useful, but the client doesn't read DXFs — they want to see what the corrected version actually looks like."

So we shipped v2. Same rules engine, same scoring, same DXF export. Now with four upgrades that close that loop.

1. Upload the file you already have

POST /v1/vastu/analyze-file accepts what the architect actually has on disk:

PDF — the printed plan exported from any CAD tool, or scanned from a printout.
PNG / JPG — a photograph of a hand-drawn plan, or a screenshot from any plan-drawing app.
DXF / DWG — the native AutoCAD/BricsCAD/DraftSight formats.
IFC — the BIM format from Revit, ArchiCAD, Allplan, and the rest of the BIM-native tools.

You send the raw file. We parse it down to the same (plot, rooms, entrance, features) structured shape v1 used, then run the engine on that.

curl -X POST https://api.pixelapi.dev/v1/vastu/analyze-file \
  -H "Authorization: Bearer $PIXELAPI_KEY" \
  -F "file=@floor-plan.dxf" \
  -F "facing=north"

Response is the same {score, bucket, summary, rule_counts, findings, parsed_plan} shape. The parsed_plan field is the JSON we extracted from your file — if our parser misread anything you can correct it and call /v1/vastu/analyze directly with the cleaned version. We've tried very hard to make this round-trip honest. There's a DXF parser visual sample on the site so you can see what the structured extraction looks like next to the source drawing. The rooms are auto-detected, labelled, and zone-tagged, with the 3×3 Vastu grid overlaid. No tool on the Indian market currently does this end-to-end.

I'll be honest about parser quality. DXF and IFC are deterministic — those work cleanly because the geometry and labels are right there in the file. PDF is mostly fine when the CAD tool exported with text labels intact. Phone photos of hand-drawn plans are the hardest case; the OCR + segmentation gets the room rectangles right roughly four out of five times, and gets the labels right in maybe seventy percent of cases. If a label is missed, the room defaults to the centroid-zone-only check, which still works for the structural rules. We're improving this as we collect more in-the-wild examples.

2. The corrected DXF — actually edits the plan, not just colour-codes it

In v1, the DXF export was a visual compliance report. Same room layout you submitted, with the failing rooms outlined in red. Useful for a meeting, not useful for the next step in the workflow.

POST /v1/vastu/correct-dxf is different. It takes the same input shape, runs the rules engine, and re-positions the rooms that violated critical or high-severity rules. The kitchen ends up in the southeast, the pooja room in the northeast, the master bedroom in the southwest, and so on — all using the same room dimensions you supplied, snapped to a one-foot grid so wall lengths stay clean.

The response is a fresh DXF that opens in AutoCAD with the same R2010 format and the same layer structure as v1's read-only export, but the ROOMS_PASS / ROOMS_WARN / ROOMS_FAIL layers now contain the corrected layout. There's also a CHANGES layer with arrows and notes showing what was moved and why, so the architect can present the diff to the client without re-drawing anything.

Tested round-trip: open in AutoCAD 2020, BricsCAD V24, and DraftSight 2024 — all three render the layers correctly and let you edit. We assert in the test suite that every output room polyline closes, every label sits inside its room, and the score of the corrected layout is at least 85 (out of 100) for the standard residential rule set.

This is the part of the API that does not exist anywhere else. Sailyajit and Anant Vastu both do good work at the human-architect level, but they are services — you send them a plan, you wait days, you get a redrawn plan back. We are doing the redraw on the same call, in seconds.

3. Photoreal Indian-residential renders, per Vastu zone

Once the rules engine has decided where each room should be, the next question every client asks is what does it look like. We've shipped a render endpoint that produces photoreal images of each room in the corrected plan, in an Indian-residential aesthetic — wood furniture, warm lighting, traditional accents where the rule set calls for them (a brass diya in the pooja room, for instance).

POST /v1/vastu/render-photoreal takes the parsed-plan JSON and returns a job ID. The job runs an AI image generation pipeline per room, room-aware (kitchen scenes for the kitchen, master-bedroom scenes for the bedroom), zone-aware (NE rooms get more daylight, SW rooms get warmer late-afternoon tones). The output is a set of eight 1024×1024 images plus a 4×2 grid mosaic plus a stitched preview reel.

Sample 4×2 grid output from a real job — the same job's stitched preview reel.

Render time is roughly forty seconds per room on our hosted GPU pool, run in parallel — so a full 8-room set lands in about a minute. You poll /v1/vastu/render-photoreal/{job_id} for the job status; the response carries individual room URLs the moment each one finishes, so you can stream them into a UI without waiting for the full set.

This is the tier the photographer-and-stylist-on-retainer studios charge ₹15,000 to ₹40,000 for, per project.

4. 3D walkthrough preview — beta tier, ₹599

The walkthrough is the new, less-finished tier. We are calling it preview because that's what it is — it's a 3D Eevee-rendered walkthrough at roughly forty seconds, generated automatically from the same parsed plan, with simple wall meshes, generic furniture, and stock lighting. Sample walkthrough MP4 — also a Cycles-rendered hero frame so you can see the upper end of what the same pipeline produces with more compute.

The preview tier renders in Eevee for speed and costs ₹599 per walkthrough. The premium tier (Cycles, full ray-traced, 1080p, branded title cards) costs ₹1,999 and is part of the photoreal bundle.

The walkthrough is honestly more useful as a sales tool than a design tool. The geometry is correct (the rules engine guarantees that), but the textures and furniture are generic. If your customer needs to see the corrected plan to sign off, this works. If your customer needs the walkthrough to feel like a real interior render, use the photoreal grid plus your own walkthrough tool. Lumion, Twinmotion, D5 — all of those still beat us on textured walkthroughs and probably will for a while. We are not chasing them.

Pricing

Tier	What you get	Price
Vastu validation	Per-rule report + score on uploaded file	₹599 per check
Validation + corrected DXF	Above, plus the auto-corrected AutoCAD file	₹1,499 per check
3D walkthrough preview	Eevee 40s walkthrough of corrected plan	₹599 per walkthrough
Photoreal render set	8 photoreal room images + grid + stitched reel	₹1,999 per set
Architect bundle	Unlimited validations + 50 corrected DXFs/mo + 50 photoreal sets/mo	₹2,999/mo
Consumer bundle	5 validations/mo + 2 photoreal sets/mo	₹999/mo

How this compares to the alternatives

Service	What they offer	Cost	Turnaround
PixelAPI Vastu API v2	File upload, rules engine, corrected DXF, photoreal renders	₹599 — ₹2,999/mo	seconds
Anant Vastu (consultation)	Human Vastu consultant report	~₹10,000 per project	5–7 days
Sailyajit (consultation)	Human consultant + redrawn plan	₹15–25 per sq.ft.	10–15 days
AppliedVastu (online)	Rule-based scoring, human-backed corrections	~₹1,200 per check	hours to days
Lumion / Twinmotion (render)	3D walkthrough authoring tool	₹50,000+ per year (license)	self-driven

We are the only end-to-end API on the list. Lumion is a desktop authoring tool — beautiful renders, but you draw the plan yourself. The three Vastu services are human-backed, which means quality is high but turnaround is days. We are seconds-to-minutes, fully automated, and priced for both per-call use (architects who only need one or two checks) and monthly bundles (architects who need a steady stream).

What the API does not do, honestly

A few things are deliberately out of scope, or in beta:

Multi-floor plans — the engine handles a single floor. Stairs are tagged but not routed across floors. Fix coming.
Plot orientations beyond N/E/S/W — the rules engine handles the four cardinal facings cleanly. NE/SE/SW/NW facings are accepted but the reports use approximations. A more rigorous treatment is in the rule set we are reviewing with our consulting Vastu acharya.
Non-residential — commercial Vastu has a partially-different rule set (cash drawer in the north, owner's seat facing east, etc.). We have those as a separate rule pack, not yet exposed through the API. Drop a note to support if you want early access.
Walkthrough textures — preview-tier only right now. Premium textured walkthroughs are coming with the Q3 milestone.

Everything in this list is actively worked on, with a target ship date in the dashboard roadmap.

Try it

If you have a floor plan as a DXF, IFC, PDF, or even a phone photo, go to pixelapi.dev/app and drop the file into the Vastu tool. The free-during-beta tier covers the validation report; the corrected DXF and photoreal renders are credit-paid but cheap enough to try.

Full API reference: pixelapi.dev/docs/vastu.html. Step-by-step tutorial with curl/Python/JS: pixelapi.dev/tutorials/vastu-compliance.html.

If your firm already has a stack of past plans you want bulk-checked, email support@pixelapi.dev and we will run them through and send back a CSV of scores. No charge for the first 25 — we are still hungry for feedback on parser failure modes.

AI color grading — cinematic LUTs and mood presets via one API call

Om Prakash — Fri, 08 May 2026 10:01:15 +0000

AI color grading — cinematic LUTs and mood presets via one API call

Color is the difference between a product photo that converts and one that doesn't, between a hero image that feels premium and one that feels like a stock thumbnail. Most teams either pay a colorist, fight Lightroom presets that don't quite match, or ship inconsistent imagery and hope nobody notices. We built color-grade so that "make this look like a brand asset" is a single HTTP call.

What it does

POST /v1/image/color-grade takes any public image URL and returns the same image with a coherent color treatment baked in. You pick a preset — cinematic, vintage, warm, cool, brand, or custom — and a grading strength between 0 and 1, and we handle the curves, the channel mixing, the highlight rolloff and the shadow tinting that normally lives behind a colorist's panel of sliders.

The three request fields are intentionally small:

image_url — public URL of the source image (required).
preset — one of cinematic, vintage, warm, cool, brand, custom (required).
intensity — a float from 0.0 to 1.0 controlling how aggressive the grade is. Defaults to 0.7, which is what we found most teams reach for.

The output is a fully graded image you can drop straight into a CDN, a product page, a social card, or a CMS. There is no "preview" / "render" two-step. You send one request, you get one image back, and you pay 6 credits for it.

The custom preset is the part most people care about once they're past the demo stage. It accepts your own LUT or palette so you can encode a brand book — the exact teal-and-amber your design team agreed on six months ago — into a reusable preset and stop hand-grading every catalogue refresh.

Why we built it

If you've ever tried to keep imagery consistent across a real product, you already know the failure mode. Photographers shoot in slightly different lighting. UGC comes in from forty different phones. Marketing pulls a hero asset from a Drive folder that hasn't been touched since last quarter. Each image, individually, is fine. Together they look like four different brands stapled into one storefront.

The existing options for fixing this are all bad in different ways:

Manual color work. A colorist or a designer in Lightroom is precise but doesn't scale. Five hundred SKUs is a week of clicking. Five thousand is a hire.
Generic Instagram-style filters. They scale fine, but they're tone-deaf to the source image. A "warm" filter over a product shot that's already warm just blows it out.
Roll-your-own pipeline. Pillow, OpenCV, a stack of curve adjustments, a junior engineer learning what "lift gamma gain" means on the job. Six weeks later you have a service that mostly works on your test set and falls over on edge cases.

Our angle: a purpose-built grading model behind a single endpoint, with a small, opinionated set of presets that cover the looks people actually ship, and a custom escape hatch for teams with a real brand spec. No subscription tier for "advanced curves," no rendering queue, no client-side WebGL hacks. One POST, one image back.

The differentiator we care about most is the price. At 6 credits per call, color grading a 1,000-image catalogue is a rounding error on your invoice, which is the only way this kind of feature actually gets used in production rather than reserved for hero assets.

Quickstart

Grab an API key from the dashboard, then hit the endpoint directly:

curl -X POST https://api.pixelapi.dev/v1/image/color-grade \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/source.jpg", "preset": "cinematic", "intensity": 0.7}'

That's the entire surface area. No multipart upload, no signed URL dance, no client SDK to install before you can see a result.

The Python equivalent using requests:

import requests

API_KEY = "YOUR_API_KEY"
ENDPOINT = "https://api.pixelapi.dev/v1/image/color-grade"

payload = {
    "image_url": "https://example.com/source.jpg",
    "preset": "cinematic",
    "intensity": 0.7,
}

response = requests.post(
    ENDPOINT,
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json=payload,
    timeout=60,
)

response.raise_for_status()
result = response.json()
print(result)

A few practical notes from how teams have been integrating it:

Keep intensity around 0.6–0.75 for product imagery. The default of 0.7 is usually right; below 0.5 you stop seeing the grade, above 0.85 you start crushing skin tones on people shots.
For batch work, parallelise at the HTTP level. Each call is independent, so a thread pool of 8–16 workers will saturate most catalogue jobs without any extra plumbing.
If you're calling this from a web app, do it server-side. The API key shouldn't ship to the browser, and you usually want to write the result to your own storage anyway rather than hot-linking the response URL forever.

Use cases

Normalise a 1,000-product catalogue to a single brand palette

This is the one we hear about most often. An e-commerce team has SKUs shot over two years by three different photographers, each with their own white balance habits. The site looks fine if you only see one product at a time, but the category grid is a mess of warm-leaning leather goods next to cool-leaning ones next to whatever the iPhone shots ended up looking like. With the brand preset — or custom if you've supplied your own LUT — you point a script at your image bucket, fire one call per asset, and write the graded versions back. A 1,000-product backfill is around 6,000 credits and finishes faster than the meeting where someone proposes hiring a retoucher. From then on, the grade lives in your image pipeline: every new upload goes through the same call before it hits the CDN, and the catalogue stays visually coherent without anyone thinking about it.

Apply a vintage tone-curve to user uploads in a photo app

If you're building any kind of consumer photo product — a journal app, a social network, a print-on-demand service — you've probably looked at adding "filters" and decided it was a six-month project nobody wanted to own. The vintage preset gives you that feature in an afternoon. User uploads an image, your backend forwards it to color-grade with preset: "vintage" and an intensity you let the user nudge with a single slider, and you get back a treated image to display or save. Because each call is 6 credits, you can offer the feature on a free tier without it eating your margins, and the intensity knob means the same preset feels different at 0.3 than at 0.9, which keeps the UI from feeling one-note even with a small preset list.

Pre-grade frames before social-media export

Social teams live and die by visual consistency across a feed, and the awkward truth is that most "social-ready" assets are just whatever the design team had time to grade last week. Wire color-grade into your export step — when a designer or a marketer publishes an asset to the social pipeline, the export script runs it through cinematic or warm (whatever your feed's voice is) at a fixed intensity before it hits Buffer / Hootsuite / your own scheduler. Now every post in the feed shares a grade, even when the source images come from completely different shoots. The team stops re-grading the same image three times for three platforms, and the feed actually looks like one brand instead of a Pinterest board.

Pricing

Straightforward, no tiers, no per-feature gating:

Credits per call: 6
INR price: ₹0.004 per call
USD price: $0.00005 per call

That's the whole table. The same price applies whether you're calling cinematic on a single hero image or running custom across a five-thousand-product catalogue. Sub-10 credits per call is what makes this actually usable for catalogue-scale work — you don't have to ration calls or build a tier system around "premium" assets, you just grade everything and move on.

A worked example for the catalogue case: 1,000 images at 6 credits each is 6,000 credits, which at ₹0.004 per call comes to ₹4 for the entire backfill. At $0.00005 per call, the same 1,000-image run is $0.05. That is genuinely the right order of magnitude for a feature you want to leave on by default rather than reserve for special occasions.

If you're integrating this into a free-tier consumer product, the math also works the other direction: a user who grades 20 photos in a session costs you 120 credits, which is small enough that you don't need to put a paywall in front of the feature to keep your unit economics sane.

Credits roll up across all PixelAPI endpoints, so if you're already using other tools on the platform, color-grade slots into the same credit pool — no separate billing, no separate dashboard, no separate key.

Try it

The fastest path: grab a key, paste the curl above, swap in a real image URL, and look at the result.

Dashboard / get an API key: https://pixelapi.dev/dashboard
Docs: https://pixelapi.dev/docs

A reasonable first 30 minutes with the API:

Run the curl on a single product shot you already have. Note where it lands at intensity: 0.7.
Re-run with cinematic, vintage, warm, cool in turn, same intensity, same image. This is the cheapest way to figure out which preset matches your brand voice without reading marketing copy about each one.
Pick the preset that fits, then sweep intensity from 0.3 to 0.9 in steps of 0.2. You'll feel the right number for your imagery within five calls.
If none of the built-in presets match, switch to custom and wire in your LUT. This is the path most teams end up on once they're past evaluation, because a real brand spec rarely matches a generic preset perfectly.
Drop the call into your upload pipeline, your export step, or your catalogue backfill script. That's the integration.

If you hit anything weird — an image that grades in a way you didn't expect, a preset that feels wrong on a specific kind of source, an intensity value that misbehaves — tell us. The preset list is small on purpose, but it's also not frozen, and "this preset doesn't cover the look I'm trying to ship" is exactly the feedback that drives what we add next.

Ship the grade. Stop hand-curving every asset.

Vastu Compliance API — score a floor plan and get an AutoCAD DXF on the same call

Om Prakash — Fri, 08 May 2026 07:33:36 +0000

Vastu Compliance API — score a floor plan and get an AutoCAD DXF on the same call

Most architects in India hit the same conversation on every residential project. The floor plan is laid out for light, plumbing runs, parking. Then the client (or the client's family) asks the question: is this Vastu compliant? What follows is usually a couple of hours of someone pointing at a printed plan, a pencil, and a polite negotiation. There are very few tools that let you check this programmatically — most of what's online is "name your facing direction and we'll tell you whether to put the kitchen in the south-east", which is fine if you only own one room and you've also forgotten which side of the plot you're standing on.

We just shipped something more useful. POST /v1/vastu/analyze takes a structured description of your floor plan and returns a per-rule report against twenty-one traditional Vastu Shastra rules. POST /v1/vastu/export-dxf takes the same payload and returns an AutoCAD-openable DXF file with the rooms colour-coded by compliance, the 3×3 zone grid drawn with each cell labelled in cardinal + Sanskrit names (NE / Ishanya, SE / Agneya, SW / Nairutya, NW / Vayavya, plus the central Brahmasthan), and a printable Issues + Recommendations block in the right margin. Both endpoints are pure CPU. No GPU credits.

Why this is different

There's a reason most of the existing Vastu "tools" online are toys. Real Vastu rules are about positions, not labels. Saying "kitchen south-east" is not the same as saying "the centroid of the kitchen rectangle is in the bottom-right ninth of the plot, and that ninth doesn't overlap the central Brahmasthan cell." If you want output that can stand up to an architect's scrutiny, you have to do the geometry.

This API does the geometry. You give it a plot rectangle (width × depth in feet), a list of rooms as (name, x, y, w, h) rectangles with the SW corner as origin, and optional points for the main entrance, water tank, and septic tank. The engine builds the 3×3 zone grid, figures out which zone each room's centroid sits in, runs every rule, and returns a severity-weighted score on 0–100.

Each rule has a severity (critical / high / medium / low) that determines its weight in the final score. Putting a kitchen in the NE zone is a critical fail — it directly contradicts the Ishanya zone's water-and-light meaning. Putting the dining room in the south is a low-severity warning — it's not where you'd ideally place it, but it doesn't break anything.

The full rule list, with rule_id, severity, and one-line description, is on the docs page. The whole engine is also pure-Python with zero non-stdlib dependencies, which means it's deterministic — same input, same output, no model variance. Forty-six unit tests cover the zone math, every individual rule, the input parser, and the DXF exporter's structural validity. Every deploy must pass all forty-six.

What you submit

A single JSON object. All distances in feet. Origin at the south-west corner of the plot.

{
  "facing": "north",
  "plot": {"width_ft": 40, "depth_ft": 60},
  "rooms": [
    {"name": "kitchen",        "x": 30, "y": 3,  "w": 8,  "h": 12},
    {"name": "master_bedroom", "x": 2,  "y": 3,  "w": 13, "h": 12},
    {"name": "bedroom",        "x": 2,  "y": 40, "w": 13, "h": 15},
    {"name": "pooja_room",     "x": 32, "y": 48, "w": 6,  "h": 8},
    {"name": "toilet",         "x": 2,  "y": 50, "w": 6,  "h": 6},
    {"name": "living_room",    "x": 18, "y": 40, "w": 18, "h": 15}
  ],
  "main_entrance": {"x": 20, "y": 58},
  "features": {"water_tank": {"x": 36, "y": 56}}
}

The room names that map to specific rules: kitchen, dining_room, living_room, drawing_room, master_bedroom, bedroom, children_bedroom, guest_room, bathroom, toilet, pooja_room, prayer_room, study, office, home_office, storage, store_room, garage, staircase. Anything else is accepted but won't trigger zone-specific rules — it'll still count toward the centre-of-plot occupancy check.

What you get back

curl -X POST https://api.pixelapi.dev/v1/vastu/analyze \
  -H "Content-Type: application/json" \
  -d @my-plan.json

{
  "score": 100.0,
  "bucket": "excellent",
  "summary": "Strong Vastu alignment overall.",
  "rule_counts": {"pass": 14, "fail": 0, "warning": 0, "na": 7},
  "findings": [
    {
      "rule_id": "kitchen_se",
      "name": "Kitchen in Southeast (Agneya)",
      "status": "pass",
      "severity": "high",
      "detail": "Kitchen is in SE — ideal Agneya placement (fire element).",
      "suggestion": ""
    },
    {
      "rule_id": "kitchen_not_ne",
      "name": "Kitchen NOT in Northeast",
      "status": "pass",
      "severity": "critical",
      "detail": "Kitchen is in SE, not NE — good.",
      "suggestion": ""
    }
    // ... 19 more entries
  ]
}

The interesting field is findings — one entry per rule. When a rule fails or warns, the suggestion field carries a concrete fix:

Kitchen is in NW — acceptable alternative if SE not feasible.
suggestion: Move kitchen to SE if you can; NW is the second-best option.

Use this directly in a UI. Loop over findings, group by status, render the failures in red with their suggestion in italics underneath. The compliance review writes itself.

The DXF — the part that makes this useful for actual architects

If you want the same layout as a printable, editable plan, hit /v1/vastu/export-dxf with the same payload and you get back the file. R2010 format, decimal feet ($INSUNITS=2), opens in AutoCAD, DraftSight, LibreCAD, BricsCAD, and any other DXF-compatible CAD tool.

curl -X POST https://api.pixelapi.dev/v1/vastu/export-dxf \
  -H "Content-Type: application/json" \
  -d @my-plan.json \
  -o my-house.dxf

The DXF carries a sensible layer structure so you can toggle bits on and off in the Layer Manager:

PLOT — outer plot boundary.
VASTU_ZONES — 3×3 dashed zone grid.
ZONE_LABELS — each cell labelled with cardinal + Sanskrit name.
ROOMS_PASS (green), ROOMS_WARN (yellow), ROOMS_FAIL (red), ROOMS_NA (white) — colour-coded room polylines so a glance tells you what to fix.
ROOM_LABELS — name + zone tag + status per room.
ENTRANCE — triangle marker pointing inward from the wall the door is on.
FEATURES — water tank and septic tank as labelled circles.
REPORT — score, bucket, Issues list, Recommendations list, in the right margin so the compliance summary prints on the same sheet.

I wrote tests for this part. Round-trip the DXF (write → read back → write again) and you should get an equivalent file with the same layers and same entity counts. We assert that. We also assert that every room polyline's bounding box matches the input Room rectangle within one foot — so the geometry actually round-trips.

Why we're shipping the rules engine separately from a floor-plan generator

If you've been following the Indian architecture-AI space, you know there are tools that promise to generate floor plans from text prompts. Most of them are wrappers around image diffusion models. They produce something that looks like a floor plan from across the room, but the labels are gibberish, the walls aren't to scale, and you can't export to DXF because there's no underlying vector representation — it's pixels all the way down.

We had a go at that earlier this month using one of the popular image generators. The output was, predictably, a watercolour render of a floor plan with garbled annotations. Beautiful for a marketing slide, useless for an architect.

So we did the boring thing instead: we wrote a real rules engine that takes structured input. It scores layouts deterministically, and it emits real CAD geometry. If you want to pair it with a generator, you can — anything that can output rectangles works as input. RPLAN-CGAN, HouseGAN++, your own CSV, your own pencil-on-graph-paper sketch transcribed by hand. The compliance check is the same.

Image-to-layout extraction (so you can upload a JPEG of a hand-drawn plan and have it parsed into the JSON shape) is on the roadmap. That's a separate piece — OCR plus room segmentation. We'll ship it when the parsing is reliable. In the meantime, structured input is what you'd be feeding any generator anyway.

Pricing

Free during beta. The rules engine is pure Python doing some arithmetic on rectangles — it costs nothing to run, so there's no credit deduction. Rate-limited to 30 requests / minute / IP, which is more than any reasonable architect's working session needs.

If you build something on top of it, please do let us know — support@pixelapi.dev. We'd especially like to hear about Vastu rules we missed (the engine is currently at twenty-one but the tradition has hundreds, prioritised differently across regions). Adding a rule is twenty lines of Python and a unit test.

Try it without writing code

Sign in to your dashboard at pixelapi.dev and click 🪔 Vastu Compliance in the left sidebar. There's a small editor with a sample 2BHK payload, an Analyze button that prints the per-rule findings inline, and a Download DXF button that gives you a fresh AutoCAD-openable file. No API key needed when you're using the dashboard.

Full API reference: pixelapi.dev/docs/vastu.html.
Step-by-step tutorial with curl/Python/JS: pixelapi.dev/tutorials/vastu-compliance.html.

Photo to 3D-render video in one API call: meet Lensora Studio

Om Prakash — Fri, 08 May 2026 05:29:25 +0000

A single photo goes in. An MP4 of the subject as a real 3D object — turning, dollying, or sweeping past the camera on a brand-new background — comes out. Two HTTP calls. Eighty credits. About four minutes of wall-clock.

That is the brief for Lensora Studio, the newest endpoint on PixelAPI. This post walks through what it does, the design choices behind it, and the slab-shaped detour we took to get the 3D step right.

A real Rolleiflex photo went in. This is one frame of the turntable MP4 that came out — the kitchen background was generated from a one-line prompt.

What it does, end to end

You hand the API a photo. It does four things back to back:

Detect. Object detection returns up to eight foreground proposals — bounding box, label, category — so a user can pick which thing to transform. Useful for messy frames, packshots that include props, or detection over-segmenting a logo into pieces.
Cut and rebackground. The chosen subject is segmented, and you choose what sits behind it: leave it transparent, drop in your own backplate URL, or describe the scene in plain English ("on a marble countertop with soft natural light").
3D. The cropped subject is rebuilt as a full 360° mesh with PBR textures. Real volume, real depth — not a flat plate that pretends to rotate.
Render. The mesh is composited over your new background and rendered as a 24 fps MP4 from one of three camera moves: turntable (full 360°), dolly (straight zoom-in), or cinematic (180° arc with depth-of-field).

You get back four artifacts every time: the hero MP4, a downloadable GLB you can drop into Blender / Unity / Three.js, a static composited still, and the alpha-cutout PNG.

The two-call shape

Step one is a multipart upload that returns object proposals plus a session_id:

curl -X POST https://api.pixelapi.dev/v1/studio/init \
  -H "Authorization: Bearer $PIXELAPI_KEY" \
  -F "image=@product.jpg"

{
  "session_id": "2a91884c-...",
  "objects": [
    {"label": "vintage twin-lens reflex camera", "category": "product", "bbox": [0.18, 0.12, 0.79, 0.93]},
    {"label": "entire image (no crop)", "category": "full_frame", "bbox": [0.0, 0.0, 1.0, 1.0]}
  ],
  "credits_used": 5
}

Step two picks an object, picks a background, picks a camera, and returns a job_id immediately while the pipeline runs in the background:

curl -X POST https://api.pixelapi.dev/v1/studio/transform \
  -H "Authorization: Bearer $PIXELAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "2a91884c-...",
    "object_index": 0,
    "background": {"type": "prompt", "prompt": "on a marble countertop with soft natural light"},
    "camera_preset": "cinematic"
  }'

You poll /v1/studio/result/{job_id} every few seconds. The step field walks through cropping → removing-bg → generating-bg → compositing → generating-3d → rendering-video → done so you can show real progress in your UI.

The full Python example is in the docs — sub-fifty lines including the polling loop and the GLB download.

The slab problem

Here is the part that ate two days.

The first version of the 3D step worked fine on the simple smoke tests we had. The output looked solid in catalog-style shots — clean object, isolated against a backdrop. So we shipped the canary and ran an end-to-end test on a Rolleiflex camera photo we had been using as a reference image for half a year.

The turntable opened on the front of the camera. Beautiful. Then it rotated 90°. And we saw a sliver. A thin sliver — barely visible at this angle. The model was a slab.

We measured. The thin axis of the bounding box was 15.7% of the longest axis. For a Rolleiflex — a roughly cube-shaped object that should be near 1:1:1 — that is a flat pancake.

The catch: from the front it looked perfect. The model had taken the input photo and built something that was mostly an extruded postcard. Texture was sharp on the front face, geometry was almost zero on the others. Three of our camera presets — turntable, dolly, cinematic — would all eventually expose the slab back or edge-on. We were going to ship a beautiful product gallery for thirty seconds, then a five-minute argument with a customer.

So we did the thing we did not want to do. We swapped the 3D engine for one that uses sparse-structure flow over a 3D occupancy grid instead of single-image-from-front extrusion. Re-validated on the same canary:

Old engine, thin axis: 15.7% of longest. Slab.
New engine, thin axis: 59.9% of longest. Real 3D.

That is a 3.8× depth recovery. Above 40% — our internal threshold for "this is a 3D shape, not a 3D image." And visually unmistakable:

Same camera, side-on profile. Real depth, side controls visible, no slab artifact. This is the cinematic preset mid-arc.

Auto-orient and arc-clamp

We added one more thing for safety. Before each render, the renderer now measures the bounding-box extents of the mesh and rotates it so the largest face points square to the camera at angle 0. This means:

The dolly preset never goes edge-on regardless of how the source photo was framed.
The turntable preset always opens on the hero face, then sweeps around.
The cinematic preset gets a clean front-on opening shot before the arc.

And as a defensive belt: if any future input ever produces a thin-axis ratio under 30% despite the new 3D engine, the renderer falls back to a ±60° rocking arc instead of a full sweep, so a slab — if one ever sneaks back in — can never be visible from a bad angle.

The auto-orient pass costs us nothing — it's a single 4×4 transform on the mesh — and it papers over a class of bugs we'd otherwise have to debug per-input.

Pricing

Step	Credits	USD
`/v1/studio/init` (detect)	5	$0.005
`/v1/studio/transform` (full pipeline)	75	$0.075
End-to-end	80	$0.08

No subscription. The transform credits are auto-refunded on any failure or timeout in the pipeline.

When to use this

Lensora Studio is a fit when:

You have product photos and you want catalog rotation videos without Blender.
You're building an e-commerce listing tool and rotation MP4s lift conversion.
You want a quick "turn this into a 3D-render ad" button and don't want to chain four separate APIs yourself.
You're prototyping AR previews and need GLB files alongside an MP4 thumbnail.

It is not a fit when:

The subject is a person, an animal, or flat-lay clothing. The 3D engine needs a discrete, rigid object — packshots, tools, electronics, accessories work great. People and pets do not, in this version.
You need 4K. Output is 768×768 today. Higher resolutions are on the roadmap.
You want a 30-second video. Each preset is 4 seconds at 24 fps. You can chain renders for longer pieces, but the unit deliverable is short.

Try it

Browser tool: pixelapi.dev/tools/lensora-studio
API docs: pixelapi.dev/docs/lensora-studio.html
Sign up: pixelapi.dev — free credits to try the flow end-to-end.

If you hit edge cases (interesting geometry, unusual subjects, a corner case our slab-detector missed), I'd love to hear about them. The canary that exposed the original slab was a Rolleiflex sitting on a desk for unrelated reasons — sometimes the bug only shows up on the photo you weren't expecting to test against.

Image Captioning API: Auto-Generate Alt Text and Descriptions

Om Prakash — Thu, 07 May 2026 10:02:29 +0000

Image Captioning API: Auto-Generate Alt Text and Descriptions

Most product catalogues, content feeds, and media libraries have one quiet shame: thousands of images with empty alt="" attributes, no search metadata, and no human-readable description. Writing them by hand does not scale. Here is the endpoint that makes that problem go away.

Today we are launching POST /v1/image/caption — a single endpoint that turns any public image URL into a caption tuned for the job you actually have: accessibility alt-tags, SEO descriptions, or full paragraph-length narration.

What it does

Send a public image URL. Get back text. That is the whole shape of the API.

What makes it useful is the style parameter. The same endpoint can produce three very different outputs from the same image, depending on what you are building:

concise (default) — tight, alt-text-shaped output. The kind of single sentence you want sitting inside an alt attribute. Screen-reader friendly, no fluff, no marketing language.
seo — keyword-rich descriptions written for indexing. Think product page meta-descriptions, image search optimisation, structured-data fields. Longer than alt text, denser with terms a search engine actually picks up.
detailed — paragraph-length narration. For accessibility contexts where someone genuinely needs to understand the image, not just identify it. Also useful when you want a content-moderation reviewer to triage uploads without opening every preview.

The full request shape is small:

Field	Required	Default	Notes
`image_url`	yes	—	Public URL of the image
`style`	no	`concise`	One of `concise`, `detailed`, `seo`
`max_tokens`	no	`64`	Length cap, range 32–256

That is the entire surface. No model selection, no temperature knobs, no system-prompt plumbing. You picked an endpoint called "caption" — we figured the job out so you do not have to.

max_tokens is a hard ceiling, not a target. A concise request with max_tokens: 256 will not waste tokens producing fluff; the style governs length, the cap is just there to protect you from runaway output in edge cases (extremely busy images, weird aspect ratios, etc.).

Why we built it

Image captioning is one of those features where the gap between "I could build this in a weekend" and "I have a production-grade pipeline serving 10k catalogue images a night" is enormous.

The weekend version is a Jupyter notebook calling someone's hosted model. It works on the demo image. It falls over the moment you point it at a real product feed: the URLs are sometimes 404, sometimes a redirect, sometimes a 50MB raw camera dump. Half the captions sound like art-gallery placards ("a serene composition featuring…") when you wanted alt text. The other half are three words long when you wanted a paragraph. You spend a week writing prompt scaffolding, retry logic, content filters, and length normalisation, and you still have not shipped the actual feature you set out to build.

The production version requires a team. We have already built that team's worth of work. The API in front of it is what we are shipping.

The other thing we kept hearing: one caption style does not fit every job. The alt-text you want on <img> tags is not the description you want in your og:image meta tag, and neither of those is what you want feeding into a moderation reviewer queue. Most APIs make you pick one shape and live with it, then post-process the output to fake the other two. We exposed the three styles directly because that is how the work actually splits in real codebases.

The angle, in plain terms: a purpose-built captioning endpoint, three styles in one call, priced flat-rate per request, no token accounting on your side.

Quickstart

Get an API key from the dashboard, then:

curl -X POST https://api.pixelapi.dev/v1/image/caption \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/source.jpg", "style": "seo"}'

The response is JSON with a caption string. That is it.

Same call in Python with requests:

import requests

response = requests.post(
    "https://api.pixelapi.dev/v1/image/caption",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "image_url": "https://example.com/source.jpg",
        "style": "seo",
        "max_tokens": 128,
    },
    timeout=30,
)

response.raise_for_status()
caption = response.json()
print(caption)

A few notes worth flagging up front:

The image_url must be publicly reachable. If your images live behind a signed-URL CDN, generate a short-lived signed URL and pass that. We fetch the bytes server-side; we cannot reach a URL that requires your session cookie.
style is the lever you reach for first. If a caption feels off, change the style before you start fiddling with max_tokens. The style governs register (alt-text voice vs. SEO voice vs. narration voice); the token cap only governs length.
Set a sensible client timeout. 30 seconds is comfortable for a single call; if you are batching, see the use-case section below for parallelism guidance.

If you are wiring this into a job queue, treat each call as independent. There is no statefulness across requests — captioning image A does not influence the caption for image B. That makes parallelism trivial: fan out as many concurrent requests as your account's rate limit allows.

Use cases

Bulk-generate alt text for an ecommerce catalogue (10k+ products)

This is the canonical job. You have a product database, every row has one or more image URLs, and the alt_text column is either empty or full of the SKU. Lighthouse is screaming. Your accessibility consultant has filed a report. Someone on the marketing side has noticed that Google Image search is sending you nothing.

The shape of the migration is straightforward: read product images out of the database in batches, fan out parallel calls to /v1/image/caption with style: "concise" for the on-page alt attribute, then a second pass with style: "seo" for the meta-description and structured-data fields. Two calls per product, sixteen credits, and the entire 10k catalogue gets done overnight while you sleep. The next morning you re-run Lighthouse and the accessibility score has moved 20+ points without anyone writing a single sentence by hand. The SEO improvement is harder to measure on day one but shows up in the indexing reports a few weeks later, when image-search referrals start trending up. This is the use case that pays for the integration on its own.

Auto-tag user uploads in an image-sharing app for search

User-generated content has a tagging problem: nobody tags their own uploads properly. They write "IMG_4823" as the title, no description, no tags, and then complain that search inside your app does not work. You cannot force users to write metadata — you have tried, the conversion drops every time you add a required field.

Solve it on the server. When an upload finishes, kick off a background job that hits /v1/image/caption with style: "detailed" to get a full sentence or two describing what is in the image. Index that text into your existing search engine — Postgres full-text, Meilisearch, Elastic, whatever you already have. Now "sunset over a mountain lake" finds the user upload that the user titled "IMG_4823.jpg". The user did nothing extra; their content became searchable. You can also display the detailed caption as default alt-text on the image so screen-reader users get a real description, not a filename. One API call, two product wins.

Build a moderation queue with auto-described preview labels

If you run a moderation queue, you know the bottleneck: a human reviewer has to open each preview, register what is in it, decide, and move on. The "register what is in it" step is where seconds add up across thousands of items. Anything that can be turned into a one-line label before the reviewer's eyes land on it is a direct throughput win.

Hit /v1/image/caption with style: "concise" on every flagged upload as it enters the queue. The caption becomes a sortable, filterable label next to the thumbnail. Reviewers can now scan the queue text-first and only open previews where the caption is ambiguous or the policy call is genuinely hard. You are not replacing the human — image moderation is a human-judgement job and should stay that way — you are removing the obvious cases from their cognitive load. Reviewers go faster, the queue burns down sooner, and the borderline cases get more attention because the easy ones got triaged on the way in.

Pricing

Flat rate per call. No token accounting on your side, no surprise overages on long detailed captions vs. short alt-text — every call costs the same regardless of which style you pick.

Credits per call: 8
Price (INR): ₹0.0054 per call
Price (USD): $0.00007 per call

To put that in catalogue terms: a 10,000-image alt-text run costs roughly ₹54 / $0.70. Two passes (one concise for alt-text, one seo for meta-descriptions) on the same catalogue is roughly ₹108 / $1.40. That is comfortably below the cost of a single hour of an engineer writing alt text by hand, for the entire catalogue.

A 100k-image media library indexed for natural-language search costs roughly ₹540 / $7.00 as a one-time pass, plus whatever your incremental ingestion rate adds — usually rounding error.

There is no minimum, no monthly commitment, no per-seat pricing. Top up credits, make calls, the meter runs down by 8 each request. If a request fails (we return a non-2xx), no credits are deducted.

Try it

Grab an API key and start captioning:

Dashboard: pixelapi.dev/dashboard — sign up, generate a key, top up credits.
Docs: pixelapi.dev/docs — full reference for /v1/image/caption including all request fields, response shape, error codes, and rate limits.

If you are migrating off a hand-rolled captioning script or an older general-purpose vision API, the easiest way to evaluate is to run a hundred of your real images through all three styles and eyeball the output. Pick the one that fits your surface, wire it in, ship it. The whole integration is a single HTTP call — there is genuinely nothing else to learn.

Build something useful with it. We will keep making the captions better underneath; you will not have to change a line of code when we do.