Om Prakash

Posted on May 5 • Originally published at pixelapi.dev

ControlNet API: pose, depth, and canny-locked image generation

#api #generativeai #imagegeneration #webdev

ControlNet API: pose, depth, and canny-locked image generation

If you've ever shipped a creative tool that needed "same pose, different outfit" or "same building, different style", you know the gap. Plain text-to-image gives you a coin flip on composition. Today we're closing that gap with a single endpoint that locks the bits you care about and regenerates the rest.

What it does

POST /v1/image/controlnet is guided image generation. You hand us a reference image and tell us which signal to extract from it — canny for edges, depth for 3D structure, openpose for human pose, or scribble for rough line drawings. We pull that signal, freeze it, and then generate a new image around it from your text prompt.

The practical upshot: the silhouette, stance, perspective, or line structure of your reference survives the round-trip. Everything else — colour, style, subject, background, lighting — is yours to direct with the prompt.

Four request fields you'll actually use:

image_url — the reference image we extract the control signal from. Required.
control_type — one of canny, depth, openpose, scribble. Required. Pick based on what you need preserved: edges, 3D layout, human pose, or rough lines respectively.
prompt — what you want the output to look like. Required.
negative_prompt — things to keep out of the frame. Defaults to empty.
strength — how strictly the output adheres to the control signal, on a 0.0–1.0 scale. Defaults to 0.8. Drop it to 0.5 if the control is fighting your prompt; push it to 0.95 if the output is drifting off-pose.

That's the whole API. No model selection, no scheduler tuning, no separate endpoints per control type. One call, one response.

Why we built it

Two things were broken in the market we kept hearing about from teams building on top of generative APIs.

Problem 1 — controls were unbundled and overpriced. Most rivals charge separately per control type, or they only ship canny edge control and call it a day. If your product needs both pose-locked photoshoots and depth-locked architectural renders, you end up integrating two APIs, juggling two sets of auth, and paying twice. That math gets ugly fast when you're shipping a feature, not running a research lab.

Problem 2 — text-to-image alone is a casino. When a designer hands a junior dev a brief that says "same pose as this reference, but make her wearing a saree and standing on a Mumbai rooftop", text-to-image is the wrong tool. You'll spend hours re-prompting and never quite land the pose. That's a workflow problem, not a creativity problem, and it deserves a workflow primitive.

Our angle: four control types under one endpoint, one auth header, one billing line. Pick the signal you need at request time. The plumbing — extracting the control map from your reference, conditioning generation on it, returning the final image — runs on our self-hosted infrastructure so you don't have to think about cold starts or queue depth.

We're shipping this because guided generation is the boring, load-bearing primitive that most production creative tools quietly need. You shouldn't have to stitch it together yourself.

Quickstart

Grab an API key from the dashboard. Then this curl works as-is — replace YOUR_API_KEY and the image_url and you're live:

curl -X POST https://api.pixelapi.dev/v1/image/controlnet \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/source.jpg", "control_type": "canny", "prompt": "anime character"}'

The same call in Python, using requests:

import requests

response = requests.post(
    "https://api.pixelapi.dev/v1/image/controlnet",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "image_url": "https://example.com/source.jpg",
        "control_type": "canny",
        "prompt": "anime character",
    },
    timeout=60,
)

response.raise_for_status()
data = response.json()
print(data)

A few things worth knowing before you wire this into a feature:

image_url must be publicly fetchable. If your reference images live in a private bucket, generate a short-lived signed URL before the call. We pull the image server-side; localhost won't work.
Pick control_type based on what should survive. If the silhouette matters most, use canny. If the 3D layout matters (architecture, interiors, product staging), use depth. If you're locking a human stance, use openpose. If you're working from a rough sketch or storyboard panel, use scribble.
strength is your steering wheel. When prompt and reference are pulling in opposite directions, the output looks confused. Lower strength gives the prompt more room. Higher strength keeps the structure tight even when the prompt strays.
Use negative_prompt to clean up the long tail. "blurry, watermark, extra limbs, low quality" is a reasonable starter. Add domain-specific exclusions as you learn what your customers complain about.

A small tip from internal testing: when an output isn't landing, change strength before you re-roll the prompt. Most "this looks weird" results are strength being too high or too low for the prompt-reference combination, not a prompt-engineering problem.

Use cases

Pose-locked product photoshoots

A direct-to-consumer apparel team has 200 SKUs to shoot for the spring drop, one human model booked for a half-day, and a backlog of seasonal campaigns. With openpose, they shoot the model once in each of a dozen hero poses, then drive the catalogue from there. For every SKU they pass the reference image, set control_type to openpose, and prompt the new outfit and the new backdrop — Goa beach, Bandra rooftop, neutral studio grey. The model's stance and proportions stay locked. The dress, the lighting, and the location regenerate cleanly around it. What used to be a multi-day shoot turns into one shoot day plus an afternoon of API calls. The model gets paid once. The catalogue ships in a week.

Convert a sketch to a finished illustration while preserving line structure

A studio doing children's-book illustration works the way most illustration studios do: an artist roughs a panel by hand, the team agrees on composition, and then someone has to take that rough into a finished, coloured, styled piece. With scribble, the rough itself becomes the control signal. Pass the photographed sketch as image_url, set control_type to scribble, and prompt the finished style — "warm watercolour, soft afternoon light, in the style of a 1970s Indian children's book". The line structure the artist agreed on survives. Colour, texture, and mood are generated. The artist stays in the loop on composition, which is the part of the job they actually care about, and they stop manually colouring fifty panels per book.

Architectural renders with depth locked from a wireframe

A small architectural visualisation studio used to spend the last forty-eight hours of every project re-rendering scenes for client review — same building, swap the morning light for evening, swap the monsoon sky for a clear one, swap the weathered concrete for fresh paint. With depth, they export a depth pass from their 3D wireframe, host it, and pass it as image_url with control_type set to depth. The building's geometry — the perspective, the volumes, the way the building sits on the plot — is locked from the wireframe. The prompt drives the material, time of day, weather, and styling. They can ship six client variations in the time it used to take to render two, and the geometry never drifts between them.

Pricing

Each successful call to /v1/image/controlnet costs 14 credits, which works out to ₹0.0095 per call in INR or $0.00011 per call in USD.

That's a flat rate regardless of which control_type you pick. Canny, depth, openpose, scribble — all 14 credits. We deliberately did not split the price by control type because we don't want you doing capacity planning around which signal your product needs this quarter. Pick the right control for the job.

For back-of-the-envelope math: a million-call month lands at ₹9,500 / $110. A startup running 10,000 generations a day for a creative tool sits at roughly ₹2,850 / $33 per month on this endpoint. If you're building a consumer product where each user might trigger five to ten generations per session, the unit economics work out cleanly even on a free tier.

Failed calls don't bill. If the reference image fails to fetch, the control type is invalid, or the request is malformed, you get a 4xx and zero credits are consumed. You only pay for completed generations.

Try it

You can be live in five minutes:

Get an API key at the PixelAPI dashboard — sign-up takes about a minute, and new accounts ship with enough free credits to try every control_type a few dozen times.
Read the full request and response schema, error codes, and tuning guidance in the ControlNet docs.
Paste the curl from the Quickstart above. Swap image_url for one of your own references. Iterate on prompt and strength until the output lands.

If you're integrating this into an existing creative product, the migration path is straightforward — most teams replace a stack of separate "edge control", "pose control", and "depth control" calls with a single control_type field, drop one round-trip from their pipeline, and cut their integration code roughly in half.

If you build something fun on top of this, we'd love to see it. Drop us a line through the dashboard, or just ship and tag us — we read everything that comes in. Now go lock some poses.

DEV Community

ControlNet API: pose, depth, and canny-locked image generation

ControlNet API: pose, depth, and canny-locked image generation

What it does

Why we built it

Quickstart

Use cases

Pose-locked product photoshoots

Convert a sketch to a finished illustration while preserving line structure

Architectural renders with depth locked from a wireframe

Pricing

Try it

Top comments (0)