Instruction-based image editing: just say what to change

#api #imageediting #ai #devtools

Instruction-based image editing: just say what to change

Most image-editing APIs make you do too much work — draw a mask, write a prompt that re-describes the entire scene, then pray the rest of the picture survives. We got tired of it. So did our customers. This launch is for everyone who just wants to say "make the shirt navy" and have the rest of the image stay exactly where it was.

What it does

POST /v1/image/edit takes a source image and a single English instruction, and returns an edited image. That's it. No mask uploads. No re-describing the scene. No "negative prompts" gymnastics. You point at an image, you tell it what to change, and you get the change back.

Three fields go in the body:

image_url — your source image. We accept a public URL, a data: URI, or a bare base64 string. Whatever your pipeline already produces, we'll take it.
instruction — plain English. "Make the sky sunset." "Replace the green wall with exposed brick." "Add subtle motion blur to the background." Write it the way you'd brief a junior designer over Slack.
strength — a float from 0.0 to 1.0 controlling how aggressively the edit is applied. Default is 0.75, which is the sweet spot for most edits. Push it up for dramatic re-imaginings, push it down when you want a whisper of the change rather than a shout.

The endpoint is POST /v1/image/edit. It's a single, synchronous, JSON-in / JSON-out call. No webhooks to wire up for the common case, no job IDs to poll. You hit it, you get a result.

The big thing to internalise: you do not have to describe the parts of the image you want to keep. That's the whole point. If your photo has a model in a white shirt standing in front of a green wall, and you ask us to "replace the green wall with exposed brick," the model, the shirt, the lighting, and the framing all stay. Only the wall changes. This is the workflow most teams actually want, and it's surprisingly hard to find off the shelf.

Why we built it

The instruction-based image-editing space has, for the last year or so, been dominated by one or two closed offerings from large labs. They work well, but they come with the usual problems for anyone trying to build a real product on top: opaque pricing, surprise rate limits, surprise content policies, and the sense that the rug could be pulled at any sprint.

On the open side, the typical workflow has been a pain. You either:

Use a generic image-to-image model and describe the entire scene in your prompt (which is fragile — you change a comma and the model's face mutates), or
Hand-author a binary mask telling the model where to edit, then write a prompt for just that region (which is fine for tooling, awful for end-user features).

Neither of those is what most product teams want. Product teams want: user uploads a picture, user types "make it golden hour," product returns a new picture. That's the user-facing primitive. Everything else is plumbing.

So we built the plumbing. /v1/image/edit is instruction-based — no mask, no need to describe the whole image. It's the closest open analogue to GPT-4o image edit, served from our own infrastructure with predictable per-call pricing. We own the stack end to end, so when we say "28 credits per call, every call, every region," we mean it. No tier surprises, no metered overage that you discover at month-end.

The angle, in plain words: buyer-language editing. Your users don't think in masks and latents. They think "make the shirt navy." We let your code stay that close to your user's intent.

Quickstart

Here's the smallest possible call. Copy, paste, swap your key, run it.

curl -X POST https://api.pixelapi.dev/v1/image/edit \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/source.jpg", "instruction": "make the sky sunset"}'

That's the whole thing. You'll get a JSON response with the edited image back — ready to pipe into S3, a CDN, or straight back to your user.

If you live in Python, here's the same call using requests:

import requests

resp = requests.post(
    "https://api.pixelapi.dev/v1/image/edit",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "image_url": "https://example.com/source.jpg",
        "instruction": "make the sky sunset",
        "strength": 0.75,
    },
    timeout=60,
)

resp.raise_for_status()
edited = resp.json()
print(edited)

A few practical notes from people who have already wired this in:

Strength is a dial, not a switch. If your first call looks too aggressive — face features drifting, brand colours shifting — drop strength to 0.5 and try again. If it looks too timid, push to 0.85. You'll find a number that fits your specific use case within three or four calls.
Instructions should be specific but short. "Make the wall brick" beats "Please replace the green wall on the left side of the image with a tasteful red brick texture in the style of an industrial loft." The model parses intent better when the intent is small.
Send the largest image you reasonably can. Edits done on a 2048×2048 source look noticeably better than the same edit on a 512×512 source, even if you downscale the result afterwards. Source quality matters.
Cache aggressively. The same (image_url, instruction, strength) triple is a cache key on your side. If your users are tweaking copy on a marketing image, don't re-bill yourself every keystroke — debounce.

Use cases

Three concrete scenarios, drawn from the kind of product work this endpoint is built for. None of these need a designer in the loop.

Make this product photo look like it was shot at golden hour. You're running a Shopify-style storefront with thousands of catalogue images shot under flat studio light. Your marketing team wants the homepage hero to feel warm and aspirational — golden-hour, low sun, long shadows — but reshooting the catalogue is out of the question. With /v1/image/edit, you point at the existing studio shot, send the instruction "make this product photo look like it was shot at golden hour", and get back a version where the lighting wraps around the product, the shadows lengthen, and the colour temperature shifts toward amber. The product itself — the actual SKU your customer is buying — stays faithful. The photons change; the merchandise doesn't.

Replace the green wall with exposed brick. You're building a real-estate or interior-design app where users upload room photos and want to preview redesigns before committing to a renovation. A user uploads a photo of their living room — green-painted feature wall, sofa, plants, the lot — and types "replace the green wall with exposed brick." You forward that straight to the endpoint. The wall comes back as red-brown brick, mortar lines and all, with the sofa, the cushions, the floor lamp, and the houseplant untouched. No mask, no segmentation step on your side, no prompting your user to "click the wall." The instruction is the entire UI.

Make the shirt navy instead of white. You're running an e-commerce platform where each product is shot in one base colour but sold in eight. Today you either re-shoot every variant (expensive, slow) or apply a hue shift in post (looks fake, breaks on textured fabrics). With instruction-based editing, your image pipeline takes the white-shirt master and emits the seven other variants by sending instructions like "make the shirt navy instead of white", "make the shirt forest green instead of white", and so on. Folds, shadows, and cloth texture stay coherent because the model understands "shirt" as a semantic region, not a colour range. You ship eight variants from one shoot.

Pricing

We've kept this stupidly simple, because that's what people ask us for in every other API call.

28 credits per call.
₹0.019 per call (Indian rupees).
$0.00022 per call (US dollars).

That's it. Same price for the first call, the millionth call, and every call in between. No tiered surprise, no "compute units" you have to mentally translate, no separate charge for the input image. You're billed per successful edit. Failed calls — bad image URLs, server errors on our end — don't cost credits.

A few quick gut-checks on what that means in practice:

A million edits costs roughly $220. That's two hundred and twenty US dollars to re-light, re-colour, or re-style a million product photos. Compare that against either the per-image cost of a designer or the surprise bills from closed-source competitors and you'll see why we picked this number.
Per ₹1, you get about 52 edits. Useful framing if you're building for the Indian market and want to show a fair-use plan to your users.
Credits are pooled across all PixelAPI endpoints. If you're already using us for other image work, the same credit balance covers edits — no separate top-ups, no separate billing.

If you have an unusual volume profile — millions per day, or bursty enterprise use — talk to us. The base rate is what's published; we're reasonable people about volume.

Try it

Two links and you're off to the races:

Get an API key: https://pixelapi.dev/dashboard — sign in, generate a key, you'll have free starter credits in your account by the time the page finishes loading. Plenty for an evening of prototyping.
Read the docs: https://pixelapi.dev/docs — full request/response schema, error codes, rate-limit behaviour, and a handful of worked examples beyond the three above.

The fastest way to know whether this fits your product is to take one of your own real images, point the curl at it with an instruction your users would actually type, and look at what comes back. Five minutes, one API key, twenty-eight credits. If it's the right fit, you'll know immediately. If it's not, you'll know that too — and we'd genuinely like to hear what was missing. The endpoint is new, the roadmap is open, and the team that built it reads its own inbox.

Go break something. We'll fix it.