Gautam

Posted on May 17

Why I built Inktag — a `<img>`-shaped tag that locks AI images to your brand

#ai #webdev #buildinpublic #react

The moment I gave up on stock photos

In January I pulled my image invoices for the prior year. Shutterstock, Unsplash+, the occasional Getty one-off.

$387. For pictures of "diverse team smiling at laptop" and "hands typing on MacBook keyboard."

The thing that bugged me wasn't the money. It was that my blog looked like every other blog on the internet using the same three stock sites. I'd written 40-something posts that year and not one of them looked like mine.

So I switched to AI image gen for everything. Midjourney first, then gpt-image-1 when it launched, then a flux model on fal. And I hit a different wall.

The actual problem with AI images for content sites

It's not quality. Quality is genuinely fine now. The problem is consistency across generations.

I'd write Monday's post and prompt for "a moody wine cellar at golden hour." I'd get back something painterly, warm, beautiful.

I'd write Thursday's post and prompt for "a busy coworking space." Same model, same week. I'd get back something that looked like a different photographer, different palette, different century.

Open my blog homepage and the cards looked like four people had designed the site.

The model has no idea what my brand is. It can't. Each prompt starts from zero. So the surface area for drift is enormous — palette drifts, lighting drifts, that weird AI gloss appears and disappears, sometimes there are people, sometimes there aren't.

I tried the usual fixes:

Reference image + prompt every time. Tedious. And it still drifts after three or four generations because the reference signal is weak relative to the prompt.
A LoRA fine-tuned on my brand. Works, kind of, but now I'm retraining every time I want to tweak the vibe. And it's $$$.
Style presets in Midjourney. Closer, but it lives in a Discord channel, not in my codebase. There's still a human prompting step. I want this in the build pipeline.

None of them put the brand somewhere the model couldn't override.

What I actually wanted

A tag. A React component that took a prompt as a prop. And a config file — set once — that locked the parts the model isn't allowed to choose.

// brand.config.ts — set ONCE per site
export default {
  palette: ["#4a1a14", "#8a3a2a", "#d8a878", "#f0e0c0"],
  style:   "editorial photography, warm low contrast, 35mm",
  aspect:  "16:9",
  format:  "webp",
  neverInclude: ["text overlays", "watermarks", "human faces"],
};

// any blog post — varies per post
<Inktag prompt="a wine cellar at golden hour" />
<Inktag prompt="a busy coworking space at 6pm" />

The shape of the API is the point. The model isn't asked "what should this image look like?" It's only asked "what's the subject?" Everything else is locked at the config layer, before the prompt assembly.

If you imagine the prompt the model actually sees, it's roughly:

[brand style block — fixed]
[palette constraint — fixed]
[aspect — fixed]
[negative prompts: things never to include — fixed]
[subject from <Inktag prompt="..."> — varies]

The brand block is 80% of the prompt by token count. The subject is the last 20%. Drift on the things I care about (palette, style, banned elements) is structurally impossible because they're never up for negotiation.

The part I didn't expect to need

I shipped a v0 that just assembled the prompt and called one provider. It mostly worked. But "mostly" is doing a lot of work there.

Sometimes the model would slip a face into the image even with "no human faces" in the negative prompt. Sometimes the palette would drift on a particular subject (cellars came out the right warm; coworking spaces came out cold blue every time).

So I added a second pass: after generation, run a vision check against the same constraints. Palette histogram has to be within X of the brand palette. No banned elements detected. If it fails, regenerate.

const image = await router.generate(assembledPrompt);
const check = await vision.audit(image, brand);

if (!check.passes) {
  return router.generate(assembledPrompt, { avoid: check.failures });
}

About 8% of generations regenerate. The user never sees the failed one. From the React tag's perspective, you always get back an image that passes the brand check, or it errors loudly.

The part I really didn't expect to need

Cost.

A naive implementation calls the image API on every page render, which is insane. So there's a cache, keyed on (brand config hash, prompt, aspect, format). First render is slow (~2s), every subsequent render of the same <Inktag> instance is ~100ms from R2.

In the last week of beta:

2,341 image requests
184 unique renders (the rest were cache hits)
~92% cache hit rate
Average cost per unique render: $0.015, down from ~$0.042 when I was on a single provider

The 64% cost drop came from routing between providers per-render based on the aspect, the brand style affinity, and current p95 latency. None of those numbers came out of theory — they came out of generating the same 30 prompts across every provider on a Saturday and ranking the results by hand.

Who this is for, and who it isn't

It's for content sites with a brand. Blogs, docs sites, marketing sites, newsletters that publish on a schedule and want their hero images to look like they came from the same place.

It is not for:

One-off images (Canva is faster)
Memes (the constraints fight you)
Sites without a brand yet (figure out the brand first, then come back)

That's a real filter. The thing I had to internalize building this is that constraints are the product. If you don't have constraints to lock, there's nothing to lock.

What's in beta and what's next

Working today:

React SDK (<Inktag prompt="..." />)
Brand config + dashboard
Multi-provider routing (OpenAI / fal / Replicate, with Nano Banana on deck)
Post-generation vision check
R2-backed cache

On deck:

Vue and Svelte SDKs
Self-hosted cache option (point it at your own S3/R2)
Public pricing

The beta is at inktag.io — no card, ~200 seats, I'm hand-picking by use case so I can actually reply to people. Reach out if the "every post looks like a different blog" thing resonated.

If you're building in this space or have thoughts on the API shape, I'd love to compare notes in the comments. Especially curious whether anyone else has landed on a different way to keep AI images on-brand across generations — I'd genuinely like to know what I missed.

— Gautam