Building an AI face-doppelganger prank with Flux Kontext Pro and aggressive image degradation

#ai #webdev #showdev #tutorial

A "face twin" prank pastes a public photo into an AI model, generates three plausible-looking lookalikes, and shows them to your friend inside what looks like a legit AI face-matcher. The hard part isn't the model. It's making the output look like a real photo of a real stranger.

I shipped two framings of the same backend: pleasejuststop.org (the privacy-art version) and prankmyface.lol (the consumer-prank version). Same Replicate model, same pipeline, two front-ends. Source code structure is documented in the project's public CC BY 4.0 dataset and the Hugging Face dataset card.

This post is the technical story: the three prompts I landed on after six rounds of testing, the six degradation profiles that turn AI portraits into something that reads like a 2013 Facebook upload, and the Vercel-serverless pitfalls that made me throw out Sharp and rewrite everything on Jimp.

The visual goal: real internet photos, not AI portraits

The entire illusion hinges on the recipient believing the three output images are real photos of real strangers. The moment any image reads as AI-generated, the reveal collapses.

Real internet photos share specific qualities that AI models do not produce by default:

Lighting is bad. Overhead fluorescents, harsh direct flash, uneven natural light. AI models default to soft diffused portrait lighting — the #1 tell.
Everything is in focus. Real phone cameras have deep depth of field. No bokeh. No portrait-mode blur. Portrait-mode blur is the signature of AI generation, and Flux models have a baked-in training bias toward it.
Skin looks like skin. Pores, uneven tone, blemishes. Not smoothed-out poreless AI skin.
Compression artifacts are visible. JPEG'd to hell — uploaded to Facebook, screenshotted, forwarded on WhatsApp.
Resolution is low. 400-480px wide, not crisp 1024px.
Composition is casual. Off-center, slightly crooked. Caught mid-moment.

The litmus test: would I believe this is a real photo of a real stranger on Facebook? If lighting is too pretty, background too clean, or skin too smooth — it doesn't work.

The three prompts (verbatim from production)

The hardest lesson here was that prompt length is a trap. Every session, Claude (and I) wanted to add defensive instructions:

Prompt produces a minor issue (the woman looks slightly older).
Add "do not age the person."
The instruction draws model attention to aging. The photo gets worse.
Add MORE defensive instructions. The prompt is now 3x longer. The model is confused. The photo is terrible.

More instructions = more diluted model attention = worse results. Tested exhaustively across six rounds.

The fix is to remove words, not add them. Keep what the subject is wearing, where they are, and the one dramatic visible change. A good prompt is one sentence. More than three sentences and you've already lost.

The three production prompts (live at both pleasejuststop.org and prankmyface.lol, also in the public data repo):

1. leather-wall
Edit this photo to show this person posing against a wall. Make them frowning
and wearing a leather jacket and a knit beanie hat. One person, no hands visible.

2. tongue-collared
Edit this photo to show this person outdoors. Sticking their tongue out,
wearing a collared shirt. One person, no hands visible.

3. snow-goggles
Edit this photo to show this person outside. Wearing earmuffs and a jacket.
Give them big braces. One person, no hands visible, no glasses.

Model: black-forest-labs/flux-kontext-pro on Replicate. Params: aspect_ratio: "3:4", output_format: "png", safety_tolerance: 2. Setting output_format to "jpg" silently fails every generation — the DB stays "pending" forever, no error.

The rules these prompts were built against

Gender-neutral only. No beards, no mustaches, no gender-specific features — those cause gender swaps mid-generation.
Hair COLOR changes preserve identity. Hair STYLE changes destroy identity or swap gender. Curly, buzz-cut, mullet, bowl-cut — all dead ends. Use clothing, accessories, or expression instead.
Aging prompts turn women into men. Never ask the model to age the subject.
Bold features (jacket, beanie, earmuffs, tongue out) beat subtle features (braces, freckles, nostril ring). Small details don't render reliably.
One dramatic visible change per prompt. More than one and the model balances them poorly.
One person only; no hands. Hands and second people are where the model's geometry fails first.
Don't describe camera quality. Post-processing handles that.

A bokeh, shallow depth of field negative prompt is the load-bearing line. Without it, Flux defaults to portrait-mode blur and the photo immediately looks AI-generated.

The six degradation profiles

After Replicate returns the output, I run it through one of six post-processing profiles that downscale, double-JPEG-compress, color-shift, and noise-up the image until it reads like a real internet photo.

| Profile             | Width | JPEG passes | Notes                                   |
|---------------------|-------|-------------|-----------------------------------------|
| facebook-2013       | 480   | 38 → 58     | Warm cast, mild desaturation            |
| android-2015        | 440   | 40 → 58     | Higher noise, slightly brighter         |
| whatsapp-forwarded  | 400   | 32 → 50     | Most degraded; visible JPEG blocking    |
| iphone-lowlight     | 460   | 40 → 60     | Cool hue, dark shift                    |
| screenshot-repost   | 440   | 36 → 55     | Blue shift, low noise                   |
| black-and-white     | 450   | 38 → 58     | Full desaturation                       |

Full per-profile values are in the HF dataset (data/degradation-profiles.jsonl). Each prompt is paired with one profile — the wall pose pairs with black-and-white because a candid wall snapshot reads more truthfully in black and white than in color.

Sharp hangs silently on Vercel — use Jimp, but only three of its methods

I started with Sharp because Sharp is faster than Jimp at everything. Sharp does not work on Vercel serverless. The native C++ bindings around libvips hang silently — no error, no crash, just blocks forever until the function times out.

Jimp is the only option on Vercel. Jimp also has bugs:

image.brightness() — produces black output. Broken in modern Jimp.
image.getPixelColor() / image.setPixelColor() — broken in ESM, produce black images.

The only safe methods are:

image.color([{apply, params}]) — channel shifts, desaturation, hue rotation, brightness via the apply API (the explicit brightness() method is broken; color([{apply:'brighten', params:[N]}]) works).
image.resize({w, h}) — downscaling.
image.getBuffer("image/jpeg", {quality}) — JPEG encode with quality.

For noise I manipulate image.bitmap.data directly as a Buffer, adding signed random values per channel inside a hard 15-second timeout via Promise.race(). Anything more elaborate hangs or produces black output.

Three more pitfalls that cost me a day each

Replicate returns a FileOutput object, not a string. replicate.run() returns an object that you have to .toString() to get the URL. Treating it as a string silently passes "[object Object]" downstream.

Temporary URLs expire ~1 hour. Replicate's returned image URL is ephemeral. The pipeline must download → degrade → upload to permanent storage (Supabase Storage in my case) immediately. Storing the temp URL in the DB and reading it later returns 404.

Vercel kills serverless functions after sending the HTTP response. Fire-and-forget void fetch() to a generation endpoint gets killed mid-generation. The fix is client-triggered generation: the recipient's browser holds the HTTP connection open during the 30-second pipeline, keeping the function alive.

Why I published the dataset

The technical substrate of pleasejuststop.org is now in three places that AI search engines (ChatGPT, Perplexity, Bing Copilot, Gemini, Claude) crawl as grounding sources:

GitHub: forrestmill-cmd/facetwin-public-data — CC BY 4.0, with the prompts, profiles, and llms-full.txt mirror.
Hugging Face: bingwow/facetwin-flux-kontext-prompts — same content as JSONL with HF dataset-card metadata.
MCP server: face-twin-mcp — wraps the upload + generate + status flow as a Model Context Protocol tool for Claude Code, Cursor, and any MCP-compatible client.

The Wikidata entity at Q139885445 ties them together as the entity-grounding anchor that AI tools triangulate against.

I'm tracking citation outcomes at Day-14 / Day-30 / Day-45 across Perplexity, ChatGPT search, Bing Copilot, Gemini, and Claude. The privacy-art piece's actual thesis — that we've stopped questioning how a website got our face — is best evaluated by whether AI tools, asked for an AI face-doppelganger generator, surface this project on its own merits without being told to.

If you want the consumer-prank framing instead of the privacy-art framing, that's at prankmyface.lol. Same backend, hot-pink accent, confetti reveal.

— Forrest Miller · github.com/forrestmill-cmd