TL;DR. I built pinock.io — an endless feed of AI-generated animals in 1960s Soviet matchbox poster style. Free, no signup, no watermark. Under the hood: FLUX.2-klein + a custom LoRA + a two-pass "sandwich" pipeline. I posted it on r/StableDiffusion, got a long technical critique with three specific complaints, and ran a 75-image ablation (5 pipeline variants × 5 categories × 3 seeds) to verify. The critic was right — and the ablation surfaced one finding I did not expect: my LoRA literally renders Cyrillic gibberish into the output at the "textbook-correct" inference settings. This is a postmortem.
What pinock.io does
Open the site → see a feed of AI-generated animals in vintage Soviet/Eastern-European matchbox label illustration style. New image every 30 seconds. ~6,700 images so far. You can like, download, share, search ("cat", "owl"), or queue your own one-word prompt. No accounts, no watermarks, no paywalls.
Stack (deliberately tiny so one person can maintain it):
- Frontend: vanilla JS, Caddy, static
- Backend: FastAPI + SQLite (WAL mode) on a cheap Ubuntu box
- FLUX worker: one RTX 3090 on vast.ai (~$0.20/hr), tunneled in via SSH
- Caption worker: Qwen2.5-VL-7B INT4 on a secondary box
- Real-ESRGAN x2 for upscaling Hall-of-Fame images
- Stripe for paid edit-tokens (Gemini 3.1 Flash Image)
Cost per generated image: ~$0.01.
The "two-pass sandwich" — and why it's a hack
Each generation runs two passes:
prompt = "cat"
│
├─ Pass 1: FLUX.2-klein + matchbox LoRA (rank=32, alpha=64, scale=2.0)
│ text2image, 28 steps
│ → output_b1 (stylized but with broken anatomy)
│
└─ Pass 2: FLUX.2-klein, no LoRA
img2img from output_b1, strength=0.9, 28 steps
→ output_b (final)
Why? I trained the LoRA on ~300 matchbox samples. At lora_scale=1.0 the style was barely visible. At lora_scale=2.0 the style appeared but anatomy broke (extra limbs, fused heads). I patched it: pass-2 takes the broken pass-1 as init and at strength=0.9 essentially redraws the image from scratch, leaving only a low-frequency "style fingerprint." It works empirically.
It also sounds like a trick.
The Reddit critique that made me sit down
Posted on r/StableDiffusion. Got a long, technically-precise comment from u/DelinquentTuna. Three points:
lora_scale=2.0over-cooks the LoRA, and you then nuke it with strength=0.9 in pass-2 — you're discarding ~90% of the LoRA's output.- FLUX.2-klein has native edit/style-transfer features. I (the critic) ran your images through it on a 4080 16GB and got 4× larger output (1024×1024) in 9 seconds with more cohesive style. Use the edit feature, not your handrolled i2i.
- ~300 examples is too few for matchbox aesthetic (halftone, limited palette, lithographic textures). You need 5× the dataset and proper captions.
All three were technically correct. I sat down to ablate.
The ablation — 5 variants × 5 animals × 3 seeds = 75 images
Tested on the prod rig (RTX 3090 + FLUX.2-klein + matchbox LoRA, same stack as production). Two tmux scripts, ~30 minutes total, results gridded with PIL.
| Code | Description | Params |
|---|---|---|
| A | Pure FLUX, no LoRA, bare prompt | baseline |
| B | LoRA t2i pass-1 snapshot (raw LoRA before "sandwich" pass-2 nukes it) | lora_scale=2.0, prompt="cat" |
| C | Current production sandwich | lora=2.0, pass2_strength=0.9 |
| D | Single-pass with style prompt (critic's suggestion #1) | lora=1.0, prompt="cat, matchbox poster style, 1960s Soviet, woodcut, halftone, limited red-black palette" |
| E | Edit-style: pure FLUX → img2img with style prompt (critic's suggestion #2) | init=A, lora=1.0, strength=0.5 |
Categories: cat, fox, owl, lion, wolf. Seeds: 42, 1337, 80085 (chosen before runs; three repeats to catch seed-dependence).
Findings, in order of how much they hurt
Variant B — LoRA at scale=2.0, bare prompt (snapshot)
Total collapse. On every seed, all 5 categories look almost identical — colored texture noise:
- seed=42: red-orange wavy stripes
- seed=1337: green "forest noise"
- seed=80085: gold smear
No anatomy. The LoRA at scale=2.0 does not generate animals. It generates poster-texture, because I overcooked the inference weight. Which is exactly why I invented the sandwich — I was watching this catastrophe and trying to hide it behind pass-2.
The critic saw it instantly. I did not.
Variant D — single-pass with style prompt at scale=1.0 (suggestion #1)
A different kind of catastrophe. On seed=42, several output images contain literal Cyrillic gibberish text: "СТАДИНАМ" or similar, baked into the image. On seed=1337, all 5 categories collapse into nearly-identical "red silhouette on dark" compositions. On seed=80085, again all 5 collapse to "red silhouette on white."
What happened: the training set (~300 examples) included Soviet posters with Cyrillic text and red dominant backgrounds. At lora_scale=1.0 plus a long, "correct" style-prompt, the LoRA starts recalling whole posters from training rather than transferring style. Textbook training-set leakage.
This is the most interesting observation in the series. The critic's advice — "use scale=1.0 with a proper style-prompt" — is theoretically right, but on this LoRA it just exposes how badly it's overfit to specific training examples.
Variant E — edit-style refinement (suggestion #2)
Style barely visible. At strength=0.5 + lora=1.0 the LoRA can't punch through the FLUX prior. Output looks like A with a faint illustrative tint. Not matchbox.
To get the style to come through I'd need strength≥0.7 — which lands us back in i2i sandwich territory, where the same Cyrillic / collapse will reappear via img2img.
Variant C — current sandwich
Works adequately. Recognizable animals with visible matchbox aesthetic: woodcut linework, halftone backgrounds, limited palette, sometimes Morris-style floral patterns. Stable across all 3 seeds.
Mechanism: pass-2 at strength=0.9 takes the broken pass-1 (B), adds 90% noise, redraws. From pass-1 only a low-frequency signal survives — overall composition and color profile. That injects style without leaving room for anatomy to break.
The headline conclusion
The current sandwich (C) wins this matchup — but it's a patch on top of a poorly-trained LoRA, not the right architecture.
All three "alternative" approaches (B raw, D single-pass-styled, E edit-style) revealed the same underlying problem: the LoRA at scale=1.0 tries to reproduce training set examples wholesale instead of transferring style. The sandwich works precisely because pass-2 at strength=0.9 burns that memorized content down to a low-frequency residual.
So:
- Critic's suggestion #1 (single-pass + scale=1.0 + style-prompt) is theoretically right but on this LoRA produces worse results than the sandwich, because it triggers leakage.
- Critic's suggestion #2 (edit features) doesn't bite at moderate strength and reverts to leakage at high strength.
- Critic's suggestion #3 (5× the dataset, cleaner captions) is the only real fix. And it's exactly what I didn't do.
What's next
Rebuild the dataset to 1500+ images. No Cyrillic at all (or behind a separate "soviet-text" token if it ever has to come back). Hard filters: halftone present, limited palette (≤5 colors), flat geometry. Captions via Qwen2.5-VL using a template like
matchbox poster of a {category}, {dominant colors}, {composition}, woodcut linework.Retrain on rank 32 + attention+MLP modules, not attention-only. The current LoRA only touches attention blocks, which is too narrow for compositional features (woodcut, halftone). MLP gives more "room" for style.
After v2 — re-run the same ablation. If single-pass at scale=1.0 + style-prompt produces clean recognizable animals on v2, the sandwich gets deleted. Generation time drops from ~30s to ~10-15s. I can crank resolution from 512 to 1024 (the 3090 has the headroom). The VAE round-trip between passes (currently saving pass-1 to JPEG and reading back) goes away too.
Side findings worth a paragraph each
FastAPI + SQLite + cursor pagination in search. The search endpoint originally hard-capped output at 60 results — 581 cats in the database, but the frontend only ever saw 60. Added ?cursor=<id> (filter id < cursor, ORDER BY id DESC), and disabled auto-generation on paginated requests so the queue isn't flooded by pagination.
Auto-prompt variety. For automated generation (when the queue is empty), I added three pools — adjectives (proud, fierce, sleepy…), actions (running, perched, watching…), scenes (in winter forest, at sunset…) — with a 55/20/15/10 distribution: 55% bare category name, 20% adj+animal, 15% animal+action, 10% animal+scene. Before this, all "cat" auto-generations looked the same.
Real cost. vast.ai 3090 ~$0.20/hr → ~$5/day → at ~1500 images/day = $0.003/image GPU cost. Plus backend/storage ~$2/day. Total <$0.01 per image at current scale.
What I take from this
- "Empirically works" is not the same as "optimal." I picked the sandwich by trial and error and stopped questioning it. I never asked "why did I have to crank scale to 2.0 in the first place?" The Reddit critic asked.
- Ablation should be day-one. 5 variants × 3 seeds = 15 minutes on a borrowed GPU. I would not have shipped the sandwich as "the solution" if I'd done this.
- External criticism is the cheapest source of truth. A month ago I would have second-guessed posting. One Reddit post and one long comment from a stranger who ran his own parallel work on a 4080 changed the entire architecture plan.
- Training-set leakage is not theoretical. In my case it manifested as literal Cyrillic letters in the output. If I'd only ever inspected the sandwich result (where the leakage is hidden), I would never have seen it.
Links
- pinock.io — https://pinock.io
- LoRA on HuggingFace — yukakst/pinock-matchbox-flux2-klein
- HuggingFace Space (live demo) — yukakst/pinock-matchbox-demo
- LoRA on Civitai — civitai.com/models/2598394
- Original Russian writeup on Habr (with full Cyrillic example screenshots) — habr.com/ru/articles/1031338/
- Reddit thread with the original critique — r/StableDiffusion
If you train v2 LoRAs on small datasets and have advice on how to avoid the training-set-leakage trap I fell into, I'm all ears in comments. Especially curious whether anyone has seen text-leakage manifest this literally before.

Top comments (0)