<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yuka Kust</title>
    <description>The latest articles on DEV Community by Yuka Kust (@kustyuka).</description>
    <link>https://dev.to/kustyuka</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3914709%2F697702ec-7a67-4824-a1a2-412199e187c6.jpg</url>
      <title>DEV Community: Yuka Kust</title>
      <link>https://dev.to/kustyuka</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kustyuka"/>
    <language>en</language>
    <item>
      <title>I shipped a free AI-art site with a flawed LoRA and ran a 75-image ablation to prove it</title>
      <dc:creator>Yuka Kust</dc:creator>
      <pubDate>Tue, 05 May 2026 21:15:35 +0000</pubDate>
      <link>https://dev.to/kustyuka/i-shipped-a-free-ai-art-site-with-a-flawed-lora-and-ran-a-75-image-ablation-to-prove-it-2o3o</link>
      <guid>https://dev.to/kustyuka/i-shipped-a-free-ai-art-site-with-a-flawed-lora-and-ran-a-75-image-ablation-to-prove-it-2o3o</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; I built &lt;a href="https://pinock.io" rel="noopener noreferrer"&gt;pinock.io&lt;/a&gt; — an endless feed of AI-generated animals in 1960s Soviet matchbox poster style. Free, no signup, no watermark. Under the hood: FLUX.2-klein + a custom LoRA + a two-pass "sandwich" pipeline. I posted it on r/StableDiffusion, got a long technical critique with three specific complaints, and ran a 75-image ablation (5 pipeline variants × 5 categories × 3 seeds) to verify. &lt;strong&gt;The critic was right&lt;/strong&gt; — and the ablation surfaced one finding I did not expect: my LoRA literally renders Cyrillic gibberish into the output at the "textbook-correct" inference settings. This is a postmortem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0k0nsxnhsbto3f6844ce.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0k0nsxnhsbto3f6844ce.jpg" alt="Master comparison grid, seed=42 — 5 variants × 5 animals" width="800" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What pinock.io does
&lt;/h2&gt;

&lt;p&gt;Open the site → see a feed of AI-generated animals in vintage Soviet/Eastern-European matchbox label illustration style. New image every 30 seconds. ~6,700 images so far. You can like, download, share, search ("cat", "owl"), or queue your own one-word prompt. No accounts, no watermarks, no paywalls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack&lt;/strong&gt; (deliberately tiny so one person can maintain it):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend: vanilla JS, Caddy, static&lt;/li&gt;
&lt;li&gt;Backend: FastAPI + SQLite (WAL mode) on a cheap Ubuntu box&lt;/li&gt;
&lt;li&gt;FLUX worker: one RTX 3090 on vast.ai (~$0.20/hr), tunneled in via SSH&lt;/li&gt;
&lt;li&gt;Caption worker: Qwen2.5-VL-7B INT4 on a secondary box&lt;/li&gt;
&lt;li&gt;Real-ESRGAN x2 for upscaling Hall-of-Fame images&lt;/li&gt;
&lt;li&gt;Stripe for paid edit-tokens (Gemini 3.1 Flash Image)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cost per generated image: ~&lt;strong&gt;$0.01&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "two-pass sandwich" — and why it's a hack
&lt;/h2&gt;

&lt;p&gt;Each generation runs two passes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prompt = "cat"
   │
   ├─ Pass 1: FLUX.2-klein + matchbox LoRA (rank=32, alpha=64, scale=2.0)
   │             text2image, 28 steps
   │             → output_b1 (stylized but with broken anatomy)
   │
   └─ Pass 2: FLUX.2-klein, no LoRA
                 img2img from output_b1, strength=0.9, 28 steps
                 → output_b (final)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt; I trained the LoRA on ~300 matchbox samples. At &lt;code&gt;lora_scale=1.0&lt;/code&gt; the style was barely visible. At &lt;code&gt;lora_scale=2.0&lt;/code&gt; the style appeared but anatomy broke (extra limbs, fused heads). I patched it: pass-2 takes the broken pass-1 as init and at strength=0.9 essentially redraws the image from scratch, leaving only a low-frequency "style fingerprint." It works empirically.&lt;/p&gt;

&lt;p&gt;It also sounds like a trick.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reddit critique that made me sit down
&lt;/h2&gt;

&lt;p&gt;Posted on r/StableDiffusion. Got a long, technically-precise comment from u/DelinquentTuna. Three points:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;lora_scale=2.0&lt;/code&gt; over-cooks the LoRA, and you then nuke it with strength=0.9 in pass-2 — you're discarding ~90% of the LoRA's output.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FLUX.2-klein has native edit/style-transfer features.&lt;/strong&gt; I (the critic) ran your images through it on a 4080 16GB and got 4× larger output (1024×1024) in 9 seconds with more cohesive style. Use the edit feature, not your handrolled i2i.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~300 examples is too few for matchbox aesthetic&lt;/strong&gt; (halftone, limited palette, lithographic textures). You need 5× the dataset and proper captions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All three were technically correct. I sat down to ablate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ablation — 5 variants × 5 animals × 3 seeds = 75 images
&lt;/h2&gt;

&lt;p&gt;Tested on the prod rig (RTX 3090 + FLUX.2-klein + matchbox LoRA, same stack as production). Two tmux scripts, ~30 minutes total, results gridded with PIL.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;A&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pure FLUX, no LoRA, bare prompt&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LoRA t2i pass-1 snapshot (raw LoRA before "sandwich" pass-2 nukes it)&lt;/td&gt;
&lt;td&gt;lora_scale=2.0, prompt="cat"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;C&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Current production sandwich&lt;/td&gt;
&lt;td&gt;lora=2.0, pass2_strength=0.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;D&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single-pass with style prompt (critic's suggestion #1)&lt;/td&gt;
&lt;td&gt;lora=1.0, prompt="cat, matchbox poster style, 1960s Soviet, woodcut, halftone, limited red-black palette"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;E&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Edit-style: pure FLUX → img2img with style prompt (critic's suggestion #2)&lt;/td&gt;
&lt;td&gt;init=A, lora=1.0, strength=0.5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Categories: cat, fox, owl, lion, wolf. Seeds: 42, 1337, 80085 (chosen before runs; three repeats to catch seed-dependence).&lt;/p&gt;

&lt;h2&gt;
  
  
  Findings, in order of how much they hurt
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Variant B — LoRA at scale=2.0, bare prompt (snapshot)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Total collapse.&lt;/strong&gt; On every seed, all 5 categories look almost identical — colored texture noise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;seed=42: red-orange wavy stripes&lt;/li&gt;
&lt;li&gt;seed=1337: green "forest noise"&lt;/li&gt;
&lt;li&gt;seed=80085: gold smear&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No anatomy. The LoRA at scale=2.0 &lt;strong&gt;does not generate animals.&lt;/strong&gt; It generates poster-texture, because I overcooked the inference weight. Which is exactly why I invented the sandwich — I was watching this catastrophe and trying to hide it behind pass-2.&lt;/p&gt;

&lt;p&gt;The critic saw it instantly. I did not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Variant D — single-pass with style prompt at scale=1.0 (suggestion #1)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A different kind of catastrophe.&lt;/strong&gt; On seed=42, several output images contain literal &lt;strong&gt;Cyrillic gibberish text&lt;/strong&gt;: "СТАДИНАМ" or similar, baked into the image. On seed=1337, all 5 categories collapse into nearly-identical "red silhouette on dark" compositions. On seed=80085, again all 5 collapse to "red silhouette on white."&lt;/p&gt;

&lt;p&gt;What happened: the training set (~300 examples) included Soviet posters with Cyrillic text and red dominant backgrounds. At &lt;code&gt;lora_scale=1.0&lt;/code&gt; plus a long, "correct" style-prompt, the LoRA starts &lt;strong&gt;recalling whole posters&lt;/strong&gt; from training rather than transferring style. &lt;strong&gt;Textbook training-set leakage.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most interesting observation in the series. The critic's advice — "use scale=1.0 with a proper style-prompt" — is theoretically right, but &lt;strong&gt;on this LoRA it just exposes how badly it's overfit to specific training examples.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Variant E — edit-style refinement (suggestion #2)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Style barely visible.&lt;/strong&gt; At &lt;code&gt;strength=0.5 + lora=1.0&lt;/code&gt; the LoRA can't punch through the FLUX prior. Output looks like A with a faint illustrative tint. Not matchbox.&lt;/p&gt;

&lt;p&gt;To get the style to come through I'd need &lt;code&gt;strength≥0.7&lt;/code&gt; — which lands us back in i2i sandwich territory, where the same Cyrillic / collapse will reappear via img2img.&lt;/p&gt;

&lt;h3&gt;
  
  
  Variant C — current sandwich
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Works adequately.&lt;/strong&gt; Recognizable animals with visible matchbox aesthetic: woodcut linework, halftone backgrounds, limited palette, sometimes Morris-style floral patterns. Stable across all 3 seeds.&lt;/p&gt;

&lt;p&gt;Mechanism: pass-2 at strength=0.9 takes the broken pass-1 (B), adds 90% noise, redraws. From pass-1 only a &lt;strong&gt;low-frequency signal&lt;/strong&gt; survives — overall composition and color profile. That injects style without leaving room for anatomy to break.&lt;/p&gt;

&lt;h2&gt;
  
  
  The headline conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The current sandwich (C) wins this matchup — but it's a patch on top of a poorly-trained LoRA, not the right architecture.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All three "alternative" approaches (B raw, D single-pass-styled, E edit-style) revealed the same underlying problem: the LoRA at scale=1.0 tries to &lt;strong&gt;reproduce training set examples wholesale&lt;/strong&gt; instead of transferring style. The sandwich works precisely because pass-2 at strength=0.9 burns that memorized content down to a low-frequency residual.&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Critic's suggestion #1 (single-pass + scale=1.0 + style-prompt) is theoretically right but on this LoRA produces worse results than the sandwich, because it triggers leakage.&lt;/li&gt;
&lt;li&gt;Critic's suggestion #2 (edit features) doesn't bite at moderate strength and reverts to leakage at high strength.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critic's suggestion #3 (5× the dataset, cleaner captions) is the only real fix.&lt;/strong&gt; And it's exactly what I didn't do.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rebuild the dataset to 1500+ images.&lt;/strong&gt; No Cyrillic at all (or behind a separate "soviet-text" token if it ever has to come back). Hard filters: halftone present, limited palette (≤5 colors), flat geometry. Captions via Qwen2.5-VL using a template like &lt;code&gt;matchbox poster of a {category}, {dominant colors}, {composition}, woodcut linework&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrain on rank 32 + attention+MLP modules&lt;/strong&gt;, not attention-only. The current LoRA only touches attention blocks, which is too narrow for compositional features (woodcut, halftone). MLP gives more "room" for style.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;After v2 — re-run the same ablation.&lt;/strong&gt; If single-pass at scale=1.0 + style-prompt produces clean recognizable animals on v2, the sandwich gets deleted. Generation time drops from ~30s to ~10-15s. I can crank resolution from 512 to 1024 (the 3090 has the headroom). The VAE round-trip between passes (currently saving pass-1 to JPEG and reading back) goes away too.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Side findings worth a paragraph each
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;FastAPI + SQLite + cursor pagination in search.&lt;/strong&gt; The search endpoint originally hard-capped output at 60 results — 581 cats in the database, but the frontend only ever saw 60. Added &lt;code&gt;?cursor=&amp;lt;id&amp;gt;&lt;/code&gt; (filter &lt;code&gt;id &amp;lt; cursor&lt;/code&gt;, ORDER BY id DESC), and disabled auto-generation on paginated requests so the queue isn't flooded by pagination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto-prompt variety.&lt;/strong&gt; For automated generation (when the queue is empty), I added three pools — adjectives (proud, fierce, sleepy…), actions (running, perched, watching…), scenes (in winter forest, at sunset…) — with a 55/20/15/10 distribution: 55% bare category name, 20% adj+animal, 15% animal+action, 10% animal+scene. Before this, all "cat" auto-generations looked the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real cost.&lt;/strong&gt; vast.ai 3090 ~$0.20/hr → ~$5/day → at ~1500 images/day = $0.003/image GPU cost. Plus backend/storage ~$2/day. &lt;strong&gt;Total &amp;lt;$0.01 per image at current scale.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I take from this
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;"Empirically works" is not the same as "optimal."&lt;/strong&gt; I picked the sandwich by trial and error and stopped questioning it. I never asked "why did I have to crank scale to 2.0 in the first place?" The Reddit critic asked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ablation should be day-one.&lt;/strong&gt; 5 variants × 3 seeds = 15 minutes on a borrowed GPU. I would not have shipped the sandwich as "the solution" if I'd done this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External criticism is the cheapest source of truth.&lt;/strong&gt; A month ago I would have second-guessed posting. One Reddit post and one long comment from a stranger who ran his own parallel work on a 4080 changed the entire architecture plan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training-set leakage is not theoretical.&lt;/strong&gt; In my case it manifested as literal Cyrillic letters in the output. If I'd only ever inspected the sandwich result (where the leakage is hidden), I would never have seen it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;pinock.io — &lt;a href="https://pinock.io" rel="noopener noreferrer"&gt;https://pinock.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LoRA on HuggingFace — &lt;a href="https://huggingface.co/yukakst/pinock-matchbox-flux2-klein" rel="noopener noreferrer"&gt;yukakst/pinock-matchbox-flux2-klein&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;HuggingFace Space (live demo) — &lt;a href="https://huggingface.co/spaces/yukakst/pinock-matchbox-demo" rel="noopener noreferrer"&gt;yukakst/pinock-matchbox-demo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LoRA on Civitai — &lt;a href="https://civitai.com/models/2598394" rel="noopener noreferrer"&gt;civitai.com/models/2598394&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Original Russian writeup on Habr (with full Cyrillic example screenshots) — &lt;a href="https://habr.com/ru/articles/1031338/" rel="noopener noreferrer"&gt;habr.com/ru/articles/1031338/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Reddit thread with the original critique — &lt;a href="https://www.reddit.com/r/StableDiffusion/comments/1t0pcac/" rel="noopener noreferrer"&gt;r/StableDiffusion&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you train v2 LoRAs on small datasets and have advice on how to avoid the training-set-leakage trap I fell into, I'm all ears in comments. Especially curious whether anyone has seen text-leakage manifest this literally before.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
    </item>
  </channel>
</rss>
