<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yuka Kust</title>
    <description>The latest articles on DEV Community by Yuka Kust (@kustyuka).</description>
    <link>https://dev.to/kustyuka</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3914709%2F697702ec-7a67-4824-a1a2-412199e187c6.jpg</url>
      <title>DEV Community: Yuka Kust</title>
      <link>https://dev.to/kustyuka</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kustyuka"/>
    <language>en</language>
    <item>
      <title>We trained a personal voice DoRA on Qwen3-8B for $1.50 — beat stock model 100% in blind A/B</title>
      <dc:creator>Yuka Kust</dc:creator>
      <pubDate>Mon, 25 May 2026 11:59:45 +0000</pubDate>
      <link>https://dev.to/kustyuka/we-trained-a-personal-voice-dora-on-qwen3-8b-for-150-beat-stock-model-100-in-blind-ab-3b70</link>
      <guid>https://dev.to/kustyuka/we-trained-a-personal-voice-dora-on-qwen3-8b-for-150-beat-stock-model-100-in-blind-ab-3b70</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;TL;DR. Trained a DoRA adapter on Qwen3-8B using 6128 personal Telegram messages. Cost: $1.50 on a single Vast.ai RTX 3090. In blind head-to-head A/B, the DoRA-tuned model beat stock Qwen3-8B 100% of the time. Zero catastrophic forgetting on 50 general-knowledge tasks. One prompt where the model actually beat the real human at sounding like themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full long-form write-up lives on the canonical URL:&lt;/strong&gt; &lt;a href="https://aiconic.company/en/journal/dora-personal-voice" rel="noopener noreferrer"&gt;aiconic.company/en/journal/dora-personal-voice&lt;/a&gt;. This post is the dev.to-flavored version with the practical bits.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What we did
&lt;/h2&gt;

&lt;p&gt;Took one person's Telegram export (DataExport JSON, 1047 personal chats), wrote a custom pairs extractor (&lt;code&gt;other_person_message&lt;/code&gt;, &lt;code&gt;author_reply&lt;/code&gt;), capped 12 pairs per chat so a few active chats don't dominate, deduplicated. Final dataset: &lt;strong&gt;6128 train + 322 valid pairs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Trained a DoRA adapter on top of &lt;code&gt;Qwen/Qwen3-8B&lt;/code&gt;. DoRA (Weight-Decomposed Low-Rank Adaptation, &lt;a href="https://arxiv.org/abs/2402.09353" rel="noopener noreferrer"&gt;Liu et al. 2024&lt;/a&gt;) decomposes pretrained weights into magnitude and direction, then applies LoRA-style updates only to the direction component while learning magnitude as a separate trainable vector. In practice it matches full fine-tuning more closely than LoRA at the same rank.&lt;/p&gt;

&lt;h2&gt;
  
  
  The training config
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TrainingArguments&lt;/span&gt;

&lt;span class="n"&gt;peft_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;use_dora&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# the only line that turns LoRA into DoRA
&lt;/span&gt;    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;training_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TrainingArguments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lr_scheduler_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cosine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;warmup_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gradient_accumulation_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# effective batch = 16
&lt;/span&gt;    &lt;span class="n"&gt;max_seq_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bf16&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gradient_checkpointing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adamw_torch_fused&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trainable params: ~30M / 8B = 0.4%. Adapter file on disk: 63 MB. Total wall time: 3.5h on a single Vast.ai RTX 3090 spot (~$0.30/h, ~$1.50 total).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critical detail:&lt;/strong&gt; apply loss only on the author's assistant tokens, not on the prompt. Without this mask the model spends half its capacity learning what &lt;em&gt;other people&lt;/em&gt; say to you, which dilutes voice signal noticeably. Non-optional for personal voice work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The evaluation (blind 3-way A/B)
&lt;/h2&gt;

&lt;p&gt;Loss numbers are useless for personal voice. The relevant question is &lt;em&gt;does a human who knows you think it sounds like you&lt;/em&gt;. So:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;30 hold-out prompts&lt;/strong&gt; — real recent messages from real people, where we knew what the author actually replied. Held out of train.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three responses per prompt:&lt;/strong&gt; stock Qwen3-8B reply, DoRA reply, real human reply.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Randomized A/B/C labels&lt;/strong&gt; per prompt. &lt;code&gt;secret.json&lt;/code&gt; mapped labels back to sources, kept blind from rater.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTML rating UI&lt;/strong&gt; asking "which one sounds most like you?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Catastrophic forgetting check:&lt;/strong&gt; separate 50-task suite (capitals, math, code, translations).&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Comparison&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DoRA vs stock (head-to-head)&lt;/td&gt;
&lt;td&gt;DoRA &lt;strong&gt;100%&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full 3-way (real / DoRA / stock)&lt;/td&gt;
&lt;td&gt;Real 71% / DoRA 29% / Stock 0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One specific prompt (p07)&lt;/td&gt;
&lt;td&gt;DoRA beat the real human&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Catastrophic forgetting&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0 pp&lt;/strong&gt; (49/50 = 49/50)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The p07 case is the one that gets me. Author looked at her own real reply, looked at DoRA, picked DoRA over herself. Her comment: &lt;em&gt;"Honestly the DoRA one sounds more like a representative thing I'd say than what I actually wrote that day."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Reading it as: DoRA samples from a smoothed manifold of typical replies and can produce a closer-to-mean instance than the human did on a specific Tuesday afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  What broke (so you don't waste an evening)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;enable_thinking=False&lt;/code&gt; is mandatory
&lt;/h3&gt;

&lt;p&gt;Qwen3 is a reasoning model by default — emits &lt;code&gt;&amp;lt;think&amp;gt;...&amp;lt;/think&amp;gt;&lt;/code&gt; traces before its final answer. Chat training data has none. During inference, base prior pulls toward reasoning prefixes while DoRA shifts toward chat style, output ends up as Frankenstein reasoning + short colloquial reply.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;enable_thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# MANDATORY for chat-style adapters
&lt;/span&gt;    &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're &lt;em&gt;training&lt;/em&gt; a chat-style adapter on Qwen3, set this in your training data tokenization too — aligns training prefix with inference prefix and probably helps eval loss further.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. transformers version dance
&lt;/h3&gt;

&lt;p&gt;Qwen3 lands in &lt;code&gt;4.51&lt;/code&gt;. &lt;code&gt;4.55+&lt;/code&gt; wants torch &lt;code&gt;≥2.5&lt;/code&gt;. Working pin for Vast 3090 image: &lt;code&gt;transformers==4.53.0&lt;/code&gt;. Boring but cost two hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Cerebras can't load adapters
&lt;/h3&gt;

&lt;p&gt;Cerebras hosted inference (where we run prod) does not support runtime LoRA/DoRA loading. So this adapter is a research artifact for us, not a prod swap. For prod personalization either self-host on vLLM (~$300/mo single 3090 24/7) or stay on hosted backbone + system prompt + RAG. We ship the latter today; the DoRA convinces us self-hosted is worth building once user demand justifies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reproducibility
&lt;/h2&gt;

&lt;p&gt;Adapter on HuggingFace: &lt;a href="https://huggingface.co/aiconiccompany/yuka-dora-v1" rel="noopener noreferrer"&gt;aiconiccompany/yuka-dora-v1&lt;/a&gt; (gated CC BY-NC 4.0 because training data is one person's private chats).&lt;/p&gt;

&lt;p&gt;Hardware to reproduce on your own messages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single RTX 3090 (24 GB VRAM) — about $0.30/h on Vast.ai&lt;/li&gt;
&lt;li&gt;3.5 hours of GPU time&lt;/li&gt;
&lt;li&gt;Your own Telegram export (Settings → Advanced → Export Telegram data → JSON)&lt;/li&gt;
&lt;li&gt;~6000 message pairs for solid voice capture, 1000 minimum&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total cost on your own messaging history: &lt;strong&gt;$1–$3&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it matters
&lt;/h2&gt;

&lt;p&gt;The thesis we keep restating: &lt;strong&gt;the right granularity of personalization is the individual, not the segment.&lt;/strong&gt; Companies have been trying personalized AI by clustering users into 50 personas and routing to slightly-tuned base models. That's segment-level. The destination is one small adapter per user, trained on their own continuous data stream, owned by the user.&lt;/p&gt;

&lt;p&gt;yuka-dora-v1 is the first concrete piece of evidence we have that the unit economics work: $1.50 of GPU time turns a frontier model into your specific voice with no measurable capability loss. Multiply by users-who-would-pay for personalized AI and the cost structure starts looking very different from "rent OpenAI by the token."&lt;/p&gt;

&lt;h2&gt;
  
  
  Full write-up
&lt;/h2&gt;

&lt;p&gt;The long version with the full code, the loss curve, the complete p07 sample, the v2 backlog, and the bigger personal-AI thesis lives on the canonical:&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://aiconic.company/en/journal/dora-personal-voice" rel="noopener noreferrer"&gt;aiconic.company/en/journal/dora-personal-voice&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want a custom DoRA trained for your product (voice-of-the-brand, customer-support style, founder-voice): &lt;a href="mailto:hi@aiconic.company"&gt;hi@aiconic.company&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Otherwise — train one for yourself. The README is there. The GPU is cheap. The result is worth it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Aiconic is a research-grade AI engineering shop. Three engineers, AI tooling. Custom adapters, personal AI engines, production ML systems. &lt;a href="https://aiconic.company" rel="noopener noreferrer"&gt;aiconic.company&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I shipped a free AI-art site with a flawed LoRA and ran a 75-image ablation to prove it</title>
      <dc:creator>Yuka Kust</dc:creator>
      <pubDate>Tue, 05 May 2026 21:15:35 +0000</pubDate>
      <link>https://dev.to/kustyuka/i-shipped-a-free-ai-art-site-with-a-flawed-lora-and-ran-a-75-image-ablation-to-prove-it-2o3o</link>
      <guid>https://dev.to/kustyuka/i-shipped-a-free-ai-art-site-with-a-flawed-lora-and-ran-a-75-image-ablation-to-prove-it-2o3o</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; I built &lt;a href="https://pinock.io" rel="noopener noreferrer"&gt;pinock.io&lt;/a&gt; — an endless feed of AI-generated animals in 1960s Soviet matchbox poster style. Free, no signup, no watermark. Under the hood: FLUX.2-klein + a custom LoRA + a two-pass "sandwich" pipeline. I posted it on r/StableDiffusion, got a long technical critique with three specific complaints, and ran a 75-image ablation (5 pipeline variants × 5 categories × 3 seeds) to verify. &lt;strong&gt;The critic was right&lt;/strong&gt; — and the ablation surfaced one finding I did not expect: my LoRA literally renders Cyrillic gibberish into the output at the "textbook-correct" inference settings. This is a postmortem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0k0nsxnhsbto3f6844ce.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0k0nsxnhsbto3f6844ce.jpg" alt="Master comparison grid, seed=42 — 5 variants × 5 animals" width="800" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What pinock.io does
&lt;/h2&gt;

&lt;p&gt;Open the site → see a feed of AI-generated animals in vintage Soviet/Eastern-European matchbox label illustration style. New image every 30 seconds. ~6,700 images so far. You can like, download, share, search ("cat", "owl"), or queue your own one-word prompt. No accounts, no watermarks, no paywalls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack&lt;/strong&gt; (deliberately tiny so one person can maintain it):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend: vanilla JS, Caddy, static&lt;/li&gt;
&lt;li&gt;Backend: FastAPI + SQLite (WAL mode) on a cheap Ubuntu box&lt;/li&gt;
&lt;li&gt;FLUX worker: one RTX 3090 on vast.ai (~$0.20/hr), tunneled in via SSH&lt;/li&gt;
&lt;li&gt;Caption worker: Qwen2.5-VL-7B INT4 on a secondary box&lt;/li&gt;
&lt;li&gt;Real-ESRGAN x2 for upscaling Hall-of-Fame images&lt;/li&gt;
&lt;li&gt;Stripe for paid edit-tokens (Gemini 3.1 Flash Image)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cost per generated image: ~&lt;strong&gt;$0.01&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "two-pass sandwich" — and why it's a hack
&lt;/h2&gt;

&lt;p&gt;Each generation runs two passes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prompt = "cat"
   │
   ├─ Pass 1: FLUX.2-klein + matchbox LoRA (rank=32, alpha=64, scale=2.0)
   │             text2image, 28 steps
   │             → output_b1 (stylized but with broken anatomy)
   │
   └─ Pass 2: FLUX.2-klein, no LoRA
                 img2img from output_b1, strength=0.9, 28 steps
                 → output_b (final)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt; I trained the LoRA on ~300 matchbox samples. At &lt;code&gt;lora_scale=1.0&lt;/code&gt; the style was barely visible. At &lt;code&gt;lora_scale=2.0&lt;/code&gt; the style appeared but anatomy broke (extra limbs, fused heads). I patched it: pass-2 takes the broken pass-1 as init and at strength=0.9 essentially redraws the image from scratch, leaving only a low-frequency "style fingerprint." It works empirically.&lt;/p&gt;

&lt;p&gt;It also sounds like a trick.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reddit critique that made me sit down
&lt;/h2&gt;

&lt;p&gt;Posted on r/StableDiffusion. Got a long, technically-precise comment from u/DelinquentTuna. Three points:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;lora_scale=2.0&lt;/code&gt; over-cooks the LoRA, and you then nuke it with strength=0.9 in pass-2 — you're discarding ~90% of the LoRA's output.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FLUX.2-klein has native edit/style-transfer features.&lt;/strong&gt; I (the critic) ran your images through it on a 4080 16GB and got 4× larger output (1024×1024) in 9 seconds with more cohesive style. Use the edit feature, not your handrolled i2i.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~300 examples is too few for matchbox aesthetic&lt;/strong&gt; (halftone, limited palette, lithographic textures). You need 5× the dataset and proper captions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All three were technically correct. I sat down to ablate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ablation — 5 variants × 5 animals × 3 seeds = 75 images
&lt;/h2&gt;

&lt;p&gt;Tested on the prod rig (RTX 3090 + FLUX.2-klein + matchbox LoRA, same stack as production). Two tmux scripts, ~30 minutes total, results gridded with PIL.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;A&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pure FLUX, no LoRA, bare prompt&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LoRA t2i pass-1 snapshot (raw LoRA before "sandwich" pass-2 nukes it)&lt;/td&gt;
&lt;td&gt;lora_scale=2.0, prompt="cat"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;C&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Current production sandwich&lt;/td&gt;
&lt;td&gt;lora=2.0, pass2_strength=0.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;D&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single-pass with style prompt (critic's suggestion #1)&lt;/td&gt;
&lt;td&gt;lora=1.0, prompt="cat, matchbox poster style, 1960s Soviet, woodcut, halftone, limited red-black palette"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;E&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Edit-style: pure FLUX → img2img with style prompt (critic's suggestion #2)&lt;/td&gt;
&lt;td&gt;init=A, lora=1.0, strength=0.5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Categories: cat, fox, owl, lion, wolf. Seeds: 42, 1337, 80085 (chosen before runs; three repeats to catch seed-dependence).&lt;/p&gt;

&lt;h2&gt;
  
  
  Findings, in order of how much they hurt
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Variant B — LoRA at scale=2.0, bare prompt (snapshot)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Total collapse.&lt;/strong&gt; On every seed, all 5 categories look almost identical — colored texture noise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;seed=42: red-orange wavy stripes&lt;/li&gt;
&lt;li&gt;seed=1337: green "forest noise"&lt;/li&gt;
&lt;li&gt;seed=80085: gold smear&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No anatomy. The LoRA at scale=2.0 &lt;strong&gt;does not generate animals.&lt;/strong&gt; It generates poster-texture, because I overcooked the inference weight. Which is exactly why I invented the sandwich — I was watching this catastrophe and trying to hide it behind pass-2.&lt;/p&gt;

&lt;p&gt;The critic saw it instantly. I did not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Variant D — single-pass with style prompt at scale=1.0 (suggestion #1)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A different kind of catastrophe.&lt;/strong&gt; On seed=42, several output images contain literal &lt;strong&gt;Cyrillic gibberish text&lt;/strong&gt;: "СТАДИНАМ" or similar, baked into the image. On seed=1337, all 5 categories collapse into nearly-identical "red silhouette on dark" compositions. On seed=80085, again all 5 collapse to "red silhouette on white."&lt;/p&gt;

&lt;p&gt;What happened: the training set (~300 examples) included Soviet posters with Cyrillic text and red dominant backgrounds. At &lt;code&gt;lora_scale=1.0&lt;/code&gt; plus a long, "correct" style-prompt, the LoRA starts &lt;strong&gt;recalling whole posters&lt;/strong&gt; from training rather than transferring style. &lt;strong&gt;Textbook training-set leakage.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most interesting observation in the series. The critic's advice — "use scale=1.0 with a proper style-prompt" — is theoretically right, but &lt;strong&gt;on this LoRA it just exposes how badly it's overfit to specific training examples.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Variant E — edit-style refinement (suggestion #2)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Style barely visible.&lt;/strong&gt; At &lt;code&gt;strength=0.5 + lora=1.0&lt;/code&gt; the LoRA can't punch through the FLUX prior. Output looks like A with a faint illustrative tint. Not matchbox.&lt;/p&gt;

&lt;p&gt;To get the style to come through I'd need &lt;code&gt;strength≥0.7&lt;/code&gt; — which lands us back in i2i sandwich territory, where the same Cyrillic / collapse will reappear via img2img.&lt;/p&gt;

&lt;h3&gt;
  
  
  Variant C — current sandwich
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Works adequately.&lt;/strong&gt; Recognizable animals with visible matchbox aesthetic: woodcut linework, halftone backgrounds, limited palette, sometimes Morris-style floral patterns. Stable across all 3 seeds.&lt;/p&gt;

&lt;p&gt;Mechanism: pass-2 at strength=0.9 takes the broken pass-1 (B), adds 90% noise, redraws. From pass-1 only a &lt;strong&gt;low-frequency signal&lt;/strong&gt; survives — overall composition and color profile. That injects style without leaving room for anatomy to break.&lt;/p&gt;

&lt;h2&gt;
  
  
  The headline conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The current sandwich (C) wins this matchup — but it's a patch on top of a poorly-trained LoRA, not the right architecture.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All three "alternative" approaches (B raw, D single-pass-styled, E edit-style) revealed the same underlying problem: the LoRA at scale=1.0 tries to &lt;strong&gt;reproduce training set examples wholesale&lt;/strong&gt; instead of transferring style. The sandwich works precisely because pass-2 at strength=0.9 burns that memorized content down to a low-frequency residual.&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Critic's suggestion #1 (single-pass + scale=1.0 + style-prompt) is theoretically right but on this LoRA produces worse results than the sandwich, because it triggers leakage.&lt;/li&gt;
&lt;li&gt;Critic's suggestion #2 (edit features) doesn't bite at moderate strength and reverts to leakage at high strength.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critic's suggestion #3 (5× the dataset, cleaner captions) is the only real fix.&lt;/strong&gt; And it's exactly what I didn't do.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rebuild the dataset to 1500+ images.&lt;/strong&gt; No Cyrillic at all (or behind a separate "soviet-text" token if it ever has to come back). Hard filters: halftone present, limited palette (≤5 colors), flat geometry. Captions via Qwen2.5-VL using a template like &lt;code&gt;matchbox poster of a {category}, {dominant colors}, {composition}, woodcut linework&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrain on rank 32 + attention+MLP modules&lt;/strong&gt;, not attention-only. The current LoRA only touches attention blocks, which is too narrow for compositional features (woodcut, halftone). MLP gives more "room" for style.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;After v2 — re-run the same ablation.&lt;/strong&gt; If single-pass at scale=1.0 + style-prompt produces clean recognizable animals on v2, the sandwich gets deleted. Generation time drops from ~30s to ~10-15s. I can crank resolution from 512 to 1024 (the 3090 has the headroom). The VAE round-trip between passes (currently saving pass-1 to JPEG and reading back) goes away too.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Side findings worth a paragraph each
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;FastAPI + SQLite + cursor pagination in search.&lt;/strong&gt; The search endpoint originally hard-capped output at 60 results — 581 cats in the database, but the frontend only ever saw 60. Added &lt;code&gt;?cursor=&amp;lt;id&amp;gt;&lt;/code&gt; (filter &lt;code&gt;id &amp;lt; cursor&lt;/code&gt;, ORDER BY id DESC), and disabled auto-generation on paginated requests so the queue isn't flooded by pagination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto-prompt variety.&lt;/strong&gt; For automated generation (when the queue is empty), I added three pools — adjectives (proud, fierce, sleepy…), actions (running, perched, watching…), scenes (in winter forest, at sunset…) — with a 55/20/15/10 distribution: 55% bare category name, 20% adj+animal, 15% animal+action, 10% animal+scene. Before this, all "cat" auto-generations looked the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real cost.&lt;/strong&gt; vast.ai 3090 ~$0.20/hr → ~$5/day → at ~1500 images/day = $0.003/image GPU cost. Plus backend/storage ~$2/day. &lt;strong&gt;Total &amp;lt;$0.01 per image at current scale.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I take from this
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;"Empirically works" is not the same as "optimal."&lt;/strong&gt; I picked the sandwich by trial and error and stopped questioning it. I never asked "why did I have to crank scale to 2.0 in the first place?" The Reddit critic asked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ablation should be day-one.&lt;/strong&gt; 5 variants × 3 seeds = 15 minutes on a borrowed GPU. I would not have shipped the sandwich as "the solution" if I'd done this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External criticism is the cheapest source of truth.&lt;/strong&gt; A month ago I would have second-guessed posting. One Reddit post and one long comment from a stranger who ran his own parallel work on a 4080 changed the entire architecture plan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training-set leakage is not theoretical.&lt;/strong&gt; In my case it manifested as literal Cyrillic letters in the output. If I'd only ever inspected the sandwich result (where the leakage is hidden), I would never have seen it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;pinock.io — &lt;a href="https://pinock.io" rel="noopener noreferrer"&gt;https://pinock.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LoRA on HuggingFace — &lt;a href="https://huggingface.co/yukakst/pinock-matchbox-flux2-klein" rel="noopener noreferrer"&gt;yukakst/pinock-matchbox-flux2-klein&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;HuggingFace Space (live demo) — &lt;a href="https://huggingface.co/spaces/yukakst/pinock-matchbox-demo" rel="noopener noreferrer"&gt;yukakst/pinock-matchbox-demo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LoRA on Civitai — &lt;a href="https://civitai.com/models/2598394" rel="noopener noreferrer"&gt;civitai.com/models/2598394&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Original Russian writeup on Habr (with full Cyrillic example screenshots) — &lt;a href="https://habr.com/ru/articles/1031338/" rel="noopener noreferrer"&gt;habr.com/ru/articles/1031338/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Reddit thread with the original critique — &lt;a href="https://www.reddit.com/r/StableDiffusion/comments/1t0pcac/" rel="noopener noreferrer"&gt;r/StableDiffusion&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you train v2 LoRAs on small datasets and have advice on how to avoid the training-set-leakage trap I fell into, I'm all ears in comments. Especially curious whether anyone has seen text-leakage manifest this literally before.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
    </item>
  </channel>
</rss>
