We trained a personal voice DoRA on Qwen3-8B for $1.50 — beat stock model 100% in blind A/B

#ai #llm #machinelearning #showdev

TL;DR. Trained a DoRA adapter on Qwen3-8B using 6128 personal Telegram messages. Cost: $1.50 on a single Vast.ai RTX 3090. In blind head-to-head A/B, the DoRA-tuned model beat stock Qwen3-8B 100% of the time. Zero catastrophic forgetting on 50 general-knowledge tasks. One prompt where the model actually beat the real human at sounding like themselves.

Full long-form write-up lives on the canonical URL: aiconic.company/en/journal/dora-personal-voice. This post is the dev.to-flavored version with the practical bits.

What we did

Took one person's Telegram export (DataExport JSON, 1047 personal chats), wrote a custom pairs extractor (other_person_message, author_reply), capped 12 pairs per chat so a few active chats don't dominate, deduplicated. Final dataset: 6128 train + 322 valid pairs.

Trained a DoRA adapter on top of Qwen/Qwen3-8B. DoRA (Weight-Decomposed Low-Rank Adaptation, Liu et al. 2024) decomposes pretrained weights into magnitude and direction, then applies LoRA-style updates only to the direction component while learning magnitude as a separate trainable vector. In practice it matches full fine-tuning more closely than LoRA at the same rank.

The training config

from peft import LoraConfig
from transformers import TrainingArguments

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    use_dora=True,           # the only line that turns LoRA into DoRA
    task_type="CAUSAL_LM",
)

training_args = TrainingArguments(
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_steps=50,
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,   # effective batch = 16
    max_seq_length=1024,
    bf16=True,
    gradient_checkpointing=True,
    optim="adamw_torch_fused",
)

Trainable params: ~30M / 8B = 0.4%. Adapter file on disk: 63 MB. Total wall time: 3.5h on a single Vast.ai RTX 3090 spot (~$0.30/h, ~$1.50 total).

Critical detail: apply loss only on the author's assistant tokens, not on the prompt. Without this mask the model spends half its capacity learning what other people say to you, which dilutes voice signal noticeably. Non-optional for personal voice work.

The evaluation (blind 3-way A/B)

Loss numbers are useless for personal voice. The relevant question is does a human who knows you think it sounds like you. So:

30 hold-out prompts — real recent messages from real people, where we knew what the author actually replied. Held out of train.
Three responses per prompt: stock Qwen3-8B reply, DoRA reply, real human reply.
Randomized A/B/C labels per prompt. secret.json mapped labels back to sources, kept blind from rater.
HTML rating UI asking "which one sounds most like you?"
Catastrophic forgetting check: separate 50-task suite (capitals, math, code, translations).

Results

Comparison	Result
DoRA vs stock (head-to-head)	DoRA 100%
Full 3-way (real / DoRA / stock)	Real 71% / DoRA 29% / Stock 0%
One specific prompt (p07)	DoRA beat the real human
Catastrophic forgetting	0 pp (49/50 = 49/50)

The p07 case is the one that gets me. Author looked at her own real reply, looked at DoRA, picked DoRA over herself. Her comment: "Honestly the DoRA one sounds more like a representative thing I'd say than what I actually wrote that day."

Reading it as: DoRA samples from a smoothed manifold of typical replies and can produce a closer-to-mean instance than the human did on a specific Tuesday afternoon.

What broke (so you don't waste an evening)

1. `enable_thinking=False` is mandatory

Qwen3 is a reasoning model by default — emits <think>...</think> traces before its final answer. Chat training data has none. During inference, base prior pulls toward reasoning prefixes while DoRA shifts toward chat style, output ends up as Frankenstein reasoning + short colloquial reply.

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    enable_thinking=False,   # MANDATORY for chat-style adapters
    return_tensors="pt",
)

If you're training a chat-style adapter on Qwen3, set this in your training data tokenization too — aligns training prefix with inference prefix and probably helps eval loss further.

2. transformers version dance

Qwen3 lands in 4.51. 4.55+ wants torch ≥2.5. Working pin for Vast 3090 image: transformers==4.53.0. Boring but cost two hours.

3. Cerebras can't load adapters

Cerebras hosted inference (where we run prod) does not support runtime LoRA/DoRA loading. So this adapter is a research artifact for us, not a prod swap. For prod personalization either self-host on vLLM (~$300/mo single 3090 24/7) or stay on hosted backbone + system prompt + RAG. We ship the latter today; the DoRA convinces us self-hosted is worth building once user demand justifies.

Reproducibility

Adapter on HuggingFace: aiconiccompany/yuka-dora-v1 (gated CC BY-NC 4.0 because training data is one person's private chats).

Hardware to reproduce on your own messages:

Single RTX 3090 (24 GB VRAM) — about $0.30/h on Vast.ai
3.5 hours of GPU time
Your own Telegram export (Settings → Advanced → Export Telegram data → JSON)
~6000 message pairs for solid voice capture, 1000 minimum

Total cost on your own messaging history: $1–$3.

Why it matters

The thesis we keep restating: the right granularity of personalization is the individual, not the segment. Companies have been trying personalized AI by clustering users into 50 personas and routing to slightly-tuned base models. That's segment-level. The destination is one small adapter per user, trained on their own continuous data stream, owned by the user.

yuka-dora-v1 is the first concrete piece of evidence we have that the unit economics work: $1.50 of GPU time turns a frontier model into your specific voice with no measurable capability loss. Multiply by users-who-would-pay for personalized AI and the cost structure starts looking very different from "rent OpenAI by the token."

Full write-up

The long version with the full code, the loss curve, the complete p07 sample, the v2 backlog, and the bigger personal-AI thesis lives on the canonical:

→ aiconic.company/en/journal/dora-personal-voice

If you want a custom DoRA trained for your product (voice-of-the-brand, customer-support style, founder-voice): hi@aiconic.company.

Otherwise — train one for yourself. The README is there. The GPU is cheap. The result is worth it.

Aiconic is a research-grade AI engineering shop. Three engineers, AI tooling. Custom adapters, personal AI engines, production ML systems. aiconic.company

DEV Community

We trained a personal voice DoRA on Qwen3-8B for $1.50 — beat stock model 100% in blind A/B

What we did

The training config

The evaluation (blind 3-way A/B)

Results