LoRA vs DoRA: 7B Model Training Speed Cuts 34% Cost

#lora #dora #llmfinetuning #parameterefficienttr

DoRA Promises Better Quality — But Costs 40% More Iterations

DoRA (Weight-Decomposed Low-Rank Adaptation) showed up in February 2024 claiming to beat LoRA's accuracy ceiling without full fine-tuning's memory cost. The paper reported consistent wins across vision and language tasks. Naturally, I wanted to see if that held up for LLaMA 2 7B instruction tuning — and whether the training overhead would kill the gains in production.

Spoiler: DoRA converges slower. A lot slower.

I ran both methods on the same Alpaca-style instruction dataset (10K samples, 4×A100 40GB setup) and tracked wall-clock time, GPU memory, and downstream task accuracy. DoRA hit the target validation loss 40% later than LoRA in terms of training steps. That translates to real money if you're renting cloud GPUs. But the final model? Noticeably sharper on multi-turn reasoning tasks.

Here's what the trade-off actually looks like in practice.