DoRA Promises Better Quality — But Costs 40% More Iterations
DoRA (Weight-Decomposed Low-Rank Adaptation) showed up in February 2024 claiming to beat LoRA's accuracy ceiling without full fine-tuning's memory cost. The paper reported consistent wins across vision and language tasks. Naturally, I wanted to see if that held up for LLaMA 2 7B instruction tuning — and whether the training overhead would kill the gains in production.
Spoiler: DoRA converges slower. A lot slower.
I ran both methods on the same Alpaca-style instruction dataset (10K samples, 4×A100 40GB setup) and tracked wall-clock time, GPU memory, and downstream task accuracy. DoRA hit the target validation loss 40% later than LoRA in terms of training steps. That translates to real money if you're renting cloud GPUs. But the final model? Noticeably sharper on multi-turn reasoning tasks.
Here's what the trade-off actually looks like in practice.
LoRA Recap: Low-Rank Updates to Weight Matrices
Continue reading the full article on TildAlice

Top comments (0)