Why Full Fine-Tuning Became Unaffordable
Fine-tuning GPT-3 175B requires updating 175 billion parameters. That's 700GB of optimizer states alone (Adam needs 2 copies per parameter). Most teams can't afford that.
Parameter-Efficient Fine-Tuning (PEFT) methods solve this by freezing the base model and training a tiny subset of parameters. LoRA, Adapter layers, and Prefix Tuning are the three most cited approaches. They all claim "competitive performance with <1% trainable parameters," but they achieve it in completely different ways.
This post compares the three methods mechanically: where the new parameters live, what the forward pass looks like, and which one actually saves you money on your next fine-tuning job. You can read the original LoRA paper here, Adapters from Houlsby et al. (2019), and Prefix Tuning from Li and Liang (2021).
LoRA: Low-Rank Decomposition of Weight Updates
Continue reading the full article on TildAlice

Top comments (0)