Originally published on AI Tech Connect.
What LoRA does and why it works Low-Rank Adaptation (LoRA) is the workhorse of parameter-efficient fine-tuning. The core idea is elegantly simple. A pre-trained weight matrix W in a transformer has dimensions d × k. During fine-tuning you want to update W, but doing so for every parameter in a 7B or 70B model requires enormous GPU memory and compute. LoRA instead freezes the original W and learns a low-rank decomposition of the update: it adds two small matrices A (d × r) and B (r × k), where r is the chosen rank — typically 8, 16, or 32. The effective weight during a forward pass is W + BA, and because r is much smaller than d or k, the total number of trainable parameters is a tiny fraction of the original model size. For a 7B Llama-class model, applying LoRA at rank 16 to all attention…
Top comments (0)