Cross-posted from Best GPU for AI — visit the original for our VRAM calculator, GPU comparison table, and current Amazon pricing.
Which GPU do you actually need for LoRA training? It depends on the model size and whether you use LoRA or QLoRA. A 16GB card handles QLoRA on 7B models comfortably, but LoRA on 13B+ models demands 24GB or more. Here is the full breakdown.
See the recommended pick on the original guide
Who this is for
This guide is for anyone fine-tuning language models or image generation checkpoints with LoRA adapters. Whether you are customizing a 7B LLM for a specific domain or training a Stable Diffusion LoRA for a character style, VRAM and training speed are your two constraints.
LoRA vs QLoRA VRAM requirements
| Method | 7B Model | 13B Model | 34B Model | 70B Model |
|---|---|---|---|---|
| LoRA (FP16 base) | ~18GB | ~30GB | ~72GB | ~140GB |
| QLoRA (4-bit base) | ~6GB | ~10GB | ~22GB | ~40GB |
| LoRA (SDXL) | ~10GB | — | — | — |
| LoRA (Flux) | ~14GB | — | — | — |
QLoRA cuts memory usage by 60-70% compared to standard LoRA by quantizing the base model to 4-bit while keeping the LoRA adapters in FP16. The quality tradeoff is minimal for most use cases.
VRAM chart available at the original article
Best GPUs for LoRA training ranked
| Rank | GPU | VRAM | Price | Best For |
|---|---|---|---|---|
| 1 | RTX 5090 | 32GB GDDR7 | ~$2,000+ | LoRA 13B, QLoRA 34B-70B |
| 2 | RTX 4090 | 24GB GDDR6X | ~$1,600 | LoRA 7B-13B, QLoRA 34B |
| 3 | RTX 5080 | 16GB GDDR7 | ~$1,000 | QLoRA 13B, SDXL LoRA |
| 4 | RTX 5070 Ti | 16GB GDDR7 | ~$750 | QLoRA 7B-13B, SDXL LoRA |
| 5 | RTX 4060 Ti 16GB | 16GB GDDR6 | ~$400 | QLoRA 7B, budget entry |
Training speed comparison
| Task | RTX 4060 Ti 16GB | RTX 5070 Ti | RTX 4090 | RTX 5090 |
|---|---|---|---|---|
| QLoRA 7B (1 epoch, 10k samples) | ~45 min | ~25 min | ~12 min | ~8 min |
| LoRA 7B (1 epoch, 10k samples) | OOM | OOM | ~18 min | ~11 min |
| LoRA SDXL (1500 steps) | ~18 min | ~10 min | ~5 min | ~3.5 min |
| LoRA Flux (1500 steps) | OOM | ~14 min | ~7 min | ~5 min |
The RTX 4090 hits the sweet spot — it handles LoRA on 7B models in FP16 and QLoRA on models up to 34B. The 5090 adds headroom for larger models and cuts training time by 30-40%.
See the recommended pick on the original guide
Budget picks for LoRA training
If $1,600 is too steep, two 16GB options get the job done:
RTX 5070 Ti (~$750) — QLoRA on 7B-13B models with comfortable headroom. GDDR7 bandwidth keeps gradients moving. Handles SDXL and Flux LoRA training without issues.
RTX 4060 Ti 16GB (~$400) — The cheapest meaningful entry point. QLoRA on 7B models works at batch size 1 with gradient accumulation. SDXL LoRA training is slower but functional.
See the recommended pick on the original guide
See the recommended pick on the original guide
Which GPU should you buy?
QLoRA on 7B models only: The RTX 4060 Ti 16GB at $400 is sufficient. You save $1,200 compared to the 4090 and still get usable training speeds.
LoRA on 7B or QLoRA on 13B: The RTX 5070 Ti at $750 gives you faster GDDR7 memory and better compute. Worth the step up from the 4060 Ti.
LoRA on 7B-13B or QLoRA on 34B: The RTX 4090 at 24GB is the standard recommendation. Its VRAM covers the widest range of training scenarios on a single consumer card.
LoRA on 13B+ or QLoRA on 70B: The RTX 5090 at 32GB is the only consumer card that can handle these workloads without multi-GPU setups.
Common mistakes to avoid
- Running LoRA when QLoRA would produce equivalent results. Start with QLoRA and compare output quality before committing to the higher VRAM requirement of full LoRA.
- Setting LoRA rank too high. Rank 16-32 is sufficient for most tasks. Higher ranks waste VRAM without meaningful quality gains.
- Forgetting gradient checkpointing. Enabling it reduces peak VRAM by ~30% at the cost of ~20% slower training. Always turn it on for tight-VRAM scenarios.
- Training without Flash Attention 2. It reduces attention memory from O(n^2) to O(n). This single setting can prevent OOM errors on borderline configurations.
Final verdict
| Budget | GPU | Why |
|---|---|---|
| $400 | RTX 4060 Ti 16GB | Cheapest QLoRA entry |
| $750 | RTX 5070 Ti | Fast QLoRA, SDXL/Flux LoRA |
| $1,600 | RTX 4090 | Best all-around LoRA card |
| $2,000+ | RTX 5090 | Maximum model size coverage |
See the recommended pick on the original guide
See the recommended pick on the original guide
The RTX 4090 remains the top recommendation for LoRA training. Its 24GB VRAM handles both LLM and image model fine-tuning without compromise. For deeper coverage, see our guides on fine-tuning GPUs and deep learning hardware. For Stable Diffusion LoRA training specifically using Kohya_ss, see our best GPU for Kohya_ss guide for script-specific settings and VRAM tuning.
LoRA training is a VRAM game. Buy the most VRAM you can afford, then optimize everything else around it.
Related guides on Best GPU for AI
- Best GPU for Fine-Tuning AI Models in 2026 (Ranked)
- Best GPU for Kohya_ss LoRA Training in 2026 (Ranked)
- Best GPU for AI Research in 2026 (Picks From $400)
Continue on Best GPU for AI for the complete guide with interactive calculators and current GPU prices.
Top comments (0)