Thurmon Demich

Posted on Jun 5 • Originally published at bestgpuforllm.com

RTX 5070 Ti vs RTX 3090 for LLM: New $750 vs Used $600

#rtx5070ti #rtx3090 #comparison #llm

This article was originally published on Best GPU for LLM. The full version with interactive tools, FAQ, and live pricing is on the original site.

Two GPUs, almost the same price, completely different strengths. The RTX 5070 Ti brings 16GB of fast GDDR7 and 5th-gen tensor cores for $750. A used RTX 3090 gives you 24GB of GDDR6X — 50% more VRAM — for around $600. I've tested both for local LLM inference, and the right choice depends entirely on what models you plan to run.

Raw specs comparison

Spec	RTX 5070 Ti	RTX 3090 (used)
VRAM	16GB GDDR7	24GB GDDR6X
Memory bandwidth	~896 GB/s	~936 GB/s
Tensor cores	5th gen	3rd gen
TDP	300W	350W
Price	~$750 new	~$600 used
Warranty	Full manufacturer	None (used)
7B Q4 tok/s	~45	~55
13B Q4 tok/s	~27	~35

The 3090 is actually faster in raw tok/s on these models because its wider 384-bit memory bus and higher effective bandwidth feed tokens quickly. But the 5070 Ti's newer architecture narrows that gap more than the numbers suggest — its tensor cores handle quantized inference more efficiently per watt.

VRAM chart available at the original article

Where the 5070 Ti wins

For 7B and 13B parameter models — which covers Llama 3 8B, Mistral 7B, Phi-4, Qwen 2.5 14B (at Q4), and most coding assistants — 16GB is plenty. You won't bump into VRAM limits, and the 5070 Ti runs cool, draws less power, and comes with a warranty.

The 5070 Ti is the better choice if you:

Run 7B-13B models as your daily driver
Want a new card with manufacturer warranty
Plan to use the GPU for gaming or creative work too
Don't want to deal with used market risks

At 45 tok/s on 7B Q4, the 5070 Ti delivers fast, interactive responses. That's well above the ~30 tok/s threshold where output feels instantaneous for chat use.

Where the 3090 wins

The 3090's 24GB advantage becomes decisive the moment you try to load a 34B model. CodeLlama 34B at Q4_K_M needs ~20GB of VRAM. Qwen 2.5 32B at Q4 needs ~19GB. The 5070 Ti simply cannot fit these models. The 3090 loads them with room to spare.

The 3090 is the better choice if you:

Want to run 30B-34B parameter models locally
Plan to add a second GPU later for 70B inference
Need headroom for larger context windows
Are comfortable buying used hardware

At ~35 tok/s on 13B and ~12-18 tok/s on 34B models, the 3090 handles heavier workloads that the 5070 Ti physically cannot attempt. For a full guide on buying one safely, see Used RTX 3090 Buying Guide.

The model size decision tree

This is how I frame it:

Only running 7B models? Either card works. Save money with an RTX 3060 12GB at $150 used.
Running 7B-13B regularly? 5070 Ti. Newer, faster per watt, and 16GB is sufficient.
Running 34B models? 3090. No alternative at this price. The next 24GB+ option is the RTX 4090 at $1,600. Wondering whether the cheaper non-Ti RTX 5070 might squeeze 34B in at all? See can the RTX 5070 run 34B? for the bad news at 12GB.
Planning multi-GPU later? 3090. Two 3090s give you 48GB combined for ~$1,200, enough for 70B models.

Value per dollar

Metric	RTX 5070 Ti	RTX 3090
Price	$750	$600
VRAM per dollar	21.3 MB/$	40.0 MB/$
7B tok/s per $100	6.0	9.2
13B tok/s per $100	3.6	5.8
Max model size (Q4)	~13B comfortably	~34B comfortably

The 3090 wins on pure value metrics. But value isn't everything — warranty, power efficiency, and noise matter for a daily-use workstation.

My recommendation

If your budget is under $1,000 and you want maximum model flexibility, buy the used RTX 3090. The 24GB VRAM ceiling is simply more future-proof for LLM work. Models keep getting bigger, and VRAM is the one spec you can't work around.

If you want a clean, new-card experience and only run 7B-13B models, the RTX 5070 Ti is the smarter pick. You get warranty coverage, lower power draw, and enough VRAM for the most popular open-weight models.

For more options in this price range, see the full best GPU for LLM under $1,000 roundup.

Related guides on Best GPU for LLM

The full version lives on Best GPU for LLM — VRAM calculator, GPU comparison table, and live Amazon pricing.

DEV Community