DEV Community

Discussion on: Running Local LLMs, CPU vs. GPU - a Quick Speed Test

Collapse
 
orlando_arroyo_1 profile image
Orlando Arroyo

Adding some info here:

Running on a Razer Blade 2021 with a Ryzen 5900HX, a GF 3070Ti and 16GB RAM, I got 41.75tok/s. I used the same test as you, asking about Mars on the same model.

Hope that adds information to this very interesting topic.

Collapse
 
maximsaplin profile image
Maxim Saplin

Thanks for the contribution! I assume you used 100% GPU off-loading , right? Just checking:)

Collapse
 
orlando_arroyo_1 profile image
Orlando Arroyo

Indeed, 100%GPU off-loading.

I also tested an Ryzen 7950X with 0% off loading, but there’s something odd. I set 32 threads but CPU use is not going beyond 60% and only gets 7tok/s. Any thoughts how about possible cause?

Just for fun, I’ll check with an Asus ROG Ally later (Z1 Extreme version).

Thread Thread
 
maximsaplin profile image
Maxim Saplin

Seems the threads param is ignored, I saw same behaviour when testing CPU inference