Discussion on: Running Local LLMs, CPU vs. GPU - a Quick Speed Test

View post

Adding some info here:

Running on a Razer Blade 2021 with a Ryzen 5900HX, a GF 3070Ti and 16GB RAM, I got 41.75tok/s. I used the same test as you, asking about Mars on the same model.

Hope that adds information to this very interesting topic.

Maxim Saplin • Mar 15 '24

Thanks for the contribution! I assume you used 100% GPU off-loading , right? Just checking:)

Orlando Arroyo • Mar 16 '24

Indeed, 100%GPU off-loading.

I also tested an Ryzen 7950X with 0% off loading, but there’s something odd. I set 32 threads but CPU use is not going beyond 60% and only gets 7tok/s. Any thoughts how about possible cause?

Just for fun, I’ll check with an Asus ROG Ally later (Z1 Extreme version).

Maxim Saplin • Mar 16 '24

Seems the threads param is ignored, I saw same behaviour when testing CPU inference