DEV Community

Discussion on: Running Local LLMs, CPU vs. GPU - a Quick Speed Test

Collapse
 
orlando_arroyo_1 profile image
Orlando Arroyo • Edited

Just for fun, here are some additional results:

iPad Pro M1 256GB, using LLM Farm to load the model: 12.05tok/s
Asus ROG Ally Z1 Extreme (CPU): 5.25 tok/s using the 25W preset, 5.05tok/s using the 15W preset

Update:
Asked a friend with a M3 Pro 12core CPU 18GB. Running from CPU: 17.93tok/s, GPU: 21.1tok/s

Collapse
 
maximsaplin profile image
Maxim Saplin

The CPU result for ROG is close to the one from 7840U, after all they almost identical CPUs

Collapse
 
clegger profile image
clegger

The ROG Ally has a Ryzen Z1 Extreme which appears to be nearly identical to the 7840U, but from what I can discern, the NPU is disabled. So if / when LM Studio gets around to implementing support for that AI accelerator the 7840U should be faster at inferencing workloads.

Thread Thread
 
maximsaplin profile image
Maxim Saplin

AMD GPU seems to be an underdog in the ML world, when compared to Nvidia... I doubt that AMD's NPU will see better compatibility with ML stack than it's GPUs

Collapse
 
rickyricky profile image
Ricardo Meleschi • Edited

If you let me know what settings / template you used for this test, I'll run a similar test on my M4 iPad with 16GB Ram. I get wildly different tok/s depending on which LLM and which template I'm using now.

As of right now, with the fine-tuned LLM and the "TinyLLaMa 1B" template being used I get the following:

M4 iPad w 16GB Ram / 2TB Storage: 15.52t/s