Discussion on: Running Local LLMs, CPU vs. GPU - a Quick Speed Test

View post

Just for fun, here are some additional results:

iPad Pro M1 256GB, using LLM Farm to load the model: 12.05tok/s
Asus ROG Ally Z1 Extreme (CPU): 5.25 tok/s using the 25W preset, 5.05tok/s using the 15W preset

Update:
Asked a friend with a M3 Pro 12core CPU 18GB. Running from CPU: 17.93tok/s, GPU: 21.1tok/s

Maxim Saplin • Mar 16 '24

The CPU result for ROG is close to the one from 7840U, after all they almost identical CPUs

clegger • Mar 16 '24

The ROG Ally has a Ryzen Z1 Extreme which appears to be nearly identical to the 7840U, but from what I can discern, the NPU is disabled. So if / when LM Studio gets around to implementing support for that AI accelerator the 7840U should be faster at inferencing workloads.

Maxim Saplin • Mar 16 '24

AMD GPU seems to be an underdog in the ML world, when compared to Nvidia... I doubt that AMD's NPU will see better compatibility with ML stack than it's GPUs

Ricardo Meleschi • Sep 30 '24 • Edited

If you let me know what settings / template you used for this test, I'll run a similar test on my M4 iPad with 16GB Ram. I get wildly different tok/s depending on which LLM and which template I'm using now.

As of right now, with the fine-tuned LLM and the "TinyLLaMa 1B" template being used I get the following:

M4 iPad w 16GB Ram / 2TB Storage: 15.52t/s