iPad Pro M1 256GB, using LLM Farm to load the model: 12.05tok/s
Asus ROG Ally Z1 Extreme (CPU): 5.25 tok/s using the 25W preset, 5.05tok/s using the 15W preset
Update:
Asked a friend with a M3 Pro 12core CPU 18GB. Running from CPU: 17.93tok/s, GPU: 21.1tok/s
The ROG Ally has a Ryzen Z1 Extreme which appears to be nearly identical to the 7840U, but from what I can discern, the NPU is disabled. So if / when LM Studio gets around to implementing support for that AI accelerator the 7840U should be faster at inferencing workloads.
AMD GPU seems to be an underdog in the ML world, when compared to Nvidia... I doubt that AMD's NPU will see better compatibility with ML stack than it's GPUs
If you let me know what settings / template you used for this test, I'll run a similar test on my M4 iPad with 16GB Ram. I get wildly different tok/s depending on which LLM and which template I'm using now.
As of right now, with the fine-tuned LLM and the "TinyLLaMa 1B" template being used I get the following:
M4 iPad w 16GB Ram / 2TB Storage: 15.52t/s
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Just for fun, here are some additional results:
iPad Pro M1 256GB, using LLM Farm to load the model: 12.05tok/s
Asus ROG Ally Z1 Extreme (CPU): 5.25 tok/s using the 25W preset, 5.05tok/s using the 15W preset
Update:
Asked a friend with a M3 Pro 12core CPU 18GB. Running from CPU: 17.93tok/s, GPU: 21.1tok/s
The CPU result for ROG is close to the one from 7840U, after all they almost identical CPUs
The ROG Ally has a Ryzen Z1 Extreme which appears to be nearly identical to the 7840U, but from what I can discern, the NPU is disabled. So if / when LM Studio gets around to implementing support for that AI accelerator the 7840U should be faster at inferencing workloads.
AMD GPU seems to be an underdog in the ML world, when compared to Nvidia... I doubt that AMD's NPU will see better compatibility with ML stack than it's GPUs
If you let me know what settings / template you used for this test, I'll run a similar test on my M4 iPad with 16GB Ram. I get wildly different tok/s depending on which LLM and which template I'm using now.
As of right now, with the fine-tuned LLM and the "TinyLLaMa 1B" template being used I get the following:
M4 iPad w 16GB Ram / 2TB Storage: 15.52t/s