DEV Community

SomeOddCodeGuy
SomeOddCodeGuy

Posted on • Originally published at someoddcodeguy.dev

A New Toy...

The M5 Max Macbook Pro just arrived. First thing I did was fling llama.cpp, Wilmer and Open WebUI on it.

Honestly, the speeds are really impressive, even considering that llama.cpp hasn't fully integrated the hardware changes yet (at least, that's my understanding). Here's a comparison of Qwen3.5 35b a3b between the M5 Max Macbook vs the M3 Ultra Mac Studio

M5 Max MacBook Pro:

1450 t/s processing, 68 t/s generation

prompt eval time =    
    3202.80 ms /  4654 tokens 
    (0.69 ms per token,  1453.10 tokens per second)
eval time =    
    7098.19 ms /   483 tokens 
   (14.70 ms per token,    68.05 tokens per second)
total time =   10300.99 ms /  5137 tokens
Enter fullscreen mode Exit fullscreen mode

M3 Ultra Mac Studio:

1647 t/s processing, 48 t/s generation

prompt eval time = 
    3810.74 ms / 6280 tokens 
    (0.61 ms per token, 1647.97 tokens per second)
eval time = 
    14695.00 ms / 704 tokens 
    (20.87 ms per token, 47.91 tokens per second)
total time = 
    18505.75 ms / 6984 tokens 
Enter fullscreen mode Exit fullscreen mode

So yea- the Studio processes prompts faster (at this size of model and this amount of tokens, though I think that it actually saturates better on the M5 Max at larger prompts), but generates tokens slower than the M5 Max.

Super excited to play with this. I got rid of the M2 Max Macbook, so this is my main travel machine now.

Top comments (0)