A New Toy...

The M5 Max Macbook Pro just arrived. First thing I did was fling llama.cpp, Wilmer and Open WebUI on it.

Honestly, the speeds are really impressive, even considering that llama.cpp hasn't fully integrated the hardware changes yet (at least, that's my understanding). Here's a comparison of Qwen3.5 35b a3b between the M5 Max Macbook vs the M3 Ultra Mac Studio

M5 Max MacBook Pro:

1450 t/s processing, 68 t/s generation

prompt eval time =    
    3202.80 ms /  4654 tokens 
    (0.69 ms per token,  1453.10 tokens per second)
eval time =    
    7098.19 ms /   483 tokens 
   (14.70 ms per token,    68.05 tokens per second)
total time =   10300.99 ms /  5137 tokens

M3 Ultra Mac Studio:

1647 t/s processing, 48 t/s generation

prompt eval time = 
    3810.74 ms / 6280 tokens 
    (0.61 ms per token, 1647.97 tokens per second)
eval time = 
    14695.00 ms / 704 tokens 
    (20.87 ms per token, 47.91 tokens per second)
total time = 
    18505.75 ms / 6984 tokens

So yea- the Studio processes prompts faster (at this size of model and this amount of tokens, though I think that it actually saturates better on the M5 Max at larger prompts), but generates tokens slower than the M5 Max.

Super excited to play with this. I got rid of the M2 Max Macbook, so this is my main travel machine now.

DEV Community

A New Toy...

M5 Max MacBook Pro:

M3 Ultra Mac Studio:

Top comments (0)