DEV Community

SomeOddCodeGuy
SomeOddCodeGuy

Posted on • Originally published at someoddcodeguy.dev

Mac Studio M3 Ultra Speeds for Qwen3 235b, GPT-OSS-120b, GLM 4.5, and Deepseek V3.1

M3 Ultra Mac Studio 512GB Speeds

Qwen3 235b a22b Instruct Q8 in Llama.cpp server

(~15k tokens)

prompt eval time   
    4.60 ms per token, 
    217.29 tokens per second

eval time   
    67.59 ms per token, 
    14.80 tokens per second

total time 
    146863.82 ms / 15763 tokens
Enter fullscreen mode Exit fullscreen mode

(~5k tokens)

prompt eval time   
    4.90 ms per token, 
    204.24 tokens per second

eval time   
    57.18 ms per token, 
    17.49 tokens per second

total time 
    65510.45 ms / 5649 tokens
Enter fullscreen mode Exit fullscreen mode

GPT-OSS-120b Unsloth fp16 gguf in Llama.cpp server

(~5k tokens)

prompt eval time   
    1.37 ms per token, 
    732.57 tokens per second

eval time   
    15.90 ms per token, 
    62.90 tokens per second

total time 
    8526.55 ms / 4447 tokens
Enter fullscreen mode Exit fullscreen mode

GLM 4.5 Q8 in Llama.cpp server

(~20k tokens)

prompt eval time   
    7.26 ms per token, 
    137.82 tokens per second

eval time   
    103.33 ms per token, 
    9.68 tokens per second

total time 
    202089.84 ms / 21377 tokens
Enter fullscreen mode Exit fullscreen mode

(15k tokens)

prompt eval time   
    7.16 ms per token, 
    139.64 tokens per second

eval time   
    96.64 ms per token, 
    10.35 tokens per second

total time 
    200516.47 ms / 16505 tokens
Enter fullscreen mode Exit fullscreen mode

(~10k tokens)

prompt eval time   
    6.64 ms per token, 
    150.55 tokens per second

eval time   
    88.75 ms per token, 
    11.27 tokens per second

total time 
    108213.31 ms / 10927 tokens
Enter fullscreen mode Exit fullscreen mode

(~5k tokens)

prompt eval time   
    6.86 ms per token, 
    145.70 tokens per second

eval time   
    81.31 ms per token, 
    12.30 tokens per second

total time 
    64483.49 ms / 6000 tokens
Enter fullscreen mode Exit fullscreen mode

Deepseek V3.1 Q5_K_M in Llama.cpp server

(~13k tokens)

prompt eval time   
    14.22 ms per token, 
    70.30 tokens per second

eval time   
    264.86 ms per token, 
    3.78 tokens per second

total time 
    253415.56 ms / 13217 tokens
Enter fullscreen mode Exit fullscreen mode

(~5k tokens)

prompt eval time   
    9.68 ms per token, 
    103.30 tokens per second

eval time   
    144.04 ms per token, 
    6.94 tokens per second

total time 
    119343.67 ms / 5763 tokens
Enter fullscreen mode Exit fullscreen mode

(~3k tokens)

prompt eval time   
    11.92 ms per token, 
    83.86 tokens per second

eval time   
    107.64 ms per token, 
    9.29 tokens per second

total time 
    65396.57 ms / 3269 tokens
Enter fullscreen mode Exit fullscreen mode

Top comments (0)