One of the reasons to go for Mac was the unified memory setup. With the RAM shared between CPU and GPU apple silicon has a (theoretical) cost and power efficiency over a nVidia setup. Not talking about raw power of a 5900 setup, but compared to my earlier lab with a virtualized setup on a Ryzen xxx I do expect grand improvements.
Base case: old lab
[placeholder to set numbers]
Setting up LM Studio
First things first, I had an old setting with Ubuntu running on AMD with laptop GPU. As bit of a geek I prefer terminal over GUI for hobby projects thus Ollama was a logical starting point. But on Mac there is off course a more user friendly (less geeky) setup possible with LM Studio. For my projects are hobbies I like to play a bit more over ease of use, thus terminal and Ollama it is (possible I add mystify as GUI later on for fun).
Unfortunately I Ollama was at the time or writing not 100% stable supporting MLX (apple's CUDA to make it oversimplified). As I've read that MLX can give 30% speed bums I decided to not geek and go for LM Studio :-)
20 Tokens/Second on deepseek qwen 3 -8b, that's not bad
Top comments (0)