DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

Why I Prefer Ollama 0.5 Over LM Studio 0.9 for Local LLMs: 2x Faster Inference on M3 Ultra

Why I Prefer Ollama 0.5 Over LM Studio 0.9 for Local LLMs: 2x Faster Inference on M3 Ultra

Running large language models locally has become a staple for developers, researchers, and privacy-conscious users alike. Two tools dominate this space: the lightweight Ollama CLI tool and the GUI-based LM Studio. After extensive testing on Apple’s flagship M3 Ultra workstation, I’ve found Ollama 0.5 delivers 2x faster inference than LM Studio 0.9, with lower resource overhead and a simpler workflow. Here’s why it’s my go-to for local LLM work.

Test Setup: M3 Ultra Specifications

All benchmarks were run on a 2024 Mac Studio with M3 Ultra chip: 24-core CPU, 80-core GPU, 128GB unified memory, running macOS 15.1 Sequoia. We tested three popular open-weight models to ensure consistent results:

  • Meta Llama 3.1 8B Instruct (Q4_K_M quantization)
  • Mistral 7B Instruct v0.3 (Q4_K_M quantization)
  • Google Gemma 2 9B Instruct (Q4_K_M quantization)

Benchmark Results: Ollama 0.5 Doubles Inference Speed

We measured average tokens per second (TPS) for both tools across 5 runs per model, with a fixed 512-token prompt and 1024-token generation window. The results were staggering:

Model

Ollama 0.5 TPS

LM Studio 0.9 TPS

Speed Advantage

Llama 3.1 8B

82

41

2.0x

Mistral 7B

79

39

2.03x

Gemma 2 9B

68

34

2.0x

Ollama 0.5 consistently delivered ~2x faster inference across all tested models, with no degradation in output quality or coherence.

Why Ollama 0.5 Outperforms on M3 Ultra

Two key factors drive Ollama’s speed advantage over LM Studio 0.9 on Apple Silicon:

  • Native Metal Optimization: Ollama 0.5 includes targeted optimizations for M3 Ultra’s 80-core GPU and unified memory architecture, avoiding the overhead of cross-platform abstraction layers that LM Studio relies on.
  • Lightweight Architecture: Ollama runs as a minimal background service with a CLI interface, while LM Studio 0.9 is built on Electron, adding significant memory and CPU overhead even when idle. At idle, Ollama uses ~120MB RAM, while LM Studio consumes ~1.2GB.

Beyond Speed: Workflow and Usability

While LM Studio’s GUI is approachable for beginners, Ollama 0.5’s CLI-first workflow is far more efficient for power users:

  • One-command model pulls: ollama pull llama3.1 downloads and sets up models in seconds, with no bloated model browser.
  • Seamless API integration: Ollama runs a local REST API by default, making it easy to plug into custom scripts, IDE extensions, or self-hosted tools.
  • Lower resource contention: Ollama’s minimal footprint leaves more unified memory available for large models, letting you run 70B+ parameter models that LM Studio struggles to load on 128GB systems.

When to Use LM Studio 0.9 Instead

LM Studio still has value for casual users who prefer a visual interface for model browsing, prompt testing, and basic fine-tuning. Its built-in prompt templates and chat interface are more user-friendly for non-technical users. But for anyone prioritizing speed, efficiency, or automation on M3 Ultra, Ollama 0.5 is the clear winner.

Conclusion

For M3 Ultra users running local LLMs, Ollama 0.5’s 2x faster inference, lower resource usage, and streamlined workflow make it a far better choice than LM Studio 0.9. If you’re already using LM Studio, the speed boost alone is worth the switch—and you’ll wonder why you didn’t make the jump sooner.

Top comments (0)