DEV Community

Run DeepSeek-R1 on Your Laptop with Ollama

Shayan on January 21, 2025

Yesterday, DeepSeek released a series of very powerful language models including DeepSeek R1 and a number of distilled (smaller) models that are ba...

Read full post

Jonas Scholz • Jan 21 '25

damn its time to build a macmini ai cluster 🫡

Shayan • Jan 21 '25

Honestly you don't even need to go that far. I'm running the 32b Qwen distill with 32GB of memory on my M1 Macbook Pro and it's fast enough.

Jonas Scholz • Jan 21 '25

just trying to find a reason to build that ok :(

Shayan • Jan 21 '25

mugdad • Jan 25 '25

tried it today but only 8b
its a bit slow and the way its think is annoying and amazing i did it in ollama i suggest give it a shot
im still trying to find a llm thats good and not heavy for coding c++ flutter java r if any one knows that would be cool ty.

brandon • Jan 21 '25

Is there a good place to know limits of what my mac can run? Almost like a caniuse or a canirun for large language LLM models?

Shayan • Jan 21 '25

I don't know any on top of my head, but here's my rule of thumb:

If you're running on Apple Silicon, your memory is shared across the CPU and the GPU, so if you subtract about 8gb from the total memory for the OS to be happy, you can get a sense of how large of a model you can run.

The other issue aside from memory is the GPU. So even if you manage to fit a large model in memory, the GPU may become a bottleneck in terms of how many tokens per second you can generate.

So it's mostly going based off of these two factors and deciding what's a usable token per second rate for your use case.

Yair Even Or • Jan 26 '25

R1 knowledge cutoff is December 2023, which is months prior to Sonnet 3.5 (April 1, 2024) and Gemini 2.0 (August 1, 2024)

In the world of Frontend this isa huge difference, because things move very fast and one requires knowledge of recent versions of things.

AIs with cutoff dates are a very limiting for rapidly-changed fields (such as Frontend)

Kourosh Eidivandi • Jan 30 '25

Thanks Shayan. It's valuable

Ömer Berat Sezer • Jan 22 '25 • Edited

nice article, thanks!

abamakbar07 • Jan 23 '25

Sundul gan

dewi-ny-je • Jan 22 '25

Would it be possible for any of these models with ollama to add knowlege by crawling for example my local server with emails and documents?

Martin Frasch • Jan 26 '25

Yes - via memory mechanism is probably the easiest way to do it. Use a local vector database for that.

Eduard Alexandru • Jan 29 '25

Is there a guide for it that you know of?

Martin Frasch • Jan 29 '25

GPT is your friend ;-)

Once I get my solution fully up and running, I intend to open-source it, from hardware to software. I hope that will help.

Martin Frasch • Jan 26 '25

Any insights on trade-offs going from their top model to 70b and all the way to 7b?

Are there benchmarks on the quality of the embedding?

Kimberly • Jan 26 '25

"Running DeepSeek-R1 on my laptop with Ollama is a game changer! Super easy and smooth—perfect for boosting productivity. 💻🚀"

Makhan • Jan 28 '25

Thanks. Great guide.