DEV Community

Cover image for Run DeepSeek-R1 on Your Laptop with Ollama

Run DeepSeek-R1 on Your Laptop with Ollama

Shayan on January 21, 2025

Yesterday, DeepSeek released a series of very powerful language models including DeepSeek R1 and a number of distilled (smaller) models that are ba...
Collapse
 
code42cate profile image
Jonas Scholz

damn its time to build a macmini ai cluster 🫡

Collapse
 
shayy profile image
Shayan

Honestly you don't even need to go that far. I'm running the 32b Qwen distill with 32GB of memory on my M1 Macbook Pro and it's fast enough.

Collapse
 
code42cate profile image
Jonas Scholz

just trying to find a reason to build that ok :(

Thread Thread
 
shayy profile image
Shayan

:(

Collapse
 
burcs profile image
brandon

Is there a good place to know limits of what my mac can run? Almost like a caniuse or a canirun for large language LLM models?

Collapse
 
shayy profile image
Shayan

I don't know any on top of my head, but here's my rule of thumb:

If you're running on Apple Silicon, your memory is shared across the CPU and the GPU, so if you subtract about 8gb from the total memory for the OS to be happy, you can get a sense of how large of a model you can run.

The other issue aside from memory is the GPU. So even if you manage to fit a large model in memory, the GPU may become a bottleneck in terms of how many tokens per second you can generate.

So it's mostly going based off of these two factors and deciding what's a usable token per second rate for your use case.

Collapse
 
mugdad1 profile image
mugdad

tried it today but only 8b
its a bit slow and the way its think is annoying and amazing i did it in ollama i suggest give it a shot
im still trying to find a llm thats good and not heavy for coding c++ flutter java r if any one knows that would be cool ty.

Collapse
 
yaireo profile image
Yair Even Or

R1 knowledge cutoff is December 2023, which is months prior to Sonnet 3.5 (April 1, 2024) and Gemini 2.0 (August 1, 2024)

In the world of Frontend this isa huge difference, because things move very fast and one requires knowledge of recent versions of things.

AIs with cutoff dates are a very limiting for rapidly-changed fields (such as Frontend)

Collapse
 
abamakbar07 profile image
abamakbar07

Sundul gan

Collapse
 
dewinyje profile image
dewi-ny-je

Would it be possible for any of these models with ollama to add knowlege by crawling for example my local server with emails and documents?

Collapse
 
martin_frasch_ed3b9d359b2 profile image
Martin Frasch

Yes - via memory mechanism is probably the easiest way to do it. Use a local vector database for that.

Collapse
 
edyshor profile image
Eduard Alexandru

Is there a guide for it that you know of?

Thread Thread
 
martin_frasch_ed3b9d359b2 profile image
Martin Frasch

GPT is your friend ;-)

Once I get my solution fully up and running, I intend to open-source it, from hardware to software. I hope that will help.

Collapse
 
omerberatsezer profile image
Ömer Berat Sezer • Edited

nice article, thanks!

Collapse
 
martin_frasch_ed3b9d359b2 profile image
Martin Frasch

Any insights on trade-offs going from their top model to 70b and all the way to 7b?

Are there benchmarks on the quality of the embedding?

Collapse
 
kouroshjs profile image
Kourosh Eidivandi

Thanks Shayan. It's valuable

Collapse
 
kimberly66 profile image
Kimberly

"Running DeepSeek-R1 on my laptop with Ollama is a game changer! Super easy and smooth—perfect for boosting productivity. 💻🚀"

Collapse
 
makhan1963 profile image
Makhan

Thanks. Great guide.