![Cover image for Run DeepSeek-R1 on Your Laptop with Ollama](https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0riixk5v2iptp8tqpqif.png)
Yesterday, DeepSeek released a series of very powerful language models including DeepSeek R1 and a number of distilled (smaller) models that are ba...
For further actions, you may consider blocking this person and/or reporting abuse
damn its time to build a macmini ai cluster 🫡
Honestly you don't even need to go that far. I'm running the 32b Qwen distill with 32GB of memory on my M1 Macbook Pro and it's fast enough.
just trying to find a reason to build that ok :(
:(
Is there a good place to know limits of what my mac can run? Almost like a caniuse or a canirun for large language LLM models?
I don't know any on top of my head, but here's my rule of thumb:
If you're running on Apple Silicon, your memory is shared across the CPU and the GPU, so if you subtract about 8gb from the total memory for the OS to be happy, you can get a sense of how large of a model you can run.
The other issue aside from memory is the GPU. So even if you manage to fit a large model in memory, the GPU may become a bottleneck in terms of how many tokens per second you can generate.
So it's mostly going based off of these two factors and deciding what's a usable token per second rate for your use case.
tried it today but only 8b
its a bit slow and the way its think is annoying and amazing i did it in ollama i suggest give it a shot
im still trying to find a llm thats good and not heavy for coding c++ flutter java r if any one knows that would be cool ty.
R1 knowledge cutoff is December 2023, which is months prior to Sonnet 3.5 (April 1, 2024) and Gemini 2.0 (August 1, 2024)
In the world of Frontend this isa huge difference, because things move very fast and one requires knowledge of recent versions of things.
AIs with cutoff dates are a very limiting for rapidly-changed fields (such as Frontend)
Sundul gan
Would it be possible for any of these models with ollama to add knowlege by crawling for example my local server with emails and documents?
Yes - via memory mechanism is probably the easiest way to do it. Use a local vector database for that.
Is there a guide for it that you know of?
GPT is your friend ;-)
Once I get my solution fully up and running, I intend to open-source it, from hardware to software. I hope that will help.
nice article, thanks!
Any insights on trade-offs going from their top model to 70b and all the way to 7b?
Are there benchmarks on the quality of the embedding?
Thanks Shayan. It's valuable
"Running DeepSeek-R1 on my laptop with Ollama is a game changer! Super easy and smooth—perfect for boosting productivity. 💻🚀"
Thanks. Great guide.