DEV Community

Cover image for ๐Ÿš€ Setting Up Ollama & Running DeepSeek R1 Locally for a Powerful RAG System

๐Ÿš€ Setting Up Ollama & Running DeepSeek R1 Locally for a Powerful RAG System

Ajmal Hasan on January 28, 2025

๐Ÿค– Ollama Ollama is a framework for running large language models (LLMs) locally on your machine. It lets you download, run, and interact...
Collapse
 
frulow profile image
Frulow โ€ข

Would have been better if you mentioned system requirements too

Collapse
 
thunderduck_eu profile image
thunderduck eu โ€ข

Itโ€™s a 1gb file. Llmโ€™s like to sit in your gpu. So a 2gb graphics card should run it. Obviously it will not be as fast as a 4060 8gb with lots of cuda cores. But if you read other articles about this llm itโ€™s designed to work on less resources

Collapse
 
ttibbs profile image
Terry W โ€ข โ€ข Edited

Also, the actual R1 model is the biggest one, anything less than the 400+gb model are distilled models of it.. but they are of course near enough the same thing anyway

Collapse
 
rohan_srivastava_b3c3fad3 profile image
Rohan Srivastava โ€ข

Yes even on linux computation even for 12 MB file the training and vectorization tooks almost 20 mins even the architecture is of r5a.4xlarge ec2 machine.

Collapse
 
veerakumar profile image
Veerakumar โ€ข

bro really thank for making this tut bro.

Collapse
 
leocallec profile image
Leo Calle โ€ข

thank you @ajmal_hasan for Sharing ,will give it a try ๐Ÿ˜€

Collapse
 
john_odonahue_653ef568ea profile image
John O'Donahue โ€ข

Great article. Surprised how well it works locally. Thank you.

Collapse
 
maneamarius profile image
maneamarius โ€ข

What are the hardware's requirements?
Why not starting with this, at the beginning of your guide?

Collapse
 
ajmal_hasan profile image
Ajmal Hasan โ€ข โ€ข Edited

Any decent system will suffice (for example, I use a MacBook M1 base model). Choose the light model available, if not having high end device.

However, keep in mind that processing time and response quality will vary based on your system's specifications and the complexity of the model parameters. ๐Ÿš€

Collapse
 
squidbe profile image
squidbe โ€ข

@maneamarius , Asking what the system requirements should be for an LLM is like asking what the horsepower should be for a car: It depends. We're talking about tools with a wide range of applications, so the minimum requirements depend on an individual's desired outcomes.

As they say, you attract more flies with honey than vinegar. Instead of criticizing a guy who's educating you and others for free, try asking him something like, "What are your system specs, and how many tokens per second are you getting?"

Collapse
 
maneamarius profile image
maneamarius โ€ข

Not a good answer.
You should put the recommended system requirements in your post, for each model.
e.g. graphics cards needed, etc..
Otherwise your post is incomplete.

Thread Thread
 
comradin profile image
Marcus Franke โ€ข

What about giving it a try before criticizing the author?

As mentioned, the 1.5b model is rather small. The download is "just" 1.1 Gigabyte. I was able to run it on a MacBook Pro 2 with only 16GB of RAM, and it was answering with decent speed consuming about 4G RAM usage.

The real limitation is the 1.5b model. I asked it to generate Rust code, and it admitted to not knowing it very well.

I then switched to the deepseek-coder-v2 model with 16b parameters, and that's a download of 8.9 Gigabytes. RAM usage spiked to 8G, and the model is operating at a lower speed and uses less reasoning but instead started to emit code directly to my question.

So, Ajmal's answer is that a decent system will be enough to generate your answers. I agree with this, as I would consider my Mac, due to RAM limitations, not as good, but decent. And, of course, it depends on what you are running besides the LLM. If your RAM is already filled up, you'll get into trouble.

However, you do not need a 4090 and many Tensor Cores to run these models locally. Your mileage may vary, true. But overall, and to get a first impression, it will definitely work.

Just give it a try, the text shows all the necessary steps to do this. Except for ollama serve you will find out by looking at the messages and the help.

Collapse
 
shardul_vikramsingh_d7cc profile image

Image description
I found this rule of thumb in a youtube video by bycloud
If your gpu's vram is greater than (model_size * 1.2) then you can run that model

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
oakitoki profile image
OaKiToKi โ€ข โ€ข Edited

Just wanted to confirm what specs it can run -
Ollama DeepSeekR1:14B runs smoothly and quickly on an Ryzen 7 5700x, 64GB, 3080RTX 10GB. The 32B and 70B run but the 70B thinks 1 word a second while the 32B is slightly faster.

I've used the 70B but had to let it run to provide info the next day (late at night). Just fyi if time is of no issue it will run Ollama and even the chatapp. Have not tried RAG but shouldn't be an issue.

Collapse
 
achilela profile image
Ataliba Miguel โ€ข

Hi @ajmal_hasan, how to get around from the error: requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /sentence-transformers/all-mpnet-base-v2/resolve/main/adapter_config.json (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (ssl.c:997)')))"), '(Request ID: edeffbec-e8a2-472e-9722-2c40df75aa94)')
2025-01-29 21:55:58.668 Examining the path of torch.classes raised: Tried to instantiate class '
path._path', but it does not exist! Ensure that it is registered via torch::class

Collapse
 
paul_levitt_2eb5cf3fbe227 profile image
Paul Levitt โ€ข โ€ข Edited

Iโ€™d double check your claim of DeepSeek R1 local deployments being โ€œโœ… 100% Local & Secureโ€ - wouldnโ€™t be the first to reach out to the wider net.

I caveat this with; you are however 100% in control of a local modelโ€™s resource access.

My apologies if this is what you meant; not explicitly called out so wasnโ€™t aware

Collapse
 
futuritous profile image
Futuritous โ€ข

My Laptop has 4 CPU cores, 16GB RAM with Intel integrated Graphics (Ubuntu) - will it work on my Laptop?

Collapse
 
abrahamn profile image
Abraham โ€ข

Yes, but not as fast as if you had a GPU. You also will need to use 7B or smaller model.

Collapse
 
thunderduck_eu profile image
thunderduck eu โ€ข

Try it. Itโ€™s a light weight model.

Collapse
 
samirhembrom profile image
SAMIR HEMBROM โ€ข

I tried running it dunno why but it gave me garbage text back

Collapse
 
ajmal_hasan profile image
Ajmal Hasan โ€ข โ€ข Edited

Use higher parameters version if your system supports it.

Collapse
 
samirhembrom profile image
SAMIR HEMBROM โ€ข

Sadly I don't think I can I have 8gb ram

Thread Thread
 
thunderduck_eu profile image
thunderduck eu โ€ข

Itโ€™s a small model. And will rely on your gpu. 2gb of gpu power will be enough to get started. Obviously it wonโ€™t be as fast if you have a more modern card. I use a 4060 with 8gb of ram. Mainly because it has a lot of cuda cores and uses way less electricity.

Collapse
 
sv_college_9222a5d6035883 profile image
Sv College โ€ข

i installed deepseek-r1:1.5b in my local machine , when i run ollama run deepseek-r1:1.5b it starts successfully but it's not giving response to my question , it's just give blank and asking to give another question in next line. what would be cause , and could you suggest a way to solve this issue

Collapse
 
r_flixbengolealalor_d profile image
R. Fรฉlix Bengolea Lalor โ€ข

Hi, I have a problem deploying it: "Unable to deploy
The appโ€™s code is not connected to a remote GitHub repository. To deploy on Streamlit Community Cloud, please put your code in a GitHub repository and publish the current branch. Read more in our documentation."
Is this necesary or is something Im doing wrong?
Thanks!

Collapse
 
shriramprabhu_j_c3aec0e14 profile image
SHRIRAMPRABHU J โ€ข

Hi, I have followed the above process, however I get this error - An error occurred: Ollama call failed with status code 500. Details: {"error":"llama runner process has terminated: exit status 2"}

Can someone please assist me?

Collapse
 
futuritous profile image
Futuritous โ€ข

Would love to try it.

Collapse
 
frankdev96 profile image
frankDev96 โ€ข

Does this works on windows and mac machine

Collapse
 
leonaseer profile image
Naseer Ahmad โ€ข

What if I want to use UTF8 txt files?

Collapse
 
__0db1809de3e79 profile image

Can you share a TypeScript version of that?

Collapse
 
abel_cdixon_2254309342ae profile image
Abel C Dixon โ€ข

How can I enable support for image inference

Collapse
 
mohamed_wajeeth_f1ad7a1b0 profile image
Mohamed Wajeeth โ€ข

Thank you!

Collapse
 
suman_tandukar_e7f6f6c36c profile image
Suman Tandukar โ€ข

I am struggling to download even 1.5b module. I tried all of them except 671b. it always reset at some point and restart until it gives me too much retries. Is someone facing the same issue?