DEV Community

Cover image for Run Deepseek locally using Docker!
Savvas Stephanides
Savvas Stephanides

Posted on • Edited on

84 2 2 2 3

Run Deepseek locally using Docker!

About

If you follow AI news even a little bit, you've probably heard of Deepseek. The new AI app built in China which is supposed to be a lot better than anything else out there. You've also probably heard horror stories about apps from China collecting information on their users.

If this sounds like you, you're probably looking for a better alternative. And there is: You can build a Deepseek client on your local system and run it without any access to the Internet. What's even better is that you can do this in Docker so you don't need anything else installed. Sound good? If yes, read on!

What we're building

We're going to use Docker to create a web interface where you can ask questions to Deepseek.

📖 Docker, explained

To do this, we're going to create two apps:

  1. A Deepseek endpoint built with Ollama
  2. A simple static website which calls the endpoint and gets answers to questions.

When we're building multiple apps that communicate with each other, we can do this easily in Docker using Docker Compose.

Steps

🏃‍♂️ In a hurry? Just clone this repository and follow the instructions in the README file!

Step 1: Create docker-compose file

To work with Docker Compose, we first need to create a new file in our project's directory. Create a new directory for your project if you haven't do so already and in that, create a new file called docker-compose.yml.

💡 A docker-compose.yml file is the place where we will be describing our two apps so we can run them.

Step 2: Add first app

Now that we created our Docker Compose file, let's define our first app, the Deepseek Ollama app. In Docker Compose, apps are called "services", but it's essentially the same thing.

The Deepseek Ollama app is where we're going to point to to ask questions to Deepseek and get answers.

💡 What is Ollama? Ollama is a lightweight framework where you can easily run open source frameworks, like Deepseek or Llama, on your computer locally.

To add the Deepseek endpoint to your docker-compose.yml file, add this to the file:

services:
  ollama:
    image: ollama/ollama
    volumes:
      - ./ollama-models:/root/.ollama
    ports:
      - 11434:11434
Enter fullscreen mode Exit fullscreen mode

🤔 What is going on here?

  • This part tells Docker Compose to create a container called "ollama"
  • The container will be based on the image (basically a blueprint) called ollama/ollama. This is an image which comes with Ollama preinstalled so you don't have to do anything else!
  • The volumes part essentially saves all the data from the models we're going to install later on your local hard drive. Within the container, all this data is saved in /root/.ollama but will disappear once the container is shut down. Not exactly what we want. So whatever is stored in the container's directory, will be permanently stored in ollama-models directory in your project's root.
  • When the Ollama app gets up and running, it will run on the 11434 port within the container, but not on your machine, which means you won't be able to access it from your browser. To fix this, the ports part pairs the two ports.

Now that we added the Ollama endpoint, we can now run it so we can install Deepseek. To do this, just run this Docker Compose command in your terminal:

docker compose up -d ollama
Enter fullscreen mode Exit fullscreen mode

Now go to your browser and check that the Ollama app is running, by pointing your browser to this address:

http://localhost:11434

Your browser should show the "Ollama is running" text like so:

A browser window with

Step 3: Install Deepseek

Now that Ollama is up and running, we can now go ahead and install our Deepseek model. To do this just run this Docker Compose command:

docker compose exec ollama ollama pull deepseek-r1:7b
Enter fullscreen mode Exit fullscreen mode

🤔 What is going on here?

  • The exec command from Docker Compose essentially runs any command within a given container, in this case the ollama container. The above line runs (executes) the command ollama pull deepseek-r1:7b which installs the Deepseek model. The basic structure of the exec command is as follows: docker compose exec <containerName> <command>.

This command will take a few minutes to install (depending on the size of the model) but once this is done, it should populate the new ollama-models directory with the files needed for the Deepseek model.

💡 The Deepseek model comes in lots of different sizes. For this example, I've chosen 7b (which means 7 billion parameters) but you can choose more or less depending on the capabilities of your system. You can see the full list here.

Step 4: Create a website

Now that we have our Deepseek app up and running, we can create a web interface to ask questions. We're going to create a simple site with HTML, CSS and Javascript. This is what we're creating:

Screenshot of the website

And here's how:

HTML

The HTML is going to define a simple page with a text box, a button to send the question and a space for the answer.

📖 HTML, explained

Create a new directory called web and inside that create a new file called index.html. Paste this HTML inside the file:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>My AI</title>
    <script src="showdown.min.js"></script>
    <script src="ollama.js"></script>
    <link rel="stylesheet" href="style.css" />
</head>
<body>
    <h1>🤖 Local Deepseek</h1>

    <textarea name="" id="question"></textarea>
    <button onclick="run()">Ask!</button>

    <div id="answer">

    </div>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

🤔 What is going on here?

  • In the <head> part of the HTML, you'll notice that we're linking to a style.css We'll create this file next in order to style our website.
  • You'll also notice two Javascript files, ollama.js and showdown.min.js. ollama.js will be where we talk to Deepseek and showdown.min.js uses Showdown, a Javascript library for converting Markdown (what we'll be getting back from Deepseek) to HTML.

CSS

📖 CSS, explained

To style our page, create a new file called style.css and paste the CSS below:

body{
    width: 600px;
    margin: auto;
}

#question{
    display: block;
    width: 100%;
    padding: 9px;
    font-size: 15px;
}

#answer{
    font-family: Arial, Helvetica, sans-serif;
    font-size: 15px;
    margin-top: 30px;
    line-height: 1.5;
}

#answer #think{
    border-left: 3px solid #eee;
    padding-left: 9px;
    color: #aaa;
    font-style: italic;
}
Enter fullscreen mode Exit fullscreen mode

Javascript

📖 Javascript, explained

Now we're going to create the Javascript to talk to Deepseek and give us answers to our questions. Create a new file called ollama.js and paste this:

const converter = new showdown.Converter()

async function run(){
    let prompt = document.querySelector("#question").value

    const response = await fetch("http://localhost:11434/api/generate", {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
        },
        body: JSON.stringify({
            model: "deepseek-r1:7b",
            prompt: prompt,
            stream: true
        })
    })

    const reader = response.body.getReader()
    const decoder = new TextDecoder()

    let compiledResponse = ""
    while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        const chunk = decoder.decode(value, { stream: true });
        let chunkJson = JSON.parse(chunk)
        compiledResponse += chunkJson.response
        compiledResponse = compiledResponse.replace("<think>", `<div id="think">`)
        compiledResponse = compiledResponse.replace("</think>", `</div>`)
        document.querySelector("#answer").innerHTML = converter.makeHtml(compiledResponse)
    }
}
Enter fullscreen mode Exit fullscreen mode

🤔 What is going on here?

  • We're creating a Javascript function called run().
  • Within the function, we're going to get the text from the text box in our HTML using document.querySelector("#question").value and store it in a variable called prompt.
  • Then we're going to use the built in fetch() function to send a POST request to http://localhost:11434/api/generate which includes our prompt. The response is stored in a variable called response. Since we've set stream: true, we're going to get a response in small chunks.
  • To get each chunk individually, we're going to run response.body.getReader() to get the reader and initialise a new TextDecoder with new TextDecoder().
  • We're introducing a new empty string let entireResponse = "" which we're going to be appending each chunk of the response.
  • Finally, the while loop is going to run until it runs out of chunks. For each chunk, we're going to get the response, add it to the entireResponse, process it, and show it to the webpage using document.querySelector("#answer").innerHTML = entireResponseAsHtml.

🤔 Why are we processing the response?

The response comes back from Deepseek looking like this:

<think>
The user says hello so I should say hello back
</think>

**Hello! How are you doing?**
Enter fullscreen mode Exit fullscreen mode

When we process the file, we replace <think> with <div id="think"> and </think> with </div>. This way we can style it however we like.

entireResponse = entireResponse.replace("<think>", `<div id="think">`)
entireResponse = entireResponse.replace("</think>", `</div>`)
Enter fullscreen mode Exit fullscreen mode

We're also then converting the entire response from Markdown into HTML, using the ShowdownJS library:

let entireResponseAsHtml = converter.makeHtml(entireResponse)
Enter fullscreen mode Exit fullscreen mode

Import showdown

Finally, we need to add ShowdownJS to our project. To do this, simply download this file and add it to the web directory of your project.

At the end of all this, the web directory should look like this:

📝 index.html
📝 ollama.js
📝 showdown.min.js
📝 style.css

Step 5: Add web page to Docker Compose

Once you're done with the creation of your website, add your app to the Docker Compose file like so:

  web:
    image: nginx:1.27.3-alpine
    volumes:
    - ./web:/usr/share/nginx/html
    ports:
    - "3001:80"
Enter fullscreen mode Exit fullscreen mode

Your entire docker-compose.yml file should look like this:

services:
  ollama:
    image: ollama/ollama
    volumes:
      - ./ollama-models:/root/.ollama
    ports:
      - 11434:11434

  web:
    image: nginx:1.27.3-alpine
    volumes:
    - ./web:/usr/share/nginx/html
    ports:
    - "3001:80"
Enter fullscreen mode Exit fullscreen mode

Run the website

Now that we've created our website and added to our docker-compose.yml file, we can run it with this command:

docker compose up -d web
Enter fullscreen mode Exit fullscreen mode

Give it one or two seconds and then point your browser to this URL:

http://localhost:3001

If you see this, we're good to go:

Website screenshot

Let's test it!

Let's give our AI app a go! Write a question in the text box and click the Ask button.

Deepseek should soon start responding. First with some "thinking"...

The thinking box

The answer box

This is how it looks on Video:

And that's it! You are now running Deepseek locally using just Docker!

Now every time you want to run your AI app, just run this command from your project directory in your terminal:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

And that's it!

Any questions? Let me know here!

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (13)

Collapse
 
giuseppebaldi profile image
giuseppebaldi

Thank you for this, it was a fun, fairly easy thing to follow along with. I am, however, having one problem. I am getting a 403 Forbidden error when I try to access the web page. I followed the instructions exactly with the exception of using port 3003 instead of 3001, another container was already using 3001. I've installed this on a NAS and am trying to access the page from my laptop, I do not have issues accessing other containers. What might be causing this? Is there something I need to add/change in the docker-compose.yaml?

Collapse
 
savvasstephnds profile image
Savvas Stephanides

Hello Guiseppe! Thank you for reading! This might be because of Nginx not getting the contents of your webpage. Are you sure your docker-compose had the correct directory as a volume in the nginx container?

Collapse
 
giuseppebaldi profile image
giuseppebaldi

I used the docker-compose exactly as you've shown it with the only change being port 3003 instead of 3001. What wasn't clear for me was what the file structure should be exactly. Currently, I have a folder called deepseek, inside that is the docker-compose.yaml, the ollama-models folder (which does have the blobs, manifests, etc. inside it after starting the container) and the web folder (with the html, js & css files inside). Is this correct or does the web directory go elsewhere?

For reference, I am able to see "ollama is running" at port 11434.

Thread Thread
 
savvasstephnds profile image
Savvas Stephanides

Check this repository to check how the project structure should be:

github.com/SavvasStephanides/local...

Collapse
 
hughes_chen profile image
陳建仲(黑修斯)

Since you are encountering a 403 Forbidden error when trying to access the web page, here are a few possible causes and solutions:

Check the Web Server Configuration

If the container is running a web server (like Nginx or Apache), check the configuration files to ensure that access is allowed from external sources.
If you are using Nginx, check the nginx.conf or site configuration file to confirm that it is not restricting access based on IP.
Verify Port Binding in docker-compose.yaml

Ensure that the port is correctly mapped in docker-compose.yaml. If you are running the service on port 3003, make sure your YAML file includes:
ports:

  • "3003:3003" If your service is listening on 127.0.0.1:3003, it may only be accessible from within the NAS. Try changing the binding to 0.0.0.0:3003 instead. Check Firewall and NAS Security Settings

Some NAS devices have built-in firewall rules that restrict access to certain ports. Ensure that your NAS allows traffic on port 3003.
You may need to create a rule to allow external access.
Verify File and Folder Permissions

A 403 error can also be caused by permission issues. Ensure that the web server has the necessary permissions to access the files and directories it is serving.
Examine Logs for More Details

Run docker logs to check if there are any error messages related to access permissions.

Collapse
 
dillibabu_k profile image
Dilli K

Thanks for the great tuts & resources. The steps provided are working fine.
And I'm curious about how to input docs in the prompt & ask.

Collapse
 
savvasstephnds profile image
Savvas Stephanides

I'm sure the Ollama docs will have what you need.

Collapse
 
zyntax profile image
John Bradshaw

Everything seemed to install OK, and clicking on the links in between steps all showed what was expected. The only issue is nothing happens when I type something in and click "Ask!".

Collapse
 
savvasstephnds profile image
Savvas Stephanides

Hello John! Maybe your browser’s console might give you a hint as to why this happens

Collapse
 
christopher_meeker_36615d profile image
Christopher Meeker

I have followed the instructions but the "web" container is unable to access the "ollama" container.

They are both running on a remote server from my web browser.

Collapse
 
christopher_meeker_36615d profile image
Christopher Meeker

Also, if I try to curl the localhost:11434 from inside the "web" container it is unable to connect but if I curl ollama:11434, I get the "Ollama is running" response.

Collapse
 
savvasstephnds profile image
Savvas Stephanides

Yeap. That's how networking works in Docker. If you need to curl to a container within another container, you can use the name of the container you wish to curl to: ollama:11434 in this case.

Collapse
 
christopher_meeker_36615d profile image
Christopher Meeker

If I send a POST directly from the "web" container with the correct data, it simply times out.
Tried moving the containers to an x86_64 server with the same outcome.

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more