Run Large and Small Language Models locally with ollama

#llm #ai #slm #ollama

Since ChatGPT, we all know at least roughly what Large Language Models (LLMs) are. You might have heard that they require an immense amount of GPU power to run. But did you know, there are also smaller and less powerful versions of some models (SLMs) that you can run locally on your computer?

Prerequisites

Download and install ollama
Download and install Docker

For this tutorial, we use ollama to download a model onto your machine and run it there. Despite you have an absolute power house with lots of GPU power in front of you, you might want to try with smaller models, called Small Language Models or SLMs like Llama 3 8B from Meta or Phi-3 Mini 3.8B from Microsoft.

Run a model locally with ollama

Before we can communicate with a model via ollama, we need to start ollama. The ollama app is basically just a small web server that runs locally on your machine and lets you communicate with the models (by default on http://localhost:11434).

Run the following command in the Terminal to start the ollama server.

ollama serve

Now we can open a separate Terminal window and run a model for testing. If the model you want to play with is not yet installed on your machine, ollama will download it for you automatically. You can find a full list of available models and their requirements at the ollama Library.

Run the following command to run the small Phi-3 Mini 3.8B model from Microsoft.

ollama run phi3

Now you can interact with the model and write some prompts right at the command line.

Chat with your own data locally

For the whole ChatGPT like experience, where we can also chat with our own data or web sources, we don’t just want to prompt the model through the terminal.

Luckily, there are some open-source projects like Open WebUI, which provide a web-based experience similar to ChatGPT, that you can also run locally and point to any model. To start the Open WebUI Docker container locally, run the command below in your Terminal (make sure, that ollama serve is still running).

docker run --rm -p 8080:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Now, you can open your browser at http://localhost:8080, and create an account. No worries, it all stays local, and you don’t even have to use a real E-Mail address.

Once logged in, you can choose a model in the top-bar of Open WebUI and start chatting.

Running Open WebUI locally connected to ollama
I would also like to encourage you to play with the Documents section, to perform some Retrieval Augmented Generation (RAG) with your local models!

Now enjoy playing with Small Language Models on your computer! 🎉

DEV Community

Run Large and Small Language Models locally with ollama

Prerequisites

Run a model locally with ollama

Chat with your own data locally

Top comments (0)

Read next

NPGA: Neural Parametric Gaussian Avatars

Look Once to Hear: Target Speech Hearing with Noisy Examples

Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities