DEV Community

Njeri Kimaru
Njeri Kimaru

Posted on • Edited on

Does ramalama make AI boring?? Running AI models with Ramalama.

What is ramalama

Ramalama is an open source command line tool that makes running AI models locally simple by treating them like containers.
Ramalama runs models with podman/docker and there's no config needed.
It is GPU optimizedand accelerates performance.
It is compatible with llama.cpp, openvino, vLLM, whisper.cpp and manymore.

Installing ramalama

Ramalama is easy to install.
After installing check the version you are using.

sudo dnf pip install python3-ramalama
Enter fullscreen mode Exit fullscreen mode
ramalama version
Enter fullscreen mode Exit fullscreen mode

Ramalama supports multiple model registries(transports);

1. Ollama

It is the quickest and easiest registry.
Here are a few AI models i ran using ollama.

ramalama run granite moe3
Enter fullscreen mode Exit fullscreen mode


ramalama run ollama://llama4:scout
Enter fullscreen mode Exit fullscreen mode

2. Hugging face

Some hugging face model require one to login.
Here are some that don't require logging in:

ramalama run huggingface://instructlab/granite-7b-lab-Q4_K_M.gguf
Enter fullscreen mode Exit fullscreen mode


ramalama run huggingface://microsoft/Phi-3-mini-4k-instruct-q4.gguf
Enter fullscreen mode Exit fullscreen mode

3. Modelscope

Model scope worked quite well too.
but I had to upgrade ramalama's version.

sudo dnf upgrade ramalama
Enter fullscreen mode Exit fullscreen mode

Here are some of modelscope's model I used;

ramalama run modelscope://Qwen/Qwen2.5-7B-Instruct-GGUF/qwen2.5-7b-instruct-q4_k_m.gguf
Enter fullscreen mode Exit fullscreen mode

4. OCI registries

Let's start with what is OCI?
OCI(Open Container Initiative), is a standard or a specification which defines how containers and their images should be packaged and determined.
There are several OCI registries;

  • quay.io
  • docker.io
  • github container registry(ghcr.io) In github I had to login first then get an authentication token. Afterwards, I pushed a model then accessed using the ghcr.io
ramalama convert ollama://mistral oci://ghcr.io/njeri-kimaru/mistral:gguf
Enter fullscreen mode Exit fullscreen mode
ramalama run oci://ghcr.io/njeri-kimaru/mistral:gguf
Enter fullscreen mode Exit fullscreen mode

  • google container registry(gcr.io)
  • amazon elastic container registry(ecr.io)
  • Ramalama Container Registry(rlcr.io)

5.URL based source

RamaLama also supports loading models directly from URLs instead of registries.

They include:

  • https:// → download from the internet
  • file:// → load from your local machine

6.Hosted API

For a model like Openai to run it requires a secret key which you get from openai API-keys then you'll have to pay for your model to run successfully.

Top comments (0)