DEV Community

Cover image for Deploying LocalAI Self-Hosted AI Model Management Platform on Ubuntu 24.04
Sanskriti Harmukh for Vultr

Posted on with Aashish Chaurasiya • Originally published at docs.vultr.com

Deploying LocalAI Self-Hosted AI Model Management Platform on Ubuntu 24.04

LocalAI is an open-source platform for running Large Language Models locally with an OpenAI-compatible API, so you can swap it in behind existing OpenAI client code without paying per-token or sending data off-server. This guide deploys LocalAI using Docker Compose with Traefik handling automatic HTTPS, persistent model and cache directories, and a working chat-completion test. By the end, you'll have LocalAI serving an OpenAI-compatible API securely at your domain.


Set Up the Directory Structure

1. Create the project directories:

$ mkdir -p ~/localai/{models,cache}
$ cd ~/localai
Enter fullscreen mode Exit fullscreen mode

models/ holds downloaded model files; cache/ persists between restarts.

2. Create the environment file:

$ nano .env
Enter fullscreen mode Exit fullscreen mode
DOMAIN=localai.example.com
LETSENCRYPT_EMAIL=admin@example.com
Enter fullscreen mode Exit fullscreen mode

Deploy with Docker Compose

1. Add your user to the Docker group:

$ sudo usermod -aG docker $USER
$ newgrp docker
Enter fullscreen mode Exit fullscreen mode

2. Create the Compose manifest:

$ nano docker-compose.yaml
Enter fullscreen mode Exit fullscreen mode
services:
  traefik:
    image: traefik:v3.6
    container_name: traefik
    restart: unless-stopped
    environment:
      DOCKER_API_VERSION: "1.44"
    command:
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--certificatesresolvers.le.acme.httpchallenge=true"
      - "--certificatesresolvers.le.acme.httpchallenge.entrypoint=web"
      - "--certificatesresolvers.le.acme.email=${LETSENCRYPT_EMAIL}"
      - "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./letsencrypt:/letsencrypt

  localai:
    image: localai/localai:latest-aio-cpu
    container_name: localai
    restart: unless-stopped
    volumes:
      - ./models:/models:cached
      - ./cache:/cache:cached
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20s
      retries: 5
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.localai.rule=Host(`${DOMAIN}`)"
      - "traefik.http.routers.localai.entrypoints=websecure"
      - "traefik.http.routers.localai.tls=true"
      - "traefik.http.routers.localai.tls.certresolver=le"
      - "traefik.http.services.localai.loadbalancer.server.port=8080"
Enter fullscreen mode Exit fullscreen mode

Swap localai/localai:latest-aio-cpu for a GPU variant (latest-aio-gpu-nvidia-cuda-12) if the host has an NVIDIA GPU.

3. Set the models directory permissions and start the stack:

$ sudo chmod -R 755 ~/localai/models
$ docker compose up -d
$ docker compose ps
Enter fullscreen mode Exit fullscreen mode

Verify the API

1. Check readiness:

$ curl -i https://localai.example.com/readyz
Enter fullscreen mode Exit fullscreen mode

A 200 OK confirms Traefik is routing to LocalAI.

2. List the available models:

$ curl https://localai.example.com/v1/models
Enter fullscreen mode Exit fullscreen mode

3. Run a chat completion:

$ curl -X POST https://localai.example.com/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-4",
      "messages": [
        {"role": "user", "content": "Explain what LocalAI does in one sentence."}
      ],
      "max_tokens": 60
    }'
Enter fullscreen mode Exit fullscreen mode

LocalAI returns a response in the OpenAI completion shape.


Access the Dashboard

Open https://localai.example.com in a browser to browse the model gallery, install new models, and run inference from the UI.


Next Steps

LocalAI is running and served securely over HTTPS. From here you can:

  • Install additional models from the gallery for domain-specific tasks
  • Point any OpenAI SDK at the LocalAI base URL by changing OPENAI_API_BASE
  • Run a GPU variant for image generation, embeddings, and faster LLM inference

For the full guide with additional tips, visit the original article on Vultr Docs.

Top comments (0)