Ollama + FastAPI API, Building My Own AI API Using Ollama and FastAPI on a Linux VM

#devops #ai #api #python

Introduction

Large Language Models (LLMs) like ChatGPT are usually accessed via cloud APIs.
But what if we could run our own AI model locally and expose it as an API?

In this project, I built a custom AI API using Ollama + FastAPI on a Linux virtual machine.
This API exposes LLM capabilities via REST endpoints, similar to how real-world AI microservices work.

This post covers the architecture, implementation, challenges, and learnings.

What is Ollama?

Ollama is a tool that allows us to run LLM models like Mistral, Llama, and Gemma locally.

It provides a local API endpoint:

http://localhost:11434

We can wrap this with FastAPI to build our own AI service.

Architecture

Step 1: Install Ollama on Linux VM
curl -fsSL https://ollama.com/install.sh | sh

Verify installation:
ollama --version

Step 2: Pull an LLM Model
ollama pull mistral

Check available models:
ollama list

Step 3: Setup Python Environment
python3 -m venv ai-env source ai-env/bin/activate pip install fastapi uvicorn requests

Step 4: Build Ollama API using FastAPI
Create file ollama_api.py:

from fastapi import FastAPI
import requests

app = FastAPI()

OLLAMA_URL = "http://localhost:11434/api/generate"

@app.get("/")
def home():
    return {"message": "Ollama AI API is running"}

@app.get("/health")
def health():
    return {"status": "UP", "model": "mistral"}

@app.post("/chat")
def chat(prompt: str):
    payload = {
        "model": "mistral",
        "prompt": prompt,
        "stream": False
    }
    response = requests.post(OLLAMA_URL, json=payload)
    return response.json()

Step 5: Run the API Server
uvicorn ollama_api:app --host 0.0.0.0 --port 9000

Step 6: Test the AI API
Test with curl
curl -X POST "http://localhost:9000/chat?prompt=Explain%20DevOps"

Test from Host Machine
curl -X POST "http://<VM-IP>:9000/chat?prompt=explain AI"