DEV Community

shailendra khade
shailendra khade

Posted on

Ollama + FastAPI API, Building My Own AI API Using Ollama and FastAPI on a Linux VM

Introduction

Large Language Models (LLMs) like ChatGPT are usually accessed via cloud APIs.
But what if we could run our own AI model locally and expose it as an API?

In this project, I built a custom AI API using Ollama + FastAPI on a Linux virtual machine.
This API exposes LLM capabilities via REST endpoints, similar to how real-world AI microservices work.

This post covers the architecture, implementation, challenges, and learnings.

What is Ollama?

Ollama is a tool that allows us to run LLM models like Mistral, Llama, and Gemma locally.

It provides a local API endpoint:

http://localhost:11434

We can wrap this with FastAPI to build our own AI service.

Architecture

Step 1: Install Ollama on Linux VM
curl -fsSL https://ollama.com/install.sh | sh

Verify installation:
ollama --version

Step 2: Pull an LLM Model
ollama pull mistral

Check available models:
ollama list

Step 3: Setup Python Environment
python3 -m venv ai-env
source ai-env/bin/activate
pip install fastapi uvicorn requests

Step 4: Build Ollama API using FastAPI
Create file ollama_api.py:

from fastapi import FastAPI
import requests

app = FastAPI()

OLLAMA_URL = "http://localhost:11434/api/generate"

@app.get("/")
def home():
    return {"message": "Ollama AI API is running"}

@app.get("/health")
def health():
    return {"status": "UP", "model": "mistral"}

@app.post("/chat")
def chat(prompt: str):
    payload = {
        "model": "mistral",
        "prompt": prompt,
        "stream": False
    }
    response = requests.post(OLLAMA_URL, json=payload)
    return response.json()

Enter fullscreen mode Exit fullscreen mode

Step 5: Run the API Server
uvicorn ollama_api:app --host 0.0.0.0 --port 9000

Step 6: Test the AI API
Test with curl
curl -X POST "http://localhost:9000/chat?prompt=Explain%20DevOps"

Test from Host Machine
curl -X POST "http://<VM-IP>:9000/chat?prompt=explain AI"

Challenges Faced
1 Networking Issues in VM

  • 0.0.0.0 cannot be used as a browser address.
  • Required using the VM IP address to access the API.

2 HTTPS vs HTTP

  • Browser attempted HTTPS while API was running on HTTP.
  • Solved by explicitly using HTTP.

3 Python PEP 668 Error

  • System Python was protected.
  • Solved using Python virtual environment (venv).

Key Learnings

  • Ollama can be used to run LLMs locally.
  • FastAPI is a great framework to expose AI models as microservices.
  • Virtual environments are essential in modern Linux systems.
  • Building APIs on VMs helps understand real DevOps workflows.
  • This architecture is similar to production AI services.

Top comments (0)