DEV Community

EvolveDev
EvolveDev

Posted on

11 3 3 2 2

Setup REST-API service of AI by using Local LLMs with Ollama

Setting up a REST API service for AI using Local LLMs with Ollama seems like a practical approach. Here’s a simple workflow.

1 Install Ollama and LLMs

Begin by installing Ollama and the Local LLMs on your local machine. Ollama facilitates the local deployment of LLMs, making it easier to manage and utilize them for various tasks.

Install Ollama

Install LLMs for Ollama

ollama pull llama3
ollama run llama3
Enter fullscreen mode Exit fullscreen mode

Ollama Commands

Available Commands:
  /set         Set session variables
  /show        Show model information
  /bye         Exit
  /?, /help    Help for a command

Use """ to begin a multi-line message
Enter fullscreen mode Exit fullscreen mode

Test Ollama

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?",
  "stream": true
}'
Enter fullscreen mode Exit fullscreen mode

If stream is set to false, the response will be a single JSON object

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?",
  "stream": false
}'
Enter fullscreen mode Exit fullscreen mode

2 Set Up FastAPI:

Set up a Python FastAPI application. FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints. It’s a great choice for building robust and efficient APIs.

Develop the FastAPI routes and endpoints to interact with the Ollama server. This involves sending requests to Ollama for processing tasks such as text generation, language understanding, or any other AI-related tasks supported by your LLMs. Following codes is a simple example. (You also can use Ollama Python lib to improve following coding.)

from typing import Union
from fastapi import FastAPI
from pydantic import BaseModel
import json
import requests

app = FastAPI(debug=True)

class Itemexample(BaseModel):
    name: str
    prompt: str
    instruction: str
    is_offer: Union[bool, None] = None

class Item(BaseModel):
    model: str
    prompt: str

urls =["http://localhost:11434/api/generate"]

headers = {
    "Content-Type": "application/json"
}

@app.get("/")
def read_root():
    return {"Hello": "World"}

@app.post("/chat/{llms_name}")
def update_item(llms_name: str, item: Item):
    if llms_name == "llama3":
        url = urls[0]
        payload = {
            "model": "llama3",
            "prompt": "Why is the sky blue?",
            "stream": False
        }
        response = requests.post(url, headers=headers, data=json.dumps(payload))
        if response.status_code == 200:
            return {"data": response.text, "llms_name": llms_name}
        else:
            print("error:", response.status_code, response.text)
            return {"item_name": item.model, "error": response.status_code, "data": response.text}
    return {"item_name": item.model, "llms_name": llms_name}
Enter fullscreen mode Exit fullscreen mode

Test REST-API service

curl --location 'http://127.0.0.1:8000/chat/llama3' \
--header 'Content-Type: application/json' \
--data '{
  "model": "llama3",
  "prompt": "Why is the sky blue?"
}'
Enter fullscreen mode Exit fullscreen mode

3. Deploy:

Once you’re satisfied with the functionality and performance of REST API, this service can be deployed to a production environment if needed. This might involve deploying it to a cloud platform, containerizing it with Docker, or deploying it on a server.

In this simple example, by leveraging Ollama for local LLM deployment and integrating it with FastAPI for building the REST API server, you’re creating a free solution for AI services. This model can be fine-tuning by your own training data for customized purpose (we will discuss in future).

Happy reading, happy coding.

References:

  1. Ollama

  2. Ollama GitHub

  3. Ollama Python Lib

  4. FASTAPI

  5. Ollama Installation Guide

Please leave your appreciation by commenting on this post!

It takes just one minute and is worth it for your career.

Get started

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay