DEV Community

Alex Spinov
Alex Spinov

Posted on

LocalAI Has a Free API You've Never Heard Of

LocalAI is a self-hosted, OpenAI-compatible API that runs LLMs, image generation, and audio transcription entirely on your hardware. No GPU required — it runs on CPU too.

What Makes LocalAI Special?

  • OpenAI API compatible — drop-in replacement
  • No GPU needed — runs on CPU (GPU optional)
  • Multi-modal — text, images, audio, embeddings
  • Model gallery — one-click model downloads
  • Free — open source, self-hosted

The Hidden API: Full OpenAI Compatibility

from openai import OpenAI

client = OpenAI(base_url='http://localhost:8080/v1', api_key='not-needed')

# Chat completions
response = client.chat.completions.create(
    model='llama3',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'Explain Docker in simple terms.'}
    ],
    temperature=0.7
)
print(response.choices[0].message.content)

# Image generation (Stable Diffusion)
image = client.images.generate(
    model='stablediffusion',
    prompt='A futuristic city at sunset, cyberpunk style',
    size='512x512'
)

# Audio transcription (Whisper)
transcription = client.audio.transcriptions.create(
    model='whisper-1',
    file=open('meeting.mp3', 'rb')
)
print(transcription.text)

# Embeddings
embedding = client.embeddings.create(
    model='text-embedding-ada-002',
    input='What is machine learning?'
)
print(len(embedding.data[0].embedding))  # 384 dimensions
Enter fullscreen mode Exit fullscreen mode

Text-to-Speech API

response = client.audio.speech.create(
    model='tts-1',
    voice='alloy',
    input='Hello! This is generated speech running locally.'
)
response.stream_to_file('output.mp3')
Enter fullscreen mode Exit fullscreen mode

Model Gallery API

# Browse and install models
curl http://localhost:8080/models/available | jq '.[] | .name'

# Install a model
curl http://localhost:8080/models/apply -d '{
  "url": "github:go-skynet/model-gallery/llama3.yaml"
}'
Enter fullscreen mode Exit fullscreen mode

Quick Start

docker run -p 8080:8080 localai/localai:latest-cpu
# GPU version:
# docker run --gpus all -p 8080:8080 localai/localai:latest-gpu-nvidia-cuda-12
Enter fullscreen mode Exit fullscreen mode

Why Teams Self-Host LocalAI

A CTO shared: "Compliance says no data to external APIs. LocalAI runs on our servers with the same OpenAI SDK our developers already know. We switched the base URL in our config and everything works — chat, embeddings, image generation, all local."


Building AI-powered tools? Email spinov001@gmail.com or check my AI solutions.

Self-hosting AI? LocalAI vs Ollama vs vLLM — what's your choice?

Top comments (0)