DEV Community

Alex Spinov
Alex Spinov

Posted on

Jan.ai Has a Free API — Run AI Models Locally with OpenAI-Compatible Endpoints

Jan is a free, open-source desktop app that lets you run AI models locally on your machine with an OpenAI-compatible API. That means any code using the OpenAI SDK works with Jan — just change the base URL.

No API keys. No usage fees. No data leaving your machine.

Why Use Jan?

  • 100% local — your data never leaves your computer
  • OpenAI-compatible — drop-in replacement for OpenAI API
  • GPU accelerated — uses your NVIDIA/AMD/Apple Silicon GPU
  • Model hub — download Llama, Mistral, Phi, Gemma with one click
  • Free forever — no subscription, no token limits

Quick Setup

1. Install Jan

Download from jan.ai for Mac, Windows, or Linux. Or via CLI:

# macOS
brew install --cask jan

# Or download AppImage for Linux
wget https://github.com/janhq/jan/releases/latest/download/jan-linux-x86_64.AppImage
chmod +x jan-linux-x86_64.AppImage
Enter fullscreen mode Exit fullscreen mode

2. Download a Model

In Jan UI: Hub → Search for model → Download

Popular choices:

  • Llama 3.1 8B — great general-purpose (needs 8GB RAM)
  • Mistral 7B — fast and capable
  • Phi-3 Mini — Microsoft's small but powerful model (4GB RAM)

3. Start the API Server

In Jan: Settings → Advanced → Enable API Server

Default endpoint: http://localhost:1337/v1

4. Use with curl

# Chat completion (OpenAI-compatible)
curl -s http://localhost:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-8b-instruct",
    "messages": [{"role": "user", "content": "Explain eBPF in 3 sentences"}],
    "temperature": 0.7,
    "max_tokens": 200
  }' | jq '.choices[0].message.content'

# List available models
curl -s http://localhost:1337/v1/models | jq '.data[] | .id'

# Embeddings
curl -s http://localhost:1337/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "nomic-embed-text", "input": "web scraping best practices"}' | jq '.data[0].embedding[:5]'
Enter fullscreen mode Exit fullscreen mode

5. Use with OpenAI Python SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1337/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.1-8b-instruct",
    messages=[{"role": "user", "content": "Write a Python function to validate email addresses"}],
    temperature=0.3
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

6. Streaming

stream = client.chat.completions.create(
    model="llama3.1-8b-instruct",
    messages=[{"role": "user", "content": "List 5 web scraping best practices"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Enter fullscreen mode Exit fullscreen mode

Key Endpoints

Endpoint Description
/v1/chat/completions Chat with AI model
/v1/completions Text completion
/v1/models List available models
/v1/embeddings Generate text embeddings

Jan vs Alternatives

Feature Jan Ollama LM Studio
Desktop UI Yes No (CLI) Yes
OpenAI API Yes Yes Yes
Extensions Yes No No
Open source Yes Yes No
GPU support NVIDIA/AMD/Apple NVIDIA/AMD/Apple NVIDIA/Apple

Need custom data extraction or scraping solution? I build production-grade scrapers for any website. Email: Spinov001@gmail.com | My Apify Actors

Top comments (0)