Alex Spinov

Posted on Mar 28

Jan.ai Has a Free API — Run AI Models Locally with OpenAI-Compatible Endpoints

#ai #openai #api #tutorial

Jan is a free, open-source desktop app that lets you run AI models locally on your machine with an OpenAI-compatible API. That means any code using the OpenAI SDK works with Jan — just change the base URL.

No API keys. No usage fees. No data leaving your machine.

Why Use Jan?

100% local — your data never leaves your computer
OpenAI-compatible — drop-in replacement for OpenAI API
GPU accelerated — uses your NVIDIA/AMD/Apple Silicon GPU
Model hub — download Llama, Mistral, Phi, Gemma with one click
Free forever — no subscription, no token limits

Quick Setup

1. Install Jan

Download from jan.ai for Mac, Windows, or Linux. Or via CLI:

# macOS
brew install --cask jan

# Or download AppImage for Linux
wget https://github.com/janhq/jan/releases/latest/download/jan-linux-x86_64.AppImage
chmod +x jan-linux-x86_64.AppImage

2. Download a Model

In Jan UI: Hub → Search for model → Download

Popular choices:

Llama 3.1 8B — great general-purpose (needs 8GB RAM)
Mistral 7B — fast and capable
Phi-3 Mini — Microsoft's small but powerful model (4GB RAM)

3. Start the API Server

In Jan: Settings → Advanced → Enable API Server

Default endpoint: http://localhost:1337/v1

4. Use with curl

# Chat completion (OpenAI-compatible)
curl -s http://localhost:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-8b-instruct",
    "messages": [{"role": "user", "content": "Explain eBPF in 3 sentences"}],
    "temperature": 0.7,
    "max_tokens": 200
  }' | jq '.choices[0].message.content'

# List available models
curl -s http://localhost:1337/v1/models | jq '.data[] | .id'

# Embeddings
curl -s http://localhost:1337/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "nomic-embed-text", "input": "web scraping best practices"}' | jq '.data[0].embedding[:5]'

5. Use with OpenAI Python SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1337/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.1-8b-instruct",
    messages=[{"role": "user", "content": "Write a Python function to validate email addresses"}],
    temperature=0.3
)

print(response.choices[0].message.content)

6. Streaming

stream = client.chat.completions.create(
    model="llama3.1-8b-instruct",
    messages=[{"role": "user", "content": "List 5 web scraping best practices"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Key Endpoints

Endpoint	Description
/v1/chat/completions	Chat with AI model
/v1/completions	Text completion
/v1/models	List available models
/v1/embeddings	Generate text embeddings

Jan vs Alternatives

Feature	Jan	Ollama	LM Studio
Desktop UI	Yes	No (CLI)	Yes
OpenAI API	Yes	Yes	Yes
Extensions	Yes	No	No
Open source	Yes	Yes	No
GPU support	NVIDIA/AMD/Apple	NVIDIA/AMD/Apple	NVIDIA/Apple

Need custom data extraction or scraping solution? I build production-grade scrapers for any website. Email: Spinov001@gmail.com | My Apify Actors

DEV Community