What Is GitHub Models?
GitHub Models gives every developer with a GitHub account free access to top AI models — including GPT-4o, GPT-4o mini, Llama 3.3, Phi-4, Mistral, and more — through a standard OpenAI-compatible API. No credit card, no new account signup: you just use your existing GitHub personal access token.
Launched in 2024 and now generally available, GitHub Models is built into the platform 100 million developers already use every day. Whether you’re testing a new idea, building a coding assistant, or running experiments, you’re a single API call away from production-grade AI models.
Available Free Models
GitHub Models hosts a curated list of frontier models from multiple providers:
| Model | Provider | Context Window | Best For |
|---|---|---|---|
| gpt-4o | OpenAI | 128K tokens | Complex reasoning, general use |
| gpt-4o-mini | OpenAI | 128K tokens | Fast, low-cost tasks |
| o1-mini | OpenAI | 128K tokens | Math, coding, reasoning chains |
| Llama-3.3-70B-Instruct | Meta | 128K tokens | Open-source, high quality |
| Phi-4 | Microsoft | 16K tokens | Lightweight, on-device use cases |
| Mistral-small | Mistral AI | 128K tokens | Multilingual, EU data residency |
| Cohere Command R+ | Cohere | 128K tokens | RAG, enterprise search |
| AI21 Jamba 1.5 | AI21 Labs | 256K tokens | Long documents, summarization |
The model list grows as GitHub adds new providers. You can see the full current catalog in the GitHub Marketplace Models section.
Free Tier Rate Limits
GitHub Models uses a tiered rate limit system based on your GitHub plan:
| Tier | Requests/Min (Low) | Requests/Day (Low) | Requests/Min (High) | Requests/Day (High) |
|---|---|---|---|---|
| Free account | 15 | 150 | 5 | 50 |
| Copilot Free | 15 | 150 | 5 | 50 |
| Copilot Pro | 30 | 1,000 | 10 | 180 |
| Copilot Business/Enterprise | 50 | 5,000 | 16 | 600 |
Low-tier models (like gpt-4o-mini and Llama-3.3-70B) have higher rate limits than high-tier models (gpt-4o, o1-mini). For prototyping and personal projects, the free tier is more than adequate.
How to Get Started in 2 Minutes
- Go to github.com/settings/tokens
- Click “Generate new token (classic)”
- Give it a name, set an expiration, and check no scopes — GitHub Models only requires a valid token, no special permissions
- Copy your token
That’s it. No API dashboard, no payment method, no waitlist. Use your GitHub token directly as your API key.
Making Your First API Call
Python (using the OpenAI SDK)
pip install openai
from openai import OpenAI
client = OpenAI(
api_key="YOUR_GITHUB_TOKEN",
base_url="https://models.inference.ai.azure.com"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Explain the difference between async and threading in Python"}
]
)
print(response.choices[0].message.content)
The base_url points to Azure’s inference endpoint, which GitHub Models uses under the hood. Your GitHub token authenticates the request transparently.
Switching Models
Changing models is as simple as swapping the model string:
model="gpt-4o" # GPT-4o (OpenAI)
model="gpt-4o-mini" # GPT-4o Mini (faster, cheaper limits)
model="Meta-Llama-3.3-70B-Instruct" # Llama 3.3 70B (Meta)
model="Phi-4" # Phi-4 (Microsoft)
model="Mistral-small" # Mistral Small
Streaming Responses
from openai import OpenAI
client = OpenAI(
api_key="YOUR_GITHUB_TOKEN",
base_url="https://models.inference.ai.azure.com"
)
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a Python script to parse JSON from a REST API"}],
stream=True
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Multimodal: Analyze Images with GPT-4o
GitHub Models includes GPT-4o’s vision capabilities. Analyze screenshots, diagrams, or any image file:
import base64
from openai import OpenAI
client = OpenAI(
api_key="YOUR_GITHUB_TOKEN",
base_url="https://models.inference.ai.azure.com"
)
with open("diagram.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_data}"}
},
{"type": "text", "text": "What does this architecture diagram show?"}
]
}
]
)
print(response.choices[0].message.content)
JavaScript / Node.js
npm install openai
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.GITHUB_TOKEN,
baseURL: "https://models.inference.ai.azure.com"
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Review this code for security issues" }]
});
console.log(response.choices[0].message.content);
Using the GitHub SDK (Optional)
GitHub also provides a first-party SDK with full TypeScript types:
npm install @octokit/core
import ModelClient from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";
const client = new ModelClient(
"https://models.inference.ai.azure.com",
new AzureKeyCredential(process.env.GITHUB_TOKEN)
);
const response = await client.path("/chat/completions").post({
body: {
model: "Meta-Llama-3.3-70B-Instruct",
messages: [{ role: "user", content: "Summarize the key points of the CAP theorem" }]
}
});
console.log(response.body.choices[0].message.content);
Connect GitHub Models to OpenClaw
You can use GitHub Models as the backend for a free AI agent via OpenClaw. Since the endpoint is fully OpenAI-compatible, the setup takes about a minute.
Quick Setup
npm install -g openclaw@latest
openclaw onboard
When prompted, select Custom / OpenAI-compatible provider and enter:
-
Base URL:
https://models.inference.ai.azure.com - API Key: your GitHub personal access token
-
Model:
gpt-4oorMeta-Llama-3.3-70B-Instruct
Manual Configuration
Edit ~/.openclaw/openclaw.json:
{
"models": {
"mode": "merge",
"providers": {
"github-models": {
"baseUrl": "https://models.inference.ai.azure.com",
"apiKey": "YOUR_GITHUB_TOKEN",
"api": "openai-completions",
"models": [
{
"id": "gpt-4o",
"name": "GPT-4o (GitHub Models)",
"reasoning": false,
"input": ["text", "image"],
"contextWindow": 128000,
"maxTokens": 4096
},
{
"id": "Meta-Llama-3.3-70B-Instruct",
"name": "Llama 3.3 70B (GitHub Models)",
"reasoning": false,
"input": ["text"],
"contextWindow": 128000,
"maxTokens": 4096
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "github-models/gpt-4o"
},
"models": {
"github-models/gpt-4o": {}
}
}
}
}
With this, you get a free AI agent powered by GPT-4o — the same model behind ChatGPT Plus — using nothing more than your existing GitHub account.
GitHub Models vs Other Free AI APIs
| Feature | GitHub Models | Google Gemini | Groq | OpenRouter |
|---|---|---|---|---|
| GPT-4o Access | Yes (free) | No | No | Limited (paid) |
| Signup Required | No (uses GitHub) | Google account | New account | New account |
| Speed | ~100 tokens/s | ~100 tokens/s | 300–800 tokens/s | Varies by model |
| Free Daily Requests | 150–5,000 | 100–1,500 | 14,400 | ~200 (free models) |
| Vision Support | Yes (GPT-4o) | Yes | Limited | Yes (select models) |
| Model Variety | 15+ curated | Gemini family | 16+ Llama/Mistral | 300+ models |
| OpenAI Compatible | Yes | Yes | Yes | Yes |
| Best For | Access to GPT-4o free | Long context tasks | Real-time speed | Model variety |
Practical Use Cases
-
GitHub Actions automation: Use your existing
GITHUB_TOKENin CI/CD pipelines to add AI-powered code review, changelog generation, or PR labeling — no additional credentials needed - VS Code extensions: Build Copilot-like coding assistants that use GPT-4o via GitHub Models without paying for the OpenAI API
- Code review bots: Self-hosted bots that analyze pull requests using GPT-4o and leave detailed comments automatically
- Documentation generators: Parse your codebase and generate README files, API docs, or changelogs
- RAG prototypes: Combine Cohere Command R+ (available in GitHub Models) with a vector database to test retrieval-augmented generation at zero cost
- LLM benchmarking: Compare GPT-4o vs Llama 3.3 70B vs Phi-4 on your specific tasks without setting up multiple API accounts
Limitations to Keep in Mind
- Rate limits are lower than dedicated providers: At 150 requests/day on the free tier, GitHub Models is better for development than high-volume production workloads
- No fine-tuning: You can’t train or customize models — inference only
- Powered by Azure: Requests go through Azure’s infrastructure, which may matter for data residency in certain jurisdictions
- Model availability changes: The catalog is curated by GitHub and may change — check the Marketplace for the current list
- Token limits per request: Output is typically capped at 4,096 tokens per completion even on models with larger context windows
Related Reads
- Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026
- Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?
- Cerebras Inference API: The Fastest Free AI API You’ve Never Heard Of
- Mistral AI Free API: Call Nemo and Mixtral for Free with Any OpenAI SDK
- Cloudflare Workers AI: Free Edge AI Inference with 47+ Models
Final Thoughts
GitHub Models is the most developer-friendly free AI API available today. There’s no simpler path to GPT-4o access: if you have a GitHub account, you already have everything you need. The OpenAI-compatible endpoint means any existing code or tool that works with ChatGPT’s API works here with a one-line change.
It’s not the fastest (that’s Groq) or the most generous in daily volume (that’s Groq or Gemini), but for developers who want GPT-4o without a credit card, or who want to mix and match models like Llama, Phi, and Mistral from a single endpoint, GitHub Models is unmatched.
Start with github.com/marketplace/models, grab a token at github.com/settings/tokens, and you’re making GPT-4o calls in under 2 minutes.
Originally published at toolfreebie.com.
Top comments (0)