DEV Community

Cover image for GitHub Models: Free GPT-4o and Llama API for Every Developer
toolfreebie
toolfreebie

Posted on • Originally published at toolfreebie.com

GitHub Models: Free GPT-4o and Llama API for Every Developer

What Is GitHub Models?

GitHub Models gives every developer with a GitHub account free access to top AI models — including GPT-4o, GPT-4o mini, Llama 3.3, Phi-4, Mistral, and more — through a standard OpenAI-compatible API. No credit card, no new account signup: you just use your existing GitHub personal access token.

Launched in 2024 and now generally available, GitHub Models is built into the platform 100 million developers already use every day. Whether you’re testing a new idea, building a coding assistant, or running experiments, you’re a single API call away from production-grade AI models.

Available Free Models

GitHub Models hosts a curated list of frontier models from multiple providers:

Model Provider Context Window Best For
gpt-4o OpenAI 128K tokens Complex reasoning, general use
gpt-4o-mini OpenAI 128K tokens Fast, low-cost tasks
o1-mini OpenAI 128K tokens Math, coding, reasoning chains
Llama-3.3-70B-Instruct Meta 128K tokens Open-source, high quality
Phi-4 Microsoft 16K tokens Lightweight, on-device use cases
Mistral-small Mistral AI 128K tokens Multilingual, EU data residency
Cohere Command R+ Cohere 128K tokens RAG, enterprise search
AI21 Jamba 1.5 AI21 Labs 256K tokens Long documents, summarization

The model list grows as GitHub adds new providers. You can see the full current catalog in the GitHub Marketplace Models section.

Free Tier Rate Limits

GitHub Models uses a tiered rate limit system based on your GitHub plan:

Tier Requests/Min (Low) Requests/Day (Low) Requests/Min (High) Requests/Day (High)
Free account 15 150 5 50
Copilot Free 15 150 5 50
Copilot Pro 30 1,000 10 180
Copilot Business/Enterprise 50 5,000 16 600

Low-tier models (like gpt-4o-mini and Llama-3.3-70B) have higher rate limits than high-tier models (gpt-4o, o1-mini). For prototyping and personal projects, the free tier is more than adequate.

How to Get Started in 2 Minutes

  1. Go to github.com/settings/tokens
  2. Click “Generate new token (classic)”
  3. Give it a name, set an expiration, and check no scopes — GitHub Models only requires a valid token, no special permissions
  4. Copy your token

That’s it. No API dashboard, no payment method, no waitlist. Use your GitHub token directly as your API key.

Making Your First API Call

Python (using the OpenAI SDK)

pip install openai
Enter fullscreen mode Exit fullscreen mode
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GITHUB_TOKEN",
    base_url="https://models.inference.ai.azure.com"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Explain the difference between async and threading in Python"}
    ]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

The base_url points to Azure’s inference endpoint, which GitHub Models uses under the hood. Your GitHub token authenticates the request transparently.

Switching Models

Changing models is as simple as swapping the model string:

model="gpt-4o"            # GPT-4o (OpenAI)
model="gpt-4o-mini"       # GPT-4o Mini (faster, cheaper limits)
model="Meta-Llama-3.3-70B-Instruct"  # Llama 3.3 70B (Meta)
model="Phi-4"             # Phi-4 (Microsoft)
model="Mistral-small"     # Mistral Small
Enter fullscreen mode Exit fullscreen mode

Streaming Responses

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GITHUB_TOKEN",
    base_url="https://models.inference.ai.azure.com"
)

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a Python script to parse JSON from a REST API"}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Multimodal: Analyze Images with GPT-4o

GitHub Models includes GPT-4o’s vision capabilities. Analyze screenshots, diagrams, or any image file:

import base64
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GITHUB_TOKEN",
    base_url="https://models.inference.ai.azure.com"
)

with open("diagram.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_data}"}
                },
                {"type": "text", "text": "What does this architecture diagram show?"}
            ]
        }
    ]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

JavaScript / Node.js

npm install openai
Enter fullscreen mode Exit fullscreen mode
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GITHUB_TOKEN,
  baseURL: "https://models.inference.ai.azure.com"
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Review this code for security issues" }]
});

console.log(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

Using the GitHub SDK (Optional)

GitHub also provides a first-party SDK with full TypeScript types:

npm install @octokit/core
Enter fullscreen mode Exit fullscreen mode
import ModelClient from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
  "https://models.inference.ai.azure.com",
  new AzureKeyCredential(process.env.GITHUB_TOKEN)
);

const response = await client.path("/chat/completions").post({
  body: {
    model: "Meta-Llama-3.3-70B-Instruct",
    messages: [{ role: "user", content: "Summarize the key points of the CAP theorem" }]
  }
});

console.log(response.body.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

Connect GitHub Models to OpenClaw

You can use GitHub Models as the backend for a free AI agent via OpenClaw. Since the endpoint is fully OpenAI-compatible, the setup takes about a minute.

Quick Setup

npm install -g openclaw@latest
openclaw onboard
Enter fullscreen mode Exit fullscreen mode

When prompted, select Custom / OpenAI-compatible provider and enter:

  • Base URL: https://models.inference.ai.azure.com
  • API Key: your GitHub personal access token
  • Model: gpt-4o or Meta-Llama-3.3-70B-Instruct

Manual Configuration

Edit ~/.openclaw/openclaw.json:

{
  "models": {
    "mode": "merge",
    "providers": {
      "github-models": {
        "baseUrl": "https://models.inference.ai.azure.com",
        "apiKey": "YOUR_GITHUB_TOKEN",
        "api": "openai-completions",
        "models": [
          {
            "id": "gpt-4o",
            "name": "GPT-4o (GitHub Models)",
            "reasoning": false,
            "input": ["text", "image"],
            "contextWindow": 128000,
            "maxTokens": 4096
          },
          {
            "id": "Meta-Llama-3.3-70B-Instruct",
            "name": "Llama 3.3 70B (GitHub Models)",
            "reasoning": false,
            "input": ["text"],
            "contextWindow": 128000,
            "maxTokens": 4096
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "github-models/gpt-4o"
      },
      "models": {
        "github-models/gpt-4o": {}
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

With this, you get a free AI agent powered by GPT-4o — the same model behind ChatGPT Plus — using nothing more than your existing GitHub account.

GitHub Models vs Other Free AI APIs

Feature GitHub Models Google Gemini Groq OpenRouter
GPT-4o Access Yes (free) No No Limited (paid)
Signup Required No (uses GitHub) Google account New account New account
Speed ~100 tokens/s ~100 tokens/s 300–800 tokens/s Varies by model
Free Daily Requests 150–5,000 100–1,500 14,400 ~200 (free models)
Vision Support Yes (GPT-4o) Yes Limited Yes (select models)
Model Variety 15+ curated Gemini family 16+ Llama/Mistral 300+ models
OpenAI Compatible Yes Yes Yes Yes
Best For Access to GPT-4o free Long context tasks Real-time speed Model variety

Practical Use Cases

  • GitHub Actions automation: Use your existing GITHUB_TOKEN in CI/CD pipelines to add AI-powered code review, changelog generation, or PR labeling — no additional credentials needed
  • VS Code extensions: Build Copilot-like coding assistants that use GPT-4o via GitHub Models without paying for the OpenAI API
  • Code review bots: Self-hosted bots that analyze pull requests using GPT-4o and leave detailed comments automatically
  • Documentation generators: Parse your codebase and generate README files, API docs, or changelogs
  • RAG prototypes: Combine Cohere Command R+ (available in GitHub Models) with a vector database to test retrieval-augmented generation at zero cost
  • LLM benchmarking: Compare GPT-4o vs Llama 3.3 70B vs Phi-4 on your specific tasks without setting up multiple API accounts

Limitations to Keep in Mind

  • Rate limits are lower than dedicated providers: At 150 requests/day on the free tier, GitHub Models is better for development than high-volume production workloads
  • No fine-tuning: You can’t train or customize models — inference only
  • Powered by Azure: Requests go through Azure’s infrastructure, which may matter for data residency in certain jurisdictions
  • Model availability changes: The catalog is curated by GitHub and may change — check the Marketplace for the current list
  • Token limits per request: Output is typically capped at 4,096 tokens per completion even on models with larger context windows

Related Reads

Final Thoughts

GitHub Models is the most developer-friendly free AI API available today. There’s no simpler path to GPT-4o access: if you have a GitHub account, you already have everything you need. The OpenAI-compatible endpoint means any existing code or tool that works with ChatGPT’s API works here with a one-line change.

It’s not the fastest (that’s Groq) or the most generous in daily volume (that’s Groq or Gemini), but for developers who want GPT-4o without a credit card, or who want to mix and match models like Llama, Phi, and Mistral from a single endpoint, GitHub Models is unmatched.

Start with github.com/marketplace/models, grab a token at github.com/settings/tokens, and you’re making GPT-4o calls in under 2 minutes.


Originally published at toolfreebie.com.

Top comments (0)