toolfreebie

Posted on May 3 • Originally published at toolfreebie.com

GitHub Models: Free GPT-4o and Llama API for Every Developer

#ai #api #opensource

What Is GitHub Models?

GitHub Models gives every developer with a GitHub account free access to top AI models — including GPT-4o, GPT-4o mini, Llama 3.3, Phi-4, Mistral, and more — through a standard OpenAI-compatible API. No credit card, no new account signup: you just use your existing GitHub personal access token.

Launched in 2024 and now generally available, GitHub Models is built into the platform 100 million developers already use every day. Whether you’re testing a new idea, building a coding assistant, or running experiments, you’re a single API call away from production-grade AI models.

Available Free Models

GitHub Models hosts a curated list of frontier models from multiple providers:

Model	Provider	Context Window	Best For
gpt-4o	OpenAI	128K tokens	Complex reasoning, general use
gpt-4o-mini	OpenAI	128K tokens	Fast, low-cost tasks
o1-mini	OpenAI	128K tokens	Math, coding, reasoning chains
Llama-3.3-70B-Instruct	Meta	128K tokens	Open-source, high quality
Phi-4	Microsoft	16K tokens	Lightweight, on-device use cases
Mistral-small	Mistral AI	128K tokens	Multilingual, EU data residency
Cohere Command R+	Cohere	128K tokens	RAG, enterprise search
AI21 Jamba 1.5	AI21 Labs	256K tokens	Long documents, summarization

The model list grows as GitHub adds new providers. You can see the full current catalog in the GitHub Marketplace Models section.

Free Tier Rate Limits

GitHub Models uses a tiered rate limit system based on your GitHub plan:

Tier	Requests/Min (Low)	Requests/Day (Low)	Requests/Min (High)	Requests/Day (High)
Free account	15	150	5	50
Copilot Free	15	150	5	50
Copilot Pro	30	1,000	10	180
Copilot Business/Enterprise	50	5,000	16	600

Low-tier models (like gpt-4o-mini and Llama-3.3-70B) have higher rate limits than high-tier models (gpt-4o, o1-mini). For prototyping and personal projects, the free tier is more than adequate.

How to Get Started in 2 Minutes

Go to github.com/settings/tokens
Click “Generate new token (classic)”
Give it a name, set an expiration, and check no scopes — GitHub Models only requires a valid token, no special permissions
Copy your token

That’s it. No API dashboard, no payment method, no waitlist. Use your GitHub token directly as your API key.

Making Your First API Call

Python (using the OpenAI SDK)

pip install openai

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GITHUB_TOKEN",
    base_url="https://models.inference.ai.azure.com"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Explain the difference between async and threading in Python"}
    ]
)

print(response.choices[0].message.content)

The base_url points to Azure’s inference endpoint, which GitHub Models uses under the hood. Your GitHub token authenticates the request transparently.

Switching Models

Changing models is as simple as swapping the model string:

model="gpt-4o"            # GPT-4o (OpenAI)
model="gpt-4o-mini"       # GPT-4o Mini (faster, cheaper limits)
model="Meta-Llama-3.3-70B-Instruct"  # Llama 3.3 70B (Meta)
model="Phi-4"             # Phi-4 (Microsoft)
model="Mistral-small"     # Mistral Small

Streaming Responses

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GITHUB_TOKEN",
    base_url="https://models.inference.ai.azure.com"
)

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a Python script to parse JSON from a REST API"}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Multimodal: Analyze Images with GPT-4o

GitHub Models includes GPT-4o’s vision capabilities. Analyze screenshots, diagrams, or any image file:

import base64
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GITHUB_TOKEN",
    base_url="https://models.inference.ai.azure.com"
)

with open("diagram.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_data}"}
                },
                {"type": "text", "text": "What does this architecture diagram show?"}
            ]
        }
    ]
)

print(response.choices[0].message.content)

JavaScript / Node.js

npm install openai

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GITHUB_TOKEN,
  baseURL: "https://models.inference.ai.azure.com"
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Review this code for security issues" }]
});

console.log(response.choices[0].message.content);

Using the GitHub SDK (Optional)

GitHub also provides a first-party SDK with full TypeScript types:

npm install @octokit/core

import ModelClient from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
  "https://models.inference.ai.azure.com",
  new AzureKeyCredential(process.env.GITHUB_TOKEN)
);

const response = await client.path("/chat/completions").post({
  body: {
    model: "Meta-Llama-3.3-70B-Instruct",
    messages: [{ role: "user", content: "Summarize the key points of the CAP theorem" }]
  }
});

console.log(response.body.choices[0].message.content);

Connect GitHub Models to OpenClaw

You can use GitHub Models as the backend for a free AI agent via OpenClaw. Since the endpoint is fully OpenAI-compatible, the setup takes about a minute.

Quick Setup

npm install -g openclaw@latest
openclaw onboard

When prompted, select Custom / OpenAI-compatible provider and enter:

Base URL: https://models.inference.ai.azure.com
API Key: your GitHub personal access token
Model: gpt-4o or Meta-Llama-3.3-70B-Instruct

Manual Configuration

Edit ~/.openclaw/openclaw.json:

{
  "models": {
    "mode": "merge",
    "providers": {
      "github-models": {
        "baseUrl": "https://models.inference.ai.azure.com",
        "apiKey": "YOUR_GITHUB_TOKEN",
        "api": "openai-completions",
        "models": [
          {
            "id": "gpt-4o",
            "name": "GPT-4o (GitHub Models)",
            "reasoning": false,
            "input": ["text", "image"],
            "contextWindow": 128000,
            "maxTokens": 4096
          },
          {
            "id": "Meta-Llama-3.3-70B-Instruct",
            "name": "Llama 3.3 70B (GitHub Models)",
            "reasoning": false,
            "input": ["text"],
            "contextWindow": 128000,
            "maxTokens": 4096
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "github-models/gpt-4o"
      },
      "models": {
        "github-models/gpt-4o": {}
      }
    }
  }
}

With this, you get a free AI agent powered by GPT-4o — the same model behind ChatGPT Plus — using nothing more than your existing GitHub account.

GitHub Models vs Other Free AI APIs

Feature	GitHub Models	Google Gemini	Groq	OpenRouter
GPT-4o Access	Yes (free)	No	No	Limited (paid)
Signup Required	No (uses GitHub)	Google account	New account	New account
Speed	~100 tokens/s	~100 tokens/s	300–800 tokens/s	Varies by model
Free Daily Requests	150–5,000	100–1,500	14,400	~200 (free models)
Vision Support	Yes (GPT-4o)	Yes	Limited	Yes (select models)
Model Variety	15+ curated	Gemini family	16+ Llama/Mistral	300+ models
OpenAI Compatible	Yes	Yes	Yes	Yes
Best For	Access to GPT-4o free	Long context tasks	Real-time speed	Model variety

Practical Use Cases

GitHub Actions automation: Use your existing GITHUB_TOKEN in CI/CD pipelines to add AI-powered code review, changelog generation, or PR labeling — no additional credentials needed
VS Code extensions: Build Copilot-like coding assistants that use GPT-4o via GitHub Models without paying for the OpenAI API
Code review bots: Self-hosted bots that analyze pull requests using GPT-4o and leave detailed comments automatically
Documentation generators: Parse your codebase and generate README files, API docs, or changelogs
RAG prototypes: Combine Cohere Command R+ (available in GitHub Models) with a vector database to test retrieval-augmented generation at zero cost
LLM benchmarking: Compare GPT-4o vs Llama 3.3 70B vs Phi-4 on your specific tasks without setting up multiple API accounts

Limitations to Keep in Mind

Rate limits are lower than dedicated providers: At 150 requests/day on the free tier, GitHub Models is better for development than high-volume production workloads
No fine-tuning: You can’t train or customize models — inference only
Powered by Azure: Requests go through Azure’s infrastructure, which may matter for data residency in certain jurisdictions
Model availability changes: The catalog is curated by GitHub and may change — check the Marketplace for the current list
Token limits per request: Output is typically capped at 4,096 tokens per completion even on models with larger context windows

Final Thoughts

GitHub Models is the most developer-friendly free AI API available today. There’s no simpler path to GPT-4o access: if you have a GitHub account, you already have everything you need. The OpenAI-compatible endpoint means any existing code or tool that works with ChatGPT’s API works here with a one-line change.

It’s not the fastest (that’s Groq) or the most generous in daily volume (that’s Groq or Gemini), but for developers who want GPT-4o without a credit card, or who want to mix and match models like Llama, Phi, and Mistral from a single endpoint, GitHub Models is unmatched.

Start with github.com/marketplace/models, grab a token at github.com/settings/tokens, and you’re making GPT-4o calls in under 2 minutes.

Originally published at toolfreebie.com.

DEV Community