How to Use the NanoGPT API with Python — A Developers Guide

#python #ai #api #privacy

How to Use the NanoGPT API with Python — A Developer's Guide

If you've been looking for a private, OpenAI-compatible API that doesn't hoard your prompts for training data, NanoGPT is worth checking out. It speaks the same language as OpenAI's API — meaning most of your existing code works with minimal changes — but your data stays yours.

In this guide, I'll walk you through everything: installation, auth, making basic requests, streaming responses, and handling errors properly. By the end, you'll have a working Python client you can drop into any project.

Why NanoGPT?

Before we write any code, let's talk about why you'd pick this over just hitting OpenAI directly:

Privacy-first: Your prompts and completions aren't used for model training
OpenAI-compatible: Drop-in replacement for most tools and libraries
Model variety: Access to models like MiniMax M2.7 and others without managing infrastructure
Simple pricing: Pay-per-token, no enterprise contracts required

If you're building something where user data privacy matters — and honestly, when doesn't it? — this is a solid choice. For more context on private AI tools, check out ai-privacy-tools.vercel.app.

Installation

Start by installing the official nanogpt package:

pip install nanogpt

Or, if you prefer working with raw HTTP (no judgment — sometimes you want to see exactly what's going on), you can use requests or httpx instead. The API is standard REST + SSE, so any HTTP client works.

If you want the full OpenAI SDK experience, install the OpenAI package and point it at NanoGPT's base URL:

pip install openai

Both approaches work. I'll show you both below.

Authentication

Grab your API key from nano-gpt.com and set it as an environment variable:

export NANOGPT_API_KEY="your-api-key-here"

Never hardcode your API key in source code. Seriously. I've seen production repos with API keys in them and it's always a bad day.

Basic Chat Completion

Let's start with a simple request using the requests library:

import requests
import json

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = os.environ.get("NANOGPT_API_KEY")

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "model": "minimax/minimax-m2.7",
    "messages": [
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to flatten a nested list."}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
}

response = requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
data = response.json()

print(data["choices"][0]["message"]["content"])

That's it. If you've worked with OpenAI's API before, this should look completely familiar. Same endpoint structure, same request/response format.

Using the OpenAI SDK

Here's the same thing using the official OpenAI Python SDK pointed at NanoGPT:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("NANOGPT_API_KEY"),
    base_url="https://nano-gpt.com/api/v1"
)

response = client.chat.completions.create(
    model="minimax/minimax-m2.7",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to flatten a nested list."}
    ],
    temperature=0.7,
    max_tokens=1024
)

print(response.choices[0].message.content)

This approach is great because any existing code that uses the OpenAI SDK can be migrated to NanoGPT by just changing two lines — the API key and the base URL.

Streaming Responses

For longer responses, streaming makes a huge difference in perceived performance. Here's how to do it:

import requests
import json

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = os.environ.get("NANOGPT_API_KEY")

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
    "Accept": "text/event-stream"
}

payload = {
    "model": "minimax/minimax-m2.7",
    "messages": [
        {"role": "user", "content": "Explain the difference between TCP and UDP."}
    ],
    "stream": True,
    "max_tokens": 1024
}

response = requests.post(
    f"{BASE_URL}/chat/completions",
    headers=headers,
    json=payload,
    stream=True
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: "):
            chunk = line[6:]
            if chunk.strip() == "[DONE]":
                break
            data = json.loads(chunk)
            delta = data["choices"][0]["delta"]
            if "content" in delta:
                print(delta["content"], end="", flush=True)

print()  # Newline after streaming

The key things here: set "stream": True in the payload, add "Accept": "text/event-stream" to headers, and use requests.post(..., stream=True) so it doesn't buffer the entire response. Then iterate over lines and parse the SSE chunks.

With the OpenAI SDK, streaming is even simpler:

stream = client.chat.completions.create(
    model="minimax/minimax-m2.7",
    messages=[{"role": "user", "content": "Explain TCP vs UDP."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Error Handling

Don't skip this part. Here's a practical error handling wrapper you can actually use in production:

import requests
from requests.exceptions import RequestException
import time

class NanoGPTClient:
    def __init__(self, api_key, base_url="https://nano-gpt.com/api/v1"):
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def chat(self, messages, model="minimax/minimax-m2.7", max_retries=3, **kwargs):
        payload = {
            "model": model,
            "messages": messages,
            "max_tokens": kwargs.get("max_tokens", 1024),
            "temperature": kwargs.get("temperature", 0.7)
        }

        for attempt in range(max_retries):
            try:
                response = requests.post(
                    f"{self.base_url}/chat/completions",
                    headers=self.headers,
                    json=payload,
                    timeout=30
                )

                if response.status_code == 200:
                    return response.json()["choices"][0]["message"]["content"]

                if response.status_code == 429:
                    retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
                    print(f"Rate limited. Retrying in {retry_after}s...")
                    time.sleep(retry_after)
                    continue

                if response.status_code == 401:
                    raise ValueError("Invalid API key. Check your NANOGPT_API_KEY.")

                response.raise_for_status()

            except RequestException as e:
                if attempt == max_retries - 1:
                    raise
                print(f"Request failed: {e}. Retrying ({attempt + 1}/{max_retries})...")
                time.sleep(2 ** attempt)

        raise RuntimeError("Max retries exceeded.")

This handles the common pain points:

Rate limiting (429): Respects the Retry-After header with exponential backoff
Auth errors (401): Gives you a clear message instead of a cryptic stack trace
Network failures: Retries with exponential backoff
Timeouts: Configurable via the timeout parameter

Putting It All Together

Here's a complete example that ties everything together — a simple CLI chatbot:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("NANOGPT_API_KEY"),
    base_url="https://nano-gpt.com/api/v1"
)

def chat():
    messages = [{"role": "system", "content": "You are a helpful assistant. Be concise."}]

    print("NanoGPT Chat (type 'quit' to exit)\n")

    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ("quit", "exit"):
            break

        messages.append({"role": "user", "content": user_input})

        stream = client.chat.completions.create(
            model="minimax/minimax-m2.7",
            messages=messages,
            stream=True
        )

        print("AI: ", end="")
        assistant_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content:
                text = chunk.choices[0].delta.content
                print(text, end="", flush=True)
                assistant_response += text
        print("\n")

        messages.append({"role": "assistant", "content": assistant_response})

if __name__ == "__main__":
    chat()

Quick Reference

What	Value
Base URL	`https://nano-gpt.com/api/v1`
Auth header	`Authorization: Bearer YOUR_KEY`
Chat endpoint	`/chat/completions`
Default model	`minimax/minimax-m2.7`
Streaming	Set `"stream": true` in payload

Wrapping Up

If you were already using OpenAI's Python SDK, migrating to NanoGPT is genuinely a two-line change. If you're starting fresh, you get a clean API that respects your privacy out of the box.

The real win here is that you're not locked into a single provider. Since NanoGPT is OpenAI-compatible, you can swap between providers without rewriting your application code. That's the kind of flexibility worth building on.

Got questions or hit a snag? Drop a comment below. Happy coding.

Originally published at ai-privacy-tools.vercel.app