DEV Community

Cover image for How to Use the NanoGPT API with Python.
noxlie
noxlie

Posted on • Originally published at ai-privacy-tools.vercel.app

How to Use the NanoGPT API with Python.

Before you gain knowledge: This is my first post on here and I am trying to leave a good first impression, I would be more than happy to get reviews on this and what could be changed or optimised to make good tutorials on here! :)
Happy reading!

How to Use the NanoGPT API with Python

If you've been looking for a private, OpenAI-compatible API that doesn't hoard your prompts for training data, NanoGPT is worth checking out. It speaks the same language as OpenAI's API, meaning most of your existing code works with minimal changes, but your data stays yours.

In this guide, I'll walk you through everything: installation, auth, making basic requests, streaming responses, and handling errors properly. By the end, you'll have a working Python client you can drop into any project.

Why NanoGPT?

Before we write any code, let's talk about why you'd pick this over just hitting OpenAI directly:

  • Privacy-first: Your prompts and completions aren't used for model training
  • OpenAI-compatible: Drop-in replacement for most tools and libraries
  • Model variety: Access to models like MiniMax M2.7 and others without managing infrastructure
  • Simple pricing: Pay-per-token, no enterprise contracts required

If you're building something where user data privacy matters — and honestly, when doesn't it? this is a solid choice. For more context on private AI tools, check out ai-privacy-tools.vercel.app.

Installation

Start by installing the official nanogpt package:

pip install nanogpt
Enter fullscreen mode Exit fullscreen mode

Or, if you prefer working with raw HTTP (no judgment — sometimes you want to see exactly what's going on), you can use requests or httpx instead. The API is standard REST + SSE, so any HTTP client works.

If you want the full OpenAI SDK experience, install the OpenAI package and point it at NanoGPT's base URL:

pip install openai
Enter fullscreen mode Exit fullscreen mode

Both approaches work. I'll show you both below.

Authentication

Grab your API key from nano-gpt.com and set it as an environment variable:

export NANOGPT_API_KEY="your-api-key-here"
Enter fullscreen mode Exit fullscreen mode

Never hardcode your API key in source code. Seriously. I've seen production repos with API keys in them and it's always a bad day.

Basic Chat Completion

Let's start with a simple request using the requests library:

import requests
import json

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = os.environ.get("NANOGPT_API_KEY")

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "model": "minimax/minimax-m2.7",
    "messages": [
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to flatten a nested list."}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
}

response = requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
data = response.json()

print(data["choices"][0]["message"]["content"])
Enter fullscreen mode Exit fullscreen mode

That's it. If you've worked with OpenAI's API before, this should look completely familiar. Same endpoint structure, same request/response format.

Using the OpenAI SDK

Here's the same thing using the official OpenAI Python SDK pointed at NanoGPT:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("NANOGPT_API_KEY"),
    base_url="https://nano-gpt.com/api/v1"
)

response = client.chat.completions.create(
    model="minimax/minimax-m2.7",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to flatten a nested list."}
    ],
    temperature=0.7,
    max_tokens=1024
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

This approach is great because any existing code that uses the OpenAI SDK can be migrated to NanoGPT by just changing two lines the API key and the base URL.

Streaming Responses

For longer responses, streaming makes a huge difference in perceived performance. Here's how to do it:

import requests
import json

BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = os.environ.get("NANOGPT_API_KEY")

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
    "Accept": "text/event-stream"
}

payload = {
    "model": "minimax/minimax-m2.7",
    "messages": [
        {"role": "user", "content": "Explain the difference between TCP and UDP."}
    ],
    "stream": True,
    "max_tokens": 1024
}

response = requests.post(
    f"{BASE_URL}/chat/completions",
    headers=headers,
    json=payload,
    stream=True
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: "):
            chunk = line[6:]
            if chunk.strip() == "[DONE]":
                break
            data = json.loads(chunk)
            delta = data["choices"][0]["delta"]
            if "content" in delta:
                print(delta["content"], end="", flush=True)

print()  # Newline after streaming
Enter fullscreen mode Exit fullscreen mode

The key things here: set "stream": True in the payload, add "Accept": "text/event-stream" to headers, and use requests.post(..., stream=True) so it doesn't buffer the entire response. Then iterate over lines and parse the SSE chunks.

With the OpenAI SDK, streaming is even simpler:

stream = client.chat.completions.create(
    model="minimax/minimax-m2.7",
    messages=[{"role": "user", "content": "Explain TCP vs UDP."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Error Handling

Don't skip this part. Here's a practical error handling wrapper you can actually use in production:

import requests
from requests.exceptions import RequestException
import time

class NanoGPTClient:
    def __init__(self, api_key, base_url="https://nano-gpt.com/api/v1"):
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def chat(self, messages, model="minimax/minimax-m2.7", max_retries=3, **kwargs):
        payload = {
            "model": model,
            "messages": messages,
            "max_tokens": kwargs.get("max_tokens", 1024),
            "temperature": kwargs.get("temperature", 0.7)
        }

        for attempt in range(max_retries):
            try:
                response = requests.post(
                    f"{self.base_url}/chat/completions",
                    headers=self.headers,
                    json=payload,
                    timeout=30
                )

                if response.status_code == 200:
                    return response.json()["choices"][0]["message"]["content"]

                if response.status_code == 429:
                    retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
                    print(f"Rate limited. Retrying in {retry_after}s...")
                    time.sleep(retry_after)
                    continue

                if response.status_code == 401:
                    raise ValueError("Invalid API key. Check your NANOGPT_API_KEY.")

                response.raise_for_status()

            except RequestException as e:
                if attempt == max_retries - 1:
                    raise
                print(f"Request failed: {e}. Retrying ({attempt + 1}/{max_retries})...")
                time.sleep(2 ** attempt)

        raise RuntimeError("Max retries exceeded.")
Enter fullscreen mode Exit fullscreen mode

This handles the common pain points:

  • Rate limiting (429): Respects the Retry-After header with exponential backoff
  • Auth errors (401): Gives you a clear message instead of a cryptic stack trace
  • Network failures: Retries with exponential backoff
  • Timeouts: Configurable via the timeout parameter

Putting It All Together

Here's a complete example that ties everything together — a simple CLI chatbot:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("NANOGPT_API_KEY"),
    base_url="https://nano-gpt.com/api/v1"
)

def chat():
    messages = [{"role": "system", "content": "You are a helpful assistant. Be concise."}]

    print("NanoGPT Chat (type 'quit' to exit)\n")

    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ("quit", "exit"):
            break

        messages.append({"role": "user", "content": user_input})

        stream = client.chat.completions.create(
            model="minimax/minimax-m2.7",
            messages=messages,
            stream=True
        )

        print("AI: ", end="")
        assistant_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content:
                text = chunk.choices[0].delta.content
                print(text, end="", flush=True)
                assistant_response += text
        print("\n")

        messages.append({"role": "assistant", "content": assistant_response})

if __name__ == "__main__":
    chat()
Enter fullscreen mode Exit fullscreen mode

Quick Reference

What Value
Base URL https://nano-gpt.com/api/v1
Auth header Authorization: Bearer YOUR_KEY
Chat endpoint /chat/completions
Default model minimax/minimax-m2.7
Streaming Set "stream": true in payload

Wrapping Up

If you were already using OpenAI's Python SDK, migrating to NanoGPT is genuinely a two-line change. If you're starting fresh, you get a clean API that respects your privacy out of the box.

The real win here is that you're not locked into a single provider. Since NanoGPT is OpenAI-compatible, you can swap between providers without rewriting your application code. That's the kind of flexibility worth building on.

Got questions or hit a snag? Drop a comment below. Happy coding.

Top comments (0)