How to Use the NanoGPT API with Python — A Developer's Guide
If you've been looking for a private, OpenAI-compatible API that doesn't hoard your prompts for training data, NanoGPT is worth checking out. It speaks the same language as OpenAI's API — meaning most of your existing code works with minimal changes — but your data stays yours.
In this guide, I'll walk you through everything: installation, auth, making basic requests, streaming responses, and handling errors properly. By the end, you'll have a working Python client you can drop into any project.
Why NanoGPT?
Before we write any code, let's talk about why you'd pick this over just hitting OpenAI directly:
- Privacy-first: Your prompts and completions aren't used for model training
- OpenAI-compatible: Drop-in replacement for most tools and libraries
- Model variety: Access to models like MiniMax M2.7 and others without managing infrastructure
- Simple pricing: Pay-per-token, no enterprise contracts required
If you're building something where user data privacy matters — and honestly, when doesn't it? — this is a solid choice. For more context on private AI tools, check out ai-privacy-tools.vercel.app.
Installation
Start by installing the official nanogpt package:
pip install nanogpt
Or, if you prefer working with raw HTTP (no judgment — sometimes you want to see exactly what's going on), you can use requests or httpx instead. The API is standard REST + SSE, so any HTTP client works.
If you want the full OpenAI SDK experience, install the OpenAI package and point it at NanoGPT's base URL:
pip install openai
Both approaches work. I'll show you both below.
Authentication
Grab your API key from nano-gpt.com and set it as an environment variable:
export NANOGPT_API_KEY="your-api-key-here"
Never hardcode your API key in source code. Seriously. I've seen production repos with API keys in them and it's always a bad day.
Basic Chat Completion
Let's start with a simple request using the requests library:
import requests
import json
BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = os.environ.get("NANOGPT_API_KEY")
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
payload = {
"model": "minimax/minimax-m2.7",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to flatten a nested list."}
],
"temperature": 0.7,
"max_tokens": 1024
}
response = requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
data = response.json()
print(data["choices"][0]["message"]["content"])
That's it. If you've worked with OpenAI's API before, this should look completely familiar. Same endpoint structure, same request/response format.
Using the OpenAI SDK
Here's the same thing using the official OpenAI Python SDK pointed at NanoGPT:
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ.get("NANOGPT_API_KEY"),
base_url="https://nano-gpt.com/api/v1"
)
response = client.chat.completions.create(
model="minimax/minimax-m2.7",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to flatten a nested list."}
],
temperature=0.7,
max_tokens=1024
)
print(response.choices[0].message.content)
This approach is great because any existing code that uses the OpenAI SDK can be migrated to NanoGPT by just changing two lines — the API key and the base URL.
Streaming Responses
For longer responses, streaming makes a huge difference in perceived performance. Here's how to do it:
import requests
import json
BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = os.environ.get("NANOGPT_API_KEY")
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
"Accept": "text/event-stream"
}
payload = {
"model": "minimax/minimax-m2.7",
"messages": [
{"role": "user", "content": "Explain the difference between TCP and UDP."}
],
"stream": True,
"max_tokens": 1024
}
response = requests.post(
f"{BASE_URL}/chat/completions",
headers=headers,
json=payload,
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode("utf-8")
if line.startswith("data: "):
chunk = line[6:]
if chunk.strip() == "[DONE]":
break
data = json.loads(chunk)
delta = data["choices"][0]["delta"]
if "content" in delta:
print(delta["content"], end="", flush=True)
print() # Newline after streaming
The key things here: set "stream": True in the payload, add "Accept": "text/event-stream" to headers, and use requests.post(..., stream=True) so it doesn't buffer the entire response. Then iterate over lines and parse the SSE chunks.
With the OpenAI SDK, streaming is even simpler:
stream = client.chat.completions.create(
model="minimax/minimax-m2.7",
messages=[{"role": "user", "content": "Explain TCP vs UDP."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Error Handling
Don't skip this part. Here's a practical error handling wrapper you can actually use in production:
import requests
from requests.exceptions import RequestException
import time
class NanoGPTClient:
def __init__(self, api_key, base_url="https://nano-gpt.com/api/v1"):
self.base_url = base_url
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def chat(self, messages, model="minimax/minimax-m2.7", max_retries=3, **kwargs):
payload = {
"model": model,
"messages": messages,
"max_tokens": kwargs.get("max_tokens", 1024),
"temperature": kwargs.get("temperature", 0.7)
}
for attempt in range(max_retries):
try:
response = requests.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
json=payload,
timeout=30
)
if response.status_code == 200:
return response.json()["choices"][0]["message"]["content"]
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
print(f"Rate limited. Retrying in {retry_after}s...")
time.sleep(retry_after)
continue
if response.status_code == 401:
raise ValueError("Invalid API key. Check your NANOGPT_API_KEY.")
response.raise_for_status()
except RequestException as e:
if attempt == max_retries - 1:
raise
print(f"Request failed: {e}. Retrying ({attempt + 1}/{max_retries})...")
time.sleep(2 ** attempt)
raise RuntimeError("Max retries exceeded.")
This handles the common pain points:
-
Rate limiting (429): Respects the
Retry-Afterheader with exponential backoff - Auth errors (401): Gives you a clear message instead of a cryptic stack trace
- Network failures: Retries with exponential backoff
-
Timeouts: Configurable via the
timeoutparameter
Putting It All Together
Here's a complete example that ties everything together — a simple CLI chatbot:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("NANOGPT_API_KEY"),
base_url="https://nano-gpt.com/api/v1"
)
def chat():
messages = [{"role": "system", "content": "You are a helpful assistant. Be concise."}]
print("NanoGPT Chat (type 'quit' to exit)\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit"):
break
messages.append({"role": "user", "content": user_input})
stream = client.chat.completions.create(
model="minimax/minimax-m2.7",
messages=messages,
stream=True
)
print("AI: ", end="")
assistant_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
text = chunk.choices[0].delta.content
print(text, end="", flush=True)
assistant_response += text
print("\n")
messages.append({"role": "assistant", "content": assistant_response})
if __name__ == "__main__":
chat()
Quick Reference
| What | Value |
|---|---|
| Base URL | https://nano-gpt.com/api/v1 |
| Auth header | Authorization: Bearer YOUR_KEY |
| Chat endpoint | /chat/completions |
| Default model | minimax/minimax-m2.7 |
| Streaming | Set "stream": true in payload |
Wrapping Up
If you were already using OpenAI's Python SDK, migrating to NanoGPT is genuinely a two-line change. If you're starting fresh, you get a clean API that respects your privacy out of the box.
The real win here is that you're not locked into a single provider. Since NanoGPT is OpenAI-compatible, you can swap between providers without rewriting your application code. That's the kind of flexibility worth building on.
Got questions or hit a snag? Drop a comment below. Happy coding.
Originally published at ai-privacy-tools.vercel.app
Top comments (0)