Before you gain knowledge: This is my first post on here and I am trying to leave a good first impression, I would be more than happy to get reviews on this and what could be changed or optimised to make good tutorials on here! :)
Happy reading!
How to Use the NanoGPT API with Python
If you've been looking for a private, OpenAI-compatible API that doesn't hoard your prompts for training data, NanoGPT is worth checking out. It speaks the same language as OpenAI's API, meaning most of your existing code works with minimal changes, but your data stays yours.
In this guide, I'll walk you through everything: installation, auth, making basic requests, streaming responses, and handling errors properly. By the end, you'll have a working Python client you can drop into any project.
Why NanoGPT?
Before we write any code, let's talk about why you'd pick this over just hitting OpenAI directly:
- Privacy-first: Your prompts and completions aren't used for model training
- OpenAI-compatible: Drop-in replacement for most tools and libraries
- Model variety: Access to models like MiniMax M2.7 and others without managing infrastructure
- Simple pricing: Pay-per-token, no enterprise contracts required
If you're building something where user data privacy matters — and honestly, when doesn't it? this is a solid choice. For more context on private AI tools, check out ai-privacy-tools.vercel.app.
Installation
Start by installing the official nanogpt package:
pip install nanogpt
Or, if you prefer working with raw HTTP (no judgment — sometimes you want to see exactly what's going on), you can use requests or httpx instead. The API is standard REST + SSE, so any HTTP client works.
If you want the full OpenAI SDK experience, install the OpenAI package and point it at NanoGPT's base URL:
pip install openai
Both approaches work. I'll show you both below.
Authentication
Grab your API key from nano-gpt.com and set it as an environment variable:
export NANOGPT_API_KEY="your-api-key-here"
Never hardcode your API key in source code. Seriously. I've seen production repos with API keys in them and it's always a bad day.
Basic Chat Completion
Let's start with a simple request using the requests library:
import requests
import json
BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = os.environ.get("NANOGPT_API_KEY")
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
payload = {
"model": "minimax/minimax-m2.7",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to flatten a nested list."}
],
"temperature": 0.7,
"max_tokens": 1024
}
response = requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
data = response.json()
print(data["choices"][0]["message"]["content"])
That's it. If you've worked with OpenAI's API before, this should look completely familiar. Same endpoint structure, same request/response format.
Using the OpenAI SDK
Here's the same thing using the official OpenAI Python SDK pointed at NanoGPT:
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ.get("NANOGPT_API_KEY"),
base_url="https://nano-gpt.com/api/v1"
)
response = client.chat.completions.create(
model="minimax/minimax-m2.7",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to flatten a nested list."}
],
temperature=0.7,
max_tokens=1024
)
print(response.choices[0].message.content)
This approach is great because any existing code that uses the OpenAI SDK can be migrated to NanoGPT by just changing two lines the API key and the base URL.
Streaming Responses
For longer responses, streaming makes a huge difference in perceived performance. Here's how to do it:
import requests
import json
BASE_URL = "https://nano-gpt.com/api/v1"
API_KEY = os.environ.get("NANOGPT_API_KEY")
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
"Accept": "text/event-stream"
}
payload = {
"model": "minimax/minimax-m2.7",
"messages": [
{"role": "user", "content": "Explain the difference between TCP and UDP."}
],
"stream": True,
"max_tokens": 1024
}
response = requests.post(
f"{BASE_URL}/chat/completions",
headers=headers,
json=payload,
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode("utf-8")
if line.startswith("data: "):
chunk = line[6:]
if chunk.strip() == "[DONE]":
break
data = json.loads(chunk)
delta = data["choices"][0]["delta"]
if "content" in delta:
print(delta["content"], end="", flush=True)
print() # Newline after streaming
The key things here: set "stream": True in the payload, add "Accept": "text/event-stream" to headers, and use requests.post(..., stream=True) so it doesn't buffer the entire response. Then iterate over lines and parse the SSE chunks.
With the OpenAI SDK, streaming is even simpler:
stream = client.chat.completions.create(
model="minimax/minimax-m2.7",
messages=[{"role": "user", "content": "Explain TCP vs UDP."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Error Handling
Don't skip this part. Here's a practical error handling wrapper you can actually use in production:
import requests
from requests.exceptions import RequestException
import time
class NanoGPTClient:
def __init__(self, api_key, base_url="https://nano-gpt.com/api/v1"):
self.base_url = base_url
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def chat(self, messages, model="minimax/minimax-m2.7", max_retries=3, **kwargs):
payload = {
"model": model,
"messages": messages,
"max_tokens": kwargs.get("max_tokens", 1024),
"temperature": kwargs.get("temperature", 0.7)
}
for attempt in range(max_retries):
try:
response = requests.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
json=payload,
timeout=30
)
if response.status_code == 200:
return response.json()["choices"][0]["message"]["content"]
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
print(f"Rate limited. Retrying in {retry_after}s...")
time.sleep(retry_after)
continue
if response.status_code == 401:
raise ValueError("Invalid API key. Check your NANOGPT_API_KEY.")
response.raise_for_status()
except RequestException as e:
if attempt == max_retries - 1:
raise
print(f"Request failed: {e}. Retrying ({attempt + 1}/{max_retries})...")
time.sleep(2 ** attempt)
raise RuntimeError("Max retries exceeded.")
This handles the common pain points:
-
Rate limiting (429): Respects the
Retry-Afterheader with exponential backoff - Auth errors (401): Gives you a clear message instead of a cryptic stack trace
- Network failures: Retries with exponential backoff
-
Timeouts: Configurable via the
timeoutparameter
Putting It All Together
Here's a complete example that ties everything together — a simple CLI chatbot:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("NANOGPT_API_KEY"),
base_url="https://nano-gpt.com/api/v1"
)
def chat():
messages = [{"role": "system", "content": "You are a helpful assistant. Be concise."}]
print("NanoGPT Chat (type 'quit' to exit)\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit"):
break
messages.append({"role": "user", "content": user_input})
stream = client.chat.completions.create(
model="minimax/minimax-m2.7",
messages=messages,
stream=True
)
print("AI: ", end="")
assistant_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
text = chunk.choices[0].delta.content
print(text, end="", flush=True)
assistant_response += text
print("\n")
messages.append({"role": "assistant", "content": assistant_response})
if __name__ == "__main__":
chat()
Quick Reference
| What | Value |
|---|---|
| Base URL | https://nano-gpt.com/api/v1 |
| Auth header | Authorization: Bearer YOUR_KEY |
| Chat endpoint | /chat/completions |
| Default model | minimax/minimax-m2.7 |
| Streaming | Set "stream": true in payload |
Wrapping Up
If you were already using OpenAI's Python SDK, migrating to NanoGPT is genuinely a two-line change. If you're starting fresh, you get a clean API that respects your privacy out of the box.
The real win here is that you're not locked into a single provider. Since NanoGPT is OpenAI-compatible, you can swap between providers without rewriting your application code. That's the kind of flexibility worth building on.
Got questions or hit a snag? Drop a comment below. Happy coding.
Top comments (0)