Kilo Spark

Posted on Feb 15

How I Debug OpenAI API Calls Without Any SDK

#ai #openai #debugging #webdev

You're building with the OpenAI API. Maybe directly, maybe through LangChain, maybe through an MCP tool. Something isn't working — the response is wrong, the model is ignoring your system prompt, or you're getting rate limited and don't know why.

Your options:

Add print(response) everywhere
Set up LangSmith or Helicone
Stare at the OpenAI dashboard usage page

All of these suck for quick debugging. Here's what I actually do.

The env var trick

Most OpenAI client libraries let you override the base URL. The official Python client uses OPENAI_BASE_URL. LangChain respects it. So does nearly every wrapper.

The trick: point that env var at an inspector that forwards to the real API while showing you everything.

# 1. Go to toran.sh/try
# 2. Enter: https://api.openai.com
# 3. Get your unique URL, e.g.: https://abc123.toran.sh
# 4. Set it:
export OPENAI_BASE_URL=https://abc123.toran.sh

Now run your code normally. Every request to OpenAI flows through your inspector URL, which forwards to the real API. You see the full request and response in your browser — live, as it happens.

What you can see

Once you're watching the traffic, some things become immediately obvious:

Token usage per request. Not the aggregate dashboard number — the actual usage object in each response. You can see exactly which call is burning through your quota.

System prompts hitting the API. If you're using a framework, you might not realize what system prompt it's actually sending. I've caught frameworks injecting 2,000-token system prompts I didn't write.

Retry behavior. Is your client retrying on 429s? How many times? With what backoff? You can see every retry as a separate request.

Streaming chunks. If you're using streaming, you can see the actual SSE chunks as they arrive. Useful when debugging why your streaming UI is stuttering.

Headers you didn't expect. Some wrappers add custom headers. Some send your API key in unexpected ways. Now you can see exactly what's going over the wire.

A real example

I was debugging why an agent kept giving wrong answers for a specific type of query. The logs showed the right tool was being called, the right function was executing, but the final answer was wrong.

I pointed the base URL at toran and watched the requests. Turned out the framework was sending the conversation history in the wrong order — tool results were appearing before the tool call in the messages array. The model was confused because it was seeing an answer before the question.

I would never have caught this from application logs. The logs showed "tool called, result returned, completion generated." Everything looked fine. But the actual HTTP request body told the real story.

Works with any OpenAI-compatible API

The same trick works with:

Anthropic — set ANTHROPIC_BASE_URL
Azure OpenAI — override the endpoint URL
Local models (Ollama, vLLM) — point at the local server through toran
Any OpenAI-compatible API — if it takes a base URL, it works

# Anthropic
import anthropic
client = anthropic.Anthropic(base_url="https://abc123.toran.sh")

# OpenAI
from openai import OpenAI
client = OpenAI(base_url="https://abc123.toran.sh/v1")

When to use this vs. proper observability

This isn't a replacement for LangSmith, Helicone, or OpenTelemetry. Those are production monitoring tools. This is for when you're sitting at your desk going "why the hell isn't this working" and you need to see the raw request right now.

Think of it as curl -v for your LLM calls. You don't leave curl -v in production, but you reach for it constantly during development.

Use toran when:

Something is broken and you need to see the actual request
You want to verify what a framework is sending on your behalf
You're debugging streaming behavior
You need to check retry/error handling logic
You want to see real token counts per request

Use proper observability when:

You need historical data and dashboards
You're monitoring production traffic
You need alerts on cost/latency
You want traces across multiple services

Try it

Go to toran.sh/try. Enter https://api.openai.com. Get your URL. Set OPENAI_BASE_URL. Run your code.

Takes 30 seconds. You'll see things you didn't know your code was doing.

DEV Community