You're building with the OpenAI API. Maybe directly, maybe through LangChain, maybe through an MCP tool. Something isn't working — the response is wrong, the model is ignoring your system prompt, or you're getting rate limited and don't know why.
Your options:
- Add
print(response)everywhere - Set up LangSmith or Helicone
- Stare at the OpenAI dashboard usage page
All of these suck for quick debugging. Here's what I actually do.
The env var trick
Most OpenAI client libraries let you override the base URL. The official Python client uses OPENAI_BASE_URL. LangChain respects it. So does nearly every wrapper.
The trick: point that env var at an inspector that forwards to the real API while showing you everything.
# 1. Go to toran.sh/try
# 2. Enter: https://api.openai.com
# 3. Get your unique URL, e.g.: https://abc123.toran.sh
# 4. Set it:
export OPENAI_BASE_URL=https://abc123.toran.sh
Now run your code normally. Every request to OpenAI flows through your inspector URL, which forwards to the real API. You see the full request and response in your browser — live, as it happens.
What you can see
Once you're watching the traffic, some things become immediately obvious:
Token usage per request. Not the aggregate dashboard number — the actual usage object in each response. You can see exactly which call is burning through your quota.
System prompts hitting the API. If you're using a framework, you might not realize what system prompt it's actually sending. I've caught frameworks injecting 2,000-token system prompts I didn't write.
Retry behavior. Is your client retrying on 429s? How many times? With what backoff? You can see every retry as a separate request.
Streaming chunks. If you're using streaming, you can see the actual SSE chunks as they arrive. Useful when debugging why your streaming UI is stuttering.
Headers you didn't expect. Some wrappers add custom headers. Some send your API key in unexpected ways. Now you can see exactly what's going over the wire.
A real example
I was debugging why an agent kept giving wrong answers for a specific type of query. The logs showed the right tool was being called, the right function was executing, but the final answer was wrong.
I pointed the base URL at toran and watched the requests. Turned out the framework was sending the conversation history in the wrong order — tool results were appearing before the tool call in the messages array. The model was confused because it was seeing an answer before the question.
I would never have caught this from application logs. The logs showed "tool called, result returned, completion generated." Everything looked fine. But the actual HTTP request body told the real story.
Works with any OpenAI-compatible API
The same trick works with:
-
Anthropic — set
ANTHROPIC_BASE_URL - Azure OpenAI — override the endpoint URL
- Local models (Ollama, vLLM) — point at the local server through toran
- Any OpenAI-compatible API — if it takes a base URL, it works
# Anthropic
import anthropic
client = anthropic.Anthropic(base_url="https://abc123.toran.sh")
# OpenAI
from openai import OpenAI
client = OpenAI(base_url="https://abc123.toran.sh/v1")
When to use this vs. proper observability
This isn't a replacement for LangSmith, Helicone, or OpenTelemetry. Those are production monitoring tools. This is for when you're sitting at your desk going "why the hell isn't this working" and you need to see the raw request right now.
Think of it as curl -v for your LLM calls. You don't leave curl -v in production, but you reach for it constantly during development.
Use toran when:
- Something is broken and you need to see the actual request
- You want to verify what a framework is sending on your behalf
- You're debugging streaming behavior
- You need to check retry/error handling logic
- You want to see real token counts per request
Use proper observability when:
- You need historical data and dashboards
- You're monitoring production traffic
- You need alerts on cost/latency
- You want traces across multiple services
Try it
Go to toran.sh/try. Enter https://api.openai.com. Get your URL. Set OPENAI_BASE_URL. Run your code.
Takes 30 seconds. You'll see things you didn't know your code was doing.
Top comments (0)