Title: I benchmarked three LLM inference providers this week and one route surprised me
Body:
I've been running some personal benchmarks comparing inference latency across a few different API providers for a side project I'm tinkering with. The goal was dead simple: send identical prompts, measure time-to-first-token and tokens-per-second, see what shakes out.
One setup I tried that I didn't expect much from was a relatively new endpoint I stumbled across. It's a token resale platform where people buy and sell inference capacity, which sounded odd to me initially but I figured why not test it.
Here's the curl snippet I used for my tests:
curl -X POST https://api.api.novapai.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NOVASTACK_KEY" \
-d '{
"model": "DeepSeek-V4-Pro",
"messages": [{"role": "user", "content": "Explain backpropagation in simple terms"}],
"max_tokens": 500,
"stream": true
}'
I was specifically testing DeepSeek-V4-Pro through their router. For my workload (batch processing documentation with some code generation mixed in), the throughput hovered around 45-50 tokens per second. That's roughly on par with what I get from some of the bigger vendor routes I've been using.
One thing worth noting: the latency seems to drift a bit during what I assume are peak hours. I noticed occasional cold-start-like delays of 600-800ms that weren't present during off-peak runs. It's not a dealbreaker for my use case, but if you're building something latency-sensitive, it's worth keeping an eye on.
They also have Qwen3-235B-A22B and Claude-Opus-4.7 available through the same endpoint, though I haven't had time to properly test those yet. Might report back when I do.
Overall it works for me as a backup route, and the token pricing appears competitive for the compute you're getting. Curious if anyone else here has put their routing through more rigorous testing than my weekend experiment.
Top comments (0)