I'm a solo founder from Argentina. I'm not an engineer — I built the backend of NeuralRouting.io almost entirely with Claude.
The problem
Most teams send every LLM request to GPT-4 even when a smaller model would return the same quality answer. The cost difference between a $30/M token model and a $0.50/M model is massive at scale.
What NeuralRouting does
It sits between your app and LLM providers. Each request gets a complexity score, and the router picks the cheapest model that can handle it.
It also has:
- Dual-layer semantic cache — similar queries get served from cache instead of hitting the API again
- Shadow Engine — runs cheaper models in parallel to benchmark quality over time
- PII filtering and rate limiting
- Agent loop detection
Where it's at
Early. I only support OpenAI and Groq right now. Zero users. I built too much before talking to anyone and I'm fixing that now.
If you work with LLMs and want to try it, I'm looking for honest feedback: neuralrouting.io
Top comments (0)