I was paying for multiple AI subscriptions (Anthropic, OpenAI, Google) and manually deciding which model to use for every task. Simple questions going to expensive models. Complex code getting sent to cheap models that couldn't handle it. No visibility into what anything cost.
So I built Nadiru.
What it does
Nadiru is an AI orchestration engine with a sovereign Conductor. The Conductor classifies every incoming request, decides which provider and model should handle it, and routes accordingly. Simple math goes to a free Gemini model. Complex code goes to GPT-4o. Creative writing goes to Claude. The Conductor learns from your usage over time. If you keep re-prompting after responses from a particular model, it stops routing that task type there.
What makes it different
LiteLLM and OpenRouter do routing with static rules. Nadiru's Conductor actually learns your patterns:
- Delegate-first cold start: doesn't trust the local model until it proves capable
- Implicit feedback: detects re-prompts as rejected responses without asking you to rate anything
- Provider-agnostic Conductor: run it on a local Ollama model, or use Gemini Flash as a cheap cloud Conductor
- Dynamic model discovery: pings each provider's API on startup, found 130+ models across my 8 providers
- Refusal detection: if one model refuses your request on content policy, automatically retries with a different provider
The architecture
Four modules: Service (FastAPI), Conductor (routing brain), Providers (API adapters), Memory (SQLite interaction log). Three endpoints: /connect, /generate, /query. That's the entire API.
Applications built on Nadiru are called "Nadis." A Nadi connects to the engine, sends prompts, gets responses. The engine handles everything: routing, cost optimization, streaming, failover. Building a Nadi is about 10 lines of Python:
import httpx
ENGINE = "http://localhost:8765"
resp = httpx.post(f"{ENGINE}/connect", json={"name": "my-app"})
nadi_id = resp.json()["nadi_id"]
resp = httpx.post(f"{ENGINE}/generate", json={
"nadi_id": nadi_id,
"prompt": "Explain quantum entanglement simply",
"priority": "balanced",
})
print(resp.json()["content"])
Numbers
- 15+ provider adapters (Ollama, Anthropic, OpenAI, Google, Groq, DeepSeek, Together, Perplexity, Cerebras, OpenRouter, Mistral, Fireworks, AI21, Cohere, Azure OpenAI)
- 130+ models discovered automatically
- Response streaming via SSE
- 34 unit tests, GitHub Actions CI
- MIT licensed
- 5 responses for $0.007 in the dashboard demo
Links
- Engine: github.com/hlk-devs/nadiru-engine
- Example Nadis: github.com/hlk-devs/nadiru-nadis
Looking for feedback on the architecture and anyone who wants to test it. The engine is v0.1.1. Configurable routing prompts and a Nadi SDK are on the roadmap.
Top comments (0)