I've noticed a pattern in my workflow: I spend way too much time switching between Claude for code, Gemini for long-context research, and GPT for quick logic.
In 2026, "Model Loyalty" feels like a tax on productivity. So, I built Alotaibi AI—a lightweight, intent-based router that picks the best API for your prompt so you don't have to.
The Tech Stack
Frontend: Ultra-lightweight "Thin-client" HTML/JS, hosted on Vercel for global edge delivery.
Backend: A decoupled, private orchestration layer. I opted for a dedicated backend instead of serverless functions to avoid execution timeouts and to handle more complex intent-classification logic.
Intelligence: I'm using a zero-shot classification layer to weigh prompts across four axes: Logic, Creativity, Speed, and Retrieval.
The "Secret Sauce" (and why I need your help)
The core logic evaluates your prompt in real-time. Depending on the "intent" detected, it routes the request to the specific model currently performing best for that niche (Claude 4.6, GPT-5, or Gemini 3 Pro).
The Problem: Tuning these "routing thresholds" is a nightmare. Sometimes the system favors a faster model when I actually need deep reasoning.
I need a Roast
I'm looking for feedback on three things:
Routing Accuracy: Try to "trick" the system. Send it a prompt that should go to a high-reasoning model but gets routed to a faster, "dumber" one.
The UI: Is the "thin-client" approach too minimal, or is the speed of a raw HTML frontend worth it?
Architecture: If you’ve built decoupled AI apps before, how are you handling the handshake latency between your frontend and private API?
Check it out here:
Top comments (1)
Thanks for checking out Alotaibi AI! I'm particularly interested in seeing how the router handles 'edge-case' logic prompts. If you find a weird routing result, let me know here!