Here's a mistake most AI developers make: they pick one model and use it for everything.
It's expensive. It's slow. And for most queries, it's overkill.
I helped build SupportMind AI at a hackathon and we did it differently. Here's the routing strategy we used.
The Problem With One-Size-Fits-All
"Where is my order?" needs a fast answer, not a deep reasoner.
"My laptop is broken and I need a warranty replacement" needs careful reasoning and empathy.
Running both through a 70B model is wasteful. Running both through an 8B model means the complex case gets a bad answer.
The solution is routing.
How cascadeflow Works
cascadeflow is a model routing library. It lets you define rules for which model handles which type of query, and apply them at runtime.
We used keyword-based routing as our starting point:
python
def get_model(message):
complex_keywords = ["broken", "refund", "urgent", "damaged", "fraud", "cancel", "not working", "replace"]
if any(k in message.lower() for k in complex_keywords):
print("[cascadeflow] Complex query → llama-3.3-70b-versatile")
return "llama-3.3-70b-versatile"
else:
print("[cascadeflow] Simple query → llama-3.1-8b-instant")
return "llama-3.1-8b-instant"
Simple queries hit the 8B model — faster response, lower cost.
Complex queries escalate to 70B — better reasoning, worth the extra latency.
The Terminal Output
Watching the logs during our demo was satisfying:
[cascadeflow] ⚡ Simple query → llama-3.1-8b-instant (faster and cheaper)
[cascadeflow] 🔀 Complex query → llama-3.3-70b-versatile
[Hindsight] ✅ Memory saved for customer_001
Every query being routed intelligently in real time.
What This Means for Production
In a real support system handling thousands of queries a day, smart routing can cut model costs by 40-60% while maintaining quality where it matters. That's not a minor optimization — it's the difference between a profitable AI product and one that burns money.
Links
- Live Demo: https://web-production-ad285.up.railway.app
- GitHub: https://github.com/bodigetejasree/supportmind-ai
- cascadeflow: https://github.com/lemony-ai/cascadeflow
Top comments (0)