DEV Community

Samarth Bhamare
Samarth Bhamare

Posted on • Originally published at clskills.hashnode.dev

I picked a 5ms keyword router over an LLM meta-router for my AI app. Here's the math.

I picked a 5ms keyword router over an LLM meta-router for my AI app. Here's the math.

short version: i was building a desktop AI sales coach where the user types a question and the system picks one of 10 "founder voices" to answer in. i prototyped two routers — a deterministic keyword one and a meta-LLM one. the deterministic one was 600x faster, free, and 85% accurate. i shipped the deterministic one. here's why and what the code looks like.

if you're building an AI app where you have to pick between multiple specialized prompts/personas/agents per request, this might save you a few weeks.

the setup

i shipped a product called the Sales Agent Pack last night (clskills.in/sales-agent-saas). it's a desktop electron app + claude code skill that has 10 "council voices" — each one built from the public writings of a SaaS founder (Collison, Benioff, Lütke, Chesky, Huang, Altman, Amodei, Levie, Butterfield, Lemkin).

the user types a sales question. the system picks ONE voice to answer in. that "pick" is the routing decision.

example questions:

  • "should i raise prices to $79?" → should route to Lemkin (saastr operator, pricing experiments)
  • "we're losing to hubspot, what's the angle?" → should route to Levie (challenger positioning)
  • "the deck feels generic" → should route to Chesky (identity-driven sales, design)

the "voices" aren't roleplay — each voice is a 3000-word markdown file built from the founder's actual public writing, loaded into the system prompt at chat time. so the routing decision matters: pick the wrong voice and the answer is technically correct but the style and frame is wrong.

option A: meta-LLM router

the obvious approach. before answering, ask claude (or any LLM) "which of these 10 voices should answer this question?"

async function pickVoiceLLM(message) {
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-5',
    max_tokens: 50,
    messages: [{
      role: 'user',
      content: `Question: "${message}"\n\nWhich voice should answer? Reply with just one word from: Collison, Benioff, Lutke, Chesky, Huang, Altman, Amodei, Levie, Butterfield, Lemkin.`
    }],
  });
  return response.content[0].text.trim();
}
Enter fullscreen mode Exit fullscreen mode

measured cost (over 100 sample questions):

  • latency p50: 1,800 ms
  • latency p95: 3,100 ms
  • cost per call: ~$0.005 (50 tokens out, ~120 tokens in)
  • accuracy (vs. my hand-labeled "correct" answer): 89%

option B: deterministic keyword router

less sexy. just a function with a bunch of ifs and a buyer-archetype heuristic.

function pickVoice(message, conversationType) {
  const m = message.toLowerCase();

  // Hard overrides — explicit conversation types win
  if (conversationType === 'post_mortem') {
    return { primary: 'Lemkin', voiceFile: 'council/10-lemkin.md' };
  }
  if (conversationType === 'competitive_positioning') {
    return { primary: 'Levie', voiceFile: 'council/08-levie.md' };
  }

  // Pricing keywords → Lemkin (the saastr operator)
  if (/price|pricing|raise.*price|tier|discount|annual.*contract/.test(m)) {
    return { primary: 'Lemkin', voiceFile: 'council/10-lemkin.md' };
  }

  // Developer-buyer keywords → Collison (Stripe playbook)
  if (/api|developer|sdk|docs|integration|technical.*buyer/.test(m)) {
    return { primary: 'Collison', voiceFile: 'council/01-collison.md' };
  }

  // Enterprise / trust → Benioff
  if (/enterprise|procurement|security review|legal|compliance|fortune/.test(m)) {
    return { primary: 'Benioff', voiceFile: 'council/02-benioff.md' };
  }

  // Design / identity / story → Chesky
  if (/deck|story|narrative|design|identity|brand|generic/.test(m)) {
    return { primary: 'Chesky', voiceFile: 'council/04-chesky.md' };
  }

  // Underdog / fairness / anti-incumbent → Lütke
  if (/underdog|anti|hubspot|salesforce.*alternative|david.*goliath/.test(m)) {
    return { primary: 'Lutke', voiceFile: 'council/03-lutke.md' };
  }

  // Default — Lemkin handles "general sales question"
  return { primary: 'Lemkin', voiceFile: 'council/10-lemkin.md' };
}
Enter fullscreen mode Exit fullscreen mode

measured (over the same 100 questions):

  • latency p50: 3 ms
  • latency p95: 5 ms
  • cost per call: $0
  • accuracy (vs. my hand-labeled "correct"): 85%

the math

over a year of moderate use (let's say 50 questions per buyer per month, 1000 buyers = 600,000 questions/year):

Meta-LLM router Deterministic router
Total latency added 1,080,000 sec (~12.5 days of waiting) 1,800 sec (~30 minutes)
Total cost added $3,000 $0
Accuracy 89% 85%
Failure mode API outage = no routing Code bug = obvious + fixable

the 4% accuracy delta costs $3,000 and 12.5 buyer-days of waiting. that's not worth it. especially because the 15% miss rate on the deterministic version isn't catastrophic — it picks the wrong council voice, but the answer is still useful, just framed by Lemkin when it should have been framed by Chesky.

plus there's a manual escape hatch. if the user wants a specific voice, they say "answer like Chesky would" in their question. the keyword chesky triggers an explicit override. zero ML required. infinite override-ability.

when meta-LLM routing IS worth it

i'm not saying "always use deterministic." here's when i'd flip the decision:

  1. when the routing space is large and fluid. if i had 100 voices instead of 10, hand-coding keyword rules becomes unmaintainable. an LLM router scales linearly in cost vs my time.

  2. when the cost of wrong is high. if mis-routing meant "user gets a totally irrelevant answer" instead of "user gets a reasonable answer in a slightly off voice," the 4% accuracy delta is worth $3,000.

  3. when you have reliable structured outputs. with JSON mode + a constrained enum, an LLM router becomes much more reliable than free-form generation.

  4. when latency budget is generous. for an async batch system, +2 seconds doesn't matter. for an interactive chat, it's perceptible and annoying.

the v0.3.0 plan

i'm not hard-committed to deterministic forever. the actual plan is:

  1. v0.1.0 — deterministic router (shipped)
  2. v0.1.x → v0.2.x — collect routing data. for every chat, log (question, deterministic_pick, user_override_if_any). let it run for ~3 months.
  3. v0.3.0 — train a tiny classifier on the logged data. probably 100 lines of scikit-learn. inference cost: also ~5ms. accuracy estimate: ~92%.
  4. only switch to meta-LLM router if the classifier plateaus below ~90% AND the 8% miss rate is causing real user complaints.

the "premature optimization is the root of all evil" version of this is: don't reach for an LLM call when an if statement does the job. especially when you're paying for the LLM call out of pocket and the if statement runs in single-digit milliseconds.

try it

if you want to see the deterministic router in action — the product is at clskills.in/sales-agent-saas. it's a desktop AI sales coach for SaaS founders, $299 pre-order, ships as both an Electron app and a Claude Code skill. 7-day refund.

i wrote a longer technical post about the rest of the architecture (why no BrowserWindow, the auto-update endpoint, the ELECTRON_RUN_AS_NODE trap that almost killed me) on my hashnode at clskills.hashnode.dev — go read that if this one was useful.

questions / objections / "you should have done X" — drop them in the comments. i read everything.

— samarth

Top comments (0)