Discussion on: "I Pointed Claude Code at My Local Ollama Models — Here's the 3-Minute Setup"

View post

The per-app routing config is a smart middle ground. One thing I'd push further: within the same app, the gap between "rename this variable" and "redesign this module" is huge. Have you thought about request-level classification, like estimating complexity from prompt length or keyword patterns, to auto-switch between local and cloud on a per-request basis?

CodeKing • Apr 13

You're right that per-app routing is a coarse approximation. Per-request classification is more powerful but harder to
get right — prompt length is a decent proxy but misses context (a short "refactor this function" can be surprisingly
hard if it touches 10 files).

One heuristic I've been thinking about: context token count as the routing signal. If the request includes more than N
context tokens, route to cloud. Simpler than NLP classification and correlates reasonably well with actual task
complexity.

Curious what signals you'd trust most in practice — keyword patterns or something else?