DEV Community

Dor Amir
Dor Amir

Posted on

NadirClaw 0.8: Vision Routing and the Silent Failure It Fixed

Here's a bug that's annoying to diagnose: you send a screenshot to Cursor, get a response that clearly didn't look at the image. You try again. Same thing. You figure it's a model issue and move on.

If you're running NadirClaw in front of Cursor, the bug was in the router.


How NadirClaw routes requests

Before 0.8, here's what happened when you sent an image:

  1. NadirClaw's classifier embeds your prompt using sentence embeddings and compares it to two pre-computed centroid vectors (one for "simple", one for "complex"). This takes ~10ms. No extra API call.
  2. Your screenshot is probably attached to a short message like "what's wrong here?" - that classifies as simple.
  3. Simple routes to your cheap model. If that's DeepSeek or an Ollama model, neither supports vision.
  4. The multimodal content array (the image_url part) got flattened to text before hitting LiteLLM. The image disappeared.
  5. DeepSeek answered based on the text alone. Looked wrong. Was wrong.

No error. No log warning. Just a bad answer.


What 0.8 changes

The model registry now has a has_vision field on every model:

"gemini-2.5-flash":     {"has_vision": True,  "cost_per_m_input": 0.15}
"deepseek/deepseek-chat": {"has_vision": False, "cost_per_m_input": 0.28}
"ollama/llama3.1:8b":   {"has_vision": False, "cost_per_m_input": 0}
Enter fullscreen mode Exit fullscreen mode

When NadirClaw detects image_url or base64 image content in a request, it checks the selected model's has_vision flag. If it's False, it swaps to the cheapest vision-capable model in your configured tiers.

That's usually Gemini Flash ($0.15/M input) rather than Sonnet ($3.00/M) or GPT-5.2 ($1.75/M). You're not paying premium rates for vision, you're paying the cheapest rate that actually works.


The fix that mattered as much as the routing

Separately from the routing logic, there was a bug: even if you'd manually pointed your image request at a vision-capable model, the content array was still being flattened to text-only before reaching LiteLLM. Both streaming and non-streaming paths.

That's fixed in 0.8. Image content parts now pass through unchanged.


Upgrade

pip install --upgrade nadirclaw
Enter fullscreen mode Exit fullscreen mode

If you've been getting inconsistent answers on image-heavy requests, this is probably why. Run nadirclaw report after upgrading and look at the has_images field in your request logs to see how often this was silently misfiring.

Full changelog: v0.7.0...v0.8.0

(Full disclosure: I work on this project.)

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.