DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at aicoderscope.com

Cline + Ollama stuck in a loop in 2026: the qwen2.5-coder JSON tool-call bug and the 90-second workaround

This article was originally published on aicoderscope.com

TL;DR: Cline + Ollama breaks in two distinct ways — models without native tool-calling get HTTP 400/500 errors immediately, and models with native tool-calling (qwen2.5-coder) get trapped in a JSON infinite loop because Cline's parser expects Anthropic-style XML but the model outputs JSON. Both bugs are unresolved in Cline v3.88.1 as of June 7, 2026. The JSON-loop fix takes 90 seconds.

What you'll be able to do after this guide:

  • Identify which of the two failure modes is breaking your Cline + Ollama setup
  • Apply the .clinerules XML injection that fixes the qwen2.5-coder infinite loop
  • Pick a model configuration that avoids the issue for your hardware tier

Honest take: If you're already on qwen2.5-coder, the .clinerules fix is the fastest path today. If you're on a non-tool-calling model hitting HTTP 400 errors, switch to qwen2.5-coder:14b as your floor and apply the same fix — it's the only working combination until PRs #11272 and #11301 land upstream.


What you actually see when it breaks

You followed the Cline + Ollama setup guide, pulled qwen2.5-coder:32b, set the context window, wired it to Cline v3.88.1. You give Cline a task: "Add a config loader that reads .env and returns typed settings." The spinner appears. Nothing happens.

After 30–60 seconds, one of two things occurs:

Scenario A — The request fails immediately:

Error: Request failed with status code 400
Enter fullscreen mode Exit fullscreen mode

Or status 500. Cline shows the error, pauses, and gives up. No tool calls executed.

Scenario B — Cline shows the model thinking for a long time. The context usage counter climbs. Then you see this pattern accumulate in the output pane:

{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}
{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}
{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}
Enter fullscreen mode Exit fullscreen mode

A MODEL_NO_TOOLS_USED error eventually kills the loop. No files were created. Nothing got done.

These are not the same bug. They have different causes and different fixes.


Two failure modes with one shared root

Cline's Ollama integration was updated to use the structured tools API parameter — the standard mechanism for OpenAI-compatible APIs to pass tool definitions to models with native function-calling support. The change broke compatibility in two directions simultaneously.

Mode A: HTTP 400/500 — models without native tool-calling

Standard GGUF models loaded in Ollama — older llama3 variants, codellama, most general-purpose models — don't advertise a tool-calling capability. When Cline sends a request with a populated tools parameter to these models, Ollama rejects the request at the HTTP level before the model sees the prompt. You get 400 or 500. The task never starts.

Cline currently sends the tools parameter unconditionally to all Ollama models regardless of capability. PR #11301 and PR #11319 fix this by querying /api/show for capability flags before deciding whether to include tools. Neither is merged as of June 7, 2026.

Mode B: Infinite JSON loop — models with native tool-calling

This is the qwen2.5-coder failure, documented in GitHub issue #10843. Qwen2.5-Coder models do support native function-calling — they were trained to output standard JSON tool-invocation payloads:

{"name": "write_to_file", "arguments": {"path": "src/app.ts", "content": "import dotenv..."}}
Enter fullscreen mode Exit fullscreen mode

Cline's streaming parser was built around Anthropic's XML tool tag format:

<write_to_file>
<path>src/app.ts</path>
<content>import dotenv...</content>
</write_to_file>
Enter fullscreen mode Exit fullscreen mode

When the model outputs JSON, Cline's parser treats it as plain conversational text — not a tool invocation. The agent loop fires MODEL_NO_TOOLS_USED, informs the model that no tool was called, and asks it to try again. The model retries with the same JSON. The loop continues until context fills or the session times out.

PR #11272 implements a JSON fallback parser that intercepts these JSON payloads and converts them to the same internal structures the XML path produces. Code review is passing, but the PR is not merged yet.


Which models hit which mode

Check a model's capability before you start:

curl http://localhost:11434/api/show \
  -d '{"name": "qwen2.5-coder:32b"}' | python3 -m json.tool | grep -A 10 capabilities
Enter fullscreen mode Exit fullscreen mode

If the output includes "tools" in the capabilities list, the model has native tool-calling — expect Mode B without the fix below. If there's no capabilities field or it lists nothing, expect Mode A.

Model (Ollama tag) Tool support Failure mode
qwen2.5-coder:7b Native JSON Mode B — JSON loop
qwen2.5-coder:14b Native JSON Mode B — JSON loop
qwen2.5-coder:32b Native JSON Mode B — documented in GitHub #10843
codellama:34b None Mode A — HTTP 400/500
llama3:8b / llama3.1:8b Varies by quantization Mode A on most GGUF builds
deepseek-coder:6.7b None Mode A — HTTP 400/500

Fix for Mode B: .clinerules XML injection

This workaround is confirmed in GitHub issue #10843. Forcing the model to output Anthropic-style XML makes Cline's existing parser execute tool calls correctly. One file, 90 seconds.

In your project root, create the .clinerules directory if it doesn't exist:

mkdir -p .clinerules
Enter fullscreen mode Exit fullscreen mode

Create .clinerules/tool-format.md:

# Tool invocation format

CRITICAL: Never output tool calls as JSON objects. Do not output patterns like:
{"name": "write_to_file", "arguments": {...}}
{"name": "read_file", "arguments": {...}}
{"name": "execute_command", "arguments": {...}}

You MUST use only Anthropic-style XML tags for all tool invocations:

<write_to_file>
<path>path/to/file</path>
<content>
file content here
</content>
</write_to_file>

<read_file>
<path>path/to/file</path>
</read_file>

<execute_command>
<command>ls -la</command>
</execute_command>

JSON output is silently ignored. XML tags are the only format that executes.
Enter fullscreen mode Exit fullscreen mode

Save the file, then reload VS Code:

Ctrl+Shift+P → Developer: Reload Window
Enter fullscreen mode Exit fullscreen mode

Cline loads .clinerules content at startup. Open a new Cline conversation — existing ones don't pick up rule changes mid-session.

What success looks like: The model's output shifts from a wall of repeating JSON blocks to a structured Cline tool-call card showing a file path, a content preview, and an Approve/Reject button. That button means the XML parse succeeded and Cline is waiting on your confirmation before writing. If the JSON accumulation continues with no button appearing, the .clinerules content hasn't loaded — check the directory location and reload again.


Fix for Mode A: switch model, then apply Fix 1

There's no clean user-side workaround for Mode A today. The HTTP 400/500 happens at the API layer before the model sees anything. Until PR #11301 or #11319 merges, models without native tool-calling cannot be used with Cline's Ollama provider as currently shipped.

The practical path: switch to qwen2.5-coder and apply the .clinerules fix above. Hardware requirements:

Model VRAM needed Recommended GPU
qwen2.5-coder:7b ~5 GB RTX 4060 8 GB (demo tier only — 7B loses multi-file agent tasks)
qwen2.5-coder:14b ~9 GB RTX 3060 12 GB or RTX 4060 Ti 16 GB — minimum for real agentic work
qwen2.5-coder:32b ~20 GB RTX 4090 or RTX 3090 — best practical local tier

Pull the model:


bash
ollama pull qwen2.5-coder:14b   # 9 GB download, runs on 12–16 GB VRAM
# or
ollama pull qwen2.5
Enter fullscreen mode Exit fullscreen mode

Top comments (0)