Jovan Chan

Posted on Jun 13 • Originally published at aicoderscope.com

Cline + Ollama stuck in a loop in 2026: the qwen2.5-coder JSON tool-call bug and the 90-second workaround

#cline #ollama #localllm #fix

This article was originally published on aicoderscope.com

TL;DR: Cline + Ollama breaks in two distinct ways — models without native tool-calling get HTTP 400/500 errors immediately, and models with native tool-calling (qwen2.5-coder) get trapped in a JSON infinite loop because Cline's parser expects Anthropic-style XML but the model outputs JSON. Both bugs are unresolved in Cline v3.88.1 as of June 7, 2026. The JSON-loop fix takes 90 seconds.

What you'll be able to do after this guide:

Identify which of the two failure modes is breaking your Cline + Ollama setup
Apply the .clinerules XML injection that fixes the qwen2.5-coder infinite loop
Pick a model configuration that avoids the issue for your hardware tier

Honest take: If you're already on qwen2.5-coder, the .clinerules fix is the fastest path today. If you're on a non-tool-calling model hitting HTTP 400 errors, switch to qwen2.5-coder:14b as your floor and apply the same fix — it's the only working combination until PRs #11272 and #11301 land upstream.

What you actually see when it breaks

You followed the Cline + Ollama setup guide, pulled qwen2.5-coder:32b, set the context window, wired it to Cline v3.88.1. You give Cline a task: "Add a config loader that reads .env and returns typed settings." The spinner appears. Nothing happens.

After 30–60 seconds, one of two things occurs:

Scenario A — The request fails immediately:

Error: Request failed with status code 400

Or status 500. Cline shows the error, pauses, and gives up. No tool calls executed.

Scenario B — Cline shows the model thinking for a long time. The context usage counter climbs. Then you see this pattern accumulate in the output pane:

{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}
{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}
{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}

A MODEL_NO_TOOLS_USED error eventually kills the loop. No files were created. Nothing got done.

These are not the same bug. They have different causes and different fixes.

Two failure modes with one shared root

Cline's Ollama integration was updated to use the structured tools API parameter — the standard mechanism for OpenAI-compatible APIs to pass tool definitions to models with native function-calling support. The change broke compatibility in two directions simultaneously.

Mode A: HTTP 400/500 — models without native tool-calling

Standard GGUF models loaded in Ollama — older llama3 variants, codellama, most general-purpose models — don't advertise a tool-calling capability. When Cline sends a request with a populated tools parameter to these models, Ollama rejects the request at the HTTP level before the model sees the prompt. You get 400 or 500. The task never starts.

Cline currently sends the tools parameter unconditionally to all Ollama models regardless of capability. PR #11301 and PR #11319 fix this by querying /api/show for capability flags before deciding whether to include tools. Neither is merged as of June 7, 2026.

Mode B: Infinite JSON loop — models with native tool-calling

This is the qwen2.5-coder failure, documented in GitHub issue #10843. Qwen2.5-Coder models do support native function-calling — they were trained to output standard JSON tool-invocation payloads:

{"name": "write_to_file", "arguments": {"path": "src/app.ts", "content": "import dotenv..."}}

Cline's streaming parser was built around Anthropic's XML tool tag format:

<write_to_file>
<path>src/app.ts</path>
<content>import dotenv...</content>
</write_to_file>

When the model outputs JSON, Cline's parser treats it as plain conversational text — not a tool invocation. The agent loop fires MODEL_NO_TOOLS_USED, informs the model that no tool was called, and asks it to try again. The model retries with the same JSON. The loop continues until context fills or the session times out.

PR #11272 implements a JSON fallback parser that intercepts these JSON payloads and converts them to the same internal structures the XML path produces. Code review is passing, but the PR is not merged yet.

Which models hit which mode

Check a model's capability before you start:

curl http://localhost:11434/api/show \
  -d '{"name": "qwen2.5-coder:32b"}' | python3 -m json.tool | grep -A 10 capabilities

If the output includes "tools" in the capabilities list, the model has native tool-calling — expect Mode B without the fix below. If there's no capabilities field or it lists nothing, expect Mode A.

Model (Ollama tag)	Tool support	Failure mode
`qwen2.5-coder:7b`	Native JSON	Mode B — JSON loop
`qwen2.5-coder:14b`	Native JSON	Mode B — JSON loop
`qwen2.5-coder:32b`	Native JSON	Mode B — documented in GitHub #10843
`codellama:34b`	None	Mode A — HTTP 400/500
`llama3:8b` / `llama3.1:8b`	Varies by quantization	Mode A on most GGUF builds
`deepseek-coder:6.7b`	None	Mode A — HTTP 400/500

Fix for Mode B: `.clinerules` XML injection

This workaround is confirmed in GitHub issue #10843. Forcing the model to output Anthropic-style XML makes Cline's existing parser execute tool calls correctly. One file, 90 seconds.

In your project root, create the .clinerules directory if it doesn't exist:

mkdir -p .clinerules

Create .clinerules/tool-format.md:

# Tool invocation format

CRITICAL: Never output tool calls as JSON objects. Do not output patterns like:
{"name": "write_to_file", "arguments": {...}}
{"name": "read_file", "arguments": {...}}
{"name": "execute_command", "arguments": {...}}

You MUST use only Anthropic-style XML tags for all tool invocations:

<write_to_file>
<path>path/to/file</path>
<content>
file content here
</content>
</write_to_file>

<read_file>
<path>path/to/file</path>
</read_file>

<execute_command>
<command>ls -la</command>
</execute_command>

JSON output is silently ignored. XML tags are the only format that executes.

Save the file, then reload VS Code:

Ctrl+Shift+P → Developer: Reload Window

Cline loads .clinerules content at startup. Open a new Cline conversation — existing ones don't pick up rule changes mid-session.

What success looks like: The model's output shifts from a wall of repeating JSON blocks to a structured Cline tool-call card showing a file path, a content preview, and an Approve/Reject button. That button means the XML parse succeeded and Cline is waiting on your confirmation before writing. If the JSON accumulation continues with no button appearing, the .clinerules content hasn't loaded — check the directory location and reload again.

Fix for Mode A: switch model, then apply Fix 1

There's no clean user-side workaround for Mode A today. The HTTP 400/500 happens at the API layer before the model sees anything. Until PR #11301 or #11319 merges, models without native tool-calling cannot be used with Cline's Ollama provider as currently shipped.

The practical path: switch to qwen2.5-coder and apply the .clinerules fix above. Hardware requirements:

Model	VRAM needed	Recommended GPU
`qwen2.5-coder:7b`	~5 GB	RTX 4060 8 GB (demo tier only — 7B loses multi-file agent tasks)
`qwen2.5-coder:14b`	~9 GB	RTX 3060 12 GB or RTX 4060 Ti 16 GB — minimum for real agentic work
`qwen2.5-coder:32b`	~20 GB	RTX 4090 or RTX 3090 — best practical local tier

Pull the model:


bash
ollama pull qwen2.5-coder:14b   # 9 GB download, runs on 12–16 GB VRAM
# or
ollama pull qwen2.5

DEV Community

Cline + Ollama stuck in a loop in 2026: the qwen2.5-coder JSON tool-call bug and the 90-second workaround

What you actually see when it breaks

Two failure modes with one shared root

Which models hit which mode

Fix for Mode B: `.clinerules` XML injection

Fix for Mode A: switch model, then apply Fix 1

Top comments (0)

What you actually see when it breaks

Two failure modes with one shared root

Which models hit which mode

Fix for Mode B: .clinerules XML injection

Fix for Mode A: switch model, then apply Fix 1

Fix for Mode B: `.clinerules` XML injection