DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at aicoderscope.com

Goose AI Agent Review 2026: Apache 2.0, Any LLM, and the Best Free Local Coding Agent?

This article was originally published on aicoderscope.com

TL;DR: Goose is a free, Apache 2.0 coding agent from Block that runs against any LLM — including fully local Ollama models — and orchestrates real work through MCP extensions. It is the most capable open-source agent you can run with zero API cost. The catch: small local models still choke on its heavy tool-calling, so the "$0 forever" pitch only holds if you have the VRAM for a 14B+ model.

Goose Cline Claude Code
Best for Full-cycle automation, local-first, CLI + Desktop In-editor agent inside VS Code Deep agentic runs on Claude
Price / Cost Free (Apache 2.0); pay tokens or $0 local Free; bring your own API key Bundled with Claude Pro $20/Max $100–$200, or API
The catch Small local models fail its tool calls Lives only in your editor Locked to Anthropic models

Honest take: If you want one agent that works in the terminal and a desktop app, runs on whatever model you point it at, and never sends a line of code off your machine — Goose is the one to install first. Reach for Claude Code only when you want maximum reasoning on Anthropic's models and don't mind the bill.

What Goose actually is

Goose is an on-machine AI agent built by Block (the company behind Square, Cash App, and TIDAL). It shipped as open source in January 2025 and has grown to roughly 49,000 GitHub stars. As of June 2026 it is no longer a Block-only project: in December 2025 the Linux Foundation launched the Agentic AI Foundation (AAIF), anchored by three donated projects — Anthropic's Model Context Protocol (MCP), OpenAI's AGENTS.md, and Block's goose. Goose formally moved to the AAIF on April 7, 2026, which matters if you care about a tool outliving the company that wrote it: governance is now vendor-neutral.

The thing that separates Goose from a code-completion plugin is scope. It does not just suggest code in a side panel. It runs shell commands, edits files across your repo, executes code, runs your tests, and chains those steps into multi-step tasks. It ships as both a CLI and a Desktop app (macOS, Linux, Windows). The agent's tool access comes through MCP — Goose was one of the earliest MCP adopters and exposes 70+ documented extensions plus anything in the broader MCP server registry, which crossed 3,000 entries in early 2026.

The current stable release as of this writing is v1.37.0 (June 3, 2026). That release alone tells you the project is moving fast: it added a hooks system for custom agent behavior, a /goal self-evaluation command, a /review local code-analysis command, subagent instructions, PreToolUse denial hooks, and a TUI diff viewer. The two releases before it (v1.36.0 on May 27, v1.35.0 on May 22) were similarly dense.

The part that matters: it runs on local models

This is why Goose belongs in any "local LLM + coding tool" shortlist. It is genuinely model-agnostic. The June 2026 build talks to Anthropic, OpenAI, Google, OpenRouter, Azure, AWS Bedrock, and — the one developers searching for privacy actually want — Ollama for fully local inference. v1.37.0 added even more: xAI SuperGrok, Alibaba Qwen via DashScope, Databricks AI Gateway, and a generic declarative path for any OpenAI-compatible endpoint. Point it at a model, and your code never leaves the box.

Here is the actual install and local setup, tested on June 15, 2026 with Goose CLI v1.37.0 and Ollama 0.22:

# 1. Install the Goose CLI
curl -fsSL https://github.com/block/goose/releases/download/stable/download_cli.sh | bash

# 2. Pull a model that can handle tool calling
ollama pull qwen3-coder:14b

# 3. Configure Goose to use it
goose configure
#   ┌  goose-configure
#   │
#   ◇  What would you like to configure?
#   │  Configure Providers
#   │
#   ◇  Which model provider should we use?
#   │  Ollama
#   │
#   ◇  Provider Ollama requires OLLAMA_HOST, please enter a value
#   │  http://localhost:11434
#   │
#   ◇  Model fetch complete
#   │
#   └  Configuration saved. You can now run `goose`.
Enter fullscreen mode Exit fullscreen mode

If you skip the host prompt, Goose defaults to localhost:11434, so a standard local Ollama install just works. Start a session with goose and you are in an agent loop that can read your repo, write files, and run commands against your local model — no API key, no metered tokens, no network egress.

For a deeper dive on which quantized models fit which GPU, our Gemma 4 QAT local coding guide maps VRAM tiers to real coding performance, and runaihome.com's best local AI models by VRAM covers the hardware side.

The problem nobody mentions in the demos

Goose is a heavy tool-calling agent. Every step — read this file, run that command, apply this diff — is a structured function call the model has to emit correctly. Frontier models do this in their sleep. Small local models often do not.

On the first real test — "add input validation to the three handlers in api/routes.py and run the tests" — qwen3-coder:14b handled it cleanly: it read the files, edited all three, and ran pytest. Dropping to a 7B model on the same task, the agent stalled. It returned malformed tool-call JSON, Goose retried, and it looped. This is the same class of failure documented across local-agent setups, and it is the single biggest gap between the marketing ("runs on any model") and reality ("runs well on models that are actually good at tool use").

Two fixes that worked:

  1. Use a model trained for agentic tool use, not just code completion. A 14B coder-tuned model (Qwen3-Coder, Devstral) succeeds where a general 7B model fails. If you are on 8GB of VRAM, this is the constraint, not Goose itself.
  2. Lean on v1.37's hooks and /review. When a full autonomous run is too much for a local model, use Goose more surgically — /review for local analysis, smaller scoped goals — instead of one giant "build the feature" prompt.

If your hardware can't run a 14B model at usable speed, the honest move is to use Goose with a cheap cloud model. DeepSeek or a Qwen API runs Goose's tool calls reliably for a fraction of a cent per task — see our DeepSeek V4-Flash as a coding backend breakdown.

Goose vs Cline vs Aider vs Claude Code

All four are agents. They overlap less than the marketing suggests.

| | Goose | Cline | Aider | Claude Code |
|---|---|---|---|
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 | Proprietary |
| Interface | CLI + Desktop app | VS Code / JetBrains extension | Terminal | Terminal |
| Cost | Free; tokens or local | Free; tokens or local | Free; tokens or local | Claude Pro $20 / Max $100–$200 / API |
| Local models | Yes (Ollama, OpenAI-compatible) | Yes (Ollama, LM Studio) | Yes (Ollama, OpenAI-compatible) | No |
| MCP support | Deep — 70+ extensions | Yes | Limited | Yes |
| Best fit | Full-cycle automation outside the editor | Devs who live in VS Code | Git-native, surgical edits | Best reasoning on Claude |

The decisions are clear, not "it depends":

  • You live in VS Code all day. Use Cline. It is the same idea as Goose but rendered inside your editor with inline diffs, and it talks to local models too.
  • You want tight, git-aware edits and clean commits. Use Aider. It is the most disciplined of the four about diffs and commit messages, and it stays out of your way.
  • You want the strongest possible reasoning and will pay for it. Use Claude Code. Nothing here matches a top Claude model on a gnarly multi-file refactor — but you are locked to Anthropic and you pay per run.
  • You want one agent that works everywhere, on any model, including fully offline, for $0. Use Goose. It is the only one of the four that ships a polished desktop app and a CLI, and its MCP integration is the

Top comments (0)