I pay for Claude Code. I use it every day, I built skills for it, and I think it's worth the money. I'm saying this upfront because this post is going to show you how to get a coding agent for free, and I don't want you wondering whether I actually believe in the paid version. I do. But not everybody needs it, not everybody can afford it, and not everybody wants their codebase sent to a server they don't control.
If that's you, this is the setup.
Three pieces of software, all free, all open source. You install them, connect them, and ten minutes later you have a coding agent running in your terminal that can read your files, write code, run commands, and help you build things. Your files stay on your machine. No API key. No credit card. No trial that expires in fourteen days.
What you're actually installing
Ollama runs AI models locally. Think of it as a model server sitting on your laptop. You pull a model the same way you'd pull a Docker image, and it handles all the inference. Free, open source, one command to install.
Gemma 4 is the model. Google DeepMind released it on April 2, 2026 under Apache 2.0, which means you can use it for anything, commercially or personally, no restrictions. The 26B parameter variant uses a Mixture of Experts architecture that only activates 3.8 billion parameters per inference. That means a 26 billion parameter model runs with the memory footprint of a much smaller one. It scores 77.1% on LiveCodeBench v6 (competitive coding) and 82.3% on GPQA Diamond (graduate-level science questions). For a free local model, those numbers are absurd.
OpenCode is the agent. Open source, 140,000+ stars on GitHub, built by the anomaly.co team. It's a terminal-based coding agent that connects to whatever AI backend you point it at. Claude, GPT, Gemini, or in our case, a local Ollama server running Gemma 4. It reads your project files, suggests edits, runs commands. The full agent experience, just powered by a model running on your own hardware.
Step 1: Install Ollama
Go to ollama.com/download and grab the installer for your OS. Mac, Windows, Linux, all supported.
On Mac or Linux, you can also run:
curl -fsSL https://ollama.com/install.sh | sh
Once installed, Ollama runs as a background service. You can verify it's working with:
ollama --version
Step 2: Pull Gemma 4
This is where you choose your model size. Two realistic options:
If you have 24GB+ RAM (most desktops, some high-end laptops):
ollama pull gemma4:26b
This is the 26B MoE variant. The best balance of capability and hardware requirements. It activates only 3.8B parameters per inference, so it runs faster than you'd expect from a 26 billion parameter model.
If you have 8GB RAM or less (older laptops, budget machines):
ollama pull gemma4:e4b
This is the E4B variant, 4 billion parameters. It won't be as capable, but it'll run smoothly on almost anything and still handle basic coding tasks, file operations, and simple refactors.
The download will take a few minutes depending on your connection. The 26B model is around 18GB, the E4B is around 10GB.
Step 3: Fix the context window (do not skip this)
This is the gotcha that wastes people's time. Ollama defaults every model to a 4,096 token context window. Gemma 4 supports 128K tokens on the E2B and E4B variants, 256K on the 26B and 31B, but Ollama doesn't care. It gives you 4K unless you explicitly tell it otherwise.
4K tokens is roughly one medium-sized file. For a coding agent that needs to read your project structure, understand multiple files, and maintain conversation context, 4K is useless. You'll get responses that cut off mid-thought, forget what you asked three messages ago, or just fail silently because the model ran out of room.
Create a file called Modelfile (no extension) in any directory:
FROM gemma4:26b
PARAMETER num_ctx 32768
If you pulled the E4B instead, use FROM gemma4:e4b.
Then create the custom model:
ollama create gemma4-agent -f Modelfile
Now you have a model called gemma4-agent with a 32K context window. On machines with 24GB+ RAM you can push this to 65536 or even 131072, but 32K is the sweet spot where you get enough context for real agent work without crushing your memory.
Test it works:
ollama run gemma4-agent "What is your context window size?"
Step 4: Install OpenCode
Check opencode.ai for the latest install method. As of April 2026:
Mac/Linux (recommended):
curl -fsSL https://opencode.ai/install | bash
npm (any platform):
npm i -g opencode-ai@latest
Windows (via Scoop):
scoop install opencode
Mac (via Homebrew):
brew install anomalyco/tap/opencode
Verify the install:
opencode --version
Step 5: Point OpenCode at your local model
OpenCode needs to know where your model lives. Create or edit the config file at ~/.config/opencode/opencode.json:
{
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"gemma4-agent": {
"name": "Gemma 4 Agent (local)"
}
}
}
}
}
The baseURL points to Ollama's local API. OpenCode talks to it using the OpenAI-compatible protocol, which Ollama supports out of the box.
Step 6: Use it
Navigate to any project directory and launch OpenCode:
cd your-project
opencode
On first run, select the Ollama provider and the gemma4-agent model. Then just talk to it.
Try something simple first:
"Read the files in this directory and tell me what this project does."
Then try something practical:
"Find all PNG images in this project and list their file sizes."
"Write a bash script that converts all PNG files to WebP format."
"Look at my package.json and tell me which dependencies are outdated."
If you're getting coherent, useful responses, it's working. The model is running entirely on your hardware, your files never leave your machine, and you paid nothing.
What you'll notice (honest take)
I'm not going to pretend this is equivalent to Claude Code or Cursor with Claude 4 behind it. It isn't. A local model with 3.8 billion active parameters is not going to match a frontier model with orders of magnitude more compute. You will notice the difference on complex multi-file refactors, on subtle architectural decisions, on tasks that require holding a lot of context at once.
But for a huge amount of daily coding work, it's genuinely good. File operations, simple scripts, refactoring single files, generating boilerplate, explaining code, converting formats. The stuff that takes you five minutes of tedious typing but doesn't require deep reasoning. Gemma 4 handles that well.
And for anyone who cares about privacy, there is no alternative that matches this. Your code, your files, your conversations, all of it stays on your machine. No server. No logs. No terms of service that might change next quarter.
The skills angle
I built a set of utility skills that work with OpenCode, Claude Code, Codex, all of them. They're on my Patreon. But honestly, with the setup above and the commands I share in my other posts, you can do most of it for free. The skills save you time. The setup in this post saves you money. Pick whichever matters more to you right now.
Top comments (1)
Good post, really practical and easy to follow.
One small suggestion: consider adding a “System Requirements” section right at the top. It would help readers quickly know if their machine can handle things like Gemma 4 (RAM, disk space, etc.) before they start the setup.
Right now that info is spread in the steps (like 24GB vs 8GB RAM), but surfacing it upfront would improve clarity and reduce drop-offs.