Local Agentic Development with Ollama and OpenCode

David Vellé Abel — Mon, 11 May 2026 08:34:36 +0000

Why Go Local?

Every time I use a cloud-based AI coding assistant, I feel a little trapped in a vendor lock-in ecosystem, and it triggers alarms in my brain. Where exactly is my data going? What happens when they inevitably hike up the subscription price, just like every other service (looking at you, Netflix)?

It always brings me back to one question: Is it feasible to just build this locally?

Recently, I saw this post from Julien Chaumond, the CTO of Hugging Face, and it inspired me to finally try it.

The Benefits

Going local comes with some massive benefits:

The Budget: There are no API tokens to refill and no $20/month subscription fees.
Zero Dependencies: You are completely immune to API outages, rate limits, or slow internet connections.
Absolute Privacy & Security: This is the biggest draw. Zero code leaves your machine.

Setup: Ollama + OpenCode

I am building this setup using Ollama and OpenCode. They are both open-source, and they serve two distinct purposes:

Ollama serves the local LLM to our machine.
OpenCode acts as the agent, connecting to Ollama to execute our tasks.

Both tools have deep configuration options, but for this article, we will keep it strictly to the essentials.

Ollama

Ollama offers different scripts to ease the installation process
Follow the steps, and once you are done, install a local model.

You can check available models in Ollama's website. Which one to choose depends mainly on:

What are you trying to do? - In our case that's coding
How much RAM do you have available - Different models require different RAM capabilities based on their architecture and parameter count. As a simplified rule: the higher the parameter number, the "smarter" the model, but the higher the memory footprint.

I found that Qwen3.6 looks excellent for coding tasks and it fits my hardware (~22GB footprint), so let's install it:

First, start Ollama. This can be done as a background service or directly in the terminal:

ollama serve

Now let's download the model.

ollama pull qwen3.6

We can run it as a terminal chat-bot to verify everything is working:

ollama run qwen3.6

With the backend running, let's jump to our Agent.

OpenCode

OpenCode is an open-source Agent that can connect to any LLM model - even the paid ones like Claude - and it works really well with Ollama.

Installation is simple enough through brew, bun, npm, etc

Now we need to configure a model. OpenCode already offers integration with many providers, but for our local use case, we can use Ollama to start OpenCode pre-configured. Let's do that:

ollama launch opencode --model qwen3.6

This single command starts and configures everything necessary. Easy enough.

Use and experience

OpenCode works much like other terminal-based AI agents (such as the Claude CLI). With our local setup complete, we can jump straight in. Let's try a simple "Hello World" task:

The agent executes the task perfectly, and as promised, there is zero token consumption. However, a quick glance at my system monitor confirms the trade-off we discussed earlier—my machine is definitely feeling the heat:

Conclusion

So, is local AI development feasible? Absolutely.

While you are not going to get the same performance and model reasoning of a massive cloud-deployed model like Claude Opus or GPT-4, local models are starting to close the gap and at least be "good enough" for daily tasks.

There's a shift from financial cost to hardware constraint. Your machine's RAM and GPU are now the bottlenecks.

Because of this, practicing good "context hygiene" and optimizing how you interact with the agent becomes critical. Agents.md, system prompts and boundaries will make you work significantly faster.

Even with its limitations, the trade for absolute privacy, zero latency and autonomy might be worth it.

DEV Community: David Vellé Abel