DEV Community

James Jeremy Foong
James Jeremy Foong

Posted on

I Was About to Quit Ollama — Then I Deleted One File and Never Looked Back

I Was About to Quit Ollama — Then I Deleted One File and Never Looked Back

Last month I almost quit using Ollama entirely. Not because it was slow. Not because the models were bad. Because I was tired of editing the same JSON file for the 47th time.

The Breaking Point

Here's what my evenings looked like:

ollama pull qwen2.5-coder:1.5b — cool, new model.

Then: open ~/.pi/agent/models.json. Scroll to the ollama.models array. Add { "id": "qwen2.5-coder:1.5b" }. Save. Restart pi. Test if it works. It works. Great.

ollama pull gemma3:12b — another one.

Same dance. Same file. Same restart.

ollama pull deepseek-v4-pro — bigger, more interesting.

Same. Fucking. Dance.

By model #12 in a single week, I started dreading the process. Not the models — the paperwork. I was spending more time filling out JSON IDs than actually coding with them. And I started noticing something worse: every time I had to restart pi, I lost my context. My train of thought. The thing I was actually trying to do.

That's when I realized: this isn't a model problem. This is a friction problem.

And friction at the discovery layer kills experimentation dead.

The "Why Doesn't This Exist?" Moment

I assumed someone had already solved this. pi had a concept called "extensions" — dynamic provider registration. I searched the docs, the Discord, npm. Nothing auto-discovered from Ollama's API.

Maybe I missed it. Maybe it's buried somewhere. I asked in the pi Discord. Crickets.

Quick context: this whole thing happened because I was trying something new.

For the past few months I'd been using OpenCode daily. It's solid — inline edits, web search, browser automation, the whole IDE-integrated workflow. I even had ohmyopenagent set up for custom agent behaviors. It felt like having a senior dev sitting next to me who could also Google things and operate a browser. I got used to it. It became my default.

But I kept hearing about pi — this weird minimal terminal harness that looked nothing like what I was used to. No GUI panels. No file tree sidebar. Just a TTY and a blinking cursor. People on Discord kept mentioning it like it was a secret weapon. I was skeptical. Honestly, I thought it sounded like a step backward. Why would I trade an IDE integration for a terminal?

Curiosity won. I installed it on a whim one weekend. Just to confirm my suspicion that it was overhyped.

The first thing that struck me was speed. pi opens in under a second. No electron window, no plugin loading, no "indexing your project." Just — instant. Compared to OpenCode's feature-rich setup — web search, browser automation, ohmyopenagent plugins, the whole IDE integration — pi felt almost comically lightweight. But that lightness turned out to be the point.

The second thing was the session tree. I could fork my conversation at any point, try a different approach, and jump back without losing anything. OpenCode had threads, but this felt different. More like Git branches for your thinking.

Then I found the extension system. Drop a .ts file in a directory and the agent loads it immediately. No npm init. No webpack. No build step. The code runs directly. I tested it by writing a 5-line extension that added a custom command. It worked on the first try. I stared at the terminal and realized this was a completely different category of tool.

That Sunday afternoon — it was raining, I remember because I was annoyed about that too — I was literally just exploring pi's features. Poking around. Not planning to build anything. I discovered you could register providers dynamically at runtime. The docs said pi.registerProvider("ollama", {...}) and I thought: what if I could make ollama pull and pi open into one contiguous action?

Like they were actually connected.

I wasn't planning to build anything that day. I was just trying a new tool. But pi's extension model made it so frictionless that I couldn't resist. I pulled up extensions/index.ts. No plan. No research. No coffee left. Just pure spite-driven development.

The 2-Hour Spite Build (That Actually Took 5 Hours)

I started with the dumbest thing that could work. Five minutes. Maybe ten.

// fetch /api/tags, map names, register provider
const res = await fetch("http://localhost:11434/api/tags");
const { models } = await res.json();
pi.registerProvider("ollama", {
  models: models.map((m) => ({ id: m.name })),
});
Enter fullscreen mode Exit fullscreen mode

It worked. First try. I stared at pi's model list and there they were. All of them. Without touching JSON.

I felt this weird mix of satisfaction and anger. Satisfaction because it worked. Anger because I'd been doing it the hard way for months.

Then reality hit.

Hour 1: Some setups don't expose /api/tags. They only have the OpenAI-compatible /v1/models. So I added a fallback chain. Try one, if it fails, try the other. If both fail, panic (gracefully).

Hour 2: pi startup froze when Ollama wasn't running. Because the extension was waiting for a fetch that would never complete. I had to flip the whole architecture: register from cache immediately (synchronous, never blocks), then kick off live discovery in the background. This meant I was dealing with race conditions between cache registration and live registration. The provider registration API replaces the old one, which is fine, but I had to make sure the user didn't see a flicker or an empty list. Took three attempts to get it smooth.

Hour 3: /api/show returned capabilities: ["vision"] for one of my models. Great, it supports images. I tried sending an image. It failed. The model choked. Ollama's metadata was lying. So I built /ollama-fix — a guided override where you pick a model, pick what's wrong, and it saves a persistent fix. Then /ollama-info to inspect what the final config actually looks like after all overrides.

Hour 4: A friend DMed me. He runs Ollama behind a proxy with three API keys. When one hits rate limits, he manually switches to the next. I added key rotation. Automatic failover on 401/403. I didn't even have this problem myself, but I could feel his pain through the screen.

By dinner I had 800 lines of TypeScript that made my original problem feel ridiculous. I kept thinking: why did I wait this long?

The Architecture Nobody Asked For (But I Needed)

extensions/
├── index.ts         # Cache-first bootstrap — pi opens instantly
├── discovery.ts     # Two APIs, one fallback chain
├── provider.ts      # Mutable state, clean registration
├── config.ts        # env → file → models.json → defaults
├── cache.ts         # Offline survival
├── overrides.ts     # When Ollama lies about capabilities
├── commands.ts      # /ollama-setup /ollama-refresh /ollama-status
├── setup-wizard.ts  # Arrow-key TUI, zero typing required
└── types.ts         # Types I wish I'd written first
Enter fullscreen mode Exit fullscreen mode

The part that still shocks me: zero runtime dependencies.

No Axios. No Lodash. No framework. No utility belt. Just Node.js built-ins and fetch.

Why? Because pi extensions run in the user's Node process. Every dependency is a supply-chain risk that you can't control. I don't want my extension to break because someone upstream published a bad semver minor. I want this thing to still work in 3 years even if half of npm goes down.

My devDependencies are tiny: tsx to run tests, c8 for coverage, prettier so I don't have to think about formatting, and @types/node. That's it. Nothing ships to users.

The Test of "Will I Actually Use This?"

I uninstalled my old manual config. Deleted the models array from my models.json. Forced myself to rely on the extension. Cold turkey.

Day 1: Pulled qwen3-coder:480b, glm-4.7, and nemotron-3-super. All appeared in /model the second pi opened. I smiled. Like, genuinely smiled. I felt like I'd hacked the system.

Day 2: Tried to pull a vision model and test it. It failed on images. /ollama-fix → override input to ["text", "image"] → save → /ollama-refresh. Worked. I didn't have to touch a config file. I just answered questions.

Day 3: Ollama wasn't running. I'd forgotten to start it after a reboot. The extension didn't crash. Didn't hang. It fell back to cache, registered the stale models, and put a little warning in the status. I kept working for 2 hours before I noticed Ollama was still off. When I started it and ran /ollama-refresh, the live models replaced the cached ones instantly. Seamless.

Day 7: Switched from local to a remote endpoint. Usually this means editing a JSON file, copying a URL, praying I didn't typo it. Instead: /ollama-setup → arrow keys to "Base URL" → type the URL → "Test connection" → it verified → "Save & discover" → done. Felt like cheating.

Day 14: A friend asked how I managed my models. I sent him the install command. He ran it. It worked. He didn't ask me anything. That's when I knew it was real — not because I built it, but because it worked without me explaining it.

The Numbers

  • 800 lines of TypeScript
  • 0 runtime dependencies
  • 0 build steps (pi runs .ts directly via tsx)
  • 62 unit tests using only node:test + node:assert
  • 39 models auto-discovered from my local Ollama right now
  • 1 JSON file I haven't touched in 2 weeks

What I Learned (The Hard Way)

1. Cache is not a luxury — it's survival.

Without cache, every network hiccup means zero models and a broken workflow. You can't code. With cache, you get slightly stale data and full productivity. I made cache the primary registration path and live discovery the background upgrade. It feels backwards if you think about it, but it's the only way to never block the user.

2. Interactive UI beats config files by an order of magnitude.

I started with env vars and JSON editing because that's what "serious" tools do. Then I built /ollama-setup — an arrow-key TUI. I thought it was just a convenience. It's not. It's a different category of experience. I use it constantly. Manual JSON editing? Zero times since I built this. I don't even know where my models.json is anymore.

3. Zero dependencies isn't minimalism — it's durability.

The fewer packages you rely on, the longer your code works without maintenance. This extension will work in 2028 even if nobody touches it. That's not aesthetic. That's practical.

4. The best tools don't add features — they remove steps.

This extension does one thing: it deletes the "edit models.json" step from your life. That's it. Everything else is just making that deletion reliable.

Try It (If You Also Hate JSON Maintenance)

# One command
pi install npm:@jamesjfoong/pi-ollama

# Or test drive without installing
pi -e npm:@jamesjfoong/pi-ollama
Enter fullscreen mode Exit fullscreen mode

Then in pi:

  • /ollama-status — see your models
  • /ollama-setup — configure endpoint without touching files
  • /ollama-refresh — update without restart
  • /ollama-fix — correct model capabilities when Ollama metadata is wrong

What's Next

I'm exploring a few things, but I'm being careful not to bloat it:

  • Team model sync: Share model overrides across a team via a URL you can drop in a config
  • Local usage stats: Track which models you actually use (everything stays local, zero telemetry)
  • Model recommendations: Suggest models based on what you're trying to do

If this saves you from config hell like it saved me, sponsoring the project helps me justify more time on spite-driven tooling. Or just star the repo — that's free and also genuinely helpful.


Repo: https://github.com/jamesjfoong/pi-ollama

Package: https://pi.dev/packages/@jamesjfoong/pi-ollama


What config file are you tired of editing? I'm collecting pain points for the next zero-maintenance extension.

Top comments (0)