Stop chasing parameter counts. Build the toolbelt instead. — What I learned building Tlamatini (Open Source Destktop App).

#opensource

For the last few months I've been building Tlamatini, an open-source local-first AI developer assistant. Along the way I kept bumping into the same assumption — both in articles and in my own head — that to build something useful, you need the biggest model you can afford. GPT-4. Claude Opus. Llama 70B at minimum.

Then I started actually shipping with smaller local models, and I learned something that flipped my thinking.

The real lesson

A 20B-parameter LLM, given the right tools, the right agents, and skills fine-tuned to your operating procedures, is good enough to power most of your company's real workflows.

Parameter count is not the bottleneck. The bottleneck is whether the model can act — and that's a tools problem, not a parameters problem.

What "the right tools" actually means

In Tlamatini, we wired the LLM into 75 concrete capabilities:

Shell and Python execution
File operations
Browser automation (Playwright)
Screenshots and keyboard/mouse control
Email, Telegram, WhatsApp bridges
A hybrid RAG pipeline (FAISS + BM25) so the model sees the right code, not random chunks
Multi-agent orchestration via ACPX — the assistant can delegate sub-tasks to Claude Code, Cursor, Codex, or Gemini CLI and relay output between them

With this toolbelt, a 20B model running locally on Ollama can:

Read your codebase and answer accurate questions about it
Refactor a module, run the tests, and report back
Open a browser, fill a form, screenshot the result
Build and flash firmware to an STM32 microcontroller (yes, really)
Chain all of the above into a single conversation

A 200B cloud model with no tools cannot do any of those things.

Why this matters for companies

Most internal AI projects fail because teams reach for the biggest model and the smallest scope. They get an expensive chatbot that drafts emails.

Flip it: give a modest model a real toolbox and skills fine-tuned to your actual operating procedures (your CRM, your ticketing system, your build pipeline), and you get an operator — something that participates in the workflow instead of describing it.

Local 20B + tools > cloud 200B + chat box. Almost every time.

The practical takeaway

If you're thinking about adopting AI in your company and the budget conversation is stuck on which API to pay for, consider stepping back:

What are your repeatable operating procedures?
What tools would an agent need to actually execute them?
Can you wrap those tools cleanly enough that a local 20B model can call them reliably?

If yes, you don't need to send anything to the cloud. You don't need to pay per token. You don't need permission from a vendor. You just need to build the toolbelt.

That's what Tlamatini is — an open-source toolbelt and orchestration layer for local LLMs. Built in Django, runs on Ollama, GPL-3.0.

GitHub: github.com/XAIHT/Tlamatini
One-minute demo: youtu.be/4MyRXBahHuU

I'd love to hear from other people who've shipped agent systems on smaller local models — what's working for you? What's still painful? What tools made the biggest difference?