Ángela López Mendoza

Posted on May 27

I built a local-first AI dev assistant with 68 agents in Django — here's what I learned

#ai #webdev #programming #productivity

I've spent months building Tlamatini (Nahuatl for "one who knows") — a locally-deployed AI developer assistant that goes way beyond a chatbox. It runs on your machine with Ollama, your code never leaves your box, and it's fully open source (GPL-3.0).

I want to share what I built and what I learned, because building a local-first AI tool as a solo developer taught me things I didn't expect.

What Tlamatini does

Most AI coding assistants are cloud-first chatboxes. Tlamatini is different:

Hybrid RAG over your codebase — FAISS + BM25 retrieval with Reciprocal Rank Fusion and context budgeting. The model doesn't just see random code chunks — it sees the right code, ranked and budgeted so it fits in context.

Multi-Turn mode with 75 tools — The LLM becomes an operator. Shell commands, Python execution, file operations, browser automation with Playwright, screenshots, keyboard/mouse control, email, Telegram, WhatsApp — all chained in one conversation. You tell it what you want done, and it figures out the steps.

ACPX (Agent Communication Protocol eXtension) — This is the part I'm most proud of. Tlamatini can spawn external coding-agent CLIs — Claude Code, Cursor, Codex, Gemini CLI, Qwen — as child processes, send them tasks, and relay output between them. One orchestrator, multiple coding agents, working on different parts of a problem simultaneously.

Visual Workflow Designer — A drag-and-drop canvas with 68 agent types. Wire them together, validate the flow, run it unattended. Save flows as .flw files, schedule them, monitor them with FlowHypervisor.

Self-aware architecture — Tlamatini carries a first-person knowledge map of her own architecture (Tlamatini.md) that's injected into every LLM prompt. She can answer questions about herself accurately. Builds packaged with --self-modify ship her own source tree so she can read, inspect, and modify herself.

The tech stack

Backend: Python 3.12, Django 5.2, Django Channels (Daphne ASGI)
AI/ML: LangChain 0.3, LangGraph 0.2, FAISS, rank-bm25
LLM backends: Ollama (local default), Anthropic Claude (cloud opt-in), Qwen (vision)
Communication: WebSockets for real-time streaming, gRPC for MCP services
Database: SQLite
Packaging: PyInstaller → one-click Windows .exe installer

Lessons learned building this solo

1. RAG is harder than it looks

Everyone shows RAG demos with 5 documents. Try it with a real codebase — thousands of files, mixed languages, config files, migrations, tests. The naive approach (chunk everything, embed, retrieve top-k) falls apart immediately.

What worked: hybrid retrieval (dense vectors from FAISS + sparse matching from BM25), Reciprocal Rank Fusion to combine rankings, code-aware metadata extraction so the retriever knows which file, class, and function each chunk belongs to, and context budgeting so I never blow the model's context window.

2. Multi-agent orchestration needs contracts

When you have 68 agent types that can be wired together in any combination, you need a formal system for "what can connect to what." I built an Agent Contract registry — each agent declares its connection fields, parameter sources, secret paths, and validation rules. The Flow Compiler validates every connection before execution.

Without this, users would wire agents together in invalid ways and get cryptic errors at runtime. With contracts, validation happens at design time on the canvas.

3. Process management on Windows is brutal

Tlamatini spawns child processes for agents, ACPX CLIs, and tool execution. On Windows, every subprocess gets a conhost.exe companion. These pile up and orphan when the parent dies. Users saw dozens of Tlamatini-icon processes in Task Manager.

I built a three-tier orphan reaper: Tier 1 runs after every tool call, Tier 2 runs after the LLM response, Tier 3 runs at shutdown. Plus a monkey-patch on subprocess.Popen that defaults CREATE_NO_WINDOW so future tools get the fix for free.

4. Local-first is a feature, not a limitation

The decision to make Ollama the default (not Claude API, not OpenAI) was controversial in my head. Cloud models are smarter. But local-first means: your code never leaves your machine, no API costs for basic usage, works offline, and no vendor lock-in.

Users who want cloud quality can opt in per-request. But the default is private. In 2026, that matters.

Try it

GitHub: github.com/XAIHT/Tlamatini
One-minute demo: youtube.com/watch?v=4MyRXBahHuU
Stack: Django 5 + Channels, LangChain, FAISS, Ollama. GPL-3.0.

Five-minute setup: clone, pip install, migrate, runserver. That's it.

I'd love feedback — especially on the RAG architecture and the ACPX multi-agent orchestration. What would you add? What would you do differently?

Top comments (4)

Harjot Singh • May 31

68 agents is a wild number and I'm genuinely curious what you learned about the right granularity, because that's the central tension: too few agents and each is an over-stuffed do-everything prompt; too many and you're drowning in coordination, handoffs, and "which of the 68 should handle this." My instinct is that past a certain count the bottleneck stops being the agents and becomes routing + the handoff between them - making sure the right agent fires and that context survives the pass. 68 specialized agents is only better than 6 generalists if the orchestration layer reliably gets the work to the right one without losing the thread.

Local-first on top of that is a smart constraint (privacy, no per-token bill), and the combination is exactly the space I work in - Moonshift is a multi-agent pipeline that takes a prompt to a deployed SaaS, and the lesson I keep hitting is the same: the agents are the easy part, the orchestration and verification between them is where it's won or lost. Multi-model routing keeps a build ~$3 flat, first run free no card. Really want to know your takeaways. At 68 agents, what bit hardest - routing to the right one, the coordination overhead, or keeping context coherent across that many handoffs? And did you find a granularity sweet spot, or is more-specialized always better?

Ángela López Mendoza • May 31

All agents have different capabilities — orthogonal and complementary. They can run sequentially per flow, or concurrently in the Agentic Control Panel (agentic_control_panel.html). They are self-orchestrated, because every running agent is a complete instance of a small program (Python source code that the user can modify if she/he wants) — not the entire Tlamatini forked.
So when you submit a prompt with Multi-Turn activated, a single session is raised with its own set of agent instances. They execute one agent, then another, following a flow that in this case Tlamatini designs (Multi-Turn). But you can also create your own flows by dragging and dropping agents in Tlamatini's Agentic Control Panel.
In the other hand: using Ollama's cloud-based models, Tlamatini's responses can be significantly faster than running locally on, and paying just a very cheap plan "Pro" in Ollama.com. So I recommend using Ollama's cloud-based models if you want faster responses — and don't worry: the code as-is is never sent. Only the embeddings are sent (matrices of numbers with no meaning to a man-in-the-middle).
Thanks for your commentaries, I really glad!!!

Harjot Singh • May 31

Thanks for the detail, the self-orchestrated design where each agent is a complete small program the user can edit is a genuinely nice property, it makes the system inspectable and forkable at the unit level instead of one monolith. The embeddings-only-leave-the-machine point is the right privacy instinct too. The thing I'd be curious about as you scale the drag-and-drop flows: when one agent in a chain returns something the next agent can't use, does the panel catch it at the seam, or does the bad output flow downstream? That validation-between-agents is usually where multi-agent flows get fragile, and it's the piece that turns a pile of agents into a reliable team. The harness-matters-more-than-any-single-agent lesson is the core of how I think about Moonshift. Across your set, did the reliable flows share a handoff pattern you'd now bake in by default?

Ángela López Mendoza • May 31 • Edited

There is a special agent named Parametrizer this you configure it to match each pair of field every agent should pass to others, and of course you can specify all of the parameters!. And of course if you are new on flows there is another agent named FlowCreator that based in a prompt you provide to it, Tamatini creates a flow to begin with!, check at: Agentic Flow sample. Thanks for your comments! =*