I've spent months building Tlamatini (Nahuatl for "one who knows") — a locally-deployed AI developer assistant that goes way beyond a chatbox. It runs on your machine with Ollama, your code never leaves your box, and it's fully open source (GPL-3.0).
I want to share what I built and what I learned, because building a local-first AI tool as a solo developer taught me things I didn't expect.
What Tlamatini does
Most AI coding assistants are cloud-first chatboxes. Tlamatini is different:
Hybrid RAG over your codebase — FAISS + BM25 retrieval with Reciprocal Rank Fusion and context budgeting. The model doesn't just see random code chunks — it sees the right code, ranked and budgeted so it fits in context.
Multi-Turn mode with 75 tools — The LLM becomes an operator. Shell commands, Python execution, file operations, browser automation with Playwright, screenshots, keyboard/mouse control, email, Telegram, WhatsApp — all chained in one conversation. You tell it what you want done, and it figures out the steps.
ACPX (Agent Communication Protocol eXtension) — This is the part I'm most proud of. Tlamatini can spawn external coding-agent CLIs — Claude Code, Cursor, Codex, Gemini CLI, Qwen — as child processes, send them tasks, and relay output between them. One orchestrator, multiple coding agents, working on different parts of a problem simultaneously.
Visual Workflow Designer — A drag-and-drop canvas with 68 agent types. Wire them together, validate the flow, run it unattended. Save flows as .flw files, schedule them, monitor them with FlowHypervisor.
Self-aware architecture — Tlamatini carries a first-person knowledge map of her own architecture (Tlamatini.md) that's injected into every LLM prompt. She can answer questions about herself accurately. Builds packaged with --self-modify ship her own source tree so she can read, inspect, and modify herself.
The tech stack
- Backend: Python 3.12, Django 5.2, Django Channels (Daphne ASGI)
- AI/ML: LangChain 0.3, LangGraph 0.2, FAISS, rank-bm25
- LLM backends: Ollama (local default), Anthropic Claude (cloud opt-in), Qwen (vision)
- Communication: WebSockets for real-time streaming, gRPC for MCP services
- Database: SQLite
- Packaging: PyInstaller → one-click Windows .exe installer
Lessons learned building this solo
1. RAG is harder than it looks
Everyone shows RAG demos with 5 documents. Try it with a real codebase — thousands of files, mixed languages, config files, migrations, tests. The naive approach (chunk everything, embed, retrieve top-k) falls apart immediately.
What worked: hybrid retrieval (dense vectors from FAISS + sparse matching from BM25), Reciprocal Rank Fusion to combine rankings, code-aware metadata extraction so the retriever knows which file, class, and function each chunk belongs to, and context budgeting so I never blow the model's context window.
2. Multi-agent orchestration needs contracts
When you have 68 agent types that can be wired together in any combination, you need a formal system for "what can connect to what." I built an Agent Contract registry — each agent declares its connection fields, parameter sources, secret paths, and validation rules. The Flow Compiler validates every connection before execution.
Without this, users would wire agents together in invalid ways and get cryptic errors at runtime. With contracts, validation happens at design time on the canvas.
3. Process management on Windows is brutal
Tlamatini spawns child processes for agents, ACPX CLIs, and tool execution. On Windows, every subprocess gets a conhost.exe companion. These pile up and orphan when the parent dies. Users saw dozens of Tlamatini-icon processes in Task Manager.
I built a three-tier orphan reaper: Tier 1 runs after every tool call, Tier 2 runs after the LLM response, Tier 3 runs at shutdown. Plus a monkey-patch on subprocess.Popen that defaults CREATE_NO_WINDOW so future tools get the fix for free.
4. Local-first is a feature, not a limitation
The decision to make Ollama the default (not Claude API, not OpenAI) was controversial in my head. Cloud models are smarter. But local-first means: your code never leaves your machine, no API costs for basic usage, works offline, and no vendor lock-in.
Users who want cloud quality can opt in per-request. But the default is private. In 2026, that matters.
Try it
- GitHub: github.com/XAIHT/Tlamatini
- One-minute demo: youtube.com/watch?v=4MyRXBahHuU
- Stack: Django 5 + Channels, LangChain, FAISS, Ollama. GPL-3.0.
Five-minute setup: clone, pip install, migrate, runserver. That's it.
I'd love feedback — especially on the RAG architecture and the ACPX multi-agent orchestration. What would you add? What would you do differently?
Top comments (0)