DEV Community

Cover image for I Built a 1920s Butler AI That Runs Entirely on My Linux Machine. Then I Abandoned It. Then Copilot Helped Me Fix It.
İclal Doğan
İclal Doğan

Posted on

I Built a 1920s Butler AI That Runs Entirely on My Linux Machine. Then I Abandoned It. Then Copilot Helped Me Fix It.

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

Bantz is a local-first, offline-capable AI assistant that runs entirely on your Linux machine. It presents itself as a 1920s English butler — always polite, subtly sarcastic, and absolutely convinced he is a real person standing in the room with you.

I'm a Turkish speaker on a Linux desktop. Every cloud assistant I tried spoke to me in a foreign language, phoned home to someone else's server, and forgot everything the moment the session ended. I wanted something different: an assistant that speaks Turkish natively, runs on my own hardware, remembers its context, and actually controls my desktop. So I started building Bantz.

The concept is ambitious. At its core: a Turkish ↔ English translation layer powered by Helsinki-NLP's MarianMT, a multi-step tool planner that can chain web search, Gmail, Calendar, shell commands, filesystem access, and AT-SPI desktop automation — all coordinated by an LLM running locally via Ollama. On top of that: voice I/O via faster-whisper and Piper TTS, persistent memory backed by ChromaDB + a SQLite knowledge graph (MemPalace), a 6-state butler persona that shifts tone based on CPU load and time of day, and a Textual TUI with a live health-status bar.

The architecture was genuinely interesting. The execution, as of May 2026, was a mess.

Demo

GitHub repo: github.com/miclaldogan/bantzv2

Broadcast Channel — chatting with Bantz, web search + desktop control in action:

Full walkthrough — all pages of the Operations Center:

Screenshots

The Comeback Story

Before — May 2026

I had a 17-issue backlog and a BROKEN_STATE.md file I'd written to document the damage. Here's what it said:

The feature audit was brutal:

Feature Status
Voice input (Whisper) 🔴 Broken — 3 packages missing, Picovoice key unset
TUI status bar ❌ Didn't exist
First-run onboarding ❌ Blank cursor, zero guidance
Multi-provider LLM support 🔴 Every finalizer hardcoded ollama directly
Turkish response latency 🔴 12–18 seconds end-to-end
--doctor diagnostic 🔴 Actively lying — reported working memory as broken
TUI rendering 🔴 Entire layout duplicated on screen after every message
Desktop UI logs 🔴 WebSocket handler silently crashed on every log event

The most embarrassing part: I'd built a sophisticated multi-provider LLM router (router.py) that could dispatch to Claude, OpenAI, Gemini, or Ollama based on config — and then every single callsite in finalizer.py, summarizer.py, and the streaming path had just... hardcoded from bantz.llm.ollama import ollama directly. The router was completely bypassed. Anyone who configured BANTZ_LLM_PROVIDER=claude would get Ollama responses with no error, no warning, nothing.

The first-run experience was particularly painful. bantz --once "merhaba" would hang in complete silence for up to 30 seconds as MarianMT loaded, Ollama inferred, and Piper synthesized — all sequentially, all silently. New users killed the process and never came back.

After — June 2026

Seven issues closed in a single focused session, all squash-merged to main:

PR #467 — 1-line fix, total silence explained. The _WSLogHandler inside WsBroadcastServer referenced self._log_q but the actual queue attribute was self._q. One character typo. Every log record since the WebSocket server was written had thrown an AttributeError that got silently swallowed, so the Tauri desktop UI had received zero log output. Fixed.

PRs #468 & #469 — The router that wasn't routed to. Replaced all three finalizer.py callsites and the summarizer.py Gemini/Ollama fallback chain with from bantz.llm.router import get_llm. Added get_llm = get_provider as a convenience alias in router.py. Claude, OpenAI, and Gemini users now actually get their configured provider.

PR #470 — Service dots that told the truth. The TUI's health-status bar was initialised with the hardcoded key "Ollama" and _probe_services() only called check_ollama(). Added dynamic service key resolution from config, new check_claude() and check_openai() coroutines, and a dispatch table to route to the right health check based on the active provider.

PR #471 — The TUI duplication bug. _erase_prompt_line() used os.write(1, ...) to send raw ANSI cursor-movement escapes directly to stdout while Rich Live was simultaneously rendering to the same terminal. This caused a race condition that reproduced the entire TUI block below the real one on every message. The fix: added a Layout(name="prompt", size=1) panel to the layout tree and replaced the raw write with a state variable (self._prompt_text = ""). The next Live refresh cycle clears the row cleanly, no escapes needed.

PR #472 — Cutting Turkish response latency from 18s to under 10s. Two independent problems:

  1. bridge.to_turkish() ran on the full accumulated response after all LLM inference finished — sequential, never overlapping.
  2. No caching. Identical butler stock phrases like "Done. ✓" re-ran the full neural translation model every single time.

Added a 256-entry FIFO LRU cache to _Translator — common phrases now translate in ~0ms after the first call. Then rewrote finalize_stream()._stream() to buffer LLM tokens until sentence boundaries ((?<=[.!?])\s+) and call bridge.to_turkish() per sentence immediately, yielding translated output while the LLM continues generating the next sentence. Translation now overlaps inference instead of running after it. Also removed the redundant await _to_tr("".join(parts)) re-translation in ws_server.py's streaming path — finalize_stream already emits pre-translated tokens when the bridge is enabled.

Issue #463 — Already fixed (no PR needed). Copilot confirmed Live(screen=False) was already set and REFRESH_FPS = 4 was already present. Closed with an explanatory comment. Sometimes the right fix is recognising there's nothing to fix.


My Experience with GitHub Copilot

I used Copilot in agent mode (Claude Sonnet 4.6) for the entire session. What struck me most was the discipline it brought to a codebase I'd let get messy.

For every issue, Copilot followed the same workflow without being told to:

  1. Read the affected file before touching anything. Not a summary — the actual file, end-to-end.
  2. Search for the exact symbol causing the problem. For issue #462, it searched _q|_log_q in ws_server.py and immediately surfaced the mismatch. For #465, it searched for every "Ollama" string literal and found three hardcoded sites at once.
  3. Make the minimal change. No unrelated refactors. The _log_q_q fix is literally one word on one line. The router migration replaced 3 identical patterns with identical 2-line substitutions.
  4. Verify syntax before committing. python -m py_compile <file> on every changed file.
  5. Write the commit message and PR body, then create and merge the PR via gh. Including verifying the issue closed with gh issue view NNN --json state.

The most valuable moment was on issue #422 (translation latency). I knew the translation was slow but I'd assumed it was just a hardware limitation — MarianMT on CPU takes what it takes. Copilot traced the full data flow from finalize_stream() through ws_server.py and identified that the bottleneck wasn't just the model — it was the architecture: sequential execution after completion, plus identical inputs being re-translated on every call. The sentence-boundary streaming approach it introduced had never occurred to me, and it worked on the first try.

The other moment I appreciated: issue #463. Rather than making a change to justify its existence, Copilot searched for screen= in live_ui.py, confirmed the value was already False, checked REFRESH_FPS, and told me the issue was already resolved. Closing a bug report with "this is already fixed" is the correct outcome. That kind of restraint is hard to get from a tool optimised to produce output.

Bantz isn't finished, voice input still needs its three packages, the --doctor output still needs polish, and I want to add a proper onboarding flow. But the core pipeline now works correctly for all supported LLM providers, the TUI renders cleanly, Turkish responses arrive in under 10 seconds, and the butler's logs finally reach the desktop UI. That's a project that went from "broken in embarrassing ways" to "actually ships" — and Copilot was the pairing partner that made it happen in a single afternoon.

Top comments (0)