DEV Community

Cover image for I scoped a multi-agent TUI system in January. It sat dead for 4 months. Here is the comeback.
Matt
Matt

Posted on

I scoped a multi-agent TUI system in January. It sat dead for 4 months. Here is the comeback.

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I built

TUI Master Agent — point it at a real open-source terminal-UI repo, and it studies the code, figures out the framework on its own, and generates a small original TUI in that same framework. Then it proves the generated app actually runs.

python tui_master.py https://github.com/Textualize/textual
# Cloning https://github.com/Textualize/textual
# Framework detected -> textual (signatures=3212, files=918)
# Generating with claude-opus-4-7 ...
# Wrote 1 file(s) to output/textual - "Pixel Pond"
# Verifying the generated TUI runs ...
# run_test headless -> exit 0
# OK - python main.py from output/textual
Enter fullscreen mode Exit fullscreen mode

It works across two languages today:

  • Textual (Python) — studied Textualize/textual, generated an interactive "Pixel Pond" (drop pebbles, feed fish, toggle day/night).
  • Bubble Tea (Go) — studied charmbracelet/bubbletea, generated a Pomodoro timer.

Both generated apps are committed verbatim in examples/ — nothing hand-edited.

The comeback arc (the honest part)

On January 23, 2026 I scoped this as something much bigger: a multi-agent system with three specialized sub-agents (Pattern Learner, Validator, Termux Converter), a compounding "learning database," and six target frameworks. I wrote a 21KB architecture spec, opened an editor the next day, got two paragraphs into the orchestrator stub... and stopped.

The thing that killed it: I tried to design "patterns as a data structure" before I had anything that worked. The scope was a cliff. It sat dormant for four and a half months — an architecture document and zero implementation.

That whole spec is still in the repo, frozen and untouched at the v0.0.1-before tag. It's the literal "before" exhibit. You can diff it against main.

The Finish-Up-A-Thon deadline was the forcing function. In the final 24 hours I did the one thing January-me wouldn't: I cut the scope to the spine. No sub-agents. No learning DB. No clever pattern abstraction. Just the pipeline that makes the idea real — clone, detect, generate, verify — and got it working end-to-end on two frameworks.

This is not the finished vision. It is the start of finishing.

How the spine works

A single file, tui_master.py, runs the whole pipeline inline:

  1. Clone the repo (shallow).
  2. Detect the framework with heuristics — deliberately not AI. File-extension counts plus import-signature grep. import textual ranks above import rich, so a Rich-only repo is never misread as Textual; a .go file importing charmbracelet/bubbletea is recognized as Bubble Tea. On the real Textual repo this scores 3,212 framework signatures across 918 files. It's boring, fast, and correct — which is exactly why it's not a model call.
  3. Gather the README plus the few most framework-dense source files.
  4. Generate a small original TUI in one claude-opus-4-7 call.
  5. Write it to output/<framework>/.
  6. Verify it runs, headless, with no TTY — and this is the part I'm proud of. Instead of trusting the model to cooperate, the orchestrator owns the check per framework: Textual apps are driven through Textual's own run_test() pilot; Bubble Tea apps are verified with go build. If it doesn't run, the run fails. No green checkmark theater.

The whole thing is ~330 lines, mypy + ruff clean, with a 17-test offline suite (detection, JSON parsing, path-traversal guards).

What I deliberately did NOT build — and why that's the win

Most hackathon submissions overclaim. I'd rather own the constraint. Everything from the original spec that isn't built is filed as a roadmap issue, not hidden:

The detector already recognizes all six frameworks. Generation works for any of them. Only the automated run-verification is wired for two — and I'd rather ship two that genuinely run than six I can't prove.

Demo

A TUI the agent generated from the Textual repo — "Pixel Pond" — captured headless straight from the verification harness, no hand-editing:

Generated Textual TUI - Pixel Pond

Both generated apps (this and a Bubble Tea Pomodoro timer) are committed verbatim in examples/ — clone and run them yourself.

My experience with GitHub Copilot

I'm going to be straight here, because honesty is the whole point of this post. This revival was built AI-assisted, and the heavy lifting — the heuristic detector, the generation contract, and the framework-native headless verification harness — was designed and written in an agentic Claude Code session (claude-opus-4-7), not hand-typed with Copilot riding shotgun.

Where Copilot-style assistance genuinely fits this codebase is worth naming, because it shaped how I structured it. The framework registry is exactly the repetitive, pattern-dense code autocomplete is best at — one declarative entry per framework (extensions, import signatures, run command, verifier). And the offline test suite is written so a function signature plus a one-line docstring is enough context to complete the body. Those are the two places a contributor with Copilot on will feel it immediately, and the repo is laid out to make that smooth.

What I won't do is stage screenshots of suggestions I didn't accept to tick a box. This submission's entire thesis is finishing an abandoned project honestly — owning the cut scope, filing the unbuilt parts as real issues. Padding the AI-tooling section would contradict the one thing the post is about. How every line got written is in the commit history, open to inspect.

Try it

git clone https://github.com/in5devilinspace/tui-master-agent
cd tui-master-agent
uv pip install -e ".[dev]"        # or pip
export ANTHROPIC_API_KEY=sk-...
python tui_master.py https://github.com/charmbracelet/bubbletea
Enter fullscreen mode Exit fullscreen mode

Repo: https://github.com/in5devilinspace/tui-master-agent

What's next

The spine is in. Next is the first sub-agent (Pattern Learner) and the cross-run memory that started this whole thing — but this time built on top of something that works, instead of in front of it. Turns out that's the only order that ever ships.

Top comments (0)