This is a submission for the GitHub Finish-Up-A-Thon Challenge
What I Built
Yumii is an open-source, locally-run AI companion with a Live2D avatar, real-time voice, and six distinct personalities — and she now actually remembers you.
She listens to your voice. She responds through an animated avatar. Her face reacts to what you say. And as of this sprint, she builds a long-term memory of who you are — across every conversation, every session, every restart.
Beyond the core AI experience, Yumii focuses heavily on accessibility and ease of use. The project features a dedicated website, comprehensive documentation, a one-line installation command, and a streamlined CLI onboarding process that helps users get started quickly.
GitHub: https://github.com/CodeNeuron58/Yumi
License: MIT
Demo
Before vs After — at a glance:
| Before | After |
|---|---|
| Forgot you on every restart | Persistent memory across sessions |
| One shared conversation, no history | Full session management — create, resume, rename |
| No user facts | Auto-extracts facts from every conversation turn |
| No CLI beyond launch |
/memory, /sessions, /resume, /forget commands |
| Wiped on exit | Survives restarts |
| No dedicated website | Full documentation website |
| Basic audio capture | Active VAD — only speaks when you actually do |
| Manual setup | One-line installation |
The Comeback Story
I saw Project Ava and a few other AI companion demos and thought — I could build that.
I enjoy anime. The idea of an anime-style AI that actually talks and reacts to you genuinely excited me. So I built a scrappy version that barely held together, it worked, I was happy, and I dropped it. I mean i just build it for fun so i didn't care that much .
Months later I saw the same concept being sold as a subscription product. That bothered me. I'd already built this, yeah a scrappy one , but still i built it. And i thought to myself that I could build it properly — and make it free or opensource .
So I restarted. Proper CLI. Web interface. Clean architecture. VAD, STT, LLM, TTS — the full real-time voice loop. And one hard rule: it has to run on a CPU. I don't have a GPU. Most people don't. If it needs one, it's not really for everyone.
That's what you're looking at now. Still alpha, still rough in places, not yet user-tested beyond myself. But it works, it's free, and it runs locally on your machine.
Note on the demo: I'm skipping a video walkthrough — the Live2D model I use locally has redistribution restrictions, so I can't share footage with it publicly. Voice, memory, and everything else works fine as much as i can tell , it works without an avatar if you want to try it yourself.
What's next
Making her actually do things.
Right now Yumii talks. What I want is for her to act — search the web, read your emails, help you shop, browse on your behalf. A real assistant, not just a companion.
It would be done by giving her tool access via mcp.
Contribute
Yumii is fully open source — MIT license. If you're a developer who builds with LangGraph, vector databases, voice pipelines, or just finds this kind of project interesting — contributions are genuinely welcome.
I'd love to hear what other developers think of the architecture, the approach, or what they'd build on top of it. Open an issue, start a discussion, or just drop a comment here. Every perspective helps.
Small Note
Yumii is still in a very early stage and actively being developed, so expect rough edges here and there.
This is my first time properly sharing the project outside LinkedIn, and I’d genuinely love feedback, ideas, or contributions from fellow developers while building it
My Experience with GitHub Copilot
Copilot was running throughout this sprint mostly through tab completion — and that's where it genuinely helped.
The repetitive async patterns across the session and memory modules are where it earned its place. Once it saw the structure of the first method, it completed the rest almost entirely — correct SQL, correct await placement, correct error handling. What would've taken an hour took fifteen minutes.
The SQLite schema was another moment. I typed CREATE TABLE sessions ( and it suggested the full schema, including columns I hadn't thought of yet but ended up needing.
It didn't make any decisions. The architecture, the three-database design, the fire-and-forget fact extraction, the async queue pipeline — all of that was thought through manually. But for translating decisions into correct boilerplate fast, it was genuinely useful.
I also used Claude for cross-checking integration points — particularly verifying the LangChain tool interface when I added web search. Both tools, different jobs.


Top comments (0)