# Using AI to Build a Study Buddy That Feels Like Hatsune Miku ๐คโจ
ai #python #langchain #tts #opensource #vocaloid #rag #localllm
Studying alone? Boring.
Studying with an AI that reads your PDFs, explains concepts, remembers context, and talks like Miku with an anime-style voice?
Way better.
I used AI tools (Copilot, Claude, Gemini, local Ollama) to ship StudyWithMiku โ an autonomous AI study companion that:
- ๐ Reads and embeds your PDFs
- ๐ง Answers using RAG + memory
- ๐ Responds with Mikuโs personality
- ๐ Speaks with a character-style voice (not a robotic TTS)
Repo: https://github.com/zkzkGamal/StudyWithFriend
Demo (voice + behavior): https://github.com/zkzkGamal/StudyWithFriend/blob/main/demo.mp4
๐ Whatโs Working Really Well
๐ 1. Personality That Actually Feels Alive
Mikuโs personality is fully implemented using:
- Prompt engineering (
prompt.yaml) - LangGraph state + memory
- Structured tool calling
She responds with cute, energetic vibes:
โช Bayes time~! โ P(A|B) = P(B|A) * P(A) / P(B) โฆ Miku thinks this is sooo cool for studying! ^_^
She remembers context across questions. It feels less like โquery โ responseโ and more like chatting with a nerdy Vocaloid friend.
๐ 2. Custom Voice (Not Generic TTS)
Voice pipeline:
- Coqui TTS (acoustic model)
- DiffSinger vocoder
- sounddevice playback
This gives anime-style character speech instead of flat robotic output.
Itโs not fully expressive idol-concert mode yet โ but itโs already very distinct.
๐ 3. Real RAG, Not Just Chat
Drop a PDF into content/ โ auto-embedded into ChromaDB in the background.
You get:
- Smart retrieval
- Context-aware answers
- Tool usage (web search, open browser, system commands)
- Error handling
Itโs a proper agent โ not just a wrapper over an LLM.
๐งช Whatโs Still Basic (Honest Section)
- TTS is clear but not ultra-expressive yet (emotion/prosody tuning next).
-
Animations work (sparkles, terminal flair), but they could evolve into:
- Sprite sequences
- Mini GUI
- Browser-based visuals
Voice emotion control needs better parameter tuning in DiffSinger.
The foundation is strong:
Agent โ
Memory โ
Voice โ
RAG โ
Now itโs polish time.
๐ก Why I Built This
I love Vocaloid. Studying is hard. Motivation matters.
So I asked myself:
Why not turn studying into hanging out with Miku?
Cheerful voice + personality + visual feedback = more engagement.
And honestly? It works.
โก How AI Helped Me Ship Fast
AI wasnโt just autocomplete โ it was a multiplier.
It helped me:
- Scaffold the LangGraph agent structure
- Fix PyTorch + protobuf dependency chaos
- Generate 90% of the Bash installer (venv, CUDA, model downloads)
- Iterate on Mikuโs personality in minutes
- Debug Chroma, audio pipelines, tool execution
But hereโs the key:
AI gave speed.
Understanding the TTS pipeline, agent state transitions, and RAG design gave growth.
Thatโs where the real learning happened.
๐ Quick Start
git clone https://github.com/zkzkGamal/StudyWithFriend.git
cd StudyWithMiku
chmod +x install.sh
./install.sh
Edit .env (choose Ollama/local or cloud LLM), then:
source venv/bin/activate
python main.py
Drop PDFs into content/ and start chatting.
๐ฏ Example Interaction
You:
Explain Bayes theorem from my stats notes.
Miku:
โช Bayes time~! โ
P(A|B) = P(B|A) * P(A) / P(B) โฆ Miku thinks this is sooo powerful for updating beliefs! ^_^
(Voice playback + animation trigger happens here)
๐ฎ Next Steps
- Emotion-aware TTS (tag-based prosody control?)
- Better DiffSinger tuning
- Real animated sprites
- Character toggle (Teto mode?)
- Flashcards & quiz generation
- Study session gamification
๐ง Who This Is For
If youโre into:
- Local AI agents
- RAG systems
- TTS pipelines
- Anime/Vocaloid
- Building weird but fun AI tools
Clone it. Break it. Improve it.
Iโd love feedback on:
- How the personality feels
- Voice quality on your machine
- Ideas to make her more โidol-tierโ
PRs and issues are very welcome.
Built with โค๏ธ in Cairo by Zkzk (zkzkGamal on GitHub).
Top comments (0)