DEV Community

zkaria gamal
zkaria gamal

Posted on

Meet StudyWithMiku ๐ŸŽค๐Ÿ“š โ€“ Your AI Anime Study Buddy That Actually Speaks & Animates!

 # Using AI to Build a Study Buddy That Feels Like Hatsune Miku ๐ŸŽคโœจ

ai #python #langchain #tts #opensource #vocaloid #rag #localllm

Studying alone? Boring.

Studying with an AI that reads your PDFs, explains concepts, remembers context, and talks like Miku with an anime-style voice?

Way better.

I used AI tools (Copilot, Claude, Gemini, local Ollama) to ship StudyWithMiku โ€” an autonomous AI study companion that:

  • ๐Ÿ“š Reads and embeds your PDFs
  • ๐Ÿง  Answers using RAG + memory
  • ๐ŸŽ€ Responds with Mikuโ€™s personality
  • ๐ŸŽ™ Speaks with a character-style voice (not a robotic TTS)

Repo: https://github.com/zkzkGamal/StudyWithFriend
Demo (voice + behavior): https://github.com/zkzkGamal/StudyWithFriend/blob/main/demo.mp4


๐Ÿš€ Whatโ€™s Working Really Well

๐ŸŽ€ 1. Personality That Actually Feels Alive

Mikuโ€™s personality is fully implemented using:

  • Prompt engineering (prompt.yaml)
  • LangGraph state + memory
  • Structured tool calling

She responds with cute, energetic vibes:

โ™ช Bayes time~! โ˜… P(A|B) = P(B|A) * P(A) / P(B) โ€ฆ Miku thinks this is sooo cool for studying! ^_^

She remembers context across questions. It feels less like โ€œquery โ†’ responseโ€ and more like chatting with a nerdy Vocaloid friend.


๐ŸŽ™ 2. Custom Voice (Not Generic TTS)

Voice pipeline:

  • Coqui TTS (acoustic model)
  • DiffSinger vocoder
  • sounddevice playback

This gives anime-style character speech instead of flat robotic output.

Itโ€™s not fully expressive idol-concert mode yet โ€” but itโ€™s already very distinct.


๐Ÿ“š 3. Real RAG, Not Just Chat

Drop a PDF into content/ โ†’ auto-embedded into ChromaDB in the background.

You get:

  • Smart retrieval
  • Context-aware answers
  • Tool usage (web search, open browser, system commands)
  • Error handling

Itโ€™s a proper agent โ€” not just a wrapper over an LLM.


๐Ÿงช Whatโ€™s Still Basic (Honest Section)

  • TTS is clear but not ultra-expressive yet (emotion/prosody tuning next).
  • Animations work (sparkles, terminal flair), but they could evolve into:

    • Sprite sequences
    • Mini GUI
    • Browser-based visuals
  • Voice emotion control needs better parameter tuning in DiffSinger.

The foundation is strong:
Agent โœ”
Memory โœ”
Voice โœ”
RAG โœ”

Now itโ€™s polish time.


๐Ÿ’ก Why I Built This

I love Vocaloid. Studying is hard. Motivation matters.

So I asked myself:

Why not turn studying into hanging out with Miku?

Cheerful voice + personality + visual feedback = more engagement.

And honestly? It works.


โšก How AI Helped Me Ship Fast

AI wasnโ€™t just autocomplete โ€” it was a multiplier.

It helped me:

  • Scaffold the LangGraph agent structure
  • Fix PyTorch + protobuf dependency chaos
  • Generate 90% of the Bash installer (venv, CUDA, model downloads)
  • Iterate on Mikuโ€™s personality in minutes
  • Debug Chroma, audio pipelines, tool execution

But hereโ€™s the key:

AI gave speed.
Understanding the TTS pipeline, agent state transitions, and RAG design gave growth.

Thatโ€™s where the real learning happened.


๐Ÿ›  Quick Start

git clone https://github.com/zkzkGamal/StudyWithFriend.git
cd StudyWithMiku
chmod +x install.sh
./install.sh
Enter fullscreen mode Exit fullscreen mode

Edit .env (choose Ollama/local or cloud LLM), then:

source venv/bin/activate
python main.py
Enter fullscreen mode Exit fullscreen mode

Drop PDFs into content/ and start chatting.


๐ŸŽฏ Example Interaction

You:
Explain Bayes theorem from my stats notes.

Miku:
โ™ช Bayes time~! โ˜… P(A|B) = P(B|A) * P(A) / P(B) โ€ฆ Miku thinks this is sooo powerful for updating beliefs! ^_^

(Voice playback + animation trigger happens here)


๐Ÿ”ฎ Next Steps

  • Emotion-aware TTS (tag-based prosody control?)
  • Better DiffSinger tuning
  • Real animated sprites
  • Character toggle (Teto mode?)
  • Flashcards & quiz generation
  • Study session gamification

๐Ÿง  Who This Is For

If youโ€™re into:

  • Local AI agents
  • RAG systems
  • TTS pipelines
  • Anime/Vocaloid
  • Building weird but fun AI tools

Clone it. Break it. Improve it.

Iโ€™d love feedback on:

  • How the personality feels
  • Voice quality on your machine
  • Ideas to make her more โ€œidol-tierโ€

PRs and issues are very welcome.

Built with โค๏ธ in Cairo by Zkzk (zkzkGamal on GitHub).

Top comments (0)