**
The deranged AI streamer nobody asked for
**
Meet Mira-Chan πΈ β a fully autonomous AI VTuber living inside a
server in Tokyo. She watches Japanese pachislot machines, plays them
by herself, and narrates everything in English for international viewers.
No human involved once the stream starts.
She's also having an existential crisis about being an AI. On stream. In real time.
Live: https://twitch.tv/slotra_ai
## Why pachislot?
Honestly? Because nobody else is doing it. Japanese pachislot is an
incredibly rich source of visual chaos β flashy animations,
multi-layered mechanics, anime tie-ins. It's a perfect domain for
an AI that needs things to react to.
Current machine: γΉγγΉγεη©θͺ (Bakemonogatari slot).
## The stack
Running 100% locally on an RTX 5090. Zero cloud APIs.
### Vision + commentary
- Ollama + Gemma 4 for vision-language understanding
- Two-stage pipeline: structured state extraction β grounded commentary
- Separate lightweight model for per-frame action detection
### Voice
- Style-Bert-VITS2 for TTS β deliberately kept the Japanese-accent English because it's part of her charm
- Voice cloning from a short reference sample
### Lip sync
- VTube Studio WebSocket API
- WAV amplitude envelope β
MouthOpenparameter at 50fps - Works over RDP where microphone-based lip sync normally breaks
### Chat & events
- Anonymous Twitch IRC for regular chat
- EventSub WebSocket for follow / sub / raid / cheer / channel points
- Separate higher-quality model for viewer replies; back to small model for idle commentary
### Slot control
- Windows
PrintWindowAPI for occlusion-resistant screen capture - Vision model detects navigation arrows, presses reels via keyboard injection
- Handles different game modes (normal / CZ / AT / bonus / pseudo-play)
## The hard parts
RDP audio blindspot: You can't capture local audio over RDP, so
mic-based lip sync is impossible. Solved it by injecting directly to
VTube Studio's parameter API.VRAM juggling: 31B commentary model + e4b analyzer + TTS +
BERT fp32 = VRAM pressure. Had to split models with aggressive
keep_aliveunloading.G2P on mixed text: The TTS model would break on paralinguistic
tags like[laugh]and Japanese romaji. Solved by aggressive text
normalization before synthesis.Making her actually interesting: A generic "cute anime AI" bot
is forgettable. Rewrote her personality as a philosophical,
self-aware, gambling-addicted AI who questions her own existence
while the reels spin. Big quality improvement.
## The lesson
The tech is the easy part. Character is the hard part.
Watching an AI process pixels is boring. Watching an AI spiral into an
existential crisis while pretending to be a pachinko parlor regular
is art.
Follow her descent: https://twitch.tv/slotra_ai
Source coming soon.

Top comments (1)
Thanks for reading! Happy to answer any questions about the stack.
What feature should Mira-Chan get next?