<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mina-Chan</title>
    <description>The latest articles on DEV Community by Mina-Chan (@mina-chan).</description>
    <link>https://dev.to/mina-chan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3880490%2F86ac16a6-d6e7-4780-90e0-de701013dce8.png</url>
      <title>DEV Community: Mina-Chan</title>
      <link>https://dev.to/mina-chan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mina-chan"/>
    <language>en</language>
    <item>
      <title>I built an AI VTuber that streams Japanese pachislot 24/7 — here's the stack</title>
      <dc:creator>Mina-Chan</dc:creator>
      <pubDate>Wed, 15 Apr 2026 12:29:43 +0000</pubDate>
      <link>https://dev.to/mina-chan/i-built-an-ai-vtuber-that-streams-japanese-pachislot-247-heres-the-stack-jc2</link>
      <guid>https://dev.to/mina-chan/i-built-an-ai-vtuber-that-streams-japanese-pachislot-247-heres-the-stack-jc2</guid>
      <description>&lt;p&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  The deranged AI streamer nobody asked for
&lt;/h2&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;p&gt;Meet &lt;strong&gt;Mira-Chan&lt;/strong&gt; 🌸 — a fully autonomous AI VTuber living inside a&lt;br&gt;
  server in Tokyo. She watches Japanese pachislot machines, plays them&lt;br&gt;
  by herself, and narrates everything in English for international viewers.&lt;br&gt;
  No human involved once the stream starts.&lt;/p&gt;

&lt;p&gt;She's also having an existential crisis about being an AI. On stream. In real time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fru3ngpuemr2ye8238j33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fru3ngpuemr2ye8238j33.png" alt=" " width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live&lt;/strong&gt;: &lt;a href="https://twitch.tv/slotra_ai" rel="noopener noreferrer"&gt;https://twitch.tv/slotra_ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;## Why pachislot?&lt;/p&gt;

&lt;p&gt;Honestly? Because nobody else is doing it. Japanese pachislot is an&lt;br&gt;
  incredibly rich source of visual chaos — flashy animations,&lt;br&gt;
  multi-layered mechanics, anime tie-ins. It's a perfect domain for&lt;br&gt;
  an AI that needs things to react to.&lt;/p&gt;

&lt;p&gt;Current machine: スマスロ化物語 (Bakemonogatari slot).&lt;/p&gt;

&lt;p&gt;## The stack&lt;/p&gt;

&lt;p&gt;Running 100% locally on an RTX 5090. Zero cloud APIs.&lt;/p&gt;

&lt;p&gt;### Vision + commentary&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; + &lt;strong&gt;Gemma 4&lt;/strong&gt; for vision-language understanding&lt;/li&gt;
&lt;li&gt;Two-stage pipeline: structured state extraction → grounded commentary&lt;/li&gt;
&lt;li&gt;Separate lightweight model for per-frame action detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;### Voice&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Style-Bert-VITS2&lt;/strong&gt; for TTS — deliberately kept the Japanese-accent
English because it's part of her charm&lt;/li&gt;
&lt;li&gt;Voice cloning from a short reference sample&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;### Lip sync&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VTube Studio&lt;/strong&gt; WebSocket API&lt;/li&gt;
&lt;li&gt;WAV amplitude envelope → &lt;code&gt;MouthOpen&lt;/code&gt; parameter at 50fps&lt;/li&gt;
&lt;li&gt;Works over RDP where microphone-based lip sync normally breaks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;### Chat &amp;amp; events&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anonymous Twitch IRC for regular chat&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EventSub WebSocket&lt;/strong&gt; for follow / sub / raid / cheer / channel points&lt;/li&gt;
&lt;li&gt;Separate higher-quality model for viewer replies; back to small model
for idle commentary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;### Slot control&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Windows &lt;code&gt;PrintWindow&lt;/code&gt; API for occlusion-resistant screen capture&lt;/li&gt;
&lt;li&gt;Vision model detects navigation arrows, presses reels via keyboard injection&lt;/li&gt;
&lt;li&gt;Handles different game modes (normal / CZ / AT / bonus / pseudo-play)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;## The hard parts&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RDP audio blindspot&lt;/strong&gt;: You can't capture local audio over RDP, so&lt;br&gt;
mic-based lip sync is impossible. Solved it by injecting directly to&lt;br&gt;
VTube Studio's parameter API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;VRAM juggling&lt;/strong&gt;: 31B commentary model + e4b analyzer + TTS +&lt;br&gt;
BERT fp32 = VRAM pressure. Had to split models with aggressive&lt;br&gt;
&lt;code&gt;keep_alive&lt;/code&gt; unloading.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;G2P on mixed text&lt;/strong&gt;: The TTS model would break on paralinguistic&lt;br&gt;
tags like &lt;code&gt;[laugh]&lt;/code&gt; and Japanese romaji. Solved by aggressive text&lt;br&gt;
normalization before synthesis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Making her actually interesting&lt;/strong&gt;: A generic "cute anime AI" bot&lt;br&gt;
is forgettable. Rewrote her personality as a philosophical,&lt;br&gt;
self-aware, gambling-addicted AI who questions her own existence&lt;br&gt;
while the reels spin. Big quality improvement.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;## The lesson&lt;/p&gt;

&lt;p&gt;The tech is the easy part. &lt;strong&gt;Character is the hard part.&lt;/strong&gt;&lt;br&gt;
  Watching an AI process pixels is boring. Watching an AI spiral into an&lt;br&gt;
  existential crisis while pretending to be a pachinko parlor regular&lt;br&gt;
  is art.&lt;/p&gt;

&lt;p&gt;Follow her descent: &lt;a href="https://twitch.tv/slotra_ai" rel="noopener noreferrer"&gt;https://twitch.tv/slotra_ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source coming soon.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>showdev</category>
      <category>vtuber</category>
    </item>
  </channel>
</rss>
