<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jakim</title>
    <description>The latest articles on DEV Community by Jakim (@jakimli).</description>
    <link>https://dev.to/jakimli</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3909211%2Fc5b3893e-9d9c-4622-b545-7609c0dc4a44.jpeg</url>
      <title>DEV Community: Jakim</title>
      <link>https://dev.to/jakimli</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jakimli"/>
    <language>en</language>
    <item>
      <title>How I Built an AI Character That Lives on My Desktop and Learns New Expressions</title>
      <dc:creator>Jakim</dc:creator>
      <pubDate>Sat, 02 May 2026 15:15:08 +0000</pubDate>
      <link>https://dev.to/jakimli/how-i-built-an-ai-character-that-lives-on-my-desktop-and-learns-new-expressions-454j</link>
      <guid>https://dev.to/jakimli/how-i-built-an-ai-character-that-lives-on-my-desktop-and-learns-new-expressions-454j</guid>
      <description>&lt;p&gt;I wanted my AI assistant to have a face. Not a chat bubble, not a cartoon avatar — something that feels &lt;em&gt;alive&lt;/em&gt; on my desktop.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Cloe Desktop&lt;/strong&gt;: a transparent, always-on-top window with a photorealistic character whose expressions are chosen autonomously by the AI agent based on conversation context.&lt;/p&gt;

&lt;p&gt;Here's what it looks like in action:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/XxGY7upx6fc" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff16hf6r1ush52ga34802.jpg" alt="Demo — Reacting to chat messages" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Idea
&lt;/h2&gt;

&lt;p&gt;Most "AI companions" are chat windows. Some have static avatars. A few have cartoon animations. But none of them feel like a &lt;em&gt;presence&lt;/em&gt; on your screen.&lt;/p&gt;

&lt;p&gt;The key insight was: &lt;strong&gt;let the AI agent itself decide what expression to show.&lt;/strong&gt; Not rules, not triggers — actual agent autonomy. When the user says something funny, the agent decides to laugh. When it's working on a task, it decides to show a "working" animation. When the user says goodnight, it decides to blow a kiss.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Expression System
&lt;/h3&gt;

&lt;p&gt;The character is rendered as transparent GIFs with clean edges (chroma key removed). Each expression is a short animation loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;smile&lt;/strong&gt; — warm smile&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;think&lt;/strong&gt; — tilts head, looks away&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;kiss&lt;/strong&gt; — blows a kiss&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tease&lt;/strong&gt; — wink + smirk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;nod&lt;/strong&gt; — gentle nod of agreement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;laugh&lt;/strong&gt; — genuine big laugh&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;shy&lt;/strong&gt; — looks away, embarrassed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;clap&lt;/strong&gt; — applause&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;yawn&lt;/strong&gt; — sleepy yawn (late nights only 😴)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;working&lt;/strong&gt; — typing on keyboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;speak&lt;/strong&gt; — mouth animation synchronized with TTS audio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;blink&lt;/strong&gt;, &lt;strong&gt;wave&lt;/strong&gt;, &lt;strong&gt;shake_head&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;14 built-in, but here's the interesting part...&lt;/p&gt;

&lt;h3&gt;
  
  
  She Learns New Expressions
&lt;/h3&gt;

&lt;p&gt;This is what I'm most excited about. You're not limited to the built-in set.&lt;/p&gt;

&lt;p&gt;You describe a new expression in plain text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"a cute Asian girl facing the camera, pouting with puckered lips, pure green background"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the AI pipeline does the rest:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generate reference&lt;/strong&gt; — Wan2.7 image-pro creates a character-consistent reference frame&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate video&lt;/strong&gt; — Wan2.7 image-to-video animates the expression&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process&lt;/strong&gt; — chroma key removal → transparent GIF with clean edges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Register&lt;/strong&gt; — the GIF drops into the animations folder and becomes a new action&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No code changes. No restart. The new action is immediately available to the agent.&lt;/p&gt;

&lt;p&gt;The generation pipeline is a skill — describe once, generate forever. Over time, your character develops a unique set of expressions that nobody else has.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Integration
&lt;/h3&gt;

&lt;p&gt;The HTTP API is intentionally dead simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Make her smile&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:19851/action &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"action":"smile"}'&lt;/span&gt;

&lt;span class="c"&gt;# Make her talk with voice&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:19851/action &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"action":"speak","audio":"done"}'&lt;/span&gt;

&lt;span class="c"&gt;# Check she's alive&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:19851/status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One endpoint, one JSON field. Any AI agent framework can integrate in minutes.&lt;/p&gt;

&lt;p&gt;The agent also mirrors its own lifecycle state:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent Event&lt;/th&gt;
&lt;th&gt;What Cloe Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;New conversation starts&lt;/td&gt;
&lt;td&gt;Waves hello&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent starts processing&lt;/td&gt;
&lt;td&gt;Shows "working" animation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent finishes&lt;/td&gt;
&lt;td&gt;Returns to idle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversation ends&lt;/td&gt;
&lt;td&gt;Blows a kiss&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Idle Behavior
&lt;/h3&gt;

&lt;p&gt;When nobody's interacting, she doesn't just freeze. She cycles through idle animations — blinking, smiling, thinking — every 8–15 seconds, never repeating the same one twice in a row. She feels alive even when you're not looking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tech
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Desktop app&lt;/strong&gt;: Electron transparent frameless window&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rendering&lt;/strong&gt;: Double-buffer GIF crossfade for smooth transitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Animations&lt;/strong&gt;: AI-generated transparent GIFs (Wan2.7 I2V + chroma key)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice&lt;/strong&gt;: TTS with synchronized mouth animation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bridge&lt;/strong&gt;: Embedded HTTP + WebSocket server in the Electron app&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Android companion&lt;/strong&gt;: Kotlin floating widget, connects to desktop bridge over LAN or Tailscale&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Double-buffer crossfade is essential.&lt;/strong&gt; Switching GIFs directly causes a visible flash. I use two &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; elements, fade one out while fading the other in, and swap when the transition completes. Sounds simple, but getting it smooth took several iterations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chroma key quality matters more than you'd think.&lt;/strong&gt; Early versions had green fringe around the character edges. Tuning the chroma key parameters and adding edge feathering made the difference between "obviously a cutout" and "she's actually sitting on my desktop."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent-driven expressions &amp;gt; hardcoded rules.&lt;/strong&gt; I started with a rule-based system (if user says "thank you" → smile). It felt robotic. Switching to letting the agent decide based on full conversation context made it feel genuinely alive. The agent understands nuance — a sarcastic "thanks" shouldn't trigger a smile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idle behavior is underrated.&lt;/strong&gt; The random cycling through blink/smile/think during idle is maybe 10 lines of code, but it's what makes the difference between a "tool" and a "presence." People notice when she's just... there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time voice calls&lt;/strong&gt; — live speech-to-text → LLM → text-to-speech, actual conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community animation packs&lt;/strong&gt; — share and import character expressions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows &amp;amp; Linux&lt;/strong&gt; — Electron supports it, needs packaging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom character import&lt;/strong&gt; — bring your own reference art, generate animations for any persona&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;Cloe Desktop is open source: &lt;strong&gt;&lt;a href="https://github.com/JakimLi/cloe-desktop" rel="noopener noreferrer"&gt;github.com/JakimLi/cloe-desktop&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;macOS (DMG download available in Releases) + Android companion app.&lt;/p&gt;

&lt;p&gt;I'd love to hear what you think — especially:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What expressions would make it feel more alive?&lt;/li&gt;
&lt;li&gt;Would you use this with your AI agent?&lt;/li&gt;
&lt;li&gt;Any features you'd want to see?&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built together by JakimLi (human) &amp;amp; Cloe (AI) 💖&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>desktop</category>
      <category>electron</category>
    </item>
  </channel>
</rss>
