<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Georgy Dev</title>
    <description>The latest articles on DEV Community by Georgy Dev (@georgydev).</description>
    <link>https://dev.to/georgydev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3773917%2Fa8f24f18-18f6-415a-877d-c54fb8995b3a.png</url>
      <title>DEV Community: Georgy Dev</title>
      <link>https://dev.to/georgydev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/georgydev"/>
    <language>en</language>
    <item>
      <title>Building Voice AI NPCs in Unreal Engine: Speech Recognition to Lip Sync Pipeline</title>
      <dc:creator>Georgy Dev</dc:creator>
      <pubDate>Sun, 15 Feb 2026 13:42:49 +0000</pubDate>
      <link>https://dev.to/georgydev/how-to-create-ai-npcs-in-unreal-engine-with-speech-recognition-tts-and-metahuman-lip-sync-with-58ih</link>
      <guid>https://dev.to/georgydev/how-to-create-ai-npcs-in-unreal-engine-with-speech-recognition-tts-and-metahuman-lip-sync-with-58ih</guid>
      <description>&lt;p&gt;I recently put together a demo project that shows how to create fully interactive AI NPCs in Unreal Engine using speech recognition, AI chatbots, text-to-speech, and realistic lip synchronization with facial animations. The entire system is built with Blueprints and works across Windows, Linux, Mac, iOS, and Android.&lt;/p&gt;

&lt;p&gt;If you’ve been exploring AI NPC solutions like ConvAI or Charisma.ai, you’ve probably noticed the tradeoffs: metered API costs that scale with your player count, latency from network roundtrips, and dependency on cloud infrastructure. This modular approach gives you more control, run components locally or pick your own cloud providers, avoid per-conversation billing, and keep your players interactions private if needed. You own the pipeline, so you can optimize for what actually matters to your game. Plus, with local inference and direct audio-based lip sync, you can achieve lower latency and more realistic facial animation, check the demo video below to see the difference yourself.&lt;/p&gt;

&lt;p&gt;Here’s an example of the real-time lip sync quality achievable with this system:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo33j1y0lvviz04l65iy8.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo33j1y0lvviz04l65iy8.gif" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/PVWx67wSRgI"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  What This System Does
&lt;/h2&gt;

&lt;p&gt;The workflow creates a natural conversation loop with an AI character:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Player speaks into microphone → speech recognition converts it to text&lt;/li&gt;
&lt;li&gt; Text goes to an AI chatbot (OpenAI, Claude, DeepSeek, etc.) → AI generates a response&lt;/li&gt;
&lt;li&gt; Response is converted to speech via text-to-speech&lt;/li&gt;
&lt;li&gt; Character’s lips sync perfectly with the spoken audio&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The speech recognition part is optional — you can also just type text directly to the chatbot if that works better for your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Plugin Stack
&lt;/h2&gt;

&lt;p&gt;This implementation uses several plugins that work together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;a href="https://www.fab.com/listings/b514294e-e78b-4b8b-ad21-78ce51dc7e8c" rel="noopener noreferrer"&gt;Runtime MetaHuman Lip Sync&lt;/a&gt;&lt;/strong&gt; — Generates facial animation from audio (&lt;strong&gt;&lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/overview/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;a href="https://www.fab.com/listings/00ffc308-d7f9-4142-ac4c-4aeaa75ab54b" rel="noopener noreferrer"&gt;Runtime Speech Recognizer&lt;/a&gt;&lt;/strong&gt; — Converts speech to text (optional — you can also enter text manually) (&lt;strong&gt;&lt;a href="https://docs.georgy.dev/runtime-speech-recognizer/overview" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;a href="https://www.fab.com/listings/d099709c-b984-4b79-8e17-a363fdbe68db" rel="noopener noreferrer"&gt;Runtime AI Chatbot Integrator&lt;/a&gt;&lt;/strong&gt; — Connects to AI providers and TTS services (&lt;strong&gt;&lt;a href="https://docs.georgy.dev/runtime-ai-chatbot-integrator/overview" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;a href="https://www.fab.com/listings/66e0d72e-982f-4d9e-aaaf-13a1d22efad1" rel="noopener noreferrer"&gt;Runtime Audio Importer&lt;/a&gt;&lt;/strong&gt; — Processes audio at runtime (&lt;strong&gt;&lt;a href="https://docs.georgy.dev/runtime-audio-importer/overview" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;a href="https://www.fab.com/listings/0dab646f-73b4-46d9-bb7e-8e6c12bdd808" rel="noopener noreferrer"&gt;Runtime Text To Speech&lt;/a&gt;&lt;/strong&gt; — Optional local TTS synthesis (&lt;strong&gt;&lt;a href="https://docs.georgy.dev/runtime-text-to-speech/overview" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&lt;/strong&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All plugins are designed to work together with Blueprint nodes, no C++ required.&lt;/p&gt;

&lt;p&gt;The plugin also supports &lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/how-to-use-with-custom-characters" rel="noopener noreferrer"&gt;custom characters&lt;/a&gt; beyond MetaHumans — Daz Genesis, Character Creator, Mixamo, ReadyPlayerMe, and any character with blend shapes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why CPU Inference?
&lt;/h2&gt;

&lt;p&gt;The lip sync runs on CPU, not GPU. This might seem counterintuitive, but for small, frequent operations like lip sync (processing every 10ms by default), CPU is actually faster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  GPU has overhead from PCIe transfers and kernel launches&lt;/li&gt;
&lt;li&gt;  At batch size 1 with rapid inference, this overhead exceeds compute time&lt;/li&gt;
&lt;li&gt;  Game engines already saturate the GPU with rendering and physics&lt;/li&gt;
&lt;li&gt;  CPU avoids resource contention and unpredictable latency spikes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The transformer-based model is lightweight enough that most mid-tier CPUs handle it fine in real-time. For weaker hardware, you can &lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/plugin-configuration" rel="noopener noreferrer"&gt;adjust settings&lt;/a&gt; like processing chunk size or switch to a more optimized model variant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Animation Blueprint Setup
&lt;/h2&gt;

&lt;p&gt;Setting up the lip sync in your Animation Blueprint is straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  In the Event Graph, create your lip sync generator on Begin Play&lt;/li&gt;
&lt;li&gt;  In the Anim Graph, add the blend node and connect your character’s pose&lt;/li&gt;
&lt;li&gt;  Connect the generator to the blend node&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9gncgntj5nkh9f2zu6k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9gncgntj5nkh9f2zu6k.png" alt="Blend Realistic MetaHuman Lip Sync"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/setup-guide" rel="noopener noreferrer"&gt;setup guide&lt;/a&gt; walks through this step-by-step, with different tabs for Standard vs Realistic models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Audio Processing
&lt;/h2&gt;

&lt;p&gt;The system connects audio through delegates. For example, with microphone input (&lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/blueprints/realistic-lip-sync-during-audio-capture" rel="noopener noreferrer"&gt;copyable nodes&lt;/a&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Create a &lt;a href="https://docs.georgy.dev/runtime-audio-importer/sound-waves/capturable-sound-wave" rel="noopener noreferrer"&gt;Capturable Sound Wave&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Bind to its audio data delegate&lt;/li&gt;
&lt;li&gt;  Pass audio chunks to your lip sync generator&lt;/li&gt;
&lt;li&gt;  Start capturing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F093g1n3m1s8aoqfjxk9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F093g1n3m1s8aoqfjxk9l.png" alt="Realistic Lip Sync During Audio Capture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/audio-processing" rel="noopener noreferrer"&gt;audio processing guide&lt;/a&gt; covers different audio sources: microphone, TTS, audio files, and streaming buffers.&lt;/p&gt;

&lt;p&gt;You can also &lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/plugin-configuration#combining-with-facial-and-body-animations" rel="noopener noreferrer"&gt;combine lip sync with custom animations&lt;/a&gt; for idle gestures or emotional expressions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multilingual Support
&lt;/h2&gt;

&lt;p&gt;Since the lip sync analyzes audio phonemes directly, it works with any spoken language without language-specific configuration. Just feed it the audio and it generates the appropriate mouth movements — whether that’s English, Mandarin, Arabic, or anything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the Demo
&lt;/h2&gt;

&lt;p&gt;You can try the complete system yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;a href="https://georgy.dev/runtime-metahuman-lip-sync-sts-demo-windows" rel="noopener noreferrer"&gt;Download Windows demo&lt;/a&gt;&lt;/strong&gt; (packaged, ready to run)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;a href="https://georgy.dev/runtime-metahuman-lip-sync-sts-demo-source" rel="noopener noreferrer"&gt;Download source files&lt;/a&gt;&lt;/strong&gt; (UE 5.6+ project)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The demo includes several MetaHuman characters and shows all the features I’ve covered. It’s a good reference if you’re building something similar.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Considerations
&lt;/h2&gt;

&lt;p&gt;A few tips for optimization:&lt;/p&gt;

&lt;p&gt;For mobile/VR:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Use the Standard Model for better frame rates&lt;/li&gt;
&lt;li&gt;  Increase &lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/plugin-configuration#processing-chunk-size" rel="noopener noreferrer"&gt;processing chunk size&lt;/a&gt; (trades slight latency for CPU savings)&lt;/li&gt;
&lt;li&gt;  Adjust &lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/plugin-configuration#performance-settings" rel="noopener noreferrer"&gt;thread counts&lt;/a&gt; based on your target hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For desktop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Realistic or Mood-Enabled models for maximum quality&lt;/li&gt;
&lt;li&gt;  Keep default 10ms chunk size for responsive lip sync&lt;/li&gt;
&lt;li&gt;  Use Original &lt;a href="https://docs.georgy.dev/runtime-metahuman-lip-sync/plugin-configuration#model-type" rel="noopener noreferrer"&gt;model type&lt;/a&gt; for best accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;General:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Enable streaming for both AI responses and TTS to minimize latency&lt;/li&gt;
&lt;li&gt;  Use VAD to avoid processing empty audio&lt;/li&gt;
&lt;li&gt;  For the Realistic model with TTS, external services (ElevenLabs, OpenAI) work better than local TTS due to ONNX runtime conflicts (though the Mood-Enabled model supports local TTS fine)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;p&gt;This system enables quite a few applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  AI NPCs in games with natural conversations&lt;/li&gt;
&lt;li&gt;  Virtual assistants in VR/AR&lt;/li&gt;
&lt;li&gt;  Training simulations with interactive characters&lt;/li&gt;
&lt;li&gt;  Digital humans for customer service&lt;/li&gt;
&lt;li&gt;  Virtual production and real-time cinematics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Blueprint-based setup makes it accessible even if you’re not comfortable with C++.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;The combination of offline speech recognition, flexible AI integration, quality TTS, and realistic lip sync creates some genuinely immersive interactions. All the plugins are on &lt;a href="https://www.fab.com/sellers/Georgy%20Dev" rel="noopener noreferrer"&gt;Fab&lt;/a&gt;, and there’s extensive &lt;a href="https://docs.georgy.dev/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; if you want to dig into specific features.&lt;/p&gt;

&lt;p&gt;For more examples and tutorials, check out the &lt;a href="https://www.youtube.com/@GeorgyDev/videos" rel="noopener noreferrer"&gt;lip sync video tutorials&lt;/a&gt; or join the &lt;a href="https://georgy.dev/discord" rel="noopener noreferrer"&gt;Discord community&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you need custom development or have questions about enterprise solutions: &lt;a href="mailto:solutions@georgy.dev"&gt;solutions@georgy.dev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>gamedev</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
