<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Frank Fu</title>
    <description>The latest articles on DEV Community by Frank Fu (@frankfu).</description>
    <link>https://dev.to/frankfu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3828188%2F8359f502-fbc2-439e-9d20-74ad63793d82.png</url>
      <title>DEV Community: Frank Fu</title>
      <link>https://dev.to/frankfu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/frankfu"/>
    <language>en</language>
    <item>
      <title>OpenAvatarChat: A Detailed Explanation of System Architecture and Handler Collaboration Mechanism</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:55:10 +0000</pubDate>
      <link>https://dev.to/frankfu/openavatarchat-a-detailed-explanation-of-system-architecture-and-handler-collaboration-mechanism-1gch</link>
      <guid>https://dev.to/frankfu/openavatarchat-a-detailed-explanation-of-system-architecture-and-handler-collaboration-mechanism-1gch</guid>
      <description>&lt;h2&gt;1. Overall Architecture&lt;/h2&gt;
&lt;h3&gt;1.1 System Hierarchical Structure&lt;/h3&gt;
&lt;p&gt;OpenAvatarChat adopts a layered architecture, divided into three levels from top to bottom:&lt;/p&gt;
&lt;img width="800" height="284" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-1024x364.png" alt=""&gt;&lt;p&gt;&lt;strong&gt;Architecture Description&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;1. &lt;strong&gt;ChatEngine (Top Layer)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; The core of the system, managing the entire chat engine&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Responsible for initialization, configuration loading, and Handler management&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Supports concurrent multi-session operation, with each session running independently&lt;/p&gt;
&lt;p&gt;2. &lt;strong&gt;ChatSession (Middle Layer)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Corresponds to a user session (one WebRTC connection)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Manages all Handler instances within the session&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Manages data flow, threads, and queues&lt;/p&gt;
&lt;p&gt;3. &lt;strong&gt;Handler (Bottom Layer)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Functional modules responsible for specific task processing&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Includes: RTC client, VAD, ASR, LLM, TTS, Avatar, etc.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Each Handler creates an independent instance when the session starts&lt;/p&gt;
&lt;h3&gt;1.2 Core Component Description&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;ChatEngine (src/chat_engine/chat_engine.py)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Responsibilities:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; System initialization and management&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Creation and initialization of HandlerManager&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Creation and destruction of sessions&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Management of concurrent multi-session operation&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Methods&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def initialize(engine_config, app=None, ui=None):&lt;br&gt;
    # Initialize HandlerManager&lt;br&gt;
    # Load all Handlers&lt;br&gt;
    # Set up the client Handler

&lt;p&gt;def create_client_session(session_info, client_handler):&lt;br&gt;
    # Create a new ChatSession&lt;br&gt;
    # Prepare the Handler environment&lt;br&gt;
    # Return the session and Handler environment&lt;/p&gt;

&lt;p&gt;def stop_session(session_id):&lt;br&gt;
    # Stop and destroy the session&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;HandlerManager (src/chat_engine/core/handler_manager.py)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Responsibilities&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Dynamically load Handler modules from configuration files&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Register Handler instances&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Manage Handler lifecycle&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Data Structure&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;handler_registries = {&lt;br&gt;
    "RtcClient": HandlerRegistry(&lt;br&gt;
        base_info=HandlerBaseInfo(...),&lt;br&gt;
        handler=RtcClient instance,&lt;br&gt;
        handler_config=configuration object&lt;br&gt;
    ),&lt;br&gt;
    "SileroVad": HandlerRegistry(...),&lt;br&gt;
    ...&lt;br&gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;ChatSession (src/chat_engine/core/chat_session.py)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Responsibilities&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Manage data flow for a single session&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Create and manage Handler instances&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Data routing and distribution&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Thread management&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Data Structure&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Data routing table: Data type → Handler input queue&lt;br&gt;&lt;br&gt;
data_sinks = {&lt;br&gt;
    ChatDataType.MIC_AUDIO: [&lt;br&gt;
        DataSink(owner="SileroVad", sink_queue=vad_queue),&lt;br&gt;
    ],&lt;br&gt;
    ChatDataType.HUMAN_TEXT: [&lt;br&gt;
        DataSink(owner="LLM_Bailian", sink_queue=llm_queue),&lt;br&gt;
    ],&lt;br&gt;
}
&lt;h1&gt;
  
  
  Handler records: Handler name → Handler environment
&lt;/h1&gt;

&lt;p&gt;handlers = {&lt;br&gt;
    "SileroVad": HandlerRecord(env=HandlerEnv(...)),&lt;br&gt;
    ...&lt;br&gt;
}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;2. Data Flow Process&lt;/h2&gt;
&lt;h3&gt;2.1 Complete Data Flow Architecture Diagram&lt;/h3&gt;
&lt;p&gt;The complete data flow is as follows:&lt;/p&gt;
&lt;img width="800" height="410" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-1-1024x526.png" alt=""&gt;&lt;h3&gt;2.2 Detailed Data Flow Process&lt;/h3&gt;
&lt;h4&gt;Step 1: Client Input&lt;/h4&gt;
&lt;img width="800" height="492" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-2.png" alt=""&gt;&lt;h4&gt;Step 2: Data Distribution (Subscription Distribution)&lt;/h4&gt;
&lt;img width="800" height="445" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-3.png" alt=""&gt;&lt;p&gt;&lt;strong&gt;Key Mechanisms&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;data_sinks&lt;/code&gt; is a mapping table from data types to Handler input queues.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; The system automatically finds all subscribers based on the data type.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Data is simultaneously distributed to all Handlers that have subscribed to that data type.&lt;/p&gt;
&lt;h4&gt;Step 3: Handler Processing&lt;/h4&gt;
&lt;p&gt;Each Handler has an independent processing thread that reads data from its own input queue:&lt;/p&gt;
&lt;img width="800" height="319" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-4.png" alt=""&gt;&lt;h4&gt;Step 4: Chained Data Flow&lt;/h4&gt;
&lt;p&gt;Data automatically forms a processing chain based on the input and output definitions of the Handlers:&lt;/p&gt;
&lt;img width="800" height="277" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-5-1024x355.png" alt=""&gt;&lt;h4&gt;Step 5: Client Output&lt;/h4&gt;
&lt;img width="693" height="322" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-6.png" alt=""&gt;&lt;h3&gt;2.3 Key Data Structures: Queues and Routing&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Input Queues&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Client input queues (created by RTC Client Handler)&lt;br&gt;
input_queues = {&lt;br&gt;
    EngineChannelType.AUDIO: asyncio.Queue(),&lt;br&gt;
    EngineChannelType.VIDEO: asyncio.Queue(),&lt;br&gt;
    EngineChannelType.TEXT: asyncio.Queue(),&lt;br&gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Handler Input Queues&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Each Handler has its own input queue&lt;br&gt;
vad_input_queue = queue.Queue()      # Input queue for SileroVad&lt;br&gt;
asr_input_queue = queue.Queue()      # Input queue for SenseVoice&lt;br&gt;
llm_input_queue = queue.Queue()      # Input queue for LLM_Bailian&lt;br&gt;
tts_input_queue = queue.Queue()      # Input queue for Edge_TTS&lt;br&gt;
avatar_input_queue = queue.Queue()   # Input queue for AvatarMusetalk&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Data Routing Table (data_sinks)&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Data type → List of Handlers that subscribe to this type&lt;br&gt;
data_sinks = {&lt;br&gt;
    ChatDataType.MIC_AUDIO: [&lt;br&gt;
        DataSink(owner="SileroVad", sink_queue=vad_input_queue),&lt;br&gt;
    ],&lt;br&gt;
    ChatDataType.HUMAN_AUDIO: [&lt;br&gt;
        DataSink(owner="SenseVoice", sink_queue=asr_input_queue),&lt;br&gt;
    ],&lt;br&gt;
    ChatDataType.HUMAN_TEXT: [&lt;br&gt;
        DataSink(owner="LLM_Bailian", sink_queue=llm_input_queue),&lt;br&gt;
    ],&lt;br&gt;
    ChatDataType.AVATAR_TEXT: [&lt;br&gt;
        DataSink(owner="Edge_TTS", sink_queue=tts_input_queue),&lt;br&gt;
    ],&lt;br&gt;
    ChatDataType.AVATAR_AUDIO: [&lt;br&gt;
        DataSink(owner="AvatarMusetalk", sink_queue=avatar_input_queue),&lt;br&gt;
    ],&lt;br&gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Output Queue Mapping&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# (Handler Name, Data Type) → Output Queue&lt;br&gt;
outputs = {&lt;br&gt;
    ("AvatarMusetalk", ChatDataType.AVATAR_VIDEO): DataSink(&lt;br&gt;
        sink_queue=output_queues[EngineChannelType.VIDEO]&lt;br&gt;
    ),&lt;br&gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;3. The Essence of Handler&lt;/h2&gt;
&lt;h3&gt;3.1 What is a Handler?&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;A Handler is an independent functional module&lt;/strong&gt;, and each Handler is responsible for a specific task:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;RTC Client Handler&lt;/strong&gt;: Manages WebRTC connections, receives user input, and sends output&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;SileroVad Handler&lt;/strong&gt;: Voice Activity Detection (VAD), detects whether the user is speaking&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;SenseVoice Handler&lt;/strong&gt;: Speech Recognition (ASR), converts speech into text&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;LLM Handler&lt;/strong&gt;: Large Language Model, generates response text&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;TTS Handler&lt;/strong&gt;: Text-to-Speech (TTS), converts text into audio&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Avatar Handler&lt;/strong&gt;: Avatar driving, generates video from audio&lt;/p&gt;
&lt;h3&gt;3.2 The Nature of a Handler: Independent Threads&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Key Understanding&lt;/strong&gt;: Each Handler creates an &lt;strong&gt;independent thread&lt;/strong&gt; when the session starts.&lt;/p&gt;
&lt;img width="800" height="260" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-7.png" alt=""&gt;&lt;p&gt;&lt;strong&gt;Thread Operation Mode&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Core loop of the handler_pumper thread&lt;br&gt;
def handler_pumper(session_context, handler_env, sinks, outputs):&lt;br&gt;
    shared_states = session_context.shared_states&lt;br&gt;
    input_queue = handler_env.input_queue  # Handler's input queue
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;while shared_states.active:  # Continue running while the session is active
    try:
        # 1. Read data from the input queue
        input_data = input_queue.get_nowait()
    except queue.Empty:
        time.sleep(0.03)  # Sleep for 30ms when the queue is empty
        continue

    # 2. Call the Handler to process the data
    handler_result = handler_env.handler.handle(
        handler_env.context,
        input_data,
        handler_env.output_info
    )

    # 3. Submit the processed result
    ChatDataSubmitter.submit(handler_result)
        │
        └─→ distribute_data()  # Distribute to the next Handler&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;h3 class="wp-block-heading" style="font-size:clamp(18.959px, 1.185rem + ((1vw - 3.2px) * 1.255), 30px);"&amp;gt;3.3 The Lifecycle of a Handler&amp;lt;/h3&amp;gt;&amp;lt;h4 class="wp-block-heading" style="font-size:clamp(16.293px, 1.018rem + ((1vw - 3.2px) * 0.989), 25px);"&amp;gt;Stage 1: Load (load)&amp;lt;/h4&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-b85c410936e4085440a31364d204b137"&amp;gt;When the system starts, each Handler executes a load:&amp;lt;/p&amp;gt;&amp;lt;pre class="wp-block-code"&amp;gt;&amp;lt;code&amp;gt;handler.load(engine_config, handler_config)&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-d70759991c85e4d41799731d87dd3c20"&amp;gt;&amp;lt;strong&amp;gt;Purpose&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-c7adf9dabae7a9ec3268914328da44c2"&amp;gt;&amp;lt;img src="https://s.w.org/images/core/emoji/15.0.3/72x72/25aa.png" alt="▪" class="wp-smiley" style="height: 1em; max-height: 1em;" /&amp;gt; Load model files&amp;lt;/p&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-110eeec2b4f8d0745e672a7460c70773"&amp;gt;&amp;lt;img src="https://s.w.org/images/core/emoji/15.0.3/72x72/25aa.png" alt="▪" class="wp-smiley" style="height: 1em; max-height: 1em;" /&amp;gt; Initialize global resources&amp;lt;/p&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-e86f46ea31a3ab58b33f3f3d817c88f7"&amp;gt;&amp;lt;img src="https://s.w.org/images/core/emoji/15.0.3/72x72/25aa.png" alt="▪" class="wp-smiley" style="height: 1em; max-height: 1em;" /&amp;gt; Prepare the Handler runtime environment&amp;lt;/p&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-47e3acf204bd1f841288f37930cfd1fe"&amp;gt;&amp;lt;strong&amp;gt;Examples&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-4cf4aa4c73ceff3e42c49c493d0756fe"&amp;gt;&amp;lt;img src="https://s.w.org/images/core/emoji/15.0.3/72x72/25aa.png" alt="▪" class="wp-smiley" style="height: 1em; max-height: 1em;" /&amp;gt; SileroVad: Load the VAD model&amp;lt;/p&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-7d5a5d31e923c06a261ce9c71875b756"&amp;gt;&amp;lt;img src="https://s.w.org/images/core/emoji/15.0.3/72x72/25aa.png" alt="▪" class="wp-smiley" style="height: 1em; max-height: 1em;" /&amp;gt; SenseVoice: Load the ASR model&amp;lt;/p&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-bf814852ddf0e1c48e75d01d8b4be34a"&amp;gt;&amp;lt;img src="https://s.w.org/images/core/emoji/15.0.3/72x72/25aa.png" alt="▪" class="wp-smiley" style="height: 1em; max-height: 1em;" /&amp;gt; LLM: Initialize API client&amp;lt;/p&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-ca77cb04f64945a005b4db0cb0e2ae8c"&amp;gt;&amp;lt;img src="https://s.w.org/images/core/emoji/15.0.3/72x72/25aa.png" alt="▪" class="wp-smiley" style="height: 1em; max-height: 1em;" /&amp;gt; Avatar: Load the avatar model&amp;lt;/p&amp;gt;&amp;lt;h4 class="wp-block-heading" style="font-size:clamp(16.293px, 1.018rem + ((1vw - 3.2px) * 0.989), 25px);"&amp;gt;Stage 2: Create Context (create_context)&amp;lt;/h4&amp;gt;&amp;lt;p class="has-black-color has-text-color has-link-color has-small-medium-font-size wp-elements-3929972cd6d907f4813220ddd0c955e8"&amp;gt;When each session is created, an independent context is created for each Handler:&amp;lt;/p&amp;gt;&amp;lt;pre class="wp-block-code"&amp;gt;&amp;lt;code&amp;gt;handler_context = handler.create_context(session_context, handler_config)
&lt;/code&gt;&lt;/pre&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Create session-related states&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; For example: LLM creates conversation history, ASR creates an audio buffer&lt;/p&gt;
&lt;h4&gt;Stage 3: Handle (handle)&lt;/h4&gt;
&lt;p&gt;During the session, the Handler continuously processes data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;handler_result = handler.handle(context, inputs, output_definitions)&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Each Handler runs in its own thread&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; It reads data from its own input queue&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; After processing, it outputs the result&lt;/p&gt;
&lt;h4&gt;Stage 4: Destroy Context (destroy_context)&lt;/h4&gt;
&lt;p&gt;When the session ends, the Handler context is cleaned up:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;handler.destroy_context(handler_context)&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Release session-related resources&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Clean up state data&lt;/p&gt;
&lt;h3&gt;3.4 Interface Definition of Handlers&lt;/h3&gt;
&lt;p&gt;All Handlers inherit from &lt;code&gt;HandlerBase&lt;/code&gt; and must implement the following interfaces:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class HandlerBase(ABC):&lt;br&gt;
    @abstractmethod&lt;br&gt;
    def load(self, engine_config, handler_config):&lt;br&gt;
        """Load the Handler (e.g., load models)"""&lt;br&gt;
        pass
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@abstractmethod
def create_context(self, session_context, handler_config):
    """Create Handler context"""
    pass

@abstractmethod
def handle(self, context, inputs, output_definitions):
    """Process input data"""
    pass

@abstractmethod
def get_handler_detail(self, session_context, context):
    """Declare input and output data types"""
    return HandlerDetail(
        inputs={...},   # Input type definitions
        outputs={...}   # Output type definitions
    )

@abstractmethod
def destroy_context(self, context):
    """Destroy Handler context"""
    pass
&lt;/code&gt;&lt;/pre&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;3.5 Key Method of Handler: get_handler_detail&lt;/h3&gt;
&lt;p&gt;This is the key method for interaction between the Handler and the system. The Handler declares its inputs and outputs through this method:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def get_handler_detail(self, session_context, context) -&amp;gt; HandlerDetail:&lt;br&gt;
    return HandlerDetail(&lt;br&gt;
        inputs={&lt;br&gt;
            ChatDataType.MIC_AUDIO: HandlerDataInfo(&lt;br&gt;
                type=ChatDataType.MIC_AUDIO,&lt;br&gt;
                # Other configurations...&lt;br&gt;
            )&lt;br&gt;
        },&lt;br&gt;
        outputs={&lt;br&gt;
            ChatDataType.HUMAN_AUDIO: HandlerDataInfo(&lt;br&gt;
                type=ChatDataType.HUMAN_AUDIO,&lt;br&gt;
                definition=output_definition,&lt;br&gt;
            )&lt;br&gt;
        }&lt;br&gt;
    )&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;How the System Uses It&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;1. During the &lt;code&gt;prepare_handler()&lt;/code&gt; stage, the system calls &lt;code&gt;get_handler_detail()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;2. Based on the returned &lt;code&gt;inputs&lt;/code&gt;, the system creates a data routing table:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for input_type, input_info in io_detail.inputs.items():&lt;br&gt;
    sink_list = data_sinks.setdefault(input_type, [])&lt;br&gt;
    data_sink = DataSink(&lt;br&gt;
        owner=handler_name,&lt;br&gt;
        sink_queue=handler_input_queue&lt;br&gt;
    )&lt;br&gt;
    sink_list.append(data_sink)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;3. When data of that type arrives, the system automatically distributes it to the Handler’s input queue.&lt;/p&gt;
&lt;h2&gt;4. Handler Collaborative Mechanism&lt;/h2&gt;
&lt;h3&gt;4.1 Data Subscription Mechanism&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Core Idea&lt;/strong&gt;: Handlers “subscribe” to data by declaring input types, and the system automatically establishes data routing.&lt;/p&gt;
&lt;h4&gt;Establishing Subscription Relationships&lt;/h4&gt;
&lt;img width="749" height="396" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-8.png" alt=""&gt;&lt;h4&gt;Subscription Example&lt;/h4&gt;
&lt;p&gt;For example, in the &lt;code&gt;glut3.yaml&lt;/code&gt; configuration:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# SileroVad subscribes to MIC_AUDIO&lt;br&gt;
data_sinks[ChatDataType.MIC_AUDIO] = [&lt;br&gt;
    DataSink(owner="SileroVad", sink_queue=vad_queue),&lt;br&gt;
]
&lt;h1&gt;
  
  
  SenseVoice subscribes to HUMAN_AUDIO (SileroVad's output)
&lt;/h1&gt;

&lt;p&gt;data_sinks[ChatDataType.HUMAN_AUDIO] = [&lt;br&gt;
    DataSink(owner="SenseVoice", sink_queue=asr_queue),&lt;br&gt;
]&lt;/p&gt;

&lt;h1&gt;
  
  
  LLM_Bailian subscribes to HUMAN_TEXT (SenseVoice's output)
&lt;/h1&gt;

&lt;p&gt;data_sinks[ChatDataType.HUMAN_TEXT] = [&lt;br&gt;
    DataSink(owner="LLM_Bailian", sink_queue=llm_queue),&lt;br&gt;
]&lt;/p&gt;

&lt;h1&gt;
  
  
  Edge_TTS subscribes to AVATAR_TEXT (LLM's output)
&lt;/h1&gt;

&lt;p&gt;data_sinks[ChatDataType.AVATAR_TEXT] = [&lt;br&gt;
    DataSink(owner="Edge_TTS", sink_queue=tts_queue),&lt;br&gt;
]&lt;/p&gt;

&lt;h1&gt;
  
  
  AvatarMusetalk subscribes to AVATAR_AUDIO (TTS's output)
&lt;/h1&gt;

&lt;p&gt;data_sinks[ChatDataType.AVATAR_AUDIO] = [&lt;br&gt;
    DataSink(owner="AvatarMusetalk", sink_queue=avatar_queue),&lt;br&gt;
]&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.2 Data Distribution Mechanism (Subscription Distribution)&lt;/h3&gt;
&lt;p&gt;When data arrives, the system automatically distributes it through &lt;code&gt;distribute_data()&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def distribute_data(data: ChatData, sinks, outputs):&lt;br&gt;
    # 1. Check if it's the final output (directly sent to the client)&lt;br&gt;
    source_key = (data.source, data.type)&lt;br&gt;
    if source_key in outputs:&lt;br&gt;
        outputs[source_key].sink_queue.put_nowait(data)
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 2. Find all Handlers subscribed to this data type
sink_list = sinks.get(data.type, &amp;amp;#91;])

# 3. Distribute to all subscribers
for sink in sink_list:
    if sink.owner == data.source:
        continue  # Skip the data source itself

    sink.sink_queue.put_nowait(data)  # Put into Handler's input queue
&lt;/code&gt;&lt;/pre&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Key Points&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Data is automatically routed based on type.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; A piece of data can be distributed to multiple subscribers simultaneously.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Handlers are completely decoupled and unaware of each other’s existence.&lt;/p&gt;
&lt;h3&gt;4.3 Handler Parallel Processing Mechanism&lt;/h3&gt;
&lt;h4&gt;Parallel Execution&lt;/h4&gt;
&lt;p&gt;All Handler threads run simultaneously without blocking each other:&lt;/p&gt;
&lt;img width="800" height="418" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-9-1024x536.png" alt=""&gt;&lt;h4&gt;Data Flow Sequence Guarantee&lt;/h4&gt;
&lt;p&gt;Although Handlers run in parallel, the data flow is sequential:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;MIC_AUDIO → HUMAN_AUDIO → HUMAN_TEXT → AVATAR_TEXT → AVATAR_AUDIO → AVATAR_VIDEO&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Why the Sequence is Guaranteed&lt;/h4&gt;
&lt;p&gt;1. &lt;strong&gt;Data Type Driven&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; SileroVad outputs &lt;code&gt;HUMAN_AUDIO&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; SenseVoice subscribes to &lt;code&gt;HUMAN_AUDIO&lt;/code&gt; (not &lt;code&gt;MIC_AUDIO&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; SenseVoice only receives data when &lt;code&gt;HUMAN_AUDIO&lt;/code&gt; is produced&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Queue Buffering&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Each Handler has its own input queue.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; The queue automatically buffers data to ensure the sequence.&lt;/p&gt;
&lt;p&gt;3. &lt;strong&gt;VAD’s Speech End Marker&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; VAD outputs the &lt;code&gt;human_speech_end&lt;/code&gt; marker during processing.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ASR waits for this marker before performing inference.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; This ensures that a complete speech segment is processed.&lt;/p&gt;
&lt;h3&gt;4.4 Decoupling of Handlers&lt;/h3&gt;
&lt;h4&gt;Complete Decoupling&lt;/h4&gt;
&lt;p&gt;Handlers do not communicate directly with each other, only interact through data types:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;❌ Incorrect Way (Tightly Coupled):&lt;br&gt;
    SileroVad → Direct Call → SenseVoice.handle()

&lt;p&gt;✅ Correct Way (Decoupled):&lt;br&gt;
    SileroVad → Outputs HUMAN_AUDIO → System Distributes → SenseVoice Input Queue&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Benefits of Decoupling&lt;/h4&gt;
&lt;p&gt;1. &lt;strong&gt;Easy to Extend&lt;/strong&gt;: Adding a new Handler only requires declaring input/output without modifying existing Handlers.&lt;/p&gt;
&lt;p&gt;2. &lt;strong&gt;Flexible Combination&lt;/strong&gt;: Handlers can be flexibly combined through configuration files.&lt;/p&gt;
&lt;p&gt;3. &lt;strong&gt;Easy to Test&lt;/strong&gt;: Each Handler can be tested independently.&lt;/p&gt;
&lt;p&gt;4. &lt;strong&gt;Easy to Maintain&lt;/strong&gt;: Handlers have clear responsibilities and do not interfere with each other.&lt;/p&gt;
&lt;h3&gt;4.5 Session End Mechanism&lt;/h3&gt;
&lt;h4&gt;Shared Flag Control&lt;/h4&gt;
&lt;p&gt;All threads share a flag: &lt;code&gt;shared_states.active&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# While the session is running&lt;br&gt;
shared_states.active = True
&lt;h1&gt;
  
  
  All threads loop and check
&lt;/h1&gt;

&lt;p&gt;while shared_states.active:&lt;br&gt;
    # Process data&lt;br&gt;
    ...&lt;/p&gt;

&lt;h1&gt;
  
  
  When the session ends
&lt;/h1&gt;

&lt;p&gt;shared_states.active = False&lt;/p&gt;

&lt;h1&gt;
  
  
  All threads automatically exit the loop
&lt;/h1&gt;

&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;End Process&lt;/h4&gt;
&lt;img width="800" height="447" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-10.png" alt=""&gt;&lt;h2&gt;5. Detailed Explanation of Handlers&lt;/h2&gt;
&lt;h3&gt;5.1 RTC Client Handler&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Function&lt;/strong&gt;: Manages WebRTC connections and handles bidirectional communication with the client.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: Client audio/video/text (received via WebRTC)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;: Avatar video/audio (sent via WebRTC)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Code Locations&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;src/handlers/client/rtc_client/client_handler_rtc.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;src/service/rtc_service/rtc_stream.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;:&lt;/p&gt;
&lt;img width="746" height="547" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-11.png" alt=""&gt;&lt;h3&gt;5.2 SileroVad Handler (Voice Activity Detection)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Function&lt;/strong&gt;: Detects whether the user is speaking and filters out silence.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;ChatDataType.MIC_AUDIO&lt;/code&gt; (raw audio)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;: &lt;code&gt;ChatDataType.HUMAN_AUDIO&lt;/code&gt; (human speech audio, with speech activity markers)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Code Location&lt;/strong&gt;: &lt;code&gt;src/handlers/vad/silerovad/vad_handler_silero.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Methods&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def get_handler_detail(self, ...):&lt;br&gt;
    return HandlerDetail(&lt;br&gt;
        inputs={&lt;br&gt;
            ChatDataType.MIC_AUDIO: HandlerDataInfo(...)&lt;br&gt;
        },&lt;br&gt;
        outputs={&lt;br&gt;
            ChatDataType.HUMAN_AUDIO: HandlerDataInfo(...)&lt;br&gt;
        }&lt;br&gt;
    )

&lt;p&gt;def handle(self, context, inputs, output_definitions):&lt;br&gt;
    # 1. Extract audio from input&lt;br&gt;
    audio_data = inputs.data.get_main_data()&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 2. VAD model inference
is_speech = self.model(audio_data)

# 3. If speech is detected, output HUMAN_AUDIO
if is_speech:
    yield ChatData(type=HUMAN_AUDIO, data=audio_data)
&lt;/code&gt;&lt;/pre&gt;

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Real-time processing with streaming output&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Output contains &lt;code&gt;human_speech_start&lt;/code&gt; and &lt;code&gt;human_speech_end&lt;/code&gt; markers&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ASR relies on these markers to determine when to perform recognition&lt;/p&gt;
&lt;h3&gt;5.3 SenseVoice Handler (Speech Recognition)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Function&lt;/strong&gt;: Converts speech to text.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;ChatDataType.HUMAN_AUDIO&lt;/code&gt; (human speech audio)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;: &lt;code&gt;ChatDataType.HUMAN_TEXT&lt;/code&gt; (recognized text)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Code Location&lt;/strong&gt;: &lt;code&gt;src/handlers/asr/sensevoice/asr_handler_sensevoice.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Methods&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def get_handler_detail(self, ...):&lt;br&gt;
    return HandlerDetail(&lt;br&gt;
        inputs={&lt;br&gt;
            ChatDataType.HUMAN_AUDIO: HandlerDataInfo(...)&lt;br&gt;
        },&lt;br&gt;
        outputs={&lt;br&gt;
            ChatDataType.HUMAN_TEXT: HandlerDataInfo(...)&lt;br&gt;
        }&lt;br&gt;
    )

&lt;p&gt;def handle(self, context, inputs, output_definitions):&lt;br&gt;
    # 1. Accumulate audio data&lt;br&gt;
    context.audio_buffer.append(inputs.data.get_main_data())&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 2. Check if there is a human_speech_end marker
if inputs.data.has_meta('human_speech_end'):
    # 3. Perform ASR inference
    text = self.model(context.audio_buffer)

    # 4. Output recognized text
    yield ChatData(type=HUMAN_TEXT, data=text)

    # 5. Clear the buffer
    context.audio_buffer.clear()
&lt;/code&gt;&lt;/pre&gt;

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Accumulates audio and waits for the speech end marker&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Performs ASR on complete speech segments&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Text format output: &lt;code&gt;&amp;lt;|zh|&amp;gt;&amp;lt;|NEUTRAL|&amp;gt;&amp;lt;|Speech|&amp;gt;&amp;lt;|woitn|&amp;gt;你好&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;5.4 LLM Handler (Large Language Model)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Function&lt;/strong&gt;: Understands user input and generates response text.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;ChatDataType.HUMAN_TEXT&lt;/code&gt; (user text)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;: &lt;code&gt;ChatDataType.AVATAR_TEXT&lt;/code&gt; (AI response text)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Code Location&lt;/strong&gt;: &lt;code&gt;src/handlers/llm/openai_compatible/llm_handler_openai_compatible.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Methods&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def get_handler_detail(self, ...):&lt;br&gt;
    return HandlerDetail(&lt;br&gt;
        inputs={&lt;br&gt;
            ChatDataType.HUMAN_TEXT: HandlerDataInfo(...)&lt;br&gt;
        },&lt;br&gt;
        outputs={&lt;br&gt;
            ChatDataType.AVATAR_TEXT: HandlerDataInfo(...)&lt;br&gt;
        }&lt;br&gt;
    )

&lt;p&gt;def handle(self, context, inputs, output_definitions):&lt;br&gt;
    # 1. Update conversation history&lt;br&gt;
    context.history.add_user_message(inputs.data.get_main_data())&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 2. Call the LLM API (streaming)
response = self.client.chat.completions.create(
    model=self.model_name,
    messages=context.history.get_messages(),
    stream=True
)

# 3. Stream the output text
for chunk in response:
    text = chunk.choices&amp;amp;#91;0].delta.content
    if text:
        yield ChatData(type=AVATAR_TEXT, data=text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Maintains conversation history&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Supports streaming output&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Configurable for different LLM models (Bailian, OpenAI compatible, etc.)&lt;/p&gt;
&lt;h3&gt;5.5 Edge_TTS Handler (Text-to-Speech)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Function&lt;/strong&gt;: Converts text to speech.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;ChatDataType.AVATAR_TEXT&lt;/code&gt; (AI response text)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;: &lt;code&gt;ChatDataType.AVATAR_AUDIO&lt;/code&gt; (generated audio)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Code Location&lt;/strong&gt;: &lt;code&gt;src/handlers/tts/edgetts/tts_handler_edgetts.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Methods&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def get_handler_detail(self, ...):&lt;br&gt;
    return HandlerDetail(&lt;br&gt;
        inputs={&lt;br&gt;
            ChatDataType.AVATAR_TEXT: HandlerDataInfo(...)&lt;br&gt;
        },&lt;br&gt;
        outputs={&lt;br&gt;
            ChatDataType.AVATAR_AUDIO: HandlerDataInfo(...)&lt;br&gt;
        }&lt;br&gt;
    )

&lt;p&gt;def handle(self, context, inputs, output_definitions):&lt;br&gt;
    # 1. Accumulate text&lt;br&gt;
    context.text_buffer += inputs.data.get_main_data()&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 2. Check if there is a text end marker
if inputs.data.has_meta('text_end'):
    # 3. Call the TTS API to generate audio
    audio = edge_tts.generate(
        text=context.text_buffer,
        voice=self.voice
    )

    # 4. Output audio stream
    for audio_chunk in audio:
        yield ChatData(type=AVATAR_AUDIO, data=audio_chunk)

    # 5. Clear the buffer
    context.text_buffer = ""
&lt;/code&gt;&lt;/pre&gt;

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Accumulates text and waits for a complete sentence&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Supports multiple voices (selectable via configuration)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Outputs 24kHz audio&lt;/p&gt;
&lt;h3&gt;5.6 AvatarMusetalk Handler (Avatar Driving)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Function&lt;/strong&gt;: Generates avatar video (lip-sync) from audio.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;ChatDataType.AVATAR_AUDIO&lt;/code&gt; (TTS-generated audio)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;: &lt;code&gt;ChatDataType.AVATAR_VIDEO&lt;/code&gt; (avatar video frames)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Code Location&lt;/strong&gt;: &lt;code&gt;src/handlers/avatar/musetalk/avatar_handler_musetalk.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Methods&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def get_handler_detail(self, ...):&lt;br&gt;
    return HandlerDetail(&lt;br&gt;
        inputs={&lt;br&gt;
            ChatDataType.AVATAR_AUDIO: HandlerDataInfo(...)&lt;br&gt;
        },&lt;br&gt;
        outputs={&lt;br&gt;
            ChatDataType.AVATAR_VIDEO: HandlerDataInfo(...)&lt;br&gt;
        }&lt;br&gt;
    )

&lt;p&gt;def handle(self, context, inputs, output_definitions):&lt;br&gt;
    # 1. Accumulate audio data&lt;br&gt;
    context.audio_buffer.append(inputs.data.get_main_data())&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 2. Check if there is an audio end marker
if inputs.data.has_meta('audio_end'):
    # 3. MuseTalk model processing
    video_frames = self.model(
        audio=context.audio_buffer,
        avatar_image=context.avatar_image
    )

    # 4. Output video frame stream
    for frame in video_frames:
        yield ChatData(type=AVATAR_VIDEO, data=frame)

    # 5. Clear the buffer
    context.audio_buffer.clear()
&lt;/code&gt;&lt;/pre&gt;

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Precise lip-syncing&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Supports 16fps video output&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Uses the MuseTalk model for inference&lt;/p&gt;
&lt;h3&gt;5.7 Summary of the Handler Processing Chain&lt;/h3&gt;
&lt;img width="479" height="710" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-12.png" alt=""&gt;&lt;h2&gt;6. Quick Reference&lt;/h2&gt;
&lt;h3&gt;6.1 Key Code Locations&lt;/h3&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;File Path&lt;/th&gt;
&lt;th&gt;Key Method&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Main Entry&lt;/td&gt;
&lt;td&gt;&lt;code&gt;src/glut.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;main()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engine Initialization&lt;/td&gt;
&lt;td&gt;&lt;code&gt;src/chat_engine/chat_engine.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatEngine.initialize()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handler Loading&lt;/td&gt;
&lt;td&gt;&lt;code&gt;src/chat_engine/core/handler_manager.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;HandlerManager.initialize()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session Creation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;src/chat_engine/chat_engine.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatEngine.create_client_session()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Distribution&lt;/td&gt;
&lt;td&gt;&lt;code&gt;src/chat_engine/core/chat_session.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatSession.distribute_data()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input Processing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;src/chat_engine/core/chat_session.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatSession.inputs_pumper()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handler Processing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;src/chat_engine/core/chat_session.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatSession.handler_pumper()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;6.2 Key Data Structures&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Data Types (ChatDataType)&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;MIC_AUDIO        # Microphone audio&lt;br&gt;
HUMAN_AUDIO      # Human speech audio&lt;br&gt;
HUMAN_TEXT       # User text&lt;br&gt;
AVATAR_TEXT      # AI response text&lt;br&gt;
AVATAR_AUDIO     # TTS audio&lt;br&gt;
AVATAR_VIDEO     # Avatar video&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Data Routing Table (data_sinks)&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;data_sinks: Dict[ChatDataType, List[DataSink]]
&lt;h1&gt;
  
  
  Data type → List of Handlers subscribed to this type
&lt;/h1&gt;

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Handler Registry&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;handler_registries: Dict[str, HandlerRegistry]
&lt;h1&gt;
  
  
  Handler name → Handler registration info
&lt;/h1&gt;

&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;6.3 Core Execution Flow&lt;/h3&gt;
&lt;img width="800" height="357" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-13-1024x458.png" alt=""&gt;&lt;h3&gt;6.4 Key Features of Modularity&lt;/h3&gt;
&lt;p&gt;1. &lt;strong&gt;Configuration-driven&lt;/strong&gt;: Define Handler combinations through YAML configuration files.&lt;/p&gt;
&lt;p&gt;2. &lt;strong&gt;Dynamic Loading&lt;/strong&gt;: Import and instantiate Handlers dynamically at runtime based on the configuration.&lt;/p&gt;
&lt;p&gt;3. &lt;strong&gt;Data-driven Routing&lt;/strong&gt;: Automatically distribute data based on data types, with Handlers unaware of each other.&lt;/p&gt;
&lt;p&gt;4. &lt;strong&gt;Asynchronous Processing&lt;/strong&gt;: Each Handler runs in its own thread, communicating via queues.&lt;/p&gt;
&lt;p&gt;5. &lt;strong&gt;Loose Coupling&lt;/strong&gt;: Handlers do not depend on each other directly, only on data types.&lt;/p&gt;
&lt;p&gt;6. &lt;strong&gt;Easy to Extend&lt;/strong&gt;: To add a new Handler, simply implement the HandlerBase interface.&lt;/p&gt;
&lt;h2&gt;7. Summary&lt;/h2&gt;
&lt;p&gt;OpenAvatarChat adopts a layered, modular architecture design:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Top Layer (ChatEngine)&lt;/strong&gt;: Manages the entire system and supports concurrent multi-session operation.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Middle Layer (ChatSession)&lt;/strong&gt;: Manages a single session and coordinates the collaborative work of Handlers.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Bottom Layer (Handler)&lt;/strong&gt;: Independent functional modules that communicate via data types.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core Mechanisms&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Data Subscription&lt;/strong&gt;: Handlers subscribe to data by declaring input types.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Automatic Routing&lt;/strong&gt;: The system automatically distributes data based on data types.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Parallel Processing&lt;/strong&gt;: Handlers run concurrently in independent threads.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Queue Communication&lt;/strong&gt;: Communication between Handlers is asynchronous and decoupled via queues.&lt;/p&gt;
&lt;p&gt;This design achieves a highly cohesive, loosely coupled architecture that makes the system easy to extend, maintain, and test.&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/openavatarchat-a-detailed-explanation-of-system-architecture-and-handler-collaboration-mechanism/" rel="noopener noreferrer"&gt;OpenAvatarChat: A Detailed Explanation of System Architecture and Handler Collaboration Mechanism&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>Deployment tests of IMTalker and LatentSync</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:54:34 +0000</pubDate>
      <link>https://dev.to/frankfu/deployment-tests-of-imtalker-and-latentsync-1lma</link>
      <guid>https://dev.to/frankfu/deployment-tests-of-imtalker-and-latentsync-1lma</guid>
      <description>&lt;h2&gt;&lt;strong&gt;LatentSync Deployment Test&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;During the LatentSync test on Lambda, I rented A6000 and A100 GPUs. Test results show:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; On the A6000, generating a video for 20 seconds of audio resulted in a video over 100 seconds long.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; On the A100, generation time was similar to the A6000.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generated material:&lt;/strong&gt;&lt;br&gt;I uploaded a video — the same one used with MuseTalk — and combined it with audio, looping for playback.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generation results:&lt;/strong&gt;&lt;br&gt;Except for insufficient clarity around the teeth detail, other mouth details were preserved very well.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real‑time performance:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt;&lt;br&gt;From testing LatentSync under these different hardware setups, we conclude:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Performance gap:&lt;/strong&gt; Although both A6000 and A100 are high‑performance GPUs, video generation speed still fails to reach real‑time or near‑real‑time — generating 20 seconds of audio requires over 100 seconds.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Not suitable for real‑time applications:&lt;/strong&gt; Based on current hardware results, LatentSync is better suited for offline or batch rendering rather than applications requiring quick or real‑time video generation.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Hardware requirements:&lt;/strong&gt; For higher‑quality output or higher‑resolution video generation, stronger GPUs with more VRAM are needed to reduce generation time.&lt;/p&gt;

&lt;h2&gt;&lt;strong&gt;IMTalker Deployment Test&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Currently, IMTalker has been tested remotely, but there are some bugs. After clicking “Generate,” a manual page refresh is required to trigger backend processing. This issue is still being fixed, but partial results are now viewable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generated material:&lt;/strong&gt;&lt;br&gt;Only a single image needs to be uploaded here.&lt;/p&gt;
&lt;img width="800" height="800" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fnavtalk.Luke_.png" alt=""&gt;&lt;p&gt;&lt;strong&gt;Generation results:&lt;/strong&gt;&lt;br&gt;The output video is cropped to a 512×512 region, can blink automatically, and shows very fast real‑time performance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real‑time performance:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt;&lt;br&gt;Based on IMTalker testing, we conclude:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Image cropping:&lt;/strong&gt; The input image is cropped to 512×512 area.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Real‑time performance:&lt;/strong&gt; Real-time performance meets expectations — the video can be generated quickly with synchronized mouth movements.&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/deployment-tests-of-imtalker-and-latentsync/" rel="noopener noreferrer"&gt;Deployment tests of IMTalker and LatentSync&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>NVIDIA Jetson Orin Nano Super Developer Kits – Build MIT Mini Cheetah Robot</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:53:58 +0000</pubDate>
      <link>https://dev.to/frankfu/nvidia-jetson-orin-nano-super-developer-kits-build-mit-mini-cheetah-robot-1478</link>
      <guid>https://dev.to/frankfu/nvidia-jetson-orin-nano-super-developer-kits-build-mit-mini-cheetah-robot-1478</guid>
      <description>&lt;p&gt;This article aims to systematically analyze the technical architecture and implementation details of the MIT Cheetah robot. The content is compiled from publicly available materials and combined with personal practical understanding, intended to provide reference for relevant technical developers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MIT Cheetah System Architecture Diagram:&lt;/strong&gt;&lt;/p&gt;
&lt;img width="768" height="581" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2F9972917e-2fc0-4d7e-b206-9400bc7ffd0f-1-768x581.png" alt=""&gt;&lt;p&gt;&lt;strong&gt;Data Communication Protocol Architecture Diagram:&lt;/strong&gt;&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F12%2F04%2FiJRlwo14krTQ2GP.png" alt="image.png" width="800" height="644"&gt;&lt;h2&gt;1. Introduction to mbedOS&lt;/h2&gt;
&lt;p&gt;Developers who first encounter the MIT Cheetah project may notice that the codebase on GitHub is relatively small, and the compilation method differs from conventional projects. This is primarily because the project uses &lt;strong&gt;mbedOS&lt;/strong&gt; as its underlying development framework.&lt;/p&gt;
&lt;p&gt;The hardware modules of MIT Cheetah have relatively small code volumes. For example, the SPIne module primarily focuses on data interaction processing, while underlying hardware drivers and other basic functions are provided by mbedOS.&lt;/p&gt;
&lt;p&gt;mbedOS is a complete software solution developed by ARM for IoT applications, and it is an embedded open-source ecosystem targeting ARM Cortex-M series processors. For more information, please visit the &lt;a href="https://os.mbed.com/mbed-os/" rel="noopener noreferrer"&gt;mbedOS official website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The following example demonstrates how to initialize the SPI interface in the SPIne module:&lt;/p&gt;
&lt;pre&gt;void init_spi(void){&lt;br&gt;    SPISlave *spi = new SPISlave(PA_7, PA_6, PA_5, PA_4);&lt;br&gt;    spi-&amp;gt;format(16, 0);         //　16bit&lt;br&gt;    spi-&amp;gt;frequency(12000000);　 // 12M&lt;br&gt;    spi-&amp;gt;reply(0x0);&lt;br&gt;    cs.fall(&amp;amp;spi_isr);&lt;br&gt;    printf("done\n\r");&lt;br&gt;}&lt;/pre&gt;
&lt;p&gt;The following is a typical application example of CAN bus communication:&lt;/p&gt;
&lt;pre&gt;#include "mbed.h"&lt;br&gt; &lt;br&gt;DigitalOut myled(D8);&lt;br&gt;CAN can1(PD_0, PD_1,500000);&lt;br&gt;int main() {&lt;br&gt;     CANMessage msg;&lt;br&gt;    while(1) {&lt;br&gt;   if(can1.read(msg)) {&lt;br&gt;            printf("Message received:id=%d,type=%d,%d\n", msg.id,msg.type,msg.data[0]);&lt;br&gt;            myled = !myled;&lt;br&gt;    }&lt;br&gt;    }&lt;br&gt;}&lt;/pre&gt;
&lt;h2&gt;2. MIT Cheetah Open Source Resources&lt;/h2&gt;
&lt;p&gt;The following are open source resource links related to the MIT Cheetah project:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hardware Related:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Motor Controller Hardware: &lt;a href="https://github.com/bgkatz/3phase_integrated" rel="noopener noreferrer"&gt;https://github.com/bgkatz/3phase_integrated&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; SPIne Hardware: &lt;a href="https://github.com/bgkatz/SPIne" rel="noopener noreferrer"&gt;https://github.com/bgkatz/SPIne&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Software Related:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Motor Controller Software: &lt;a href="https://os.mbed.com/users/benkatz/code/Hobbyking_Cheetah_Compact_DRV8323/" rel="noopener noreferrer"&gt;https://os.mbed.com/users/benkatz/code/Hobbyking_Cheetah_Compact_DRV8323/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; SPIne Software: &lt;a href="https://os.mbed.com/users/benkatz/code/SPIne/" rel="noopener noreferrer"&gt;https://os.mbed.com/users/benkatz/code/SPIne/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Linux Control Code (Cheetah Mini): &lt;a href="https://github.com/mit-biomimetics/Cheetah-Software" rel="noopener noreferrer"&gt;https://github.com/mit-biomimetics/Cheetah-Software&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;3. MIT Mini Cheetah Robot System&lt;/h2&gt;
&lt;h3&gt;3.1 Simulation Environment Configuration and Usage&lt;/h3&gt;
&lt;p&gt;After compilation is complete, you need to configure the simulation environment parameters. Navigate to the &lt;code&gt;config&lt;/code&gt; directory under the MIT main folder, open the &lt;code&gt;mini-cheetah-defaults.yaml&lt;/code&gt; file, set &lt;code&gt;control_mode&lt;/code&gt; and &lt;code&gt;cheater_mode&lt;/code&gt; to 1, and set &lt;code&gt;use_rc&lt;/code&gt; to 0. After configuration, save and exit, as shown in the following figure:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F12%2F04%2FjYLKOTtg6QCSA4a.png" alt="image.png" width="800" height="992"&gt;&lt;p&gt;Next, start the robot simulation environment. It is recommended to connect a game controller before starting (optional, for subsequent control). Navigate to the &lt;code&gt;build&lt;/code&gt; directory under the MIT main folder (Note: directly entering the &lt;code&gt;sim&lt;/code&gt; subdirectory may prevent the simulation from starting, so execute in the &lt;code&gt;build&lt;/code&gt; directory), right-click in a blank area and select “Open in Terminal”, then execute the following command:&lt;/p&gt;
&lt;pre&gt;./sim/sim&lt;/pre&gt;
&lt;p&gt;After execution, the robot simulation control interface will be displayed, as shown in the following figure:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2F7Fgd1fjGp9ti5ME.png" alt="image.png" width="800" height="637"&gt;&lt;p&gt;In the control interface, click “Mini Cheetah” and “Simulator” in sequence, then click the “Start” button to launch the robot simulation interface, as shown in the following figure:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2F9hY3U72opOu16X5.png" alt="image.png" width="800" height="471"&gt;&lt;p&gt;Next, start the robot controller. Navigate to the &lt;code&gt;build/user/MIT_Controller&lt;/code&gt; directory under the MIT main folder, right-click in a blank area and select “Open in Terminal”, then execute the following command:&lt;/p&gt;
&lt;pre&gt;./mit_ctrl m s&lt;/pre&gt;
&lt;p&gt;Here, &lt;code&gt;mit_ctrl&lt;/code&gt; is the compiled executable file, parameter &lt;code&gt;m&lt;/code&gt; represents the mini cheetah model, and parameter &lt;code&gt;s&lt;/code&gt; represents simulate (simulation mode). After execution, the robot in the simulation should be able to stand up. At this point, switch to the simulation control interface and change the &lt;code&gt;control_mode&lt;/code&gt; value to 4. You can observe that the robot in the simulation switches to trot (trotting gait), as shown in the following figure:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2FcZtNHSjzqlsXT92.png" alt="image.png" width="800" height="635"&gt;&lt;p&gt;At this point, you can control the robot’s movement speed through the game controller’s joystick. Readers can explore different control modes on their own. The following describes the implementation method for backflip operation:&lt;/p&gt;
&lt;p&gt;        1. Change the &lt;code&gt;control_mode&lt;/code&gt; value in the simulation control interface to 3, and the robot will enter a standing state&lt;/p&gt;
&lt;p&gt;        2. Change the &lt;code&gt;control_mode&lt;/code&gt; value to 9, and the robot will perform a backflip action&lt;/p&gt;
&lt;p&gt;        3. After the backflip is complete, change the &lt;code&gt;control_mode&lt;/code&gt; value to 3 again, then change it to 9 to repeat the backflip&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If the robot falls during operation, you can click the “Go Home” button in the simulation control interface to restore the robot to its initial position. If it cannot be restored, you need to restart the simulation and controller.&lt;/p&gt;
&lt;h3&gt;3.2 Real Robot and Simulation Combined Usage&lt;/h3&gt;
&lt;p&gt;When running the real robot, you need to start both the simulation interface and the controller program:&lt;/p&gt;
&lt;pre&gt;# Terminal 1: Start simulation interface&lt;br&gt;./sim/sim&lt;br&gt;​&lt;br&gt;# Terminal 2: Start controller (real robot mode)&lt;br&gt;./mit_ctrl m r f&lt;/pre&gt;
&lt;p&gt;Here, parameter &lt;code&gt;r&lt;/code&gt; represents robot (real robot mode), and parameter &lt;code&gt;f&lt;/code&gt; represents other configuration options.&lt;/p&gt;
&lt;h2&gt;4. Computer Board Selection&lt;/h2&gt;
&lt;p&gt;The original MIT Mini Cheetah system runs on UP Board, which uses a 4-core Intel Atom x5-Z8350 processor, equipped with 4GB RAM, peak power consumption of approximately 5W, based on x86 architecture.&lt;/p&gt;
&lt;p&gt;UP Board has relatively few applications in the Chinese market. More common choices include Raspberry Pi and NVIDIA Jetson Nano. Among them, Raspberry Pi is more oriented towards general embedded applications, while Jetson Nano is more suitable for image processing and AI model deployment.&lt;/p&gt;
&lt;p&gt;The solution described in this article uses Jetson Nano as the computing platform, running Ubuntu 22 system, equipped with a 6-core ARM Cortex-A78AE v8.2 64-bit processor (ARM architecture).&lt;/p&gt;
&lt;p&gt;It should be noted that for the SPIne board, the GPIO interfaces of UP Board and Jetson Nano are compatible, which provides convenience for platform migration.&lt;/p&gt;
&lt;img width="800" height="669" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2F71tZcfXMFQL._AC_UF8941000_QL80_.jpg" alt=""&gt;&lt;h3&gt;4.1 Jetson Nano Software Environment Configuration&lt;/h3&gt;
&lt;p&gt;The development environment used in this article is Ubuntu 20.04.&lt;/p&gt;
&lt;h4&gt;4.1.1 Download Cheetah-Software Source Code&lt;/h4&gt;
&lt;pre&gt;git clone https://github.com/fuwei007/NavBot-EG02&lt;/pre&gt;
&lt;h4&gt;4.1.2 Install Third-Party Dependency Libraries&lt;/h4&gt;
&lt;pre&gt;sudo apt-get update&lt;br&gt;sudo apt -y install cmake gcc build-essential&lt;br&gt;sudo apt-get -y install openjdk-11-jdk&lt;br&gt;sudo apt -y install liblcm-dev&lt;br&gt;sudo apt-get -y install libeigen3-dev&lt;br&gt;sudo apt-get -y install mesa-common-dev&lt;br&gt;sudo apt -y install libgl1-mesa-dev&lt;br&gt;sudo apt -y install libglu1-mesa-dev&lt;br&gt;sudo apt-get -y install freeglut3-dev&lt;br&gt;sudo apt-get -y install libblas-dev liblapack-dev&lt;br&gt;sudo apt-get -y  install libopenblas-dev&lt;br&gt;​&lt;br&gt;sudo apt install -y coinor-libipopt-dev gfortran libglib2.0-dev&lt;br&gt;sudo apt install -y openjdk-8-jdk&lt;/pre&gt;
&lt;h4&gt;4.1.3 Install Qt&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Method 1: Source Code Compilation Installation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Download Qt 5.14.2 version: &lt;a href="https://link.zhihu.com/?target=https%3A//download.qt.io/archive/qt/5.14/5.14.2/qt-opensource-linux-x64-5.14.2.run" rel="noopener noreferrer"&gt;Qt 5.14.2 Download Link&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After downloading, navigate to the directory where the file is located, right-click the Qt installation file, select “Properties” → “Permissions”, and check “Allow executing file as program”. Then open a terminal in that directory and execute the following command to start the Qt installation program (Note: the filename in the command should match the actual downloaded filename):&lt;/p&gt;
&lt;pre&gt;./qt-opensource-linux-x64-5.14.2.run&lt;/pre&gt;
&lt;p&gt;The installation process is similar to installing programs on Windows, just follow the wizard to complete the installation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Method 2: Install Using apt Package Manager&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;sudo apt install -y libqt5 libqt5gamepad5&lt;/pre&gt;
&lt;h4&gt;4.1.4 Install LCM&lt;/h4&gt;
&lt;p&gt;LCM (Lightweight Communications and Marshalling) is a library for message passing and marshalling.&lt;/p&gt;
&lt;p&gt;Download LCM 1.4.0 installation package: &lt;a href="https://link.zhihu.com/?target=https%3A//github.com/lcm-proj/lcm/releases/download/v1.4.0/lcm-1.4.0.zip" rel="noopener noreferrer"&gt;LCM v1.4.0 Download Link&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After downloading, extract the archive, navigate to the extracted folder, right-click in a blank area and select “Open in Terminal”, then execute the following commands in sequence (it is recommended to execute them one by one):&lt;/p&gt;
&lt;pre&gt;mkdir build &lt;br&gt;cd build &lt;br&gt;cmake .. &lt;br&gt;make &lt;br&gt;sudo make install &lt;br&gt;sudo ldconfig&lt;/pre&gt;
&lt;h4&gt;4.1.5 Install Eigen 3.3.6&lt;/h4&gt;
&lt;p&gt;Eigen is a C++ template library for linear algebra, matrix and vector operations. Based on practical experience, Eigen 3.3.6 version has good compatibility with the MIT Cheetah project, and other versions may have compatibility issues. It is recommended to use version 3.3.6.&lt;/p&gt;
&lt;p&gt;Download Eigen 3.3.6: &lt;a href="https://link.zhihu.com/?target=https%3A//gitlab.com/libeigen/eigen/-/archive/3.3.6/eigen-3.3.6.zip" rel="noopener noreferrer"&gt;Eigen 3.3.6 Download Link&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After downloading, extract the archive, navigate to the extracted folder, right-click in a blank area and select “Open in Terminal”, then execute the following commands in sequence (it is recommended to execute them one by one):&lt;/p&gt;
&lt;pre&gt;mkdir build &lt;br&gt;cd build &lt;br&gt;cmake .. &lt;br&gt;make &lt;br&gt;sudo make install &lt;br&gt;sudo ldconfig&lt;/pre&gt;
&lt;h4&gt;4.1.6 Modify Source Code Configuration&lt;/h4&gt;
&lt;p&gt;Navigate to the Cheetah-Software main folder (hereinafter referred to as MIT main folder). The folder structure is shown in the following figure:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2FfT6vP2tcIyMYBOu.png" alt="image.png" width="800" height="451"&gt;&lt;p&gt;The following modifications need to be made:&lt;/p&gt;
&lt;h5&gt;&lt;strong&gt;Step1: Modify Branch Name in CMakeLists.txt&lt;/strong&gt;&lt;/h5&gt;
&lt;p&gt;Open the &lt;code&gt;common/CMakeLists.txt&lt;/code&gt; file under the MIT main folder, and change &lt;code&gt;master&lt;/code&gt; at the position marked in the figure below to &lt;code&gt;main&lt;/code&gt;. At the same time, since pulling the googletest library from GitHub is slow in China, it is recommended to switch to the Gitee mirror source.&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2FP4LaXVvb82DqQ5U.png" alt="image.png" width="800" height="454"&gt;&lt;p&gt;After modification, save and exit.&lt;/p&gt;
&lt;h5&gt;&lt;strong&gt;step2: Modify Eigen3 and LCM Header File Paths&lt;/strong&gt;&lt;/h5&gt;
&lt;p&gt;Modify the header file include paths according to the actual installation path. If Eigen3 and LCM are installed in the &lt;code&gt;/usr/include&lt;/code&gt; directory (rather than the default &lt;code&gt;/usr/local/include&lt;/code&gt; in the source code), you need to modify the include paths in all related files.&lt;/p&gt;
&lt;p&gt;Search and replace the following content:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Original Path:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;include_directories("/usr/local/include/lcm/")&lt;br&gt;include_directories("/usr/local/include/eigen3")&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Replace with:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;include_directories("/usr/include/lcm/")&lt;br&gt;include_directories("/usr/include/eigen3")&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;List of Files to Modify:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;Cheetah-Software-master/common/CMakeLists.txt&lt;br&gt;Cheetah-Software-master/rc_test/CMakeLists.txt&lt;br&gt;Cheetah-Software-master/robot/CMakeLists.txt&lt;br&gt;Cheetah-Software-master/sim/CMakeLists.txt&lt;br&gt;Cheetah-Software-master/user/MIT_Controller/CMakeLists.txt&lt;/pre&gt;
&lt;h5&gt;&lt;strong&gt;step3: Modify Qt Path Configuration&lt;/strong&gt;&lt;/h5&gt;
&lt;p&gt;Modify the file &lt;code&gt;Cheetah-Software/scripts/find_qt_path.sh&lt;/code&gt;, comment out the default Qt path:&lt;/p&gt;
&lt;pre&gt;#printf "${HOME}/Qt/${QT_VER}/gcc_64/"&lt;/pre&gt;
&lt;p&gt;Then add your own Qt installation path, as shown in the following figure:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2FdjzZOGoauXgI3nW.png" alt="image.png" width="464" height="218"&gt;&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The path after &lt;code&gt;printf&lt;/code&gt; should include the &lt;code&gt;bin&lt;/code&gt; directory.&lt;/p&gt;
&lt;h5&gt;&lt;strong&gt;step4: Fix Serial Port Header File Missing Issue&lt;/strong&gt;&lt;/h5&gt;
&lt;p&gt;Modify the file &lt;code&gt;Cheetah-Software/robot/src/rt/rt_serial.cpp&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Comment out &lt;code&gt;#include &amp;lt;stropts.h&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Add &lt;code&gt;#include &amp;lt;sys/ioctl.h&amp;gt;&lt;/code&gt; before &lt;code&gt;#include &amp;lt;asm/termios.h&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Then fix the redefinition: vim /usr/include/asm-generic/termios.h Add &lt;code&gt;#ifndef _SYS_IOCTL_H&lt;/code&gt; and &lt;code&gt;#endif&lt;/code&gt; at the following position:&lt;/p&gt;
&lt;pre&gt;#ifndef _SYS_IOCTL_H&lt;br&gt;struct winsize {&lt;br&gt;        unsigned short ws_row;&lt;br&gt;        unsigned short ws_col;&lt;br&gt;        unsigned short ws_xpixel;&lt;br&gt;        unsigned short ws_ypixel;&lt;br&gt;};&lt;br&gt;&lt;br&gt;#define NCC 8&lt;br&gt;struct termio {&lt;br&gt;        unsigned short c_iflag;         /* input mode flags */&lt;br&gt;        unsigned short c_oflag;         /* output mode flags */&lt;br&gt;        unsigned short c_cflag;         /* control mode flags */&lt;br&gt;        unsigned short c_lflag;         /* local mode flags */&lt;br&gt;        unsigned char c_line;           /* line discipline */&lt;br&gt;        unsigned char c_cc[NCC];        /* control characters */&lt;br&gt;};&lt;br&gt;#endif&lt;/pre&gt;
&lt;h4&gt;4.1.7 Compile Program&lt;/h4&gt;
&lt;p&gt;It is recommended to build inside the &lt;code&gt;mc-build&lt;/code&gt; folder at the project root directory :&lt;/p&gt;
&lt;pre&gt;cd mc-build&lt;br&gt;rm CMakeCache.txt  # Clean up old configuration (if present)&lt;br&gt;&lt;br&gt;# Configure the project&lt;br&gt;# -DMINI_CHEETAH_BUILD=TRUE: build the Mini Cheetah version  &lt;br&gt;# -DJCQP_USE_AVX2=OFF: disable x86 AVX2 optimizations, so it suits ARM architectures (e.g., Jetson Nano / NX)  &lt;br&gt;cmake -DMINI_CHEETAH_BUILD=TRUE -DJCQP_USE_AVX2=OFF ..&lt;br&gt;&lt;br&gt;# Build (adjust the -j parameter according to the number of CPU cores)&lt;br&gt;make -j4&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; File deletion related errors that appear when executing &lt;code&gt;./make_types.sh&lt;/code&gt; can be ignored&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; The &lt;code&gt;cmake&lt;/code&gt; command may get stuck at a certain step (usually related to Google services), which is a network issue and requires patience&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;make -j$(nproc)&lt;/code&gt; will automatically use all available CPU cores for parallel compilation. If your system doesn’t support this, you can use the &lt;code&gt;make&lt;/code&gt; command instead, but compilation speed will be slower&lt;/p&gt;
&lt;h2&gt;5. SPIne Data Communication Conversion Board&lt;/h2&gt;
&lt;p&gt;SPIne is a key communication conversion module in the MIT Cheetah system, responsible for data conversion and transmission between the computer board and motor controllers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Open Source Code Download Address:&lt;/strong&gt; &lt;a href="https://gitee.com/lookc4/spine" rel="noopener noreferrer"&gt;SPIne Firmware&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;5.1 Communication Rate Configuration&lt;/h3&gt;
&lt;p&gt;1. &lt;strong&gt;CAN Bus Communication:&lt;/strong&gt; The communication rate of each CAN bus is configured to 1Mbps. SPIne uses two STM32 microcontrollers because a single CAN bus does not have sufficient bandwidth to support all motor communication requirements. Each STM32 provides two CAN buses, and each CAN bus is responsible for three motors’ communication to achieve a 1000Hz communication frequency. If a single CAN bus is responsible for two legs (six motors), it cannot achieve the required communication frequency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. SPI Communication:&lt;/strong&gt; The SPI communication clock frequency between SPIne and the computer board is 12MHz, with a communication frequency of 1000Hz.&lt;/p&gt;
&lt;h3&gt;5.2 Communication Data Format&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;CAN Format:&lt;/strong&gt; Each frame is 8 bytes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SPIne → Joint Motor Controller (Command, 8 bytes):&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Position command: 16bit&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Velocity command: 12bit&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Kp (position gain): 12bit&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Kd (velocity gain): 12bit&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Feedforward torque: 12bit&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Joint Motor Controller → SPIne (Feedback, 5 bytes):&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Position information: 16bit&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Velocity information: 12bit&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Current (torque): 12bit&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PC → SPIne (Command, 132 bytes):&lt;/strong&gt; Contains 33 data items: position commands, velocity commands, Kp, Kd, and feedforward torque for 6 joints, plus two flags and one checksum.&lt;/p&gt;
&lt;h3&gt;5.3 Code Architecture&lt;/h3&gt;
&lt;p&gt;The code structure of the SPIne firmware is shown in the following figure:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F10%2F30%2FurBwg3GDTP1mpcQ.png" alt="image.png" width="235" height="167"&gt;&lt;p&gt;&lt;strong&gt;Main Module Descriptions:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;leg_message:&lt;/strong&gt; Responsible for data downlink and uplink encapsulation between UPboard and SPIne, as well as data encapsulation between SPIne and motor controllers&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;math_ops:&lt;/strong&gt; Provides mathematical operation functions&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;main:&lt;/strong&gt; Program main entry function&lt;/p&gt;
&lt;h3&gt;5.4 Communication Protocol Details&lt;/h3&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F10%2F30%2FHFvSJYiINnOE983.png" alt="image.png" width="656" height="450"&gt;&lt;p&gt;UPboard Computer Board &amp;lt;—-&amp;gt; SPIne Firmware&lt;/p&gt;
&lt;p&gt;SPI Communication Protocol:&lt;/p&gt;
&lt;pre&gt;UPboard Computer Board----&amp;gt;SPIne Firmware&lt;br&gt;&lt;br&gt;// 60 bytes&lt;br&gt;// 30 16-bit words&lt;br&gt;struct spi_data_t&lt;br&gt;{&lt;br&gt;    float q_abad[2];&lt;br&gt;    float q_hip[2];&lt;br&gt;    float q_knee[2];&lt;br&gt;    float qd_abad[2];&lt;br&gt;    float qd_hip[2];&lt;br&gt;    float qd_knee[2];&lt;br&gt;    int32_t flags[2];&lt;br&gt;    int32_t checksum;&lt;br&gt;};&lt;br&gt;&lt;br&gt;UPboard Computer Board&amp;lt;----SPIne Firmware&lt;br&gt;&lt;br&gt;// 132 bytes&lt;br&gt;// 66 16-bit words&lt;br&gt;struct spi_command_t&lt;br&gt;{&lt;br&gt;    float q_des_abad[2];&lt;br&gt;    float q_des_hip[2];&lt;br&gt;    float q_des_knee[2];&lt;br&gt;    float qd_des_abad[2];&lt;br&gt;    float qd_des_hip[2];&lt;br&gt;    float qd_des_knee[2];&lt;br&gt;    float kp_abad[2];&lt;br&gt;    float kp_hip[2];&lt;br&gt;    float kp_knee[2];&lt;br&gt;    float kd_abad[2];&lt;br&gt;    float kd_hip[2];&lt;br&gt;    float kd_knee[2];&lt;br&gt;    float tau_abad_ff[2];&lt;br&gt;    float tau_hip_ff[2];&lt;br&gt;    float tau_knee_ff[2];&lt;br&gt;    int32_t flags[2];&lt;br&gt;    int32_t checksum;&lt;br&gt;};&lt;/pre&gt;
&lt;p&gt;SPIne Firmware &amp;lt;—-&amp;gt; Motor Controller&lt;/p&gt;
&lt;p&gt;CAN Communication Protocol:&lt;/p&gt;
&lt;pre&gt;SPIne Firmware----&amp;gt;Motor Controller&lt;br&gt;&lt;br&gt;/// CAN Command Packet Structure ///&lt;br&gt;/// 16 bit position command, between -4*pi and 4*pi&lt;br&gt;/// 12 bit velocity command, between -30 and + 30 rad/s&lt;br&gt;/// 12 bit kp, between 0 and 500 N-m/rad&lt;br&gt;/// 12 bit kd, between 0 and 100 N-m*s/rad&lt;br&gt;/// 12 bit feed forward torque, between -18 and 18 N-m&lt;br&gt;/// CAN Packet is 8 8-bit words&lt;br&gt;/// Formatted as follows.  For each quantity, bit 0 is LSB&lt;br&gt;/// 0: [position[15-8]]&lt;br&gt;/// 1: [position[7-0]] &lt;br&gt;/// 2: [velocity[11-4]]&lt;br&gt;/// 3: [velocity[3-0], kp[11-8]]&lt;br&gt;/// 4: [kp[7-0]]&lt;br&gt;/// 5: [kd[11-4]]&lt;br&gt;/// 6: [kd[3-0], torque[11-8]]&lt;br&gt;/// 7: [torque[7-0]]&lt;br&gt;&lt;br&gt;&lt;br&gt;SPIne Firmware&amp;lt;----Motor Controller&lt;br&gt;&lt;br&gt;/// CAN Reply Packet Structure &lt;br&gt;/// 16 bit position, between -4*pi and 4*pi&lt;br&gt;/// 12 bit velocity, between -30 and + 30 rad/s&lt;br&gt;/// 12 bit current, between -40 and 40;&lt;br&gt;/// CAN Packet is 5 8-bit words&lt;br&gt;/// Formatted as follows.  For each quantity, bit 0 is LSB&lt;br&gt;/// 0: [position[15-8]]&lt;br&gt;/// 1: [position[7-0]] &lt;br&gt;/// 2: [velocity[11-4]]&lt;br&gt;/// 3: [velocity[3-0], current[11-8]]&lt;br&gt;/// 4: [current[7-0]]&lt;/pre&gt;
&lt;h3&gt;5.5 SPIne Firmware SPI Communication Implementation Analysis&lt;/h3&gt;
&lt;h4&gt;5.5.1 Data Buffer Definition&lt;/h4&gt;
&lt;pre&gt;// Receive and transmit byte count&lt;br&gt;#define RX_LEN 66&lt;br&gt;#define TX_LEN 66&lt;br&gt;// SPI data buffer&lt;br&gt;uint16_t rx_buff[RX_LEN];&lt;br&gt;uint16_t tx_buff[TX_LEN];&lt;/pre&gt;
&lt;h4&gt;5.5.2 SPIne and UPboard Data Encapsulation Structure&lt;/h4&gt;
&lt;pre&gt;spi_data_t spi_data; // data from spine to up&lt;br&gt;spi_command_t spi_command; // data from up to spine&lt;br&gt;&lt;br&gt;// 60 bytes&lt;br&gt;// 30 16-bit words&lt;br&gt;struct spi_data_t&lt;br&gt;{    //position&lt;br&gt;    float q_abad[2];&lt;br&gt;    float q_hip[2];&lt;br&gt;    float q_knee[2];&lt;br&gt;     //velocity&lt;br&gt;    float qd_abad[2];&lt;br&gt;    float qd_hip[2];&lt;br&gt;    float qd_knee[2];&lt;br&gt;    //flags and checksum&lt;br&gt;    int32_t flags[2];&lt;br&gt;    int32_t checksum;&lt;br&gt;};&lt;br&gt;&lt;br&gt;// 132 bytes&lt;br&gt;// 66 16-bit words&lt;br&gt;struct spi_command_t&lt;br&gt;{   &lt;br&gt;    //position&lt;br&gt;    float q_des_abad[2];&lt;br&gt;    float q_des_hip[2];&lt;br&gt;    float q_des_knee[2];&lt;br&gt;   //velocity&lt;br&gt;    float qd_des_abad[2];&lt;br&gt;    float qd_des_hip[2];&lt;br&gt;    float qd_des_knee[2];&lt;br&gt;    //gain KP&lt;br&gt;    float kp_abad[2];&lt;br&gt;    float kp_hip[2];&lt;br&gt;    float kp_knee[2];&lt;br&gt;    //gain KD&lt;br&gt;    float kd_abad[2];&lt;br&gt;    float kd_hip[2];&lt;br&gt;    float kd_knee[2];&lt;br&gt;     //feedforward torque&lt;br&gt;    float tau_abad_ff[2];&lt;br&gt;    float tau_hip_ff[2];&lt;br&gt;    float tau_knee_ff[2];&lt;br&gt;    //flags and checksum&lt;br&gt;    int32_t flags[2];&lt;br&gt;    int32_t checksum;&lt;br&gt;};&lt;/pre&gt;
&lt;h4&gt;5.5.3 SPI Interrupt Service Routine&lt;/h4&gt;
&lt;p&gt;The SPI interrupt service routine is responsible for handling SPI communication data transmission and reception:&lt;/p&gt;
&lt;pre&gt;void spi_isr(void)&lt;br&gt;{&lt;br&gt;    GPIOC-&amp;gt;ODR |= (1 &amp;lt;&amp;lt; 8);&lt;br&gt;    GPIOC-&amp;gt;ODR &amp;amp;= ~(1 &amp;lt;&amp;lt; 8);&lt;br&gt;    int bytecount = 0;&lt;br&gt;    SPI1-&amp;gt;DR = tx_buff[0];&lt;br&gt;    while(cs == 0) {&lt;br&gt;        if(SPI1-&amp;gt;SR&amp;amp;0x1) {&lt;br&gt;            rx_buff[bytecount] = SPI1-&amp;gt;DR;//data reception&lt;br&gt;            bytecount++;&lt;br&gt;            if(bytecount&amp;lt;TX_LEN) {&lt;br&gt;                SPI1-&amp;gt;DR = tx_buff[bytecount];//data transmission&lt;br&gt;            }&lt;br&gt;        }&lt;br&gt;    }  &lt;br&gt;    // after reading, save into spi_command&lt;br&gt;    //After reading data, save it in the spi_command structure&lt;br&gt;    // should probably check checksum first!&lt;br&gt;    //data checksum&lt;br&gt;    uint32_t calc_checksum = xor_checksum((uint32_t*)rx_buff,32);&lt;br&gt;    for(int i = 0; i &amp;lt; CMD_LEN; i++)&lt;br&gt;    {&lt;br&gt;        ((uint16_t*)(&amp;amp;spi_command))[i] = rx_buff[i];//spi_command data assignment&lt;br&gt;    }&lt;br&gt;    &lt;br&gt;    // run control, which fills in tx_buff for the next iteration&lt;br&gt;    //&lt;br&gt;    if(calc_checksum != spi_command.checksum){&lt;br&gt;        spi_data.flags[1] = 0xdead;}&lt;br&gt;        &lt;br&gt;    //test_control();&lt;br&gt;    //spi_data.q_abad[0] = 12.0f;&lt;br&gt;    control();//Assign the state information received from the robot motor controller to tx_buff and send it to UPboard via SPI&lt;br&gt;    PackAll();//Write the control data received from UPboard into CAN buffer &lt;br&gt;    WriteAll();//Send CAN buffer data to leg motor controllers via CAN&lt;br&gt;}&lt;/pre&gt;
&lt;h4&gt;5.5.4 Control Function Implementation&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;control()&lt;/code&gt; function is responsible for assigning the state information received from the motor controller to &lt;code&gt;tx_buff&lt;/code&gt; and sending it to UPboard via SPI:&lt;/p&gt;
&lt;pre&gt;void control()&lt;br&gt;{&lt;br&gt;    //Enter motor mode&lt;br&gt;    if(((spi_command.flags[0]&amp;amp;0x1)==1)  &amp;amp;&amp;amp; (enabled==0)){&lt;br&gt;        enabled = 1;&lt;br&gt;        EnterMotorMode(&amp;amp;a1_can);&lt;br&gt;        can1.write(a1_can);&lt;br&gt;        EnterMotorMode(&amp;amp;a2_can);&lt;br&gt;        can2.write(a2_can);&lt;br&gt;        EnterMotorMode(&amp;amp;k1_can);&lt;br&gt;        can1.write(k1_can);&lt;br&gt;        EnterMotorMode(&amp;amp;k2_can);&lt;br&gt;        can2.write(k2_can);&lt;br&gt;        EnterMotorMode(&amp;amp;h1_can);&lt;br&gt;        can1.write(h1_can);&lt;br&gt;        EnterMotorMode(&amp;amp;h2_can);&lt;br&gt;        can2.write(h2_can);&lt;br&gt;        printf("e\n\r");&lt;br&gt;        return;&lt;br&gt;    }&lt;br&gt;      //Exit motor mode&lt;br&gt;    else if((((spi_command.flags[0]&amp;amp;0x1))==0)  &amp;amp;&amp;amp; (enabled==1)){&lt;br&gt;         enabled = 0;&lt;br&gt;        ExitMotorMode(&amp;amp;a1_can);&lt;br&gt;        can1.write(a1_can);&lt;br&gt;        ExitMotorMode(&amp;amp;a2_can);&lt;br&gt;        can2.write(a2_can);&lt;br&gt;        ExitMotorMode(&amp;amp;h1_can);&lt;br&gt;        can1.write(h1_can);&lt;br&gt;        ExitMotorMode(&amp;amp;h2_can);&lt;br&gt;        can2.write(h2_can);&lt;br&gt;        ExitMotorMode(&amp;amp;k1_can);&lt;br&gt;        can1.write(k1_can);&lt;br&gt;        ExitMotorMode(&amp;amp;k2_can);&lt;br&gt;        can2.write(k2_can);&lt;br&gt;        printf("x\n\r");&lt;br&gt;        return;&lt;br&gt;        }&lt;br&gt;    //Assign the state information received from the motor controller to spi_data (send to UPboard)&lt;br&gt;    spi_data.q_abad[0] = l1_state.a.p;&lt;br&gt;    spi_data.q_hip[0] = l1_state.h.p;&lt;br&gt;    spi_data.q_knee[0] = l1_state.k.p;&lt;br&gt;    spi_data.qd_abad[0] = l1_state.a.v;&lt;br&gt;    spi_data.qd_hip[0] = l1_state.h.v;&lt;br&gt;    spi_data.qd_knee[0] = l1_state.k.v;&lt;br&gt;    &lt;br&gt;    spi_data.q_abad[1] = l2_state.a.p;&lt;br&gt;    spi_data.q_hip[1] = l2_state.h.p;&lt;br&gt;    spi_data.q_knee[1] = l2_state.k.p;&lt;br&gt;    spi_data.qd_abad[1] = l2_state.a.v;&lt;br&gt;    spi_data.qd_hip[1] = l2_state.h.v;&lt;br&gt;    spi_data.qd_knee[1] = l2_state.k.v;&lt;br&gt;       &lt;br&gt;    if(estop==0){//Emergency stop&lt;br&gt;        //printf("estopped!!!!\n\r");&lt;br&gt;        memset(&amp;amp;l1_control, 0, sizeof(l1_control));&lt;br&gt;        memset(&amp;amp;l2_control, 0, sizeof(l2_control));&lt;br&gt;        spi_data.flags[0] = 0xdead;&lt;br&gt;        spi_data.flags[1] = 0xdead;&lt;br&gt;        led = 1;&lt;br&gt;        }&lt;br&gt;    &lt;br&gt;    else{//Running state, assign the spi_command data received from UPboard to l1_control (send to motor controller)&lt;br&gt;        led = 0;&lt;br&gt;        &lt;br&gt;        memset(&amp;amp;l1_control, 0, sizeof(l1_control));&lt;br&gt;        memset(&amp;amp;l2_control, 0, sizeof(l2_control));&lt;br&gt;        &lt;br&gt;        l1_control.a.p_des = spi_command.q_des_abad[0];&lt;br&gt;        l1_control.a.v_des  = spi_command.qd_des_abad[0];&lt;br&gt;        l1_control.a.kp = spi_command.kp_abad[0];&lt;br&gt;        l1_control.a.kd = spi_command.kd_abad[0];&lt;br&gt;        l1_control.a.t_ff = spi_command.tau_abad_ff[0];&lt;br&gt;        &lt;br&gt;        l1_control.h.p_des = spi_command.q_des_hip[0];&lt;br&gt;        l1_control.h.v_des  = spi_command.qd_des_hip[0];&lt;br&gt;        l1_control.h.kp = spi_command.kp_hip[0];&lt;br&gt;        l1_control.h.kd = spi_command.kd_hip[0];&lt;br&gt;        l1_control.h.t_ff = spi_command.tau_hip_ff[0];&lt;br&gt;        &lt;br&gt;        l1_control.k.p_des = spi_command.q_des_knee[0];&lt;br&gt;        l1_control.k.v_des  = spi_command.qd_des_knee[0];&lt;br&gt;        l1_control.k.kp = spi_command.kp_knee[0];&lt;br&gt;        l1_control.k.kd = spi_command.kd_knee[0];&lt;br&gt;        l1_control.k.t_ff = spi_command.tau_knee_ff[0];&lt;br&gt;        &lt;br&gt;        l2_control.a.p_des = spi_command.q_des_abad[1];&lt;br&gt;        l2_control.a.v_des  = spi_command.qd_des_abad[1];&lt;br&gt;        l2_control.a.kp = spi_command.kp_abad[1];&lt;br&gt;        l2_control.a.kd = spi_command.kd_abad[1];&lt;br&gt;        l2_control.a.t_ff = spi_command.tau_abad_ff[1];&lt;br&gt;        &lt;br&gt;        l2_control.h.p_des = spi_command.q_des_hip[1];&lt;br&gt;        l2_control.h.v_des  = spi_command.qd_des_hip[1];&lt;br&gt;        l2_control.h.kp = spi_command.kp_hip[1];&lt;br&gt;        l2_control.h.kd = spi_command.kd_hip[1];&lt;br&gt;        l2_control.h.t_ff = spi_command.tau_hip_ff[1];&lt;br&gt;        &lt;br&gt;        l2_control.k.p_des = spi_command.q_des_knee[1];&lt;br&gt;        l2_control.k.v_des  = spi_command.qd_des_knee[1];&lt;br&gt;        l2_control.k.kp = spi_command.kp_knee[1];&lt;br&gt;        l2_control.k.kd = spi_command.kd_knee[1];&lt;br&gt;        l2_control.k.t_ff = spi_command.tau_knee_ff[1];&lt;br&gt;        &lt;br&gt;        //Soft stop program to prevent stopping too abruptly&lt;br&gt;        spi_data.flags[0] = 0;&lt;br&gt;        spi_data.flags[1] = 0;&lt;br&gt;        spi_data.flags[0] |= softstop_joint(l1_state.a, &amp;amp;l1_control.a, A_LIM_P, A_LIM_N);&lt;br&gt;        spi_data.flags[0] |= (softstop_joint(l1_state.h, &amp;amp;l1_control.h, H_LIM_P, H_LIM_N))&amp;lt;&amp;lt;1;&lt;br&gt;        //spi_data.flags[0] |= (softstop_joint(l1_state.k, &amp;amp;l1_control.k, K_LIM_P, K_LIM_N))&amp;lt;&amp;lt;2;&lt;br&gt;        spi_data.flags[1] |= softstop_joint(l2_state.a, &amp;amp;l2_control.a, A_LIM_P, A_LIM_N);&lt;br&gt;        spi_data.flags[1] |= (softstop_joint(l2_state.h, &amp;amp;l2_control.h, H_LIM_P, H_LIM_N))&amp;lt;&amp;lt;1;&lt;br&gt;        //spi_data.flags[1] |= (softstop_joint(l2_state.k, &amp;amp;l2_control.k, K_LIM_P, K_LIM_N))&amp;lt;&amp;lt;2;&lt;br&gt;    }&lt;br&gt;    spi_data.checksum = xor_checksum((uint32_t*)&amp;amp;spi_data,14);&lt;br&gt;    for(int i = 0; i &amp;lt; DATA_LEN; i++){&lt;br&gt;        tx_buff[i] = ((uint16_t*)(&amp;amp;spi_data))[i];}&lt;br&gt;    &lt;br&gt;}&lt;/pre&gt;
&lt;h4&gt;5.5.5 Soft Stop Program Implementation&lt;/h4&gt;
&lt;p&gt;The soft stop program is used to prevent overly abrupt movements when joint motion exceeds limits:&lt;/p&gt;
&lt;pre&gt;int softstop_joint(joint_state state, joint_control * control, float limit_p, float limit_n){&lt;br&gt;    if((state.p)&amp;gt;=limit_p){&lt;br&gt;        //control-&amp;gt;p_des = limit_p;&lt;br&gt;        control-&amp;gt;v_des = 0.0f;&lt;br&gt;        control-&amp;gt;kp = 0;&lt;br&gt;        control-&amp;gt;kd = KD_SOFTSTOP;&lt;br&gt;        control-&amp;gt;t_ff += KP_SOFTSTOP*(limit_p - state.p);&lt;br&gt;        return 1;&lt;br&gt;    }&lt;br&gt;    else if((state.p)&amp;lt;=limit_n){&lt;br&gt;        //control-&amp;gt;p_des = limit_n;&lt;br&gt;        control-&amp;gt;v_des = 0.0f;&lt;br&gt;        control-&amp;gt;kp = 0;&lt;br&gt;        control-&amp;gt;kd = KD_SOFTSTOP;&lt;br&gt;        control-&amp;gt;t_ff += KP_SOFTSTOP*(limit_n - state.p);&lt;br&gt;        return 1;&lt;br&gt;    }&lt;br&gt;    return 0;&lt;br&gt;    &lt;br&gt;    }&lt;/pre&gt;
&lt;h4&gt;5.5.6 Data Packing Function&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;PackAll()&lt;/code&gt; function is responsible for packing the control data received from UPboard into the CAN buffer:&lt;/p&gt;
&lt;pre&gt;//l1_control encapsulates the data received from UPboard, i.e., encapsulates spi_command&lt;br&gt;struct leg_control{&lt;br&gt;    joint_control a, h, k;&lt;br&gt;    }&lt;br&gt;struct joint_control{&lt;br&gt;    float p_des, v_des, kp, kd, t_ff;//position, velocity, KP, KD, torque t_ff&lt;br&gt;    };&lt;br&gt;void PackAll(){&lt;br&gt;    pack_cmd(&amp;amp;a1_can, l1_control.a); //Left leg 1 ankle motor&lt;br&gt;    pack_cmd(&amp;amp;a2_can, l2_control.a); //Left leg 2 ankle motor&lt;br&gt;    pack_cmd(&amp;amp;h1_can, l1_control.h); //Left leg 1 hip motor&lt;br&gt;    pack_cmd(&amp;amp;h2_can, l2_control.h); //Left leg 2 hip motor&lt;br&gt;    pack_cmd(&amp;amp;k1_can, l1_control.k); //Left leg 1 Knee motor&lt;br&gt;    pack_cmd(&amp;amp;k2_can, l2_control.k); //Right leg 2 Knee motor&lt;br&gt;    &lt;br&gt;    }&lt;/pre&gt;
&lt;h4&gt;5.5.7 CAN Data Packing Function&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;pack_cmd()&lt;/code&gt; function parses the control information sent from UPboard and packs it into the CAN buffer, ready to be sent to the motor controller:&lt;/p&gt;
&lt;pre&gt;/// CAN Command Packet Structure ///&lt;br&gt;/// 16 bit position command, between -4*pi and 4*pi&lt;br&gt;/// 12 bit velocity command, between -30 and + 30 rad/s&lt;br&gt;/// 12 bit kp, between 0 and 500 N-m/rad&lt;br&gt;/// 12 bit kd, between 0 and 100 N-m*s/rad&lt;br&gt;/// 12 bit feed forward torque, between -18 and 18 N-m&lt;br&gt;/// CAN Packet is 8 8-bit words&lt;br&gt;/// Formatted as follows.  For each quantity, bit 0 is LSB&lt;br&gt;/// 0: [position[15-8]]&lt;br&gt;/// 1: [position[7-0]] &lt;br&gt;/// 2: [velocity[11-4]]&lt;br&gt;/// 3: [velocity[3-0], kp[11-8]]&lt;br&gt;/// 4: [kp[7-0]]&lt;br&gt;/// 5: [kd[11-4]]&lt;br&gt;/// 6: [kd[3-0], torque[11-8]]&lt;br&gt;/// 7: [torque[7-0]]&lt;br&gt;&lt;br&gt;void pack_cmd(CANMessage * msg, joint_control joint){&lt;br&gt;     &lt;br&gt;     /// limit data to be within bounds ///&lt;br&gt;     float p_des = fminf(fmaxf(P_MIN, joint.p_des), P_MAX);                    &lt;br&gt;     float v_des = fminf(fmaxf(V_MIN, joint.v_des), V_MAX);&lt;br&gt;     float kp = fminf(fmaxf(KP_MIN, joint.kp), KP_MAX);&lt;br&gt;     float kd = fminf(fmaxf(KD_MIN, joint.kd), KD_MAX);&lt;br&gt;     float t_ff = fminf(fmaxf(T_MIN, joint.t_ff), T_MAX);&lt;br&gt;     /// convert floats to unsigned ints ///&lt;br&gt;     uint16_t p_int = float_to_uint(p_des, P_MIN, P_MAX, 16);            &lt;br&gt;     uint16_t v_int = float_to_uint(v_des, V_MIN, V_MAX, 12);&lt;br&gt;     uint16_t kp_int = float_to_uint(kp, KP_MIN, KP_MAX, 12);&lt;br&gt;     uint16_t kd_int = float_to_uint(kd, KD_MIN, KD_MAX, 12);&lt;br&gt;     uint16_t t_int = float_to_uint(t_ff, T_MIN, T_MAX, 12);&lt;br&gt;     /// pack ints into the can buffer ///&lt;br&gt;     msg-&amp;gt;data[0] = p_int&amp;gt;&amp;gt;8;                                       &lt;br&gt;     msg-&amp;gt;data[1] = p_int&amp;amp;0xFF;&lt;br&gt;     msg-&amp;gt;data[2] = v_int&amp;gt;&amp;gt;4;&lt;br&gt;     msg-&amp;gt;data[3] = ((v_int&amp;amp;0xF)&amp;lt;&amp;lt;4)|(kp_int&amp;gt;&amp;gt;8);&lt;br&gt;     msg-&amp;gt;data[4] = kp_int&amp;amp;0xFF;&lt;br&gt;     msg-&amp;gt;data[5] = kd_int&amp;gt;&amp;gt;4;&lt;br&gt;     msg-&amp;gt;data[6] = ((kd_int&amp;amp;0xF)&amp;lt;&amp;lt;4)|(t_int&amp;gt;&amp;gt;8);&lt;br&gt;     msg-&amp;gt;data[7] = t_int&amp;amp;0xff;&lt;br&gt;     }&lt;/pre&gt;
&lt;h4&gt;5.5.8 CAN Data Transmission Function&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;WriteAll()&lt;/code&gt; function sends control data for all motors via the CAN bus:&lt;/p&gt;
&lt;pre&gt;void WriteAll(){&lt;br&gt;    //toggle = 1;&lt;br&gt;    can1.write(a1_can);&lt;br&gt;    wait(.00002);&lt;br&gt;    can2.write(a2_can);&lt;br&gt;    wait(.00002);&lt;br&gt;    can1.write(h1_can);&lt;br&gt;    wait(.00002);&lt;br&gt;    can2.write(h2_can);&lt;br&gt;    wait(.00002);&lt;br&gt;    can1.write(k1_can);&lt;br&gt;    wait(.00002);&lt;br&gt;    can2.write(k2_can);&lt;br&gt;    wait(.00002);&lt;br&gt;    //toggle = 0;&lt;br&gt;    }&lt;/pre&gt;
&lt;h3&gt;5.6 SPIne Firmware and PC Serial Communication&lt;/h3&gt;
&lt;p&gt;SPIne communicates with the PC through serial port for debugging and manual control:&lt;/p&gt;
&lt;pre&gt;void serial_isr(){&lt;br&gt;     /// handle keyboard commands from the serial terminal ///&lt;br&gt;     while(pc.readable()){&lt;br&gt;        char c = pc.getc();&lt;br&gt;        //led = !led;&lt;br&gt;        switch(c){&lt;br&gt;            case(27):&lt;br&gt;                //loop.detach();&lt;br&gt;                printf("\n\r exiting motor mode \n\r");&lt;br&gt;                ExitMotorMode(&amp;amp;a1_can);&lt;br&gt;                ExitMotorMode(&amp;amp;a2_can);&lt;br&gt;                ExitMotorMode(&amp;amp;h1_can);&lt;br&gt;                ExitMotorMode(&amp;amp;h2_can);&lt;br&gt;                ExitMotorMode(&amp;amp;k1_can);&lt;br&gt;                ExitMotorMode(&amp;amp;k2_can);&lt;br&gt;                enabled = 0;&lt;br&gt;                break;&lt;br&gt;            case('m'):&lt;br&gt;                printf("\n\r entering motor mode \n\r");&lt;br&gt;                EnterMotorMode(&amp;amp;a1_can);&lt;br&gt;                EnterMotorMode(&amp;amp;a2_can);&lt;br&gt;                EnterMotorMode(&amp;amp;h1_can);&lt;br&gt;                EnterMotorMode(&amp;amp;h2_can);&lt;br&gt;                EnterMotorMode(&amp;amp;k1_can);&lt;br&gt;                EnterMotorMode(&amp;amp;k2_can);&lt;br&gt;                wait(.5);&lt;br&gt;                enabled = 1;&lt;br&gt;                //loop.attach(&amp;amp;sendCMD, .001);&lt;br&gt;                break;&lt;br&gt;            case('s'):&lt;br&gt;                printf("\n\r standing \n\r");&lt;br&gt;                counter2 = 0;&lt;br&gt;                is_standing = 1;&lt;br&gt;                //stand();&lt;br&gt;                break;&lt;br&gt;            case('z'):&lt;br&gt;                printf("\n\r zeroing \n\r");&lt;br&gt;                Zero(&amp;amp;a1_can);&lt;br&gt;                Zero(&amp;amp;a2_can);&lt;br&gt;                Zero(&amp;amp;h1_can);&lt;br&gt;                Zero(&amp;amp;h2_can);&lt;br&gt;                Zero(&amp;amp;k1_can);&lt;br&gt;                Zero(&amp;amp;k2_can);&lt;br&gt;                break;&lt;br&gt;            }&lt;br&gt;        }&lt;br&gt;        WriteAll();&lt;br&gt;        &lt;br&gt;    }&lt;/pre&gt;
&lt;h4&gt;5.6.1 Enter Motor Mode Function&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;EnterMotorMode()&lt;/code&gt; function is used to put the motor into operating mode:&lt;/p&gt;
&lt;pre&gt;void EnterMotorMode(CANMessage * msg){&lt;br&gt;    msg-&amp;gt;data[0] = 0xFF;&lt;br&gt;    msg-&amp;gt;data[1] = 0xFF;&lt;br&gt;    msg-&amp;gt;data[2] = 0xFF;&lt;br&gt;    msg-&amp;gt;data[3] = 0xFF;&lt;br&gt;    msg-&amp;gt;data[4] = 0xFF;&lt;br&gt;    msg-&amp;gt;data[5] = 0xFF;&lt;br&gt;    msg-&amp;gt;data[6] = 0xFF;&lt;br&gt;    msg-&amp;gt;data[7] = 0xFC;&lt;br&gt;    //WriteAll();&lt;br&gt;    }&lt;/pre&gt;
&lt;h3&gt;5.7 SPIne Firmware Main Function Analysis&lt;/h3&gt;
&lt;h4&gt;5.7.1 Main Program Flow&lt;/h4&gt;
&lt;p&gt;The main function is responsible for initializing the system and starting the main loop, continuously processing CAN messages and SPI communication:&lt;/p&gt;
&lt;pre&gt;int main() {&lt;br&gt;    wait(1);&lt;br&gt;    //led = 1;&lt;br&gt;    pc.baud(921600);//Set baud rate&lt;br&gt;    pc.attach(&amp;amp;serial_isr);//Communicate with PC&lt;br&gt;    estop.mode(PullUp);//Emergency stop setup&lt;br&gt;    //spi.format(16, 0);&lt;br&gt;    //spi.frequency(1000000);&lt;br&gt;    //spi.reply(0x0);&lt;br&gt;    //cs.fall(&amp;amp;spi_isr);&lt;br&gt;&lt;br&gt;    //can1.frequency(1000000);                     // set bit rate to 1Mbps&lt;br&gt;    //can1.attach(&amp;amp;rxISR1);                 // attach 'CAN receive-complete' interrupt handler&lt;br&gt;    can1.filter(CAN_ID&amp;lt;&amp;lt;21, 0xFFE00004, CANStandard, 0); //CAN1 filter set up can filter&lt;br&gt;    //can2.frequency(1000000);                     // set bit rate to 1Mbps&lt;br&gt;    //can2.attach(&amp;amp;rxISR2);                 // attach 'CAN receive-complete' interrupt handler&lt;br&gt;    can2.filter(CAN_ID&amp;lt;&amp;lt;21, 0xFFE00004, CANStandard, 0); //CAN1 filter set up can filter&lt;br&gt;    //Allocate space&lt;br&gt;    memset(&amp;amp;tx_buff, 0, TX_LEN * sizeof(uint16_t));&lt;br&gt;    memset(&amp;amp;spi_data, 0, sizeof(spi_data_t));&lt;br&gt;    memset(&amp;amp;spi_command,0,sizeof(spi_command_t));&lt;br&gt;    &lt;br&gt;    //Set priority&lt;br&gt;    NVIC_SetPriority(TIM5_IRQn, 1);&lt;br&gt;    //NVIC_SetPriority(CAN1_RX0_IRQn, 3);&lt;br&gt;    //NVIC_SetPriority(CAN2_RX0_IRQn, 3);&lt;br&gt;    &lt;br&gt;    printf("\n\r SPIne\n\r");&lt;br&gt;    //printf("%d\n\r", RX_ID &amp;lt;&amp;lt; 18);&lt;br&gt;    //Transmit data parameters&lt;br&gt;    a1_can.len = 8;                         //transmit 8 bytes&lt;br&gt;    a2_can.len = 8;                         //transmit 8 bytes&lt;br&gt;    h1_can.len = 8;&lt;br&gt;    h2_can.len = 8;&lt;br&gt;    k1_can.len = 8;&lt;br&gt;    k2_can.len = 8;&lt;br&gt;   //Receive data parameters&lt;br&gt;    rxMsg1.len = 6;                          //receive 6 bytes&lt;br&gt;    rxMsg2.len = 6;                          //receive 6 bytes&lt;br&gt;   //CAN ID setup&lt;br&gt;    a1_can.id = 0x1;                        &lt;br&gt;    a2_can.id = 0x1;                 &lt;br&gt;    h1_can.id = 0x2;&lt;br&gt;    h2_can.id = 0x2;&lt;br&gt;    k1_can.id = 0x3;&lt;br&gt;    k2_can.id = 0x3;     &lt;br&gt;    //Data buffer assignment&lt;br&gt;    pack_cmd(&amp;amp;a1_can, l1_control.a); &lt;br&gt;    pack_cmd(&amp;amp;a2_can, l2_control.a); &lt;br&gt;    pack_cmd(&amp;amp;h1_can, l1_control.h); &lt;br&gt;    pack_cmd(&amp;amp;h2_can, l2_control.h); &lt;br&gt;    pack_cmd(&amp;amp;k1_can, l1_control.k); &lt;br&gt;    pack_cmd(&amp;amp;k2_can, l2_control.k); &lt;br&gt;   //Transmit&lt;br&gt;    WriteAll();&lt;br&gt;&lt;br&gt;&lt;br&gt;    // SPI doesn't work if enabled while the CS pin is pulled low&lt;br&gt;    // Wait for CS to not be low, then enable SPI&lt;br&gt;    if(!spi_enabled){   //Wait for SPI enable&lt;br&gt;        while((spi_enabled==0) &amp;amp;&amp;amp; (cs.read() ==0)){wait_us(10);}&lt;br&gt;        init_spi();&lt;br&gt;        spi_enabled = 1;&lt;br&gt;        }&lt;br&gt;            &lt;br&gt;    while(1) {//while main loop&lt;br&gt;        counter++;&lt;br&gt;        can2.read(rxMsg2);//Read data sent by motor controller&lt;br&gt;        unpack_reply(rxMsg2, &amp;amp;l2_state);//Data parsing, assign to l2_state&lt;br&gt;        can1.read(rxMsg1);                    // read message into Rx message storage&lt;br&gt;        unpack_reply(rxMsg1, &amp;amp;l1_state);&lt;br&gt;        wait_us(10);&lt;br&gt;&lt;br&gt;        }     &lt;br&gt;    }   &lt;/pre&gt;
&lt;h4&gt;5.7.2 CAN Data Parsing Function&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;unpack_reply()&lt;/code&gt; function is used to parse CAN messages received from the motor controller and assign the data to the corresponding state structure:&lt;/p&gt;
&lt;pre&gt;/// CAN Reply Packet Structure ///&lt;br&gt;/// 16 bit position, between -4*pi and 4*pi&lt;br&gt;/// 12 bit velocity, between -30 and + 30 rad/s&lt;br&gt;/// 12 bit current, between -40 and 40;&lt;br&gt;/// CAN Packet is 5 8-bit words&lt;br&gt;/// Formatted as follows.  For each quantity, bit 0 is LSB&lt;br&gt;/// 0: [position[15-8]]&lt;br&gt;/// 1: [position[7-0]] &lt;br&gt;/// 2: [velocity[11-4]]&lt;br&gt;/// 3: [velocity[3-0], current[11-8]]&lt;br&gt;/// 4: [current[7-0]]&lt;br&gt;&lt;br&gt;void unpack_reply(CANMessage msg, leg_state * leg){&lt;br&gt;    /// unpack ints from can buffer ///&lt;br&gt;    uint16_t id = msg.data[0];&lt;br&gt;    uint16_t p_int = (msg.data[1]&amp;lt;&amp;lt;8)|msg.data[2];&lt;br&gt;    uint16_t v_int = (msg.data[3]&amp;lt;&amp;lt;4)|(msg.data[4]&amp;gt;&amp;gt;4);&lt;br&gt;    uint16_t i_int = ((msg.data[4]&amp;amp;0xF)&amp;lt;&amp;lt;8)|msg.data[5];&lt;br&gt;    /// convert uints to floats ///&lt;br&gt;    float p = uint_to_float(p_int, P_MIN, P_MAX, 16);&lt;br&gt;    float v = uint_to_float(v_int, V_MIN, V_MAX, 12);&lt;br&gt;    float t = uint_to_float(i_int, -T_MAX, T_MAX, 12);&lt;br&gt;    &lt;br&gt;    if(id==1){&lt;br&gt;        leg-&amp;gt;a.p = p;&lt;br&gt;        leg-&amp;gt;a.v = v;&lt;br&gt;        leg-&amp;gt;a.t = t;&lt;br&gt;        }&lt;br&gt;    else if(id==2){&lt;br&gt;        leg-&amp;gt;h.p = p;&lt;br&gt;        leg-&amp;gt;h.v = v;&lt;br&gt;        leg-&amp;gt;h.t = t;&lt;br&gt;        }&lt;br&gt;    else if(id==3){&lt;br&gt;        leg-&amp;gt;k.p = p;&lt;br&gt;        leg-&amp;gt;k.v = v;&lt;br&gt;        leg-&amp;gt;k.t = t;&lt;br&gt;        }&lt;br&gt;    } &lt;/pre&gt;
&lt;h2&gt;6. Summary&lt;/h2&gt;
&lt;p&gt;This article systematically introduces the technical architecture and implementation details of the MIT Cheetah robot, covering the complete process from system architecture, simulation environment configuration, hardware platform selection to software environment setup. It focuses on analyzing the working principles and code implementation of the SPIne data communication conversion board, including SPI communication, CAN bus communication, and related data encapsulation and parsing mechanisms.&lt;/p&gt;
&lt;p&gt;Through the detailed explanations in this article, developers can:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Understand the overall architecture and communication mechanisms of the MIT Cheetah system&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Complete the full configuration process from simulation environment to real robot deployment&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Master the core functions and implementation principles of the SPIne firmware&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Perform secondary development and customization based on existing solutions&lt;/p&gt;
&lt;p&gt;We hope this article can provide valuable reference for relevant technical developers and contribute to the further development and application of quadruped robot technology.&lt;/p&gt;
&lt;p&gt;—&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some content in this article references &lt;a href="https://zhuanlan.zhihu.com/p/645386248" rel="noopener noreferrer"&gt;Chen Bu Chen’s Zhihu article&lt;/a&gt;, which provides an in-depth and excellent analysis of the technical details of the MIT Cheetah system. Special thanks are extended here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related Video:&lt;/strong&gt;&lt;/p&gt;


&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/nvidia-jetson-orin-nano-super-developer-kits-mit-cheetah-robot-technical-deep-analysis/" rel="noopener noreferrer"&gt;NVIDIA Jetson Orin Nano Super Developer Kits – Build MIT Mini Cheetah Robot&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>Building Real-time Voice Conversations with ElevenLabs WebSocket API: A Complete Development Guide</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:53:22 +0000</pubDate>
      <link>https://dev.to/frankfu/building-real-time-voice-conversations-with-elevenlabs-websocket-api-a-complete-development-guide-52aj</link>
      <guid>https://dev.to/frankfu/building-real-time-voice-conversations-with-elevenlabs-websocket-api-a-complete-development-guide-52aj</guid>
      <description>&lt;p&gt;Recently, I’ve been researching real-time voice conversation implementations and discovered that ElevenLabs Agents Platform provides a very powerful WebSocket API. After some exploration, I completed a real-time voice conversation demo that can run directly in the browser. Today, I’ll share the implementation details and usage experience of this project.&lt;/p&gt;
&lt;h2&gt;1. &lt;strong&gt;Why Choose ElevenLabs?&lt;/strong&gt;
&lt;/h2&gt;
&lt;p&gt;Before we begin, you might be wondering why I chose ElevenLabs over other solutions. I compared ElevenLabs with OpenAI Realtime API and found that ElevenLabs has unique advantages in voice selection, model flexibility, and other aspects. However, I’ll elaborate on this comparison in detail later in the article.&lt;/p&gt;
&lt;h2&gt;2. &lt;strong&gt;Project Overview&lt;/strong&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;demo link:&lt;/strong&gt; &lt;a href="https://demo.navtalk.ai/11labs/en/index.html" rel="noreferrer noopener"&gt;&lt;strong&gt;https://demo.navtalk.ai/11labs/en/index.html&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This demo is implemented based on the ElevenLabs Agents Platform WebSocket API and supports:&lt;/p&gt;
&lt;p&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; Complete WebSocket connection management&lt;/p&gt;
&lt;p&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; Real-time voice input and output&lt;/p&gt;
&lt;p&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; Text message support&lt;/p&gt;
&lt;p&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; Rich custom configuration options&lt;/p&gt;
&lt;p&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; Complete message handling mechanism&lt;/p&gt;
&lt;p&gt;The entire project can run directly in the browser without a backend server, making it perfect for rapid prototyping and learning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Core Features&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3.1 Complete WebSocket Connection&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The project implements complete WebSocket connection management, including:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Automatic signature URL retrieval&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Secure WSS connection establishment&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Comprehensive connection status and error handling&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3.2 Real-time Voice Conversation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Voice processing is the core functionality, including:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Microphone audio capture&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; 16kHz PCM audio encoding&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Real-time audio stream transmission&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Agent audio playback&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3.3 Complete Message Handling&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Supports all message types provided by ElevenLabs:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `conversation_initiation_metadata` – Session initialization&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `user_transcript` – User speech-to-text&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `agent_response` – Agent text response&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `agent_response_correction` – Agent response correction&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `audio` – Agent audio response&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `interruption` – Interruption detection&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `ping/pong` – Heartbeat detection&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `client_tool_call` – Tool call support&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `contextual_update` – Context update&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; `vad_score` – Voice activity detection score&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3.4 Text Message Support&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In addition to voice input, it also supports sending text messages to the Agent, with a very practical feature: &lt;strong&gt;text messages can interrupt the Agent’s ongoing voice response&lt;/strong&gt;, making conversations more natural.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3.5 Custom Configuration&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Provides rich configuration options:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Custom Agent Prompt&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Custom first message&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Language override&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; TTS voice ID override&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Dynamic variable support&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Custom LLM parameters (temperature / max_tokens)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Detailed Usage Instructions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4.1 Prepare Configuration&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4.1.1 Open File&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Simply open the link &lt;a href="https://demo.navtalk.ai/11labs/en/index.html" rel="noreferrer noopener"&gt;&lt;strong&gt;https://demo.navtalk.ai/11labs/en/index.html&lt;/strong&gt;&lt;/a&gt; in your browser to get started.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4.1.2 Required Configuration Items&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;API Key (xi-api-key)&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs API Key&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Format: `sk-…` or `xi-api-key`&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; How to obtain: Log in to the ElevenLabs Console(&lt;a href="https://elevenlabs.io/app/settings/api-keys" rel="noopener noreferrer"&gt;https://elevenlabs.io/app/settings/api-keys&lt;/a&gt;), create or view API Key&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent ID&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs Agent ID&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Format: `agent_…`&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; How to obtain: Create or view an Agent on the ElevenLabs Agents page(&lt;a href="https://elevenlabs.io/app/agents" rel="noopener noreferrer"&gt;https://elevenlabs.io/app/agents&lt;/a&gt;), then copy the Agent ID&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4.1.3 Optional Configuration Items (in interface order)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Custom Prompt&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Override the Agent’s default prompt&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Leave empty to use the default prompt from Agent configuration&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Can be used to temporarily modify the Agent’s behavior and conversation style&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First Message&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; The first sentence the Agent says after connection&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Leave empty to use the default first message from Agent configuration&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Example: “Hello, I’m your AI assistant. How can I help you?”&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Language&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Override the Agent’s default language setting&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Supported language codes: `en` (English), `zh` (Chinese), `es` (Spanish), `fr` (French), `de` (German), `ja` (Japanese), etc.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Leave empty to use the default language from Agent configuration&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TTS Voice&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Override the Agent’s default voice setting&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Select different voice IDs from the dropdown menu&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Leave empty to use the default voice from Agent configuration&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Note: You need to fill in the API Key first to load the voice list&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Variables&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Used to dynamically replace variable placeholders in the Prompt during conversation&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Format: JSON object, for example `{“user_name”: “John”, “greeting”: “Hello”}`&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Use case: When the Agent’s Prompt contains variables (such as `{{user_name}}`, `{{greeting}}`), you can pass actual values through dynamic variables&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Example:&lt;/p&gt;

&lt;p&gt;  {&lt;/p&gt;
&lt;p&gt;    “user_name”: “John”,&lt;/p&gt;
&lt;p&gt;    “company”: “ABC Company”,&lt;/p&gt;
&lt;p&gt;    “product”: “Smart Assistant”&lt;/p&gt;
&lt;p&gt;  }&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; If the Agent’s Prompt contains `Hello, {{user_name}}, welcome to use {{product}}`, the dynamic variables will automatically replace it with `Hello, John, welcome to use Smart Assistant`&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Leave empty to not use dynamic variables&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LLM Temperature&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Controls the randomness and creativity of LLM text generation&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Value range: 0.0 – 2.0&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Lower values produce more deterministic and consistent output (more conservative); higher values produce more random and creative output (more flexible)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Recommended value: 0.7 – 1.0 (balanced creativity and consistency)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Leave empty to use the default value from Agent configuration&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LLM Max Tokens&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Limits the maximum number of tokens for a single LLM response&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Value range: Positive integers&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Used to control response length and avoid overly long replies&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Leave empty to use the default value from Agent configuration&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4.2 Start Conversation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;1. Click the &lt;strong&gt;“Connect and Start Conversation”&lt;/strong&gt; button&lt;/p&gt;
&lt;p&gt;2. The browser will request microphone permission, please allow it&lt;/p&gt;
&lt;p&gt;3. Recording will start automatically after successful connection&lt;/p&gt;
&lt;p&gt;4. Start speaking, and the Agent will respond in real-time&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4.3 Function Operations&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Stop Recording&lt;/strong&gt;: Stop sending audio but keep the connection&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Disconnect&lt;/strong&gt;: Completely disconnect the WebSocket connection&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Text Message&lt;/strong&gt;: Enter a message in the text input box and send it&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. API Documentation Reference&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The demo implementation is based on ElevenLabs Agents Platform WebSocket API(&lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.1 WebSocket Endpoint&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;wss://api.elevenlabs.io/v1/convai/conversation?agent_id={agent_id}&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5.2 Complete Call Flow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.2.1 Connection Establishment Phase&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 1: Establish WebSocket Connection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Client → Server: Establish WebSocket connection&lt;/p&gt;
&lt;p&gt;wss://api.elevenlabs.io/v1/convai/conversation?agent_id={agent_id}&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Send Initialization Data&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Immediately after successful connection, send `conversation_initiation_client_data` message&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Contains Agent configuration overrides (optional), dynamic variables (optional), custom LLM parameters (optional)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Wait for server to return `conversation_initiation_metadata` event&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 3: Receive Session Metadata&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Server returns `conversation_initiation_metadata` event&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Content to handle:&lt;/p&gt;
&lt;p&gt;      – Save `conversation_id` (for subsequent session management)&lt;/p&gt;
&lt;p&gt;      – Record audio format information (`agent_output_audio_format`, `user_input_audio_format`)&lt;/p&gt;
&lt;p&gt;      – Start audio capture (call `getUserMedia` to get microphone permission)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.2.2 Conversation Phase&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audio Input Flow&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;User speaks → Microphone capture → Audio processing (downsample to 16kHz) → Convert to 16-bit PCM → Base64 encode → Send user_audio_chunk&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server Response Flow&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Server receives audio → Speech recognition (ASR) → Send user_transcript → LLM processing → Generate response → Send agent_response → TTS synthesis → Send audio chunks&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Event Handling Sequence&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;1. &lt;strong&gt;When user speaks&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Continuously send `user_audio_chunk` (send once every 4096 samples)&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Server processes audio stream, may return `vad_score` (voice activity detection score)&lt;/p&gt;
&lt;p&gt;2. &lt;strong&gt;Server recognizes user speech&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Receive `user_transcript` event&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Can display what the user said in the UI (for debugging)&lt;/p&gt;
&lt;p&gt;3. &lt;strong&gt;Server generates response&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Receive `agent_response` event&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Can display the Agent’s text response in the UI&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; May receive `agent_response_correction` (if the Agent corrects the response)&lt;/p&gt;
&lt;p&gt;4. &lt;strong&gt;Server sends audio&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Receive `audio` event (may occur multiple times, streamed)&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Processing method:&lt;/p&gt;
&lt;p&gt;         – Decode Base64 audio data&lt;/p&gt;
&lt;p&gt;         – Add to audio playback queue&lt;/p&gt;
&lt;p&gt;         – Play audio chunks in order&lt;/p&gt;
&lt;p&gt;5. &lt;strong&gt;Interruption handling&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; If the user sends a new message while the Agent is speaking, may receive `interruption` event&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Need to immediately stop current audio playback and clear the audio queue&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.2.3 Heartbeat Maintenance Phase&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Heartbeat Mechanism&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Server periodically sends `ping` event&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Need to immediately respond with `pong` message, containing the same `event_id`&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Used to keep connection alive and detect connection status&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.2.4 Tool Call Flow (if enabled)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool Call Steps&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;1. Server sends `client_tool_call` event&lt;/p&gt;
&lt;p&gt;2. Processing flow:&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Parse tool call information (`tool_name`, `parameters`, `tool_call_id`)&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Execute the corresponding tool/function&lt;/p&gt;
&lt;p&gt;   &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Send `client_tool_result` to return results&lt;/p&gt;
&lt;p&gt;3. Server continues processing, may send new `agent_response` and `audio`&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.2.5 Context Update Flow (if enabled)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context Update&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Client can actively send `contextual_update` to update conversation context&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Server may also send `contextual_update` event&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Handle context updates according to business requirements&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.2.6 Text Message Flow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Send Text Message&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Client sends `user_message` event&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Feature: Can interrupt the Agent’s ongoing audio response (ElevenLabs unique feature)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Processing method:&lt;/p&gt;
&lt;p&gt;      – If the Agent is playing audio, immediately stop playback (receive `interruption` event)&lt;/p&gt;
&lt;p&gt;      – Wait for server to process text message and return new response&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.2.7 Connection Close Phase&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Normal Close&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Stop sending audio (call `stopRecording`)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Close WebSocket connection&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Release audio resources (close AudioContext, stop MediaStream)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Exception Handling&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Listen to WebSocket `error` and `close` events&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Implement reconnection logic (optional)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Clean up all resources&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.3 Detailed Event Handling&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5.3.1 Events Client Needs to Handle&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Event Type&lt;/th&gt;
&lt;th&gt;When Received&lt;/th&gt;
&lt;th&gt;Required Handling&lt;/th&gt;
&lt;th&gt;Optional Operations&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;conversation_initiation_metadata&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After connection established&lt;/td&gt;
&lt;td&gt;Save conversation_id, start recording&lt;/td&gt;
&lt;td&gt;Display session information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;user_transcript&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After user speaks&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;Display what user said&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agent_response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After Agent generates response&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;Display Agent text response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agent_response_correction&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When Agent corrects response&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;Display correction information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;audio&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After Agent audio synthesis&lt;/td&gt;
&lt;td&gt;Decode and play audio&lt;/td&gt;
&lt;td&gt;Display playback status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;interruption&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When interruption detected&lt;/td&gt;
&lt;td&gt;Stop playback, clear queue&lt;/td&gt;
&lt;td&gt;Display interruption prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ping&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server heartbeat detection&lt;/td&gt;
&lt;td&gt;Immediately send &lt;code&gt;pong&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;client_tool_call&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When Agent needs to call tool&lt;/td&gt;
&lt;td&gt;Execute tool and return result&lt;/td&gt;
&lt;td&gt;Display tool call information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vad_score&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;During voice activity detection&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;Visualize voice activity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;5.3.2 When Client Sends Messages&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Message Type&lt;/th&gt;
&lt;th&gt;Send Timing&lt;/th&gt;
&lt;th&gt;Frequency&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;conversation_initiation_client_data&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Immediately after connection established&lt;/td&gt;
&lt;td&gt;Once&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;user_audio_chunk&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Continuously during recording&lt;/td&gt;
&lt;td&gt;High frequency (approximately every 250ms)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;user_message&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When user inputs text&lt;/td&gt;
&lt;td&gt;On demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;user_activity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When need to notify user activity&lt;/td&gt;
&lt;td&gt;On demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pong&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Immediately respond when receive &lt;code&gt;ping&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;On demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;client_tool_result&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After tool execution completed&lt;/td&gt;
&lt;td&gt;On demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contextual_update&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When need to update context&lt;/td&gt;
&lt;td&gt;On demand&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;6. Audio Format Requirements&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;ElevenLabs has clear requirements for audio format:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Sample Rate&lt;/strong&gt;: 16kHz&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Channels&lt;/strong&gt;: Mono&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Encoding&lt;/strong&gt;: 16-bit PCM&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Format&lt;/strong&gt;: Base64 encoded binary data&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7. Technical Implementation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7.1 Audio Processing Flow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;1. &lt;strong&gt;Capture&lt;/strong&gt;: Use `getUserMedia` API to get microphone audio stream&lt;/p&gt;
&lt;p&gt;2. &lt;strong&gt;Process&lt;/strong&gt;: Use `AudioContext` and `ScriptProcessorNode` to process audio&lt;/p&gt;
&lt;p&gt;3. &lt;strong&gt;Downsample&lt;/strong&gt;: If sample rate is not 16kHz, automatically downsample&lt;/p&gt;
&lt;p&gt;4. &lt;strong&gt;Encode&lt;/strong&gt;: Convert Float32 audio data to 16-bit PCM&lt;/p&gt;
&lt;p&gt;5. &lt;strong&gt;Encode&lt;/strong&gt;: Base64 encode and send via WebSocket&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7.2 Audio Playback Flow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;1. &lt;strong&gt;Receive&lt;/strong&gt;: Receive Base64 encoded audio from WebSocket&lt;/p&gt;
&lt;p&gt;2. &lt;strong&gt;Decode&lt;/strong&gt;: Base64 decode to binary data&lt;/p&gt;
&lt;p&gt;3. &lt;strong&gt;Play&lt;/strong&gt;: Try to play as MP3 first, if fails, play as PCM&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;8. ElevenLabs vs OpenAI Realtime API Detailed Comparison&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;During development, I also researched OpenAI Realtime API and found that both platforms have their own characteristics. Below is my detailed comparison:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;8.1 Quick Comparison Overview&lt;/strong&gt;&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Comparison Item&lt;/th&gt;
&lt;th&gt;ElevenLabs Agents Platform&lt;/th&gt;
&lt;th&gt;OpenAI Realtime API&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multimodal Support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F274c.png" alt="❌" width="72" height="72"&gt; Not supported, i.e., does not support camera recognition (image input)&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Supported (GPT-4o)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Voice Selection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; 100+ preset voices, supports voice cloning&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt; 10 preset voices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Models&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Multi-model support (ElevenLabs, OpenAI, Google, Anthropic)&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; GPT-4o, GPT-4o-mini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge Base&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Supported&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Supported (via Assistants API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Function Call&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Supported&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Text Interrupt AI Response&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Supported (sending text message can interrupt AI’s ongoing response)&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F274c.png" alt="❌" width="72" height="72"&gt; Not supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Depends on model (163ms-3.87s)&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Low (300-800ms)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4b0.png" alt="💰" width="72" height="72"&gt; Per-minute billing (based on model, $0.0033-$0.1956/minute)&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4b0.png" alt="💰" width="72" height="72"&gt; Per-token billing (GPT-4o-mini more economical)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;For detailed comparison information, please see the detailed explanations of each feature point below.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;8.2 Detailed Comparison of Key Points&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;8.3.1 Multimodal Support (Camera Recognition)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Support Status&lt;/th&gt;
&lt;th&gt;Detailed Information&lt;/th&gt;
&lt;th&gt;Reference Links&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Agents Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F274c.png" alt="❌" width="72" height="72"&gt; Currently not supported&lt;/td&gt;
&lt;td&gt;Focuses on voice conversation, does not support visual input (camera/image recognition)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket" rel="noopener noreferrer"&gt;ElevenLabs Agents Platform WebSocket API Documentation&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; Supported (via GPT-4o)&lt;/td&gt;
&lt;td&gt;Supports visual input, can process images and video frames, supports real-time camera recognition. GPT-4o model natively supports multimodal input&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;OpenAI Realtime API Documentation&lt;/a&gt;&lt;br&gt;&lt;a href="https://platform.openai.com/docs/guides/vision" rel="noopener noreferrer"&gt;OpenAI GPT-4o Vision Capabilities&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;: OpenAI Realtime API is based on GPT-4o model, supports multimodal input, and can process image and video content. ElevenLabs currently focuses on voice conversation scenarios and does not support visual input.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reference Sources&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: &lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket" rel="noopener noreferrer"&gt;Official WebSocket API Documentation&lt;/a&gt; – Does not mention visual input support&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: &lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;Realtime API Official Documentation&lt;/a&gt; – Supports GPT-4o multimodal capabilities&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8.3.2 Voice Selection Comparison&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Voice Count&lt;/th&gt;
&lt;th&gt;Voice Characteristics&lt;/th&gt;
&lt;th&gt;Customization Capability&lt;/th&gt;
&lt;th&gt;Reference Links&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Agents Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;100+ preset voices&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;High quality, multilingual, supports emotional expression, voice cloning&lt;/td&gt;
&lt;td&gt;Supports custom voice ID, emotion control, tone adjustment, voice cloning&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://elevenlabs.io/voice-library" rel="noopener noreferrer"&gt;ElevenLabs Voice Library&lt;/a&gt;&lt;br&gt;&lt;a href="https://elevenlabs.io/docs/voice-cloning" rel="noopener noreferrer"&gt;ElevenLabs Voice Cloning&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt; &lt;strong&gt;Limited selection (10 voices)&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Mainly relies on TTS API, provides 10 preset voices (alloy, echo, fable, onyx, nova, shimmer…)&lt;/td&gt;
&lt;td&gt;Limited voice control capability, does not support voice cloning&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/docs/guides/text-to-speech" rel="noopener noreferrer"&gt;OpenAI TTS Documentation&lt;/a&gt;&lt;br&gt;&lt;a href="https://platform.openai.com/docs/api-reference/audio" rel="noopener noreferrer"&gt;OpenAI TTS Voice List&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Detailed Comparison&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ElevenLabs&lt;/strong&gt;: Provides over 100 preset voices, covering multiple languages, ages, genders, and styles. Supports voice cloning, can create custom voices from a small number of samples. Supports emotion and tone control, can adjust voice expression. High voice quality, suitable for professional applications.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenAI&lt;/strong&gt;: TTS API provides 10 preset voices (alloy, echo, fable, onyx, nova, shimmer…), relatively limited selection. Does not support voice cloning, weak voice control capability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reference Sources&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: &lt;a href="https://platform.openai.com/docs/guides/text-to-speech" rel="noopener noreferrer"&gt;TTS API Documentation&lt;/a&gt; – Lists 10 available voices&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: &lt;a href="https://elevenlabs.io/voice-library" rel="noopener noreferrer"&gt;Official Voice Library&lt;/a&gt; – Shows large number of preset voices&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: &lt;a href="https://elevenlabs.io/docs/voice-cloning" rel="noopener noreferrer"&gt;Voice Cloning Documentation&lt;/a&gt; – Supports custom voice cloning&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;8.2.3 Supported LLM Models&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Supported Models&lt;/th&gt;
&lt;th&gt;Model Characteristics&lt;/th&gt;
&lt;th&gt;Reference Links&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Agents Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;Multi-model support&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Supports ElevenLabs proprietary models and multiple third-party models (OpenAI, Google, Anthropic, etc.), users can choose according to needs, supports custom LLM parameters&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://elevenlabs.io/docs/agents-platform" rel="noopener noreferrer"&gt;ElevenLabs Agents Documentation&lt;/a&gt;&lt;br&gt;&lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket#custom-llm-extra-body" rel="noopener noreferrer"&gt;ElevenLabs LLM Configuration&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;GPT-4o, GPT-4o-mini&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Supports GPT-4o (multimodal, stronger capabilities) and GPT-4o-mini (lightweight, faster, lower cost), can switch models&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;OpenAI Realtime API Models&lt;/a&gt;&lt;br&gt;&lt;a href="https://platform.openai.com/docs/models" rel="noopener noreferrer"&gt;OpenAI Model Comparison&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;List of Models Supported by ElevenLabs Agents Platform&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ElevenLabs Proprietary Models&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;GLM-4.5-Air&lt;/strong&gt;: Suitable for agentic use cases, latency ~631ms, cost ~$0.0600/minute&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Qwen3-30B-A3B&lt;/strong&gt;: Ultra-low latency, latency ~163ms, cost ~$0.0168/minute&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;GPT-OSS-120B&lt;/strong&gt;: Experimental model (OpenAI open-source model), latency ~314ms, cost ~$0.0126/minute&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Other Provider Models&lt;/strong&gt; (available on ElevenLabs platform):&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenAI Models&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; GPT-5 series: GPT-5 (latency ~1.14s, cost ~$0.0826/minute), GPT-5.1, GPT-5 Mini (latency ~855ms, cost ~$0.0165/minute), GPT-5 Nano (latency ~788ms, cost ~$0.0033/minute)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; GPT-4.1 series: GPT-4.1 (latency ~803ms, cost ~$0.1298/minute), GPT-4.1 Mini, GPT-4.1 Nano (latency ~478ms, cost ~$0.0065/minute)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; GPT-4o (latency ~771ms, cost ~$0.1623/minute), GPT-4o Mini (latency ~738ms, cost ~$0.0097/minute)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; GPT-4 Turbo (latency ~1.28s, cost ~$0.6461/minute), GPT-3.5 Turbo (latency ~494ms, cost ~$0.0323/minute)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Google Models&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Gemini 3 Pro Preview (latency ~3.87s, cost ~$0.1310/minute)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Gemini 2.5 Flash (latency ~752ms, cost ~$0.0097/minute), Gemini 2.5 Flash Lite (latency ~505ms, cost ~$0.0065/minute)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Gemini 2.0 Flash (latency ~564ms, cost ~$0.0065/minute), Gemini 2.0 Flash Lite (latency ~547ms, cost ~$0.0049/minute)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Anthropic Models&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Claude Sonnet 4.5 (latency ~1.5s, cost ~$0.1956/minute), Claude Sonnet 4 (latency ~1.31s, cost ~$0.1956/minute)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Claude Haiku 4.5 (latency ~703ms, cost ~$0.0652/minute)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Claude 3.7 Sonnet (latency ~1.12s, cost ~$0.1956/minute), Claude 3.5 Sonnet (latency ~1.14s, cost ~$0.1956/minute)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Claude 3 Haiku (latency ~608ms, cost ~$0.0163/minute)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Custom Models&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Supports adding custom LLMs&lt;/p&gt;

&lt;img width="305" height="1024" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fllm-selection-305x1024.png" alt=""&gt;&lt;p&gt;&lt;em&gt;The above image shows the list of selectable LLM models in ElevenLabs Agents Platform, including latency and pricing information&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Detailed Explanation&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;ElevenLabs&lt;/strong&gt;: Provides rich model selection, including proprietary models and models from multiple third-party providers. Users can choose the most suitable model based on latency, cost, and functional requirements. Supports customizing LLM parameters (such as temperature, max_tokens) through `custom_llm_extra_body`.&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;OpenAI&lt;/strong&gt;: Clearly supports GPT-4o (supports multimodal, stronger reasoning capabilities) and GPT-4o-mini (faster, lower cost), users can choose according to needs. Both models support real-time conversation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reference Sources&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [Agents Platform Documentation](&lt;a href="https://elevenlabs.io/docs/agents-platform" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform&lt;/a&gt;) – Model selection interface&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [WebSocket API – Custom LLM Parameters](&lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket#custom-llm-extra-body" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket#custom-llm-extra-body&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Realtime API Documentation](&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/guides/realtime&lt;/a&gt;) – Supports GPT-4o and GPT-4o-mini&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Model Comparison Documentation](&lt;a href="https://platform.openai.com/docs/models" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/models&lt;/a&gt;) – Detailed model information&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8.2.4 Knowledge Base Support&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Knowledge Base Support&lt;/th&gt;
&lt;th&gt;Implementation Method&lt;/th&gt;
&lt;th&gt;Reference Links&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Agents Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;Supported&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Supports knowledge base integration through Agent configuration, can upload documents and set up knowledge base, Agent can reference knowledge base content in conversations&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://elevenlabs.io/docs/agents-platform" rel="noopener noreferrer"&gt;ElevenLabs Agents Documentation&lt;/a&gt;&lt;br&gt;&lt;a href="https://elevenlabs.io/docs/agents-platform/agent-configuration" rel="noopener noreferrer"&gt;ElevenLabs Agent Configuration&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;Supported (via Assistants API or Function Calling)&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Can integrate knowledge base through Assistants API (file upload, vector storage), or access external data sources and APIs through function calling&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/docs/assistants" rel="noopener noreferrer"&gt;OpenAI Assistants API&lt;/a&gt;&lt;br&gt;&lt;a href="https://platform.openai.com/docs/guides/function-calling" rel="noopener noreferrer"&gt;OpenAI Function Calling&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Detailed Explanation&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;ElevenLabs&lt;/strong&gt;: Supports knowledge base functionality in Agent configuration, can upload documents for Agent reference. Knowledge base content will be automatically referenced in conversations.&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;OpenAI&lt;/strong&gt;: Can create assistants with knowledge base through Assistants API (supports file upload and vector storage), or access external data sources and APIs through function calling, achieving more flexible knowledge retrieval.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reference Sources&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [Agents Platform Documentation](&lt;a href="https://elevenlabs.io/docs/agents-platform" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform&lt;/a&gt;) – Mentions knowledge base support&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [Agent Configuration Documentation](&lt;a href="https://elevenlabs.io/docs/agents-platform/agent-configuration" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform/agent-configuration&lt;/a&gt;) – Knowledge base configuration instructions&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Assistants API Documentation](&lt;a href="https://platform.openai.com/docs/assistants" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/assistants&lt;/a&gt;) – Knowledge base and file upload functionality&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Function Calling Documentation](&lt;a href="https://platform.openai.com/docs/guides/function-calling" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/guides/function-calling&lt;/a&gt;) – External data access&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8.2.5 Function Call Support&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Support Status&lt;/th&gt;
&lt;th&gt;Implementation Method&lt;/th&gt;
&lt;th&gt;Reference Links&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Agents Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;Supported&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Implements tool calling through &lt;code&gt;client_tool_call&lt;/code&gt; and &lt;code&gt;client_tool_result&lt;/code&gt; message types, supports defining tools in Agent&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket#client-tool-call" rel="noopener noreferrer"&gt;ElevenLabs WebSocket API – Tool Calling&lt;/a&gt;&lt;br&gt;&lt;a href="https://elevenlabs.io/docs/agents-platform/agent-configuration" rel="noopener noreferrer"&gt;ElevenLabs Agent Tool Configuration&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;Supported&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Implements function calling through &lt;code&gt;tool_calls&lt;/code&gt; and &lt;code&gt;tool_results&lt;/code&gt; events, supports defining tools in sessions&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/docs/guides/realtime/function-calling" rel="noopener noreferrer"&gt;OpenAI Realtime API – Function Calling&lt;/a&gt;&lt;br&gt;&lt;a href="https://platform.openai.com/docs/guides/function-calling" rel="noopener noreferrer"&gt;OpenAI Function Calling Guide&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Detailed Comparison&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;ElevenLabs&lt;/strong&gt;: Uses `client_tool_call` event to request client to execute tools, returns results through `client_tool_result`. Tools are defined in Agent configuration.&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;OpenAI&lt;/strong&gt;: Uses standard function calling mechanism, triggered through `tool_calls` event, returns results through `tool_results`. Supports dynamically defining tools in sessions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reference Sources&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [WebSocket API – client_tool_call](&lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket#client-tool-call" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket#client-tool-call&lt;/a&gt;) – Tool calling implementation&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [Agent Configuration](&lt;a href="https://elevenlabs.io/docs/agents-platform/agent-configuration" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform/agent-configuration&lt;/a&gt;) – Tool definition&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Realtime API Function Calling](&lt;a href="https://platform.openai.com/docs/guides/realtime/function-calling" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/guides/realtime/function-calling&lt;/a&gt;) – Real-time API tool calling&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Function Calling Guide](&lt;a href="https://platform.openai.com/docs/guides/function-calling" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/guides/function-calling&lt;/a&gt;) – Detailed implementation instructions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8.2.6 Text Interrupt AI Response&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Support Status&lt;/th&gt;
&lt;th&gt;Detailed Information&lt;/th&gt;
&lt;th&gt;Reference Links&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Agents Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;Supported&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Sending text message (&lt;code&gt;user_message&lt;/code&gt;) can interrupt AI’s ongoing voice response, achieving more natural conversation interaction&lt;/td&gt;
&lt;td&gt;&lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket" rel="noopener noreferrer"&gt;ElevenLabs WebSocket API – User Message&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F274c.png" alt="❌" width="72" height="72"&gt; &lt;strong&gt;Not supported&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Sending text message cannot interrupt AI’s ongoing response, need to wait for current response to complete&lt;/td&gt;
&lt;td&gt;&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;OpenAI Realtime API Documentation&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Detailed Comparison&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;ElevenLabs&lt;/strong&gt;: Supports interrupting AI’s ongoing response by sending text messages. When user sends text message while AI is speaking, AI will immediately stop current response and process new text input, making conversations more natural and smooth, similar to interruption behavior in real human conversations.&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;OpenAI&lt;/strong&gt;: Does not support text message interruption feature. If AI is responding, text messages sent by user need to wait for current response to complete before being processed, which may affect conversation fluency and real-time performance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;ElevenLabs&lt;/strong&gt;: Suitable for scenarios requiring fast interaction and interruption, such as real-time customer service, quick Q&amp;amp;A, etc.&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;OpenAI&lt;/strong&gt;: Suitable for scenarios requiring complete responses, but interaction may not be flexible enough&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8.2.7 Latency Comparison&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Latency Performance&lt;/th&gt;
&lt;th&gt;Optimization Features&lt;/th&gt;
&lt;th&gt;Reference Links&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Agents Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;Depends on model selection&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Latency ranges from &lt;strong&gt;163ms to 3.87s&lt;/strong&gt;, depending on the selected LLM model. Low-latency models like Qwen3-30B-A3B (~163ms) are suitable for real-time interaction, high-performance models like GPT-5 (~1.14s) or Claude Sonnet (~1.5s) have higher latency but stronger capabilities. Supports streaming response&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://elevenlabs.io/docs/agents-platform" rel="noopener noreferrer"&gt;ElevenLabs Agents Platform Documentation&lt;/a&gt;&lt;br&gt;&lt;a href="https://elevenlabs.io/docs/agents-platform/api-reference/agents-platform/websocket" rel="noopener noreferrer"&gt;ElevenLabs WebSocket API&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt; &lt;strong&gt;Low latency&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Real-time streaming response, latency typically &lt;strong&gt;300-800ms&lt;/strong&gt; (depends on model and network), GPT-4o-mini is usually faster&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;OpenAI Realtime API Documentation&lt;/a&gt;&lt;br&gt;&lt;a href="https://platform.openai.com/docs/guides/realtime/optimizing-latency" rel="noopener noreferrer"&gt;OpenAI Performance Optimization&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Detailed Explanation&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;ElevenLabs&lt;/strong&gt;: Latency depends on the selected LLM model. If selecting low-latency models (such as Qwen3-30B-A3B ~163ms, GPT-3.5 Turbo ~494ms), latency can be very low, suitable for real-time interaction. If selecting high-performance models (such as GPT-5 ~1.14s, Claude Sonnet ~1.5s), latency will be higher but reasoning capabilities stronger. Supports streaming audio response, reducing first-byte latency.&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;OpenAI&lt;/strong&gt;: Latency is relatively stable, GPT-4o-mini usually responds faster than GPT-4o. Supports streaming response optimization.&lt;/p&gt;
&lt;p&gt;Actual latency will be affected by the following factors:&lt;/p&gt;
&lt;p&gt;– Network conditions and geographic location&lt;/p&gt;
&lt;p&gt;– Model selection (ElevenLabs platform has multiple models to choose from, OpenAI mainly GPT-4o vs GPT-4o-mini)&lt;/p&gt;
&lt;p&gt;– Request complexity&lt;/p&gt;
&lt;p&gt;– Server load&lt;/p&gt;
&lt;p&gt;The above data are typical values, actual performance may vary depending on usage scenarios.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reference Sources&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [Agents Platform Documentation](&lt;a href="https://elevenlabs.io/docs/agents-platform" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform&lt;/a&gt;) – Emphasizes low-latency optimization&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Realtime API Documentation](&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/guides/realtime&lt;/a&gt;) – Real-time performance description&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Latency Optimization Guide](&lt;a href="https://platform.openai.com/docs/guides/realtime/optimizing-latency" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/guides/realtime/optimizing-latency&lt;/a&gt;) – Performance optimization recommendations&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8.2.8 Pricing Comparison&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Billing Method&lt;/th&gt;
&lt;th&gt;Price Details&lt;/th&gt;
&lt;th&gt;Reference Links&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs Agents Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4b0.png" alt="💰" width="72" height="72"&gt; &lt;strong&gt;Per-conversation minute billing (based on selected model)&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Price depends on selected LLM model, usually includes comprehensive fees for voice synthesis, speech recognition, and LLM calls. For specific model prices, please refer to the “Supported LLM Models” section above&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://elevenlabs.io/pricing" rel="noopener noreferrer"&gt;ElevenLabs Pricing Page&lt;/a&gt;&lt;br&gt;&lt;a href="https://elevenlabs.io/docs/agents-platform" rel="noopener noreferrer"&gt;ElevenLabs Billing Instructions&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Realtime API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4b0.png" alt="💰" width="72" height="72"&gt; &lt;strong&gt;Per-token and audio duration billing&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt;: Input $2.50/1M tokens, Output $10/1M tokens&lt;br&gt;&lt;strong&gt;GPT-4o-mini&lt;/strong&gt;: Input $0.15/1M tokens, Output $0.60/1M tokens&lt;br&gt;Audio input/output: $0.015/minute&lt;br&gt;(Prices may change over time)&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://platform.openai.com/pricing" rel="noopener noreferrer"&gt;OpenAI Pricing Page&lt;/a&gt;&lt;br&gt;&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;OpenAI Realtime API Pricing&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Detailed Comparison&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;ElevenLabs&lt;/strong&gt;: Uses per-conversation minute billing model, price depends on selected LLM model. Usually includes comprehensive fees for voice synthesis, speech recognition, and LLM calls, billing method is simple and clear. For specific model prices, please refer to the “Supported LLM Models” section above.&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;OpenAI&lt;/strong&gt;: Uses per-token billing model, prices vary significantly between different models:&lt;/p&gt;
&lt;p&gt;  – GPT-4o-mini: More economical, suitable for high-frequency usage scenarios&lt;/p&gt;
&lt;p&gt;  – GPT-4o: Stronger functionality but higher price, suitable for scenarios requiring multimodal or stronger reasoning capabilities&lt;/p&gt;
&lt;p&gt;  – Audio processing billed separately per minute&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cost Estimation Examples&lt;/strong&gt; (for reference only):&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;Short conversation scenario&lt;/strong&gt; (5 minutes, approximately 1000 tokens): OpenAI GPT-4o-mini approximately $0.0015 + $0.075 = &lt;strong&gt;$0.0765&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;– &lt;strong&gt;Long conversation scenario&lt;/strong&gt; (30 minutes, approximately 5000 tokens): OpenAI GPT-4o-mini approximately $0.0075 + $0.45 = &lt;strong&gt;$0.4575&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Recommendations&lt;/strong&gt;: Choose the appropriate platform based on actual usage scenarios and budget:&lt;/p&gt;
&lt;p&gt;– If mainly using voice conversation with high usage volume, ElevenLabs’ per-minute billing may be simpler, can choose different models according to needs to balance cost and performance&lt;/p&gt;
&lt;p&gt;– If need multimodal capabilities or stronger LLM capabilities, OpenAI may be more suitable&lt;/p&gt;
&lt;p&gt;– For high-frequency usage, GPT-4o-mini is usually more economical&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reference Sources&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [Official Pricing Page](&lt;a href="https://elevenlabs.io/pricing" rel="noopener noreferrer"&gt;https://elevenlabs.io/pricing&lt;/a&gt;) – Latest pricing information&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; ElevenLabs: [Agents Platform Documentation](&lt;a href="https://elevenlabs.io/docs/agents-platform" rel="noopener noreferrer"&gt;https://elevenlabs.io/docs/agents-platform&lt;/a&gt;) – Billing instructions&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Official Pricing Page](&lt;a href="https://platform.openai.com/pricing" rel="noopener noreferrer"&gt;https://platform.openai.com/pricing&lt;/a&gt;) – Latest pricing information (2024-2025)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenAI: [Realtime API Documentation](&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;https://platform.openai.com/docs/guides/realtime&lt;/a&gt;) – Billing details&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9. Conclusion&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;ElevenLabs Agents Platform WebSocket API provides powerful support for real-time voice conversations. Through this demo, I implemented complete real-time voice conversation functionality, including audio capture, processing, transmission, and playback.&lt;/p&gt;
&lt;p&gt;Compared to OpenAI Realtime API, ElevenLabs has obvious advantages in voice selection, model flexibility, and other aspects, especially suitable for scenarios requiring specific voices or voice cloning. However, if multimodal capabilities are needed, OpenAI may be a better choice.&lt;/p&gt;
&lt;p&gt;If you also want to try implementing real-time voice conversations, this demo should provide a good starting point. The project code is open source, and you can use it directly or extend it based on this foundation.&lt;/p&gt;
&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/building-real-time-voice-conversations-with-elevenlabs-websocket-api-a-complete-development-guide/" rel="noopener noreferrer"&gt;Building Real-time Voice Conversations with ElevenLabs WebSocket API: A Complete Development Guide&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>NavTalk Update: Revolutionary 200ms Response Time for Real-Time Digital Human Experience!</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:52:45 +0000</pubDate>
      <link>https://dev.to/frankfu/navtalk-update-revolutionary-200ms-response-time-for-real-time-digital-human-experience-3089</link>
      <guid>https://dev.to/frankfu/navtalk-update-revolutionary-200ms-response-time-for-real-time-digital-human-experience-3089</guid>
      <description>&lt;h2&gt;1. Response Speed Performance&lt;/h2&gt;
&lt;p&gt;Let’s get straight to the point by looking at the actual response speed performance:&lt;/p&gt;
&lt;p&gt;In the live demo, we achieved an end-to-end latency of &lt;strong&gt;under 200 ms &lt;/strong&gt;for the initial audio processing — from the user finishing their speech to the AI processing, generating the video, and displaying it on the front end, all within approximately 200 ms. Currently, this response speed is highly advanced compared to other real-time digital human systems.&lt;/p&gt;
&lt;h2&gt;2. Overall Latency Before Optimization&lt;/h2&gt;
&lt;p&gt;We conducted detailed tests on MuseTalk’s real-time performance in the &lt;strong&gt;A100 GPU&lt;/strong&gt; environment:&lt;/p&gt;
&lt;p&gt;    1. When testing with 0.5-second real-time audio input, the processing time exceeded 0.5 seconds, failing to meet real-time requirements. As shown in the video below:&lt;/p&gt;
&lt;p&gt;    2. Upon adjusting the FPS to 18, the processing speed for real-time audio input improved by about 0.2 seconds. However, further FPS reduction to below 15 is required to meet real-time expectations.&lt;/p&gt;
&lt;p&gt;    3. After increasing the batch size, the processing time actually increased, reaching the chip’s processing limit.&lt;/p&gt;
&lt;p&gt;The root cause of the issue lies in the A100 GPU, which uses AMD chips by default. These chips are slower than Intel chips in computer vision tasks, such as image processing. Specifically, the system uses the AMD EPYC 7J13 64-Core Processor with 30 cores, which is suited for virtualization and high-concurrency tasks but underperforms in some image processing tasks compared to Intel processors. Unfortunately, most GPU cloud providers are equipped with AMD chips.&lt;/p&gt;
&lt;img width="800" height="217" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-MTKE.png" alt=""&gt;&lt;p&gt;I initially encountered this problem, which limited performance optimization. Later, I had an idea: &lt;strong&gt;Could we leverage the GPU for image processing tasks&lt;/strong&gt;, thereby breaking through the current performance bottleneck? This thought led to a series of optimization steps.&lt;/p&gt;
&lt;h2&gt;3. GPU-Accelerated Image Processing Optimization&lt;/h2&gt;
&lt;h3&gt;3.1 Optimization Approach&lt;/h3&gt;
&lt;p&gt;To address the performance bottleneck of AMD chips in image processing, the core idea was to move the image processing operations, originally executed on the CPU, to the GPU, taking full advantage of the GPU’s parallel computing capabilities. In MuseTalk’s inference process, the following image processing steps were executed on the CPU:&lt;/p&gt;
&lt;p&gt;    1. &lt;strong&gt;Data Conversion After VAE Decoding&lt;/strong&gt;: The decoded result from the GPU tensor is converted to a numpy array, incurring GPU → CPU data transfer overhead.&lt;/p&gt;
&lt;p&gt;    2. &lt;strong&gt;Image Resize&lt;/strong&gt;: Image resizing is performed on the CPU using OpenCV’s &lt;code&gt;cv2.resize()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;    3. &lt;strong&gt;Image Sharpening&lt;/strong&gt;: Image sharpening is done on the CPU using OpenCV and NumPy with an Unsharp Mask operation.&lt;/p&gt;
&lt;p&gt;    4. &lt;strong&gt;Image Blending&lt;/strong&gt;: Image composition and blending are handled on the CPU using PIL.&lt;/p&gt;
&lt;p&gt;Although each operation individually takes a small amount of time, the cumulative latency becomes significant in a real-time processing scenario. More importantly, these operations can be accelerated using the parallel computing power of the GPU.&lt;/p&gt;
&lt;h3&gt;3.2 Technical Implementation&lt;/h3&gt;
&lt;h4&gt;3.2.1 Creating a GPU Image Processing Tool Library&lt;/h4&gt;
&lt;p&gt;First, I created a dedicated GPU image processing tool library &lt;code&gt;musetalk/utils/gpu_image_processing.py&lt;/code&gt;, implementing the following core functions:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;gpu_resize()&lt;/strong&gt;: Uses PyTorch’s &lt;code&gt;F.interpolate()&lt;/code&gt; for GPU-based image resizing.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;gpu_gaussian_blur()&lt;/strong&gt;: Implements GPU-based Gaussian blur using PyTorch’s &lt;code&gt;F.conv2d()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;gpu_unsharp_mask()&lt;/strong&gt;: Performs image sharpening on the GPU using GPU-based Gaussian blur.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;gpu_image_blending()&lt;/strong&gt;: GPU-based image blending using tensor operations.&lt;/p&gt;
&lt;p&gt;These functions support multiple input formats ([H, W, C], [B, H, W, C], [B, C, H, W]) and automatically handle data format conversions, ensuring ease of use. Based on modifications in the &lt;code&gt;processing.py&lt;/code&gt; file, all image processing tasks were migrated to the GPU.&lt;/p&gt;
&lt;h4&gt;3.2.2 Optimizing VAE Decoding Process&lt;/h4&gt;
&lt;p&gt;I modified the &lt;code&gt;decode_latents()&lt;/code&gt; method in &lt;code&gt;musetalk/models/vae.py&lt;/code&gt;, adding a &lt;code&gt;return_tensor&lt;/code&gt; parameter:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def decode_latents(self, latents, return_tensor=False):&lt;br&gt;
    # ... decoding logic ...&lt;br&gt;
    if return_tensor:&lt;br&gt;
        # Return a GPU tensor to avoid GPU → CPU transfer&lt;br&gt;
        image = image.permute(0, 2, 3, 1)  # [B, H, W, C]&lt;br&gt;
        image = image * 255.0&lt;br&gt;
        image = image[..., [2, 1, 0]]  # Convert RGB to BGR&lt;br&gt;
        return image&lt;br&gt;
    else:&lt;br&gt;
        # Original behavior: return a NumPy array&lt;br&gt;
        image = (&lt;br&gt;
            image.detach()&lt;br&gt;
                 .cpu()&lt;br&gt;
                 .permute(0, 2, 3, 1)&lt;br&gt;
                 .float()&lt;br&gt;
                 .numpy()&lt;br&gt;
        )&lt;br&gt;
        # ...&lt;br&gt;
        return image&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With &lt;code&gt;return_tensor=True&lt;/code&gt;, the data stays on the GPU, avoiding unnecessary data transfer.&lt;/p&gt;
&lt;h4&gt;3.2.3 Refactoring the Real-Time Inference Process&lt;/h4&gt;
&lt;p&gt;In &lt;code&gt;scripts/realtime_inference.py&lt;/code&gt;, I refactored the &lt;code&gt;process_frames()&lt;/code&gt; method to add a GPU processing path:&lt;/p&gt;
&lt;p&gt;Key changes:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Image Resize Optimization&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Original: CPU-based processing&lt;br&gt;
res_frame = cv2.resize(&lt;br&gt;
    res_frame.astype(np.uint8),&lt;br&gt;
    (x2 - x1, y2 - y1)&lt;br&gt;
)
&lt;h1&gt;
  
  
  Optimized: GPU-based processing
&lt;/h1&gt;

&lt;p&gt;res_frame_gpu = gpu_resize(&lt;br&gt;
    res_frame,&lt;br&gt;
    (y2 - y1, x2 - x1),&lt;br&gt;
    mode='bilinear'&lt;br&gt;
)&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Image Sharpening Optimization&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Original: CPU-based processing (OpenCV + NumPy)&lt;br&gt;
res_frame = apply_unsharp_mask(&lt;br&gt;
    res_frame,&lt;br&gt;
    amount=1.2,&lt;br&gt;
    sigma=1.0,&lt;br&gt;
    threshold=5.0&lt;br&gt;
)
&lt;h1&gt;
  
  
  Optimized: GPU-based processing
&lt;/h1&gt;

&lt;p&gt;res_frame_gpu = gpu_unsharp_mask(&lt;br&gt;
    res_frame_gpu,&lt;br&gt;
    amount=1.2,&lt;br&gt;
    sigma=1.0,&lt;br&gt;
    threshold=5.0&lt;br&gt;
)&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Image Blending Optimization&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Original: CPU-based processing (PIL)&lt;br&gt;
combine_frame = get_image_blending(&lt;br&gt;
    ori_frame,&lt;br&gt;
    res_frame,&lt;br&gt;
    bbox,&lt;br&gt;
    mask,&lt;br&gt;
    mask_crop_box&lt;br&gt;
)
&lt;h1&gt;
  
  
  Optimized: GPU-based processing
&lt;/h1&gt;

&lt;p&gt;body_tensor = numpy_to_tensor_gpu(ori_frame, device)&lt;br&gt;
face_tensor = res_frame_gpu  # Already on GPU&lt;br&gt;
mask_tensor = numpy_to_tensor_gpu(mask, device)&lt;/p&gt;

&lt;p&gt;combine_frame_tensor = gpu_image_blending(&lt;br&gt;
    body_tensor,&lt;br&gt;
    face_tensor,&lt;br&gt;
    bbox,&lt;br&gt;
    mask_tensor,&lt;br&gt;
    mask_crop_box,&lt;br&gt;
    device&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;combine_frame = tensor_to_numpy_cpu(combine_frame_tensor)&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The entire process uses an automatic fallback mechanism: if GPU processing fails, it falls back to CPU processing to ensure system stability.&lt;/p&gt;
&lt;h3&gt;3.3 Performance Improvement Results&lt;/h3&gt;
&lt;p&gt;After optimization, we tested the system in an AMD EPYC 7J13 processor + A100 GPU environment:&lt;/p&gt;
&lt;h4&gt;3.3.1 Performance Improvement Data&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;CPU Time&lt;/th&gt;
&lt;th&gt;GPU Time&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image Resize&lt;/td&gt;
&lt;td&gt;5–10 ms&lt;/td&gt;
&lt;td&gt;1–2 ms&lt;/td&gt;
&lt;td&gt;5–10x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image Sharpening&lt;/td&gt;
&lt;td&gt;8–15 ms&lt;/td&gt;
&lt;td&gt;2–4 ms&lt;/td&gt;
&lt;td&gt;3–5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image Blending&lt;/td&gt;
&lt;td&gt;10–20 ms&lt;/td&gt;
&lt;td&gt;3–5 ms&lt;/td&gt;
&lt;td&gt;3–5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VAE Decoding (No Transfer)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Saves transfer time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;3.3.2 Overall Effect&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Before Optimization:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; 0.5-second audio input required more than 0.5 seconds processing time.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Did not meet real-time requirements.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; FPS needed to be reduced to below 15 to barely achieve real-time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;After Optimization:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Image processing speed improved by 3–5 times.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; End-to-end latency controlled under 200 ms.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Successfully achieved real-time response with significantly improved user experience.&lt;/p&gt;
&lt;h3&gt;3.4 Why GPU Acceleration is Effective?&lt;/h3&gt;
&lt;p&gt;    1. &lt;strong&gt;Flexible Computing Precision&lt;/strong&gt;: GPUs support float32/half precision, allowing flexible balancing between precision and speed.&lt;/p&gt;
&lt;p&gt;    2. &lt;strong&gt;Parallel Computing Advantage&lt;/strong&gt;: Image processing tasks (such as resizing, convolution, and blending) are inherently parallel, and GPUs, with their thousands of cores, are well-suited for these tasks.&lt;/p&gt;
&lt;p&gt;    3. &lt;strong&gt;Memory Bandwidth&lt;/strong&gt;: The memory bandwidth of GPU video memory is far higher than the bandwidth between the CPU and main memory, eliminating data transfer bottlenecks.&lt;/p&gt;
&lt;h2&gt;4. MuseTalk Docker Deployment Record&lt;/h2&gt;
&lt;h3&gt;&lt;strong&gt;4.1 Build and Push Image&lt;/strong&gt;&lt;/h3&gt;
&lt;h4&gt;4.1.1 Rebuild Image&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;docker build -t xxx/musetalk:latest .&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;4.1.2 Push New Image to Docker Hub&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;docker push xxx/musetalk:latest&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note: You need to log in to Docker Hub before pushing:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker login&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;strong&gt;4.2 Remove and Pull Image&lt;/strong&gt;&lt;/h3&gt;
&lt;h4&gt;4.2.1 Stop and Remove Old Container&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;sudo docker rm -f musetalk&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;4.2.2 Pull Latest Image&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;sudo docker pull xxx/musetalk:latest&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.3 Run Container&lt;/h3&gt;
&lt;h4&gt;4.3.1 Start New Container&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;sudo docker run -d \&lt;br&gt;
  --name musetalk \&lt;br&gt;
  --gpus all \&lt;br&gt;
  --restart unless-stopped \&lt;br&gt;
  -p 2160:2160 \&lt;br&gt;
  gavana2/musetalk:latest&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Explanation of Parameters:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;-d&lt;/strong&gt;: Run in detached mode (background).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;–name musetalk&lt;/strong&gt;: The container name.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;–gpus all&lt;/strong&gt;: Use all available GPUs (requires installation of &lt;code&gt;nvidia-container-toolkit&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;–restart unless-stopped&lt;/strong&gt;: Auto-restart policy (unless manually stopped).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;-p 2160:2160&lt;/strong&gt;: Port mapping (host port:container port).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; On the first run, it will automatically download models from HuggingFace to the &lt;code&gt;/workspace/models&lt;/code&gt; directory inside the container.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;4.4 View Logs and Debug&lt;/strong&gt;&lt;/h3&gt;
&lt;h4&gt;4.4.1 Real-Time Logs&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;sudo docker logs -f musetalk&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;4.4.2 Check Container Status&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;sudo docker ps&lt;br&gt;
sudo docker ps -a&lt;br&gt;
sudo docker stats musetalk&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.5 Container Operations&lt;/h3&gt;
&lt;h4&gt;4.5.1 Enter Container&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;sudo docker exec -it musetalk /bin/bash&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Explanation: The &lt;strong&gt;-it&lt;/strong&gt; parameter specifies interactive mode, and &lt;strong&gt;/bin/bash&lt;/strong&gt; is the command executed to enter the container.&lt;/p&gt;
&lt;h4&gt;4.5.2 Fix CRLF Issue in Filenames&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;# Enter the container&lt;br&gt;
sudo docker exec -it musetalk /bin/bash
&lt;h1&gt;
  
  
  Navigate to the target directory
&lt;/h1&gt;

&lt;p&gt;cd /workspace&lt;/p&gt;

&lt;h1&gt;
  
  
  One-time fix for all filenames containing CRLF
&lt;/h1&gt;

&lt;p&gt;for f in *$'\r'; do mv "$f" "${f%$'\r'}"; done&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;4.5.3 Create Directories and Copy Files&lt;/h4&gt;
&lt;p&gt;If you encounter issues with filenames containing carriage return characters (&lt;code&gt;\r&lt;/code&gt;), you can execute the following in the container:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mkdir -p /workspace/silent/sk_navtalk_xxx/girl
&lt;h1&gt;
  
  
  Copy the avatars directory
&lt;/h1&gt;

&lt;p&gt;cp -r /workspace/results/sk_navtalk_xxx/v15/avatars \&lt;br&gt;
      /workspace/silent/sk_navtalk_xxx/&lt;/p&gt;

&lt;h1&gt;
  
  
  Copy all files from the full_imgs folder
&lt;/h1&gt;

&lt;p&gt;cp -r /workspace/results/sk_navtalk_xxx/v15/avatars/girl/full_imgs/* \&lt;br&gt;
      /workspace/silent/sk_navtalk_xxx/girl/&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.6 Analyze GPU Usage&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;nvidia-smi -l 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/navtalk-update-revolutionary-200ms-response-time-for-real-time-digital-human-experience/" rel="noopener noreferrer"&gt;NavTalk Update: Revolutionary 200ms Response Time for Real-Time Digital Human Experience!&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>NavTalk Product Update: Five Core Features Comprehensive Upgrade</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:52:09 +0000</pubDate>
      <link>https://dev.to/frankfu/navtalk-product-update-five-core-features-comprehensive-upgrade-3el2</link>
      <guid>https://dev.to/frankfu/navtalk-product-update-five-core-features-comprehensive-upgrade-3el2</guid>
      <description>&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Major Update&lt;/strong&gt;: This update covers five functional modules: real-time communication, Avatar management, data reporting, API integration, and account security, while also announcing the next development plan. Notably, we have optimized the digital human response latency to approximately &lt;strong&gt;200ms&lt;/strong&gt;, achieving industry-leading levels and providing users with a smooth experience close to real human conversation.&lt;/p&gt;&lt;/blockquote&gt;

&lt;h2&gt;1. Module One: Real-Time Communication Feature Optimization&lt;/h2&gt;
&lt;p&gt;In this update, we have comprehensively optimized the real-time communication features, focusing on &lt;strong&gt;response speed improvement&lt;/strong&gt;, &lt;strong&gt;simplified integration process&lt;/strong&gt;, and &lt;strong&gt;enhanced connection stability&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;1.1 Digital Human Response Speed Optimization&lt;/h3&gt;
&lt;p&gt;Through deep optimization of the model and full-link performance tuning, we have elevated the real-time digital human response speed to &lt;strong&gt;industry-leading levels&lt;/strong&gt;. This breakthrough performance improvement has brought NavTalk to new heights in real-time interaction experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response Latency Breakthrough&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In real-time conversation scenarios, response latency is a key indicator affecting user experience. Through continuous technical optimization, we have successfully controlled the end-to-end response latency to approximately &lt;strong&gt;200ms&lt;/strong&gt;. This means that the complete process from when users finish speaking to hearing the AI digital human’s reply has almost reached the fluency of natural human conversation.&lt;/p&gt;
&lt;p&gt;This performance level is leading among all real-time digital human systems. Traditional real-time conversation systems typically require 500ms to 1000ms or even longer response times, while NavTalk’s 200ms response latency is already close to the fluency of real human conversation, significantly improving user interaction experience. In practical applications, users can hardly feel obvious delays, making the conversation process more natural and smooth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full-Link Technical Optimization&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To achieve this performance breakthrough, we have conducted deep optimization across multiple technical aspects, achieving end-to-end full-link performance improvement:&lt;br&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Model Inference Optimization&lt;/strong&gt;: We have conducted multi-level optimization of the model inference process. Through optimizing model architecture, reducing unnecessary computational steps, GPU-accelerated image processing, and other methods, we have significantly reduced the computational latency of model inference, greatly improving inference speed while ensuring response quality.&lt;br&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Network Transmission Optimization&lt;/strong&gt;: Network transmission is an important aspect of real-time conversation systems. We have optimized the data transmission process to ensure data can be transmitted quickly and stably.&lt;br&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;System Architecture Optimization&lt;/strong&gt;: We have also optimized the entire system architecture, improving communication mechanisms between services and optimizing resource scheduling strategies, achieving overall system performance improvement.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;The combined effect of these technical optimizations has enabled NavTalk to achieve extremely low response latency while maintaining high-quality conversation experience, bringing users a smooth interaction experience close to real human conversation. This performance improvement not only enhances user experience but also provides technical guarantees for more real-time interaction scenarios.&lt;/p&gt;

&lt;h3&gt;1.2 WebRTC Connection Consolidation&lt;/h3&gt;
&lt;h4&gt;1.2.1 &lt;strong&gt;Previous Architecture Issues&lt;/strong&gt;
&lt;/h4&gt;
&lt;p&gt;Before optimization, developers needed to connect to two independent WebSocket services to complete real-time communication functionality:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Real-Time Communication WebSocket&lt;/strong&gt;: &lt;code&gt;wss://transfer.navtalk.ai/api/realtime-api&lt;/code&gt;, used for processing real-time conversation messages and business logic&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Video Stream Interface&lt;/strong&gt;: &lt;code&gt;wss://transfer.navtalk.ai/api/webrtc&lt;/code&gt;, used for establishing WebRTC connections to obtain video streams&lt;/p&gt;
&lt;p&gt;Although this dual-connection architecture was functionally complete, it brought many inconveniences. Developers needed to maintain the state of two connections simultaneously, handle connection establishment, reconnection, error handling, and other logic for both connections, increasing code complexity and maintenance costs. Additionally, state synchronization between the two connections was also a challenge, prone to connection state inconsistency issues.&lt;/p&gt;
&lt;h4&gt;1.2.2 &lt;strong&gt;Unified Connection Architecture&lt;/strong&gt;
&lt;/h4&gt;
&lt;p&gt;Now, we have merged these two services into a unified connection address: &lt;code&gt;wss://transfer.navtalk.ai/wss/v2/realtime-chat&lt;/code&gt;. Through this single connection, developers can complete all real-time communication-related operations, including message transmission and video stream acquisition.&lt;/p&gt;
&lt;p&gt;This architecture optimization brings significant advantages in multiple aspects:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Simplified Connection Management&lt;/strong&gt;: Developers only need to maintain one WebSocket connection, greatly simplifying connection management complexity. No longer need to handle state synchronization issues between two connections, reducing code volume and potential bug risks. Connection establishment, reconnection, error handling, and other logic are all unified on one connection, making the code clearer and easier to maintain.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Improved Development Efficiency&lt;/strong&gt;: From the unified connection, developers can directly obtain &lt;code&gt;sessionId&lt;/code&gt; and use it to establish WebRTC connections to obtain video streams, without additional requests and complex coordination logic. The entire process becomes more intuitive and efficient, allowing developers to complete integration work faster.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Reduced Maintenance Costs&lt;/strong&gt;: The simplified architecture not only reduces development costs but also lowers subsequent maintenance costs. The code is more concise, problem troubleshooting is easier, and upgrades and optimizations are more convenient. This is of great significance for long-term maintenance and iterative development.&lt;/p&gt;
&lt;p&gt;This architecture optimization not only simplifies developers’ work but also improves system stability and performance, laying a solid foundation for NavTalk’s further development.&lt;/p&gt;

&lt;h3&gt;1.3 Intelligent Parameter Configuration&lt;/h3&gt;
&lt;p&gt;To simplify the developer experience, we have designed intelligent optimization for connection parameters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Required Parameters&lt;/strong&gt;:&lt;br&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;&lt;code&gt;license&lt;/code&gt;: Authorization code, used for identity verification and authorization management.&lt;br&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;&lt;code&gt;name&lt;/code&gt;: Avatar name, specifying the digital human character to use. This is the core parameter of the connection, determining which Avatar to use for conversation. The system will load corresponding configurations and resources based on the Avatar name.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Optional Parameters&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;&lt;code&gt;model&lt;/code&gt;: Specifies the language model to use. This is an optional parameter. If not specified, the system will use the default value &lt;code&gt;gpt-realtime-mini&lt;/code&gt;. Developers can choose different models based on actual needs, such as selecting more powerful models for scenarios requiring higher performance, or lightweight models for scenarios requiring lower costs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Default Value Mechanism&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We have introduced a default value mechanism to make the connection process more convenient and flexible. When you only specify the &lt;code&gt;name&lt;/code&gt; parameter (Avatar name) without other optional parameters, the system will automatically use the default &lt;code&gt;model&lt;/code&gt; and &lt;code&gt;voice&lt;/code&gt; configured for that Avatar.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Usage Examples&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The following code examples demonstrate two connection methods:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// Method 1: Full parameter connection&lt;br&gt;
// Suitable for scenarios requiring explicit specification of all parameters, such as temporarily using different models&lt;br&gt;
const ws = new WebSocket('wss://transfer.navtalk.ai/wss/v2/realtime-chat?license=YOUR_LICENSE&amp;amp;name=avatar_name&amp;amp;model=gpt-realtime-mini');

&lt;p&gt;// Method 2: Only specify required parameters, use default configuration (Recommended)&lt;br&gt;
// Suitable for most scenarios, the system will automatically use the Avatar's default configuration&lt;br&gt;
const ws = new WebSocket('wss://transfer.navtalk.ai/wss/v2/realtime-chat?license=YOUR_LICENSE&amp;amp;name=avatar_name');&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Through this intelligent parameter configuration mechanism, we ensure both functional completeness and flexibility while greatly simplifying the developer experience, making NavTalk integration simpler and more efficient.&lt;/p&gt;
&lt;h3&gt;1.4 Message Format Optimization&lt;/h3&gt;
&lt;p&gt;To provide a clearer and more unified message interaction experience, we have unified the encapsulation of all message return formats. This optimization prepares for the integration of ElevenLabs while integrating OpenAI Realtime API.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;For more detailed information, please refer to the &lt;a href="https://docs.navtalk.ai" rel="noopener noreferrer"&gt;API Documentation&lt;/a&gt; to learn about message format specifications and usage examples.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2&gt;2. Module Two: Avatar Management Features&lt;/h2&gt;
&lt;p&gt;The introduction of Avatar sharing and import features makes collaboration between users more convenient. Now, you can easily share your carefully configured Avatar with others, or quickly import Avatars shared by others.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;2.1 Sharing Feature&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The sharing feature supports one-click generation of sharing links or sharing codes, allowing you to quickly share your carefully configured Avatar. The shared Avatar contains complete configuration information (model, voice, appearance, and all other settings), ensuring that recipients can get an experience completely consistent with the original Avatar.&lt;/p&gt;
&lt;h3&gt;2.2 Import Feature&lt;/h3&gt;
&lt;p&gt;The import feature supports quickly importing Avatars shared by others through sharing links or sharing codes. Imported Avatars can be used directly without reconfiguration, and the system will automatically apply all configuration information. The system will automatically synchronize Avatar configuration information to ensure that the imported Avatar configuration remains consistent with the original Avatar.&lt;/p&gt;
&lt;p&gt;These features not only promote communication and cooperation between users but also enhance the scalability and shareability of Avatars.&lt;/p&gt;
&lt;h2&gt;3. Module Three: Data Reporting Features&lt;/h2&gt;
&lt;p&gt;To help users better manage and analyze business data, we have added powerful report export features. These features allow you to easily export and analyze business data, meeting the data analysis needs of different scenarios.&lt;/p&gt;
&lt;h3&gt;3.1 Conversation Record Report&lt;/h3&gt;
&lt;p&gt;The conversation record report feature allows you to export the complete conversation history between users and Avatars, providing strong support for data analysis and business decision-making.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Export complete conversation history between users and Avatars, including all conversation content&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support filtering by time range, flexibly selecting the data time period to export&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Include complete data such as conversation content and timestamps, ensuring data integrity&lt;/p&gt;
&lt;h3&gt;3.2 Recharge Record Report&lt;/h3&gt;
&lt;p&gt;The recharge record report focuses on exporting account recharge details, providing support for financial management and data analysis.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Export account recharge details, including complete information such as recharge amount and time&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support filtering by user, time range, and other conditions, flexibly querying required data&lt;/p&gt;
&lt;h2&gt;4. Module Four: API Integration Features&lt;/h2&gt;
&lt;p&gt;To meet the needs of enterprise-level applications and third-party system integration, we have added conversation record query API and Webhook message notification features. These two features provide different data acquisition methods to meet integration needs in different scenarios.&lt;/p&gt;
&lt;h3&gt;4.1 Conversation Record Query API&lt;/h3&gt;
&lt;p&gt;The conversation record query API allows you to actively query conversation records through the API, supporting flexible query conditions and data formats.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Usage&lt;/strong&gt;: Call the API through HTTP requests, pass in query parameters, and the system returns conversation records that meet the conditions.&lt;/p&gt;
&lt;h3&gt;4.2 Webhook Message Notification&lt;/h3&gt;
&lt;p&gt;The Webhook message notification feature automatically sends callback events of conversation records to your configured Webhook address after each call is completed, achieving passive data reception.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Usage&lt;/strong&gt;: After configuring the Webhook address and trigger conditions, the system will automatically send callback requests to your server after each call is completed, containing complete conversation record data.&lt;/p&gt;
&lt;h2&gt;5. Module Five: Account Security Features&lt;/h2&gt;
&lt;p&gt;Account security has always been our focus. In this update, we have optimized the login logic to improve account security and user experience.&lt;/p&gt;
&lt;p&gt;We have optimized login-related security mechanisms, including:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Optimized Verification Code Mechanism&lt;/strong&gt;: Improved the generation and verification process of verification codes to enhance security&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Secure Email Verification&lt;/strong&gt;: Receive verification codes through registered email to ensure account security&lt;/p&gt;
&lt;p&gt;Through these optimizations, we have further improved account security while maintaining a good user experience. We are committed to providing you with the most secure account protection, ensuring the security of your data and privacy.&lt;/p&gt;
&lt;h2&gt;6. Next Development Plan&lt;/h2&gt;
&lt;h3&gt;6.1 ElevenLabs Integration&lt;/h3&gt;
&lt;p&gt;We will integrate ElevenLabs to bring you more powerful voice and model capabilities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Voice Support&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Integrate ElevenLabs’ rich voice library&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support uploading and training your own exclusive voices&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Provide more flexible voice configuration and management features&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Model Support&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support multiple large language models such as OpenAI, Gemini, Claude&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support connecting to your own model services&lt;/p&gt;
&lt;p&gt;Flexibly switch between different models to meet different scenario needs&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;For detailed model support list, please refer to &lt;a href="https://demo.navtalk.ai/11labs/en/readme.html#3-llm" rel="noopener noreferrer"&gt;ElevenLabs WebSocket Real-Time Conversation Demo&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Intelligent Knowledge Base Management&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Implemented through RAG (Retrieval-Augmented Generation) technology:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support retrieving your enterprise or personal knowledge base&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Upload, manage, and update knowledge base content&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Automatically retrieve relevant knowledge to improve answer accuracy&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Provide personalized answers based on your knowledge base&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration and Pricing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;More flexible and controllable model and voice combination configuration&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Transparent pricing strategy&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Choose services on demand, select optimal configuration based on usage scenarios&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Achieve cost optimization&lt;/p&gt;
&lt;h3&gt;6.2 Multi-Avatar Generation Model Integration&lt;/h3&gt;
&lt;p&gt;We are researching the possibility of integrating multiple Avatar generation models to provide richer digital human images and expressiveness.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Feature Planning&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support integrating different digital human generation models&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support switching between different models&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Optimize multi-model operation efficiency&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Provide higher quality digital human generation effects&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Expected Results&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Richer Avatar choices&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Higher quality image generation&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;More flexible technical solutions&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Meet different scenario needs&lt;/p&gt;
&lt;h3&gt;6.3 Localized Deployment Support&lt;/h3&gt;
&lt;p&gt;We are developing a localized deployment solution that allows you to run the entire NavTalk project on your own GPU server.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Complete deployment with fully localized data&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Meet data security requirements&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Support enterprise private deployment needs&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Optimize based on your hardware configuration&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Applicable Scenarios&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Enterprise private deployment&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Scenarios with high data security requirements&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Large-scale deployment cost optimization&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Customization needs&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Service Support&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Complete deployment documentation and tools&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Automated deployment scripts&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Technical support and services&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;/strong&gt;Continuous updates and maintenance&lt;/p&gt;
&lt;h2&gt;7. Update Summary&lt;/h2&gt;
&lt;p&gt;This NavTalk product update has comprehensively optimized according to functional modules, covering five core modules: &lt;strong&gt;real-time communication, Avatar management, data reporting, API integration, and account security&lt;/strong&gt;. Among them, the real-time communication feature has achieved a major breakthrough in response speed optimization, optimizing digital human response latency to approximately &lt;strong&gt;200ms&lt;/strong&gt;, reaching industry-leading levels. These updates will further improve NavTalk’s user experience and functional completeness, providing individual users and enterprise customers with a more powerful and easier-to-use AI virtual human interaction platform.&lt;/p&gt;
&lt;p&gt;At the same time, we are actively promoting development plans such as &lt;strong&gt;ElevenLabs integration, performance optimization, multi-model support, and localized deployment&lt;/strong&gt; to bring more powerful capabilities to NavTalk. These plans will enable NavTalk to reach new heights in voice selection, model support, knowledge base management, performance, and deployment flexibility.&lt;/p&gt;
&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/navtalk-product-update-five-core-features-comprehensive-upgrade/" rel="noopener noreferrer"&gt;NavTalk Product Update: Five Core Features Comprehensive Upgrade&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>Complete Guide to Deploying MIT Mini Cheetah on D-Robotics RDK S100</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:51:33 +0000</pubDate>
      <link>https://dev.to/frankfu/complete-guide-to-deploying-mit-mini-cheetah-on-d-robotics-rdk-s100-4aa7</link>
      <guid>https://dev.to/frankfu/complete-guide-to-deploying-mit-mini-cheetah-on-d-robotics-rdk-s100-4aa7</guid>
      <description>&lt;p&gt;This document aims to systematically analyze the technical architecture and implementation details of the MIT Mini Cheetah robot control system, and provide detailed instructions on how to complete deployment on the D-Robotics RDK S100 development board. The content is based on publicly available materials combined with actual deployment experience, and is intended to provide complete deployment references and technical guidance for relevant technical developers.&lt;/p&gt;
&lt;h2&gt;1. Introduction to mbedOS&lt;/h2&gt;
&lt;p&gt;Developers who first encounter the MIT Cheetah project may notice that the code repository on GitHub is relatively small, and the compilation method differs from conventional projects. This is mainly because the project uses &lt;strong&gt;mbedOS&lt;/strong&gt; as the underlying development framework.&lt;/p&gt;
&lt;p&gt;The hardware module code for MIT Cheetah is relatively small. For example, the SPIne module mainly focuses on data interaction processing, while underlying hardware drivers and other basic functions are provided by mbedOS.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;mbedOS&lt;/strong&gt; is a complete software solution developed by ARM for IoT applications, and is an embedded open-source ecosystem for ARM Cortex-M series processors. For more information, please visit the &lt;a href="https://www.mbed.com/en/platform/mbed-os/" rel="noopener noreferrer"&gt;mbedOS official website&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;1.1 SPI Interface Initialization Example&lt;/h3&gt;
&lt;p&gt;The following example shows how to initialize the SPI interface in the SPIne module:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void init_spi(void){
    SPISlave *spi = new SPISlave(PA_7, PA_6, PA_5, PA_4);
    spi-&amp;gt;format(16, 0);         // 16bit
    spi-&amp;gt;frequency(12000000);   // 12M
    spi-&amp;gt;reply(0x0);
    cs.fall(&amp;amp;spi_isr);
    printf("donenr");
}&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;1.2 CAN Bus Communication Example&lt;/h3&gt;
&lt;p&gt;The following is a typical application example of CAN bus communication:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#include "mbed.h"

DigitalOut myled(D8);
CAN can1(PD_0, PD_1, 500000);

int main() {
    CANMessage msg;
    while(1) {
        if(can1.read(msg)) {
            printf("Message received:id=%d,type=%d,%dn", msg.id, msg.type, msg.data[0]);
            myled = !myled;
        }
    }
}&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;2. MIT Cheetah Open Source Resources&lt;/h2&gt;
&lt;p&gt;The following are open source resource links related to the MIT Cheetah project:&lt;/p&gt;
&lt;h3&gt;2.1 Hardware Related&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Motor Controller Hardware&lt;/strong&gt;: &lt;a href="https://github.com/bgkatz/3phase_integrated" rel="noopener noreferrer"&gt;https://github.com/bgkatz/3phase_integrated&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;SPIne Hardware&lt;/strong&gt;: &lt;a href="https://github.com/bgkatz/SPIne" rel="noopener noreferrer"&gt;https://github.com/bgkatz/SPIne&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;2.2 Software Related&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Motor Controller Software&lt;/strong&gt;: &lt;a href="https://os.mbed.com/users/benkatz/code/Hobbyking_Cheetah_Compact_DRV8323/" rel="noopener noreferrer"&gt;https://os.mbed.com/users/benkatz/code/Hobbyking_Cheetah_Compact_DRV8323/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;SPIne Software&lt;/strong&gt;: &lt;a href="https://os.mbed.com/users/benkatz/code/SPIne/" rel="noopener noreferrer"&gt;https://os.mbed.com/users/benkatz/code/SPIne/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Linux Control Code (Cheetah Mini)&lt;/strong&gt;: &lt;a href="https://github.com/mit-biomimetics/Cheetah-Software" rel="noopener noreferrer"&gt;https://github.com/mit-biomimetics/Cheetah-Software&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;3. MIT Mini Cheetah Robot System&lt;/h2&gt;
&lt;h3&gt;3.1 Simulation Environment Configuration and Usage&lt;/h3&gt;
&lt;p&gt;After compilation is complete, you need to configure simulation environment parameters. Navigate to the &lt;code&gt;config&lt;/code&gt; directory under the MIT main folder, open the &lt;code&gt;mini-cheetah-defaults.yaml&lt;/code&gt; file, set &lt;code&gt;control_mode&lt;/code&gt; and &lt;code&gt;cheater_mode&lt;/code&gt; to 1, and set &lt;code&gt;use_rc&lt;/code&gt; to 0. Save and exit after configuration.&lt;/p&gt;
&lt;p&gt;Next, start the robot simulation environment. It is recommended to connect a gamepad before starting (optional, for subsequent control). Navigate to the &lt;code&gt;build&lt;/code&gt; directory under the MIT main folder (&lt;strong&gt;Note&lt;/strong&gt;: Directly entering the &lt;code&gt;sim&lt;/code&gt; subdirectory may prevent the simulation from starting, so you need to execute from the &lt;code&gt;build&lt;/code&gt; directory), right-click on a blank area and select “Open in Terminal”, then execute the following command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./sim/sim&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After execution, the robot simulation control interface will be displayed.&lt;/p&gt;
&lt;p&gt;In the control interface, click “Mini Cheetah” and “Simulator” in sequence, then click the “Start” button to launch the robot simulation interface.&lt;/p&gt;
&lt;p&gt;Next, start the robot controller. Navigate to the &lt;code&gt;build/user/MIT_Controller&lt;/code&gt; directory under the MIT main folder, right-click on a blank area and select “Open in Terminal”, then execute the following command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./mit_ctrl m s&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here, &lt;code&gt;mit_ctrl&lt;/code&gt; is the compiled executable file, parameter &lt;code&gt;m&lt;/code&gt; represents the mini cheetah model, and parameter &lt;code&gt;s&lt;/code&gt; represents simulate (simulation mode). After execution, the robot in the simulation should be able to stand up. At this point, switch to the simulation control interface and change the &lt;code&gt;control_mode&lt;/code&gt; value to 4. You can observe the robot in the simulation switching to trot (trotting gait).&lt;/p&gt;
&lt;p&gt;At this point, you can control the robot’s movement speed using the gamepad joystick. Readers can explore different control modes on their own. The following is the implementation method for backflip operation:&lt;/p&gt;
&lt;p&gt;1. Change the &lt;code&gt;control_mode&lt;/code&gt; value in the simulation control interface to 3, and the robot will enter a standing state&lt;/p&gt;
&lt;p&gt;2. Change the &lt;code&gt;control_mode&lt;/code&gt; value to 9, and the robot will perform a backflip action&lt;/p&gt;
&lt;p&gt;3. After the backflip is complete, change the &lt;code&gt;control_mode&lt;/code&gt; value to 3 again, then to 9 to repeat the backflip&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If the robot falls during operation, you can click the “Go Home” button in the simulation control interface to restore the robot to its initial position. If it cannot be restored, you need to restart the simulation and controller.&lt;/p&gt;
&lt;h3&gt;3.2 Combined Use of Real Robot and Simulation&lt;/h3&gt;
&lt;p&gt;When running the real robot, you need to start both the simulation interface and the controller program:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Terminal 1: Start simulation interface
./sim/sim

# Terminal 2: Start controller (real robot mode)
./mit_ctrl m r f&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here, parameter &lt;code&gt;r&lt;/code&gt; represents robot (real robot mode), and parameter &lt;code&gt;f&lt;/code&gt; represents other configuration options.&lt;/p&gt;
&lt;h2&gt;4. RDK S100 Development Board Selection and System Deployment&lt;/h2&gt;
&lt;h3&gt;4.1 Development Board Selection Introduction&lt;/h3&gt;
&lt;p&gt;D-Robotics provides multiple series of development boards, optimized for different application scenarios:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F12%2F19%2F7s2H8B9xhmLwQA6.png" alt="image.png" width="800" height="506"&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;RDK X3 (Entry-level Edge AI/Vision)&lt;/strong&gt;: Features 5 TOPS computing power, suitable for running common CV models and small robot prototype development.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;RDK X5 (Mid-range Robot/Multi-sensor)&lt;/strong&gt;: 10 TOPS computing power + richer high-speed interfaces (4×USB3, dual MIPI CSI, CAN FD, Wi-Fi 6, PoE), suitable for more complete robot integration and sensor expansion.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;RDK S100 / S100P (High-end “Computing-Control Integration”/Humanoid &amp;amp; Multi-joint Control Scenarios)&lt;/strong&gt;: 80/128 TOPS computing power + stronger CPU (A78AE) + &lt;strong&gt;On-board MCU (Cortex-R52+)&lt;/strong&gt;，emphasizing “perception inference + real-time motion control” collaboration, very suitable for quadruped robots and other applications requiring high real-time performance.&lt;/p&gt;
&lt;p&gt;This document mainly uses the &lt;strong&gt;RDK S100&lt;/strong&gt; development board to deploy the MIT Mini Cheetah program. This development board has powerful computing capabilities and rich interfaces, capable of meeting the real-time control requirements of quadruped robots.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RDK S100 Development Board Interface Description&lt;/strong&gt;:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F12%2F17%2FUw6gv4La1qmQNHo.png" alt="image.png" width="800" height="340"&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;J1&lt;/td&gt;
&lt;td&gt;Main board power supply interface&lt;/td&gt;
&lt;td&gt;J22&lt;/td&gt;
&lt;td&gt;MCU domain 16-Pin interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J2&lt;/td&gt;
&lt;td&gt;Main board function connector&lt;/td&gt;
&lt;td&gt;J23&lt;/td&gt;
&lt;td&gt;MCU expansion board 100-Pin interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J3&lt;/td&gt;
&lt;td&gt;RTC battery interface&lt;/td&gt;
&lt;td&gt;J24&lt;/td&gt;
&lt;td&gt;40-Pin interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J8&lt;/td&gt;
&lt;td&gt;Fan control interface&lt;/td&gt;
&lt;td&gt;J25&lt;/td&gt;
&lt;td&gt;Camera expansion board 100-Pin interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J15&lt;/td&gt;
&lt;td&gt;Main domain and MCU domain JTAG interface&lt;/td&gt;
&lt;td&gt;K1&lt;/td&gt;
&lt;td&gt;Reset button&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J16&lt;/td&gt;
&lt;td&gt;Type-C interface, for flashing, Main domain and MCU domain debugging&lt;/td&gt;
&lt;td&gt;K2&lt;/td&gt;
&lt;td&gt;Sleep button&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J17&lt;/td&gt;
&lt;td&gt;M.2 Key E interface&lt;/td&gt;
&lt;td&gt;SW1&lt;/td&gt;
&lt;td&gt;Power switch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J18&lt;/td&gt;
&lt;td&gt;M.2 Key M interface&lt;/td&gt;
&lt;td&gt;SW2&lt;/td&gt;
&lt;td&gt;Flashing mode switch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J19&amp;amp;J20&lt;/td&gt;
&lt;td&gt;4x USB3.0 Type-A interface&lt;/td&gt;
&lt;td&gt;SW3&amp;amp;SW6&lt;/td&gt;
&lt;td&gt;Pin function switching DIP switches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J21&lt;/td&gt;
&lt;td&gt;HDMI interface&lt;/td&gt;
&lt;td&gt;U43&amp;amp;U45&lt;/td&gt;
&lt;td&gt;2x Gigabit RJ45 network ports&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;5. RDK S100 System Flashing&lt;/h2&gt;
&lt;p&gt;The RDK S100 kit currently provides Ubuntu 22.04 system image, supporting Desktop graphical interaction, providing convenience for development and debugging.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important Note&lt;/strong&gt;: The RDK S100 comes pre-installed with a test version system image. To ensure you are using the latest version of the system and obtain optimal performance, it is strongly recommended to complete the flashing of the latest version system image according to this document.&lt;/p&gt;
&lt;p&gt;D-Robotics official website provides detailed system flashing documentation. This document only provides an overview of key steps. For more detailed instructions, please refer to: &lt;a href="https://developer.d-robotics.cc/rdk_doc/rdk_s/Quick_start/install_os/rdk_s100" rel="noopener noreferrer"&gt;D-Robotics Official Documentation&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;5.1 USB Driver Installation&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Download Address&lt;/strong&gt;: &lt;a href="https://archive.d-robotics.cc/downloads/software_tools/winusb_drivers/" rel="noopener noreferrer"&gt;USB Driver Download&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For Windows operating systems, you need to install the corresponding drivers before using ADB and Fastboot functions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ADB and Fastboot Description&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;With the improvement of embedded development board performance and functionality, modern development boards (such as RDK S100) mainly use ADB and Fastboot for system flashing and debugging, providing more functionality compared to traditional serial port methods:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;ADB (Android Debug Bridge)&lt;/strong&gt;: After the system &lt;strong&gt;has started&lt;/strong&gt;, used as an “instruction channel” between the computer and the development board, supporting file transfer, command execution, and other functions.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Fastboot&lt;/strong&gt;: When the system &lt;strong&gt;has not started&lt;/strong&gt;, used for &lt;strong&gt;flashing, unlocking, system recovery&lt;/strong&gt; underlying tools, is the key tool for system flashing.&lt;/p&gt;
&lt;h3&gt;5.2 Complete System Flashing&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Important Configuration&lt;/strong&gt;: Currently, you need to set the SW3 DIP switch to ↑ position to use the onboard eMMC to boot the system. The current version temporarily does not support booting from M.2 NVMe SSD.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;5.2.1 Download Flashing Tools and System Image&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Image Flashing Tool D-Navigation&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Download Address&lt;/strong&gt;: &lt;a href="https://archive.d-robotics.cc/downloads/software_tools/download_tools/" rel="noopener noreferrer"&gt;D-Navigation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Windows Version&lt;/strong&gt;: Use &lt;code&gt;D-navigation-win32-x64_v2.4.zip&lt;/code&gt; version&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;System Image Download&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Download Address&lt;/strong&gt;: &lt;a href="https://archive.d-robotics.cc/downloads/os_images/rdk_s100/RDKS100-V4.0.4-Beta/RDK_LNX_SDK/firmwares/" rel="noopener noreferrer"&gt;System Image&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After extracting the system image, you will get a &lt;code&gt;product&lt;/code&gt; folder. Ensure that this folder contains the &lt;code&gt;img_packages&lt;/code&gt; folder and &lt;code&gt;xmodem_tools&lt;/code&gt; file, with the structure as shown below:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F12%2F17%2Fbo36LEC1ni5M4cy.png" alt="image.png" width="354" height="194"&gt;
&lt;h4&gt;&lt;strong&gt;5.2.2 U-Boot Flashing Steps&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;This document uses U-Boot mode for system flashing. The specific steps are as follows:&lt;/p&gt;
&lt;p&gt;1. &lt;strong&gt;Development Board Power Preparation&lt;/strong&gt;: Ensure the development board is powered off&lt;/p&gt;
&lt;p&gt;2. &lt;strong&gt;Enter U-Boot Mode&lt;/strong&gt;: Set the SW2 DIP switch to ▽ position to enter U-Boot mode&lt;/p&gt;
&lt;p&gt;3. &lt;strong&gt;Turn on Power&lt;/strong&gt;: Set the SW1 DIP switch to ▽ position to turn on power&lt;/p&gt;
&lt;p&gt;4. &lt;strong&gt;Open D-Navigation Tool&lt;/strong&gt; and complete the following configuration:&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Select product model: &lt;strong&gt;S100&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Download mode: &lt;strong&gt;uboot&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Storage medium: &lt;strong&gt;emmc&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Type: &lt;strong&gt;secure&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Click “Browse” to select the &lt;code&gt;product&lt;/code&gt; folder where the firmware is located&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Select the serial port connected to RDK S100, set baud rate to &lt;strong&gt;921600&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Click “Start Upgrade”&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: During the upgrade process, if you see a ‘Need manual reset’ prompt, please power cycle the development board.&lt;/p&gt;
&lt;img width="800" height="569" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ffrankfu.blog%2Fwp-content%2Fuploads%2F2025%2F12%2Fimage-14-1024x729.png" alt=""&gt;
&lt;h2&gt;6. RDK S100 System Startup and Network Configuration&lt;/h2&gt;
&lt;h3&gt;6.1 System Startup&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Hardware Connection&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Connect the development board to the display via HDMI cable&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Connect to the network via RJ45 network port (if the development board is not equipped with a Wi-Fi card)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Keep the development board powered off, complete the connection before powering on&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First Boot&lt;/strong&gt;: The system will perform default environment configuration during the first boot, and the entire process takes about 45 seconds. After configuration, the Ubuntu system desktop will be output on the display.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Troubleshooting&lt;/strong&gt;: If the development board has no display output for a long time after powering on (more than 2 minutes), it indicates abnormal development board startup. At this time, you need to debug through the serial port cable, check the development board startup log to diagnose the problem.&lt;/p&gt;
&lt;p&gt;After the Ubuntu Desktop version system starts, it will output the system desktop on the display through the Display interface, as shown in the figure below:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F12%2F17%2FJmM7nNKtA4Eqv2Z.png" alt="image.png" width="800" height="457"&gt;
&lt;h3&gt;6.2 Network Configuration&lt;/h3&gt;
&lt;p&gt;Log in to the system through the serial port for network configuration. Serial port login operation is as follows:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F12%2F17%2FzOtX7p6ELNj2fsc.gif" alt="image-Uart-Login.gif" width="1073" height="621"&gt;
&lt;p&gt;Follow the GIF animation shown above, click OK, enter username: &lt;strong&gt;root&lt;/strong&gt;, password: &lt;strong&gt;root&lt;/strong&gt; to log in to the device.&lt;br&gt;
After logging in, you can use the &lt;code&gt;ifconfig -a&lt;/code&gt; command to query the development board IP address. Among them, &lt;code&gt;eth0&lt;/code&gt;/&lt;code&gt;eth1&lt;/code&gt; represent wired network interfaces, and &lt;code&gt;wlan0&lt;/code&gt; represents wireless network interface:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;root@ubuntu:~# ifconfig -a
eth0: flags=4163&amp;lt;UP,BROADCAST,RUNNING,MULTICAST&amp;gt;  mtu 1500
        inet 192.168.1.93  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 240e:39d:4d4:e2f0:283c:b3ff:fe97:bb72  prefixlen 64  scopeid 0x0&amp;lt;global&amp;gt;
        inet6 fe80::283c:b3ff:fe97:bb72  prefixlen 64  scopeid 0x20&amp;lt;link&amp;gt;
        ether 2a:3c:b3:97:bb:72  txqueuelen 1000  (Ethernet)
        RX packets 38261  bytes 55422230 (55.4 MB)
        RX errors 0  dropped 98  overruns 0  frame 0
        TX packets 21241  bytes 1485148 (1.4 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 95

eth1: flags=4099&amp;lt;UP,BROADCAST,MULTICAST&amp;gt;  mtu 1500
        ether 92:b0:69:58:4e:df  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 96

lo: flags=73&amp;lt;UP,LOOPBACK,RUNNING&amp;gt;  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10&amp;lt;host&amp;gt;
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 145  bytes 13618 (13.6 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 145  bytes 13618 (13.6 KB)
        RX errors 0  dropped 0 overruns 0  frame 0
        TX errors 0  dropped 0 overruns 0  frame 0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can obtain the router DHCP-assigned IP address through the &lt;code&gt;eth0&lt;/code&gt; interface for subsequent SSH remote connection.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SSH Login&lt;/strong&gt;: For security reasons, it is recommended to use a regular user for SSH login instead of the root account.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regular User&lt;/strong&gt;: Username &lt;code&gt;sunrise&lt;/code&gt;, password &lt;code&gt;sunrise&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;6.3 System Version Confirmation&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: System version alignment is crucial, as different versions may encounter different compatibility issues. Please confirm the system version:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sunrise@ubuntu:~$ cat /etc/version
4.0.4-Beta&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Confirm that the current version is &lt;strong&gt;4.0.4-Beta&lt;/strong&gt;, consistent with the flashed image version.&lt;/p&gt;
&lt;h2&gt;7. RDK S100 Software Environment Configuration&lt;/h2&gt;
&lt;h3&gt;7.1 Computer Board Selection Description&lt;/h3&gt;
&lt;p&gt;The original system of MIT Mini Cheetah runs on the UP Board, which uses a 4-core Intel Atom x5-Z8350 processor, equipped with 4GB RAM, peak power consumption of about 5W, based on x86 architecture.&lt;/p&gt;
&lt;p&gt;UP Board has relatively few applications in the Chinese market. More common choices include Raspberry Pi and NVIDIA Jetson series. Among them, Raspberry Pi is more oriented towards general embedded applications, while the Jetson series is more suitable for image processing and AI model deployment.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;RDK S100&lt;/strong&gt; used in this document as a computing platform runs Ubuntu 22.04 system, equipped with a 6-core ARM Cortex-A78AE v8.2 64-bit processor (ARM architecture), with 80 TOPS AI computing power and onboard MCU (Cortex-R52+), very suitable for quadruped robots and other applications requiring “perception inference + real-time motion control” collaboration.&lt;/p&gt;
&lt;h3&gt;7.2 Download MIT Mini Cheetah Source Code&lt;/h3&gt;
&lt;p&gt;First, download the MIT Mini Cheetah source code. This document uses an adapted version:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git clone https://github.com/fuwei007/NavBot-EG02.git&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After downloading, enter the source code directory, which we refer to as the MIT main folder.&lt;/p&gt;
&lt;h3&gt;7.3 Install Third-Party Dependency Libraries&lt;/h3&gt;
&lt;p&gt;Install the basic dependency libraries required for compilation and running:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo apt-get update
sudo apt -y install cmake gcc build-essential
sudo apt-get -y install openjdk-11-jdk
sudo apt -y install liblcm-dev
sudo apt-get -y install libeigen3-dev
sudo apt-get -y install mesa-common-dev
sudo apt -y install libgl1-mesa-dev
sudo apt -y install libglu1-mesa-dev
sudo apt-get -y install freeglut3-dev
sudo apt-get -y install libblas-dev liblapack-dev
sudo apt-get -y install libopenblas-dev

sudo apt install -y coinor-libipopt-dev gfortran libglib2.0-dev
sudo apt install -y openjdk-8-jdk&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;7.4 Install Qt&lt;/h3&gt;
&lt;p&gt;Qt is the graphics library required for the MIT Mini Cheetah simulation interface. There are two installation methods:&lt;/p&gt;
&lt;h4&gt;Method 1: Source Code Compilation Installation (Suitable for cases requiring a complete Qt development environment)&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Download Qt 5.14.2 Version&lt;/strong&gt;: &lt;a href="https://download.qt.io/archive/qt/5.14/5.14.2/" rel="noopener noreferrer"&gt;Qt 5.14.2 Download Link&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After downloading, execute the following steps in the directory where the file is located:&lt;/p&gt;
&lt;p&gt;1. Select the downloaded Qt installation file, right-click and select “Properties”&lt;/p&gt;
&lt;p&gt;2. In the “Permissions” tab, check “Allow executing file as program”&lt;/p&gt;
&lt;p&gt;3. Right-click in this folder to open a terminal, and execute the following command (Note: &lt;code&gt;qt-opensource-linux-x64-5.14.2.run&lt;/code&gt; should be replaced with your actual downloaded filename):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./qt-opensource-linux-x64-5.14.2.run&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;4. Complete the installation according to the graphical interface prompts (similar to Windows installation program)&lt;/p&gt;
&lt;h4&gt;Method 2: Install Using apt (Recommended, Simpler)&lt;/h4&gt;
&lt;p&gt;You can also use apt to directly install Qt-related libraries:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo apt install -y libqt5 libqt5gamepad5&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;RDK S100 Special Note&lt;/strong&gt;: In fact, the RDK S100 system already has Qt-related environment pre-installed, so you can skip the source code compilation steps and only need to install the gamepad support library:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo apt install -y libqt5gamepad5-dev&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;7.5 Install LCM&lt;/h3&gt;
&lt;p&gt;LCM (Lightweight Communications and Marshalling) is a library used for inter-process communication in the MIT Mini Cheetah system.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Download LCM 1.4.0 Version&lt;/strong&gt;: &lt;a href="https://github.com/lcm-proj/lcm/releases/download/v1.4.0/lcm-1.4.0.zip" rel="noopener noreferrer"&gt;LCM v1.4.0 Download Link&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After downloading, extract the compressed package, enter the extracted folder, right-click on a blank area and select “Open in Terminal”.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: Due to system version compatibility requirements, you need to switch the Java environment to JDK 8:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo update-alternatives --config javac
# Select option 2: /usr/lib/jvm/java-8-openjdk-arm64/jre/bin/java

sudo update-alternatives --config java
# Select option 2: /usr/lib/jvm/java-8-openjdk-arm64/bin/javac&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After completing the Java environment switch, execute the following commands to compile and install LCM (it is recommended to execute them one by one):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mkdir build 
cd build 
cmake .. 
make
sudo make install 
sudo ldconfig&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;7.6 Install Eigen 3.3.6&lt;/h3&gt;
&lt;p&gt;Eigen is a C++ template library for linear algebra, matrix and vector operations. &lt;strong&gt;Important&lt;/strong&gt;: After actual testing, other versions of Eigen may have compatibility issues, so you must use &lt;strong&gt;Eigen 3.3.6&lt;/strong&gt; version.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Download Eigen 3.3.6&lt;/strong&gt;: &lt;a href="https://gitlab.com/libeigen/eigen/-/archive/3.3.6/eigen-3.3.6.zip" rel="noopener noreferrer"&gt;Eigen 3.3.6 Download Link&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After downloading, extract the compressed package, enter the extracted folder, right-click on a blank area and select “Open in Terminal”, then execute the following commands (it is recommended to execute them one by one):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mkdir build 
cd build 
cmake .. 
sudo make install 
sudo ldconfig&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;7.7 Modify MIT Mini Cheetah Program Source Code&lt;/h3&gt;
&lt;p&gt;Since the MIT Mini Cheetah original code is mainly designed for UP Board (x86 architecture), some adaptive modifications are needed on RDK S100 (ARM architecture). The downloaded source code directory structure is shown in the figure below:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2FfT6vP2tcIyMYBOu.png" alt="image.png" width="800" height="451"&gt;
&lt;p&gt;The following will detail the modifications that need to be made:&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;7.7.1 Modify Git Branch and Repository Address in CMakeLists.txt&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Open the &lt;code&gt;common/CMakeLists.txt&lt;/code&gt; file under the MIT main folder, and you need to modify the following content:&lt;/p&gt;
&lt;p&gt;1. Change the Git branch from &lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt; (GitHub has defaulted to using the main branch)&lt;/p&gt;
&lt;p&gt;2. Switch the googletest library’s Git repository address to Gitee mirror (faster access in China)&lt;/p&gt;
&lt;p&gt;The modification location is shown in the figure below:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2FP4LaXVvb82DqQ5U.png" alt="image.png" width="800" height="454"&gt;
&lt;p&gt;Save and exit after modification.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;7.7.2 Modify Eigen3 and LCM Header File Paths&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Since Eigen3 and LCM header files are installed in the &lt;code&gt;/usr/include&lt;/code&gt; directory in the RDK S100 system, while the default path in the source code is &lt;code&gt;/usr/local/include&lt;/code&gt;, path modification is needed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Search and Replace&lt;/strong&gt;: Search for the following two lines in all related files:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;include_directories("/usr/local/include/lcm/")
include_directories("/usr/local/include/eigen3")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Replace with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;include_directories("/usr/include/lcm/")
include_directories("/usr/include/eigen3")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;List of Files That Need to Be Modified&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Cheetah-Software-master/common/CMakeLists.txt
Cheetah-Software-master/rc_test/CMakeLists.txt
Cheetah-Software-master/robot/CMakeLists.txt
Cheetah-Software-master/sim/CMakeLists.txt
Cheetah-Software-master/user/MIT_Controller/CMakeLists.txt&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;RDK S100 Special Note&lt;/strong&gt;: If you are using already adapted source code (such as the version provided in this document), you may not need to make this modification, or you may need to perform the opposite operation (change &lt;code&gt;/usr/include&lt;/code&gt; to &lt;code&gt;/usr/local/include&lt;/code&gt;). Please adjust according to the actual header file installation location.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;7.7.3 Modify Qt Path&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Modify the file &lt;code&gt;Cheetah-Software-master/scripts/find_qt_path.sh&lt;/code&gt;, comment out the original Qt path setting:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#printf "${HOME}/Qt/${QT_VER}/gcc_64/"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The path after &lt;code&gt;printf&lt;/code&gt; should include the &lt;code&gt;bin&lt;/code&gt; directory.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RDK S100 Adaptation&lt;/strong&gt;: Since RDK S100 uses system-installed Qt, you should use the following method to automatically obtain the Qt path:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;printf "$(qmake -query QT_INSTALL_PREFIX)/"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This can automatically obtain the installation path of the system Qt without manual specification.&lt;/p&gt;
&lt;p&gt;The modification location is shown in the figure below:&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2025%2F11%2F13%2FdjzZOGoauXgI3nW.png" alt="image.png" width="464" height="218"&gt;
&lt;h4&gt;&lt;strong&gt;7.7.4 Fix Serial Port Header File Missing Issue&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;In ARM architecture Linux systems, the inclusion method of certain header files differs from x86 architecture and needs to be adapted.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Modify Source Code File&lt;/strong&gt;: Edit the &lt;code&gt;Cheetah-Software-master/robot/src/rt/rt_serial.cpp&lt;/code&gt; file:&lt;/p&gt;
&lt;p&gt;1. Comment out &lt;code&gt;#include &amp;lt;stropts.h&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;2. Add &lt;code&gt;#include &amp;lt;sys/ioctl.h&amp;gt;&lt;/code&gt; before &lt;code&gt;#include &amp;lt;asm/termios.h&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix System Header File Redefinition Issue&lt;/strong&gt;: Edit the system header file &lt;code&gt;/usr/include/asm-generic/termios.h&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo nano /usr/include/asm-generic/termios.h&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Add &lt;code&gt;#ifndef _SYS_IOCTL_H&lt;/code&gt; at the beginning of the file, and add &lt;code&gt;#endif&lt;/code&gt; after the related structure definition to avoid redefinition errors:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#ifndef _SYS_IOCTL_H
struct winsize {
        unsigned short ws_row;
        unsigned short ws_col;
        unsigned short ws_xpixel;
        unsigned short ws_ypixel;
};

#define NCC 8
struct termio {
        unsigned short c_iflag;         /* input mode flags */
        unsigned short c_oflag;         /* output mode flags */
        unsigned short c_cflag;         /* control mode flags */
        unsigned short c_lflag;         /* local mode flags */
        unsigned char c_line;           /* line discipline */
        unsigned char c_cc[NCC];        /* control characters */
};
#endif&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;&lt;strong&gt;7.7.5 Adapt spdlog Logging Library&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;spdlog is a fast C++ logging library. On RDK S100, you need to use the system-installed spdlog package instead of compiling from source.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Modify &lt;code&gt;third-party/CMakeLists.txt&lt;/code&gt;&lt;/strong&gt;: Replace all file content with the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;add_subdirectory(Goldfarb_Optimizer)
add_subdirectory(ParamHandler)
add_subdirectory(inih)
add_subdirectory(osqp)
add_subdirectory(JCQP)
add_subdirectory(qpOASES)
add_subdirectory(lord_imu)
add_subdirectory(wheeltec_imu)
add_subdirectory(SOEM)

if(CMAKE_SYSTEM_NAME MATCHES Linux)
  add_subdirectory(vectornav)
endif()

# Build all 3rd-party libs with PIC (useful for shared libs)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)

# ------------------------------------------------------------
# spdlog: use system package (libspdlog-dev) instead of source
# ------------------------------------------------------------
find_package(spdlog CONFIG REQUIRED)

# Provide a target named "spdlog" for compatibility with existing link lines.
add_library(spdlog INTERFACE)

if(TARGET spdlog::spdlog)
  target_link_libraries(spdlog INTERFACE spdlog::spdlog)
elseif(TARGET spdlog::spdlog_header_only)
  target_link_libraries(spdlog INTERFACE spdlog::spdlog_header_only)
else()
  message(FATAL_ERROR "spdlog CMake target not found (spdlog::spdlog / spdlog::spdlog_header_only). Install libspdlog-dev or set spdlog_DIR.")
endif()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Modify the top-level CMakeLists.txt, directly replace all content.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cmake_minimum_required(VERSION 3.5)

# Add project() to avoid CMake warning and make PROJECT_SOURCE_DIR valid
project(MiniCheetah LANGUAGES C CXX)

set(CMAKE_DISABLE_IN_SOURCE_BUILD ON)
set(CMAKE_DISABLE_SOURCE_CHANGES  ON)

if ("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_BINARY_DIR}")
  message(SEND_ERROR "In-source builds are not allowed.")
endif ()

set(CMAKE_COLOR_MAKEFILE ON)
#execute_process(COMMAND ../scripts/make_types.sh)

set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)

#set(CMAKE_VERBOSE_MAKEFILE ON)

option(MINI_CHEETAH_BUILD "use compiler flags for mini cheetah computer" OFF)
set(BUILD_TYPE_RELEASE TRUE)

option(NO_SIM "Do not build simulator" OFF)

# -------------------------------
# spdlog: use system libspdlog-dev
# Must be before any add_subdirectory() that links spdlog::spdlog
# -------------------------------
find_package(spdlog CONFIG REQUIRED)

# Some distros provide only header-only target; alias it to spdlog::spdlog
if(NOT TARGET spdlog::spdlog AND TARGET spdlog::spdlog_header_only)
  add_library(spdlog::spdlog ALIAS spdlog::spdlog_header_only)
endif()

if(MINI_CHEETAH_BUILD)
  SET (THIS_COM "../" )
  CONFIGURE_FILE(${CMAKE_CURRENT_SOURCE_DIR}/config.h.cmake
    ${CMAKE_BINARY_DIR}/Configuration.h)
  set(CMAKE_CXX_FLAGS "-O3 -no-pie -ggdb -Wall 
  -Wextra -Wcast-align -Wdisabled-optimization -Wformat=2 
  -Winit-self -Wmissing-include-dirs -Woverloaded-virtual 
  -Wshadow -Wsign-promo -Werror")
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=overloaded-virtual -Wno-error=unused-parameter")
  set(CMAKE_C_FLAGS "-O3  -ggdb -std=gnu99 -I.")
  message("**** Mini-Cheetah build enabled ****")
else(MINI_CHEETAH_BUILD)
  SET (THIS_COM "${PROJECT_SOURCE_DIR}/" )
  CONFIGURE_FILE(${CMAKE_CURRENT_SOURCE_DIR}/config.h.cmake
    ${CMAKE_BINARY_DIR}/Configuration.h)

  if(CMAKE_SYSTEM_NAME MATCHES Linux)
    set(CMAKE_CXX_FLAGS "-O3 -no-pie -march=native -ggdb -Wall 
    -Wextra -Wcast-align -Wdisabled-optimization -Wformat=2 
    -Winit-self -Wmissing-include-dirs -Woverloaded-virtual 
    -Wshadow -Wsign-promo -Werror")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=overloaded-virtual -Wno-error=unused-parameter")
  elseif(APPLE)
    set(CMAKE_CXX_FLAGS "-O3 -march=native -ggdb -Wall 
    -Wextra -Wcast-align -Wdisabled-optimization -Wformat=2 
    -Winit-self -Wmissing-include-dirs -Woverloaded-virtual 
    -Wshadow -Wsign-promo")
    include_directories("/usr/local/include/")   # lcm includes
  endif()

  set(CMAKE_C_FLAGS "-O3  -ggdb  -march=native -std=gnu99 -I.")
  message("**** Mini-Cheetah build disabled ****")
endif(MINI_CHEETAH_BUILD)

set(CMAKE_CXX_STANDARD 14)

#find_package(lcm)

add_subdirectory(robot)
add_subdirectory(third-party)
add_subdirectory(common)

if(NO_SIM)

else(NO_SIM)
  add_subdirectory(sim)
endif()

add_subdirectory(user)
add_subdirectory(rc_test)&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;7.8 Compile MIT Mini Cheetah Program&lt;/h3&gt;
&lt;p&gt;After completing all source code modifications, start compiling the program. Taking compiling the Mini Cheetah version as an example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd Cheetah-Software
cd scripts
chmod +x make_types.sh
./make_types.sh  # You may see error messages like `rm: cannot remove...`, this is normal and can be ignored

cd .. &amp;amp;&amp;amp; mkdir mc-build &amp;amp;&amp;amp; cd mc-build
rm CMakeCache.txt  # Clean old configuration (if necessary)

# Configure project
# -DMINI_CHEETAH_BUILD=TRUE: Build Mini Cheetah version
# -DJCQP_USE_AVX2=OFF: Turn off x86 AVX2 optimization, adapt to ARM architecture (RDK S100)
cmake -DMINI_CHEETAH_BUILD=TRUE -DJCQP_USE_AVX2=OFF ..

# Compile (adjust -j parameter according to CPU core count, $(nproc) will automatically detect core count)
make -j$(nproc)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Compilation Notes&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;1. ./make_types.sh Execution: This script may prompt some errors (such as “cannot remove, no such file or directory”), these errors can be ignored and do not affect compilation.&lt;/p&gt;
&lt;p&gt;2. CMake Configuration:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;DMINI_CHEETAH_BUILD=TRUE&lt;/code&gt;: Specify building Mini Cheetah version&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;DJCQP_USE_AVX2=OFF&lt;/code&gt;: Turn off x86 architecture AVX2 optimization, adapt to ARM architecture (RDK S100)&lt;br&gt;3. &lt;strong&gt;Network Issues&lt;/strong&gt;: When executing the &lt;code&gt;cmake&lt;/code&gt; command, it may get stuck at the step of downloading Google-related dependencies. This is a network issue, please wait patiently.&lt;/p&gt;
&lt;p&gt;4. &lt;strong&gt;Compilation Parallelism&lt;/strong&gt;: &lt;code&gt;make -j$(nproc)&lt;/code&gt; will automatically use all CPU cores for parallel compilation. If you encounter problems, you can use &lt;code&gt;make&lt;/code&gt; for single-threaded compilation, but it will be slower.&lt;/p&gt;
&lt;h2&gt;8. RDK S100 Program Execution&lt;/h2&gt;
&lt;p&gt;After compilation is complete, the generated controller executable file is located in the &lt;code&gt;mc-build/user/MIT_Controller/&lt;/code&gt; directory. Running the program requires &lt;code&gt;sudo&lt;/code&gt; privileges to access hardware ports.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: You need to execute the following commands in the &lt;code&gt;mc-build&lt;/code&gt; directory.&lt;/p&gt;
&lt;h3&gt;8.1 Simulation Mode Execution&lt;/h3&gt;
&lt;p&gt;First, test whether the program runs normally in simulation mode:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd mc-build
sudo ./user/MIT_Controller/mit_ctrl m s&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Parameter description:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;m&lt;/code&gt;: Represents Mini Cheetah model&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;s&lt;/code&gt;: Represents simulate (simulation mode)&lt;/p&gt;
&lt;h3&gt;8.2 Real Robot Mode Execution&lt;/h3&gt;
&lt;p&gt;After confirming that simulation mode runs normally, you can switch to real robot mode:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd mc-build
sudo ./user/MIT_Controller/mit_ctrl m r f&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Parameter description:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;m&lt;/code&gt;: Represents Mini Cheetah model&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;r&lt;/code&gt;: Represents robot (real robot mode)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;f&lt;/code&gt;: Represents other configuration options&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; In real robot mode, please ensure hardware connections are correct, including SPIne board, motor controllers, etc.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; It is recommended to fully test in simulation mode first, confirm the control algorithm is normal before switching to real robot mode&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; When running the real robot, please ensure there is sufficient safe space to avoid the robot losing control and causing injury&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;9. Advanced: Solving RDK S100 Rear Leg (SPI 0.1) Drive Issue&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;When migrating code from Jetson Nano to RDK S100, you may encounter a typical phenomenon: &lt;strong&gt;the front legs can move, but the rear legs have no response at all&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;9.1 Problem Diagnosis&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Enter the following command in the terminal to check devices:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ls /dev/spi*&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Normally, Mini Cheetah requires two SPI devices:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;* `/dev/spidev0.0` (controls front legs)

* `/dev/spidev0.1` (controls rear legs)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: You may only see `/dev/spidev0.0`, or there may be an unused `/dev/spidev1.0`, but `/dev/spidev0.1` is missing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Root Cause Analysis&lt;/strong&gt;: The RDK S100 system device tree (Device Tree) may not have enabled the second chip select signal (CS1) of SPI0 by default, causing the kernel not to load the corresponding driver.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;9.2 Solution (Ultimate Hardware Modification Version)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The most reliable method is to &lt;strong&gt;directly modify the kernel device tree file (DTB)&lt;/strong&gt;. For convenience, a Python script has been written that automatically decompiles the system kernel, “hard-writes” the missing driver node into it, and recompiles it back.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;9.2.1 Step 1: Install Required Tools&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;We need to install the `device-tree-compiler` (dtc) tool to compile the device tree. Ensure the development board is connected to the network, then execute:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo apt-get update

sudo apt-get install -y device-tree-compiler&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;&lt;strong&gt;9.2.2 Step 2: Create Fix Script&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Create a script file in the terminal:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nano force_spi_patch.py&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;&lt;strong&gt;9.2.3 Step 3: Copy Script Code&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Copy the following code completely (comments are in English to prevent Chinese encoding issues):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import os

import glob

import subprocess

import sys

# Path to RDK S100 Device Tree files

DTB_DIR = "/boot/hobot"

# The CS1 node to insert (using spidev@1)

# reg = &amp;lt;0x1&amp;gt; corresponds to Chip Select 1

NEW_NODE = """

        spidev@1 {

            compatible = "rohm,dh2228fv";

            reg = &amp;lt;0x1&amp;gt;;

            spi-max-frequency = &amp;lt;0x2faf080&amp;gt;;

        };

"""

def patch_dts_content(content):

    # Check if spidev@1 already exists

    if "spidev@1" in content:

        return None, "Already patched"

    # Find the position of spidev@0

    # We insert spidev@1 immediately before spidev@0 for safety

    target_str = "spidev@0 {"

    if target_str not in content:

        return None, "spidev@0 not found"

    # Replace target string with NEW_NODE + target string

    new_content = content.replace(target_str, NEW_NODE + "\n\t" + target_str)

    return new_content, "Patched"

def main():

    print("=== Starting: Kernel Device Tree Patch ===")

    # 1. Check for dtc tool

    if subprocess.call(["which", "dtc"], stdout=subprocess.DEVNULL) != 0:

        print("Error: dtc tool not found. Please run: sudo apt-get install device-tree-compiler")

        sys.exit(1)

    # 2. Find all dtb files

    dtb_files = glob.glob(os.path.join(DTB_DIR, "rdk-s100*.dtb"))

    if not dtb_files:

        print(f"Error: No .dtb files found in {DTB_DIR}")

        sys.exit(1)

    count = 0

    for dtb_path in dtb_files:

        # Skip files we might have created manually before

        if "-cs1.dtb" in dtb_path:

            continue

        print(f"Processing: {os.path.basename(dtb_path)}")

        # Backup original file

        if not os.path.exists(dtb_path + ".original"):

            os.system(f"sudo cp {dtb_path} {dtb_path}.original")

        # Decompile DTB -&amp;gt; DTS

        dts_path = dtb_path + ".temp.dts"

        cmd_decompile = f"dtc -I dtb -O dts -o {dts_path} {dtb_path}"

        # Run decompile (suppress warnings)

        os.system(f"{cmd_decompile} &amp;gt; /dev/null 2&amp;gt;&amp;amp;1")

        if not os.path.exists(dts_path):

            print("  -&amp;gt; Decompilation failed, skipping")

            continue

        # Read and modify DTS content

        with open(dts_path, 'r') as f:

            content = f.read()

        new_content, status = patch_dts_content(content)

        if new_content:

            with open(dts_path, 'w') as f:

                f.write(new_content)

            # Recompile DTS -&amp;gt; DTB

            cmd_compile = f"dtc -I dts -O dtb -o {dtb_path} {dts_path}"

            if os.system(f"{cmd_compile} &amp;gt; /dev/null 2&amp;gt;&amp;amp;1") == 0:

                print(f"  -&amp;gt; Patch applied successfully!")

                count += 1

            else:

                print(f"  -&amp;gt; Compilation error, file not modified")

        else:

            print(f"  -&amp;gt; {status} (No changes needed)")

        # Clean up temporary file

        if os.path.exists(dts_path):

            os.remove(dts_path)

    print("-" * 30)

    if count &amp;gt; 0:

        print(f"Patch Complete! Modified {count} kernel files.")

        print("Please reboot immediately: sudo reboot")

    else:

        print("No files were modified. Please check if spidev@1 already exists.")

if __name__ == "__main__":

    main()&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Press `Ctrl + O` to save, `Enter` to confirm, `Ctrl + X` to exit.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;9.2.4 Step 4: Execute Fix&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Run the script:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo python3 force_spi_patch.py&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When you see the message `Patch applied successfully!`, it means the patch has been applied to the kernel file.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;9.2.5 Step 5: Reboot and Verify&lt;/strong&gt;&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;sudo reboot&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After rebooting, check again:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ls /dev/spi*&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point, you should be able to see both `/dev/spidev0.0` and `/dev/spidev0.1` existing simultaneously, and the robot’s rear legs can be controlled normally.&lt;/p&gt;
&lt;h2&gt;10. Summary&lt;/h2&gt;
&lt;p&gt;This document systematically introduces the complete process of deploying the MIT Mini Cheetah robot control system on the D-Robotics RDK S100 development board, covering all aspects from system flashing, network configuration, software environment setup, source code adaptation to program compilation and execution.&lt;/p&gt;
&lt;p&gt;Through the detailed instructions in this document, developers can:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Understand the basic characteristics and selection basis of the RDK S100 development board&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Complete the complete environment setup from system flashing to network configuration&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Master the adaptation methods of MIT Mini Cheetah on ARM architecture platforms&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Successfully compile and run the MIT Mini Cheetah control system&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Perform robot control testing in simulation and real robot modes&lt;/p&gt;
&lt;p&gt;It is hoped that this document can provide valuable references for relevant technical developers and promote the application and development of quadruped robot technology on more platforms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related Resources&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; D-Robotics Official Documentation: &lt;a href="https://developer.d-robotics.cc" rel="noopener noreferrer"&gt;https://developer.d-robotics.cc&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; MIT Mini Cheetah Open Source Code: &lt;a href="https://github.com/mit-biomimetics/Cheetah-Software" rel="noopener noreferrer"&gt;https://github.com/mit-biomimetics/Cheetah-Software&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/complete-guide-to-deploying-mit-mini-cheetah-on-d-robotics-rdk-s100/" rel="noopener noreferrer"&gt;Complete Guide to Deploying MIT Mini Cheetah on D-Robotics RDK S100&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>NavTalk Digital Human Loop Video Generation Technical Implementation</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:50:56 +0000</pubDate>
      <link>https://dev.to/frankfu/navtalk-digital-human-loop-video-generation-technical-implementation-2dl3</link>
      <guid>https://dev.to/frankfu/navtalk-digital-human-loop-video-generation-technical-implementation-2dl3</guid>
      <description>&lt;h2&gt;I. Background and Objectives&lt;/h2&gt;
&lt;p&gt;In the NavTalk real-time conversation system, digital humans need to display natural and smooth animation effects. To provide a better user experience, we need to generate a &lt;strong&gt;4-second seamlessly looping video&lt;/strong&gt; that allows the digital human to continuously play while waiting for user input or system responses, creating a seamless looping visual effect.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core Challenges&lt;/strong&gt;：&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Seamless Loop&lt;/strong&gt;: The last frame of the video must perfectly connect with the first frame to form a seamless loop&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Natural Movement&lt;/strong&gt;: The digital human’s movements need to be natural and professional, suitable for conversation scenarios&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Precise Control&lt;/strong&gt;: Precise control over video duration and loop points is required to ensure a perfect 4-second loop&lt;/p&gt;
&lt;h2&gt;II. Technical Solution Overview&lt;/h2&gt;
&lt;p&gt;We adopt a complete technical solution of &lt;strong&gt;AI Video Generation + Intelligent Blink Detection + Video Post-Processing&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Image Upload → Kling AI Generates 5s Video → Auto-detect Blink Time Point → Extract 2s Clip → Reverse and Concatenate → Generate 4s Loop Video&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Technology Stack&lt;/strong&gt;：&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Video Generation&lt;/strong&gt;: Kling AI (formerly ClingAI) Image-to-Video API&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Blink Detection&lt;/strong&gt;: MediaPipe + OpenCV (Python script)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Video Processing&lt;/strong&gt;: FFmpeg (clipping, reversing, concatenating)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Backend Framework&lt;/strong&gt;: Spring Boot + Apache HttpClient&lt;/p&gt;
&lt;h2&gt;III. Complete Implementation Flow&lt;/h2&gt;
&lt;h3&gt;Step 1: Image to Video Generation (Kling AI API)&lt;/h3&gt;
&lt;p&gt;First, we call Kling AI’s image-to-video API to generate an initial 5-second video.&lt;/p&gt;
&lt;h4&gt;1.1 API Call Implementation&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;@PostMapping("/generateVideo")&lt;br&gt;
public Result generateVideoFromImage(&lt;br&gt;
        @RequestPart("image") MultipartFile image,&lt;br&gt;
        @RequestPart(value = "prompt", required = false) String prompt) {
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// If no prompt is provided, use the default NavTalk loop animation prompt
if (prompt == null || prompt.trim().isEmpty()) {
    prompt = clingAiService.getDefaultNavTalkLoopPrompt();
}

// Call Service layer to generate 5-second video
return clingAiService.generateVideo(image, prompt, 5);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;1.2 Prompt Design&lt;/h4&gt;
&lt;p&gt;To generate a loopable video, we carefully designed the prompt to ensure the digital human faces the screen, remains still, and naturally blinks after 1 second:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;public String getDefaultNavTalkLoopPrompt() {&lt;br&gt;
    return "A digital human avatar faces the screen directly, completely still and motionless " +&lt;br&gt;
           "throughout the entire video. The character maintains a calm, professional expression " +&lt;br&gt;
           "with eyes open and fixed on the camera. After 1 second, the avatar performs a single " +&lt;br&gt;
           "natural blink - eyelids close gently and then reopen smoothly. After the blink completes, " +&lt;br&gt;
           "the character remains perfectly still again. The camera remains static with neutral lighting, " +&lt;br&gt;
           "maintaining focus on the avatar's calm facial expression and professional demeanor. " +&lt;br&gt;
           "The entire sequence creates a seamless loop where the end frame matches the start frame exactly, " +&lt;br&gt;
           "with the blink occurring after 1 second in each cycle.";&lt;br&gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Prompt Design Points&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Emphasize the digital human &lt;strong&gt;facing the screen&lt;/strong&gt; (faces the screen directly)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Emphasize &lt;strong&gt;complete stillness&lt;/strong&gt; (completely still and motionless), with no movement except blinking&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Clear blink timing: &lt;strong&gt;blink starts after 1 second&lt;/strong&gt; (After 1 second, the avatar performs a single natural blink)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Natural blink action: eyelids close gently and then reopen smoothly&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Emphasize seamless connection: the end frame matches the start frame exactly&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Maintain static camera and neutral lighting to ensure visual consistency&lt;/p&gt;
&lt;h4&gt;1.3 JWT Authentication&lt;/h4&gt;
&lt;p&gt;Kling AI API uses JWT Token for authentication. We implemented complete JWT generation logic:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;public static String generateJwtToken(String accessKey, String secretKey) {&lt;br&gt;
    // If Access Key is already in JWT format (3 parts), use it directly&lt;br&gt;
    String[] tokenParts = accessKey.split(".");&lt;br&gt;
    if (tokenParts.length == 3) {&lt;br&gt;
        return accessKey;&lt;br&gt;
    }
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Otherwise, generate a new JWT Token
long now = System.currentTimeMillis() / 1000;
String headerJson = "{"alg":"HS256","typ":"JWT"}";
String payloadJson = "{"iss":"" + accessKey + "","iat":" + now + 
                     ","nbf":" + now + ","exp":" + (now + 3600) + "}";

String header = base64UrlEncode(headerJson.getBytes(StandardCharsets.UTF_8));
String payload = base64UrlEncode(payloadJson.getBytes(StandardCharsets.UTF_8));
String signingInput = header + "." + payload;
String signature = hmacSha256Base64Url(signingInput, secretKey);

return signingInput + "." + signature;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;1.4 Configuration&lt;/h4&gt;
&lt;p&gt;First, we need to set up Kling AI API information in the configuration file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# application.properties or application-dev.properties&lt;br&gt;
clingai.api.url=&lt;a href="https://api-singapore.klingai.com" rel="noopener noreferrer"&gt;https://api-singapore.klingai.com&lt;/a&gt;&lt;br&gt;
clingai.api.access.key=your-access-key&lt;br&gt;
clingai.api.secret.key=your-secret-key&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Inject configuration in the Service class using the &lt;code&gt;@Value&lt;/code&gt; annotation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@Service&lt;br&gt;
public class ClingAiServiceImpl implements ClingAiService {
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Value("${clingai.api.url:}")
private String clingaiApiUrl;

@Value("${clingai.api.access.key:}")
private String clingaiAccessKey;

@Value("${clingai.api.secret.key:}")
private String clingaiSecretKey;

private final ObjectMapper objectMapper = new ObjectMapper();
private CloseableHttpClient httpClient;

// HttpClient initialization (with SSL support)
@PostConstruct
public void init() {
    try {
        SSLContext sslContext = SSLContext.getDefault();
        SSLConnectionSocketFactory sslSocketFactory = new SSLConnectionSocketFactory(
                sslContext,
                new String&amp;amp;#91;]{"TLSv1.2", "TLSv1.3"},
                null,
                NoopHostnameVerifier.INSTANCE
        );

        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
        cm.setMaxTotal(100);
        cm.setDefaultMaxPerRoute(20);

        this.httpClient = HttpClients.custom()
                .setConnectionManager(cm)
                .setSSLSocketFactory(sslSocketFactory)
                .setSSLHostnameVerifier(NoopHostnameVerifier.INSTANCE)
                .build();
    } catch (Exception e) {
        throw new RuntimeException("Failed to initialize HttpClient", e);
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;1.5 API Request Construction and Response Processing&lt;/h4&gt;
&lt;p&gt;Complete &lt;code&gt;generateVideo&lt;/code&gt; method implementation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@Override&lt;br&gt;
public Result generateVideo(MultipartFile image, String prompt, int duration) {&lt;br&gt;
    try {&lt;br&gt;
        // 1. Check configuration&lt;br&gt;
        if (clingaiApiUrl == null || clingaiApiUrl.isEmpty()) {&lt;br&gt;
            return ResultGenerator.genFailResult("Kling AI API configuration not set");&lt;br&gt;
        }
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    // 2. Build API endpoint
    String url = clingaiApiUrl + "/v1/videos/image2video";
    HttpPost httpPost = new HttpPost(url);

    // 3. Set request headers
    httpPost.setHeader("Content-Type", "application/json");

    // 4. Generate JWT Token and set Authorization header
    String authToken = ClingAiUtils.generateJwtToken(clingaiAccessKey, clingaiSecretKey);
    if (authToken == null || authToken.isEmpty()) {
        return ResultGenerator.genFailResult("Kling AI authentication information not configured or generation failed");
    }
    httpPost.setHeader("Authorization", "Bearer " + authToken);

    // 5. Build request body: Base64-encoded image + prompt + duration
    String imageBase64 = Base64.getEncoder().encodeToString(image.getBytes());
    Map&amp;amp;lt;String, Object&amp;amp;gt; requestBody = new HashMap&amp;amp;lt;&amp;amp;gt;();
    requestBody.put("model_name", "kling-v1-5");
    requestBody.put("image", imageBase64);
    requestBody.put("duration", String.valueOf(duration));
    requestBody.put("mode", "pro");
    if (prompt != null &amp;amp;amp;&amp;amp;amp; !prompt.isEmpty()) {
        requestBody.put("prompt", prompt);
    }

    // 6. Send request
    String jsonBody = objectMapper.writeValueAsString(requestBody);
    httpPost.setEntity(new StringEntity(jsonBody, StandardCharsets.UTF_8));

    try (CloseableHttpResponse response = httpClient.execute(httpPost)) {
        String responseBody = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8);
        int statusCode = response.getStatusLine().getStatusCode();

        // 7. Process response
        if (statusCode &amp;amp;gt;= 200 &amp;amp;amp;&amp;amp;amp; statusCode &amp;amp;lt; 300) {
            try {
                JsonNode jsonNode = objectMapper.readTree(responseBody);
                // Response format: {code, message, request_id, data: {task_id, task_status, ...}}
                int code = jsonNode.has("code") ? jsonNode.get("code").asInt() : -1;
                if (code == 0 &amp;amp;amp;&amp;amp;amp; jsonNode.has("data")) {
                    JsonNode dataNode = jsonNode.get("data");
                    Map&amp;amp;lt;String, Object&amp;amp;gt; resultMap = new HashMap&amp;amp;lt;&amp;amp;gt;();
                    resultMap.put("taskId", dataNode.has("task_id") ? dataNode.get("task_id").asText() : null);
                    resultMap.put("taskStatus", dataNode.has("task_status") ? dataNode.get("task_status").asText() : null);
                    resultMap.put("duration", duration);
                    resultMap.put("requestId", jsonNode.has("request_id") ? jsonNode.get("request_id").asText() : null);
                    return ResultGenerator.genSuccessResult(resultMap);
                } else {
                    String message = jsonNode.has("message") ? jsonNode.get("message").asText() : "Unknown error";
                    return ResultGenerator.genFailResult("API returned error: " + message);
                }
            } catch (Exception e) {
                log.error("Failed to parse response", e);
                Map&amp;amp;lt;String, Object&amp;amp;gt; resultMap = new HashMap&amp;amp;lt;&amp;amp;gt;();
                resultMap.put("response", responseBody);
                return ResultGenerator.genSuccessResult(resultMap);
            }
        } else {
            return ResultGenerator.genFailResult("API returned error: " + statusCode + " - " + responseBody);
        }
    }
} catch (Exception e) {
    log.error("Exception occurred while generating {} second video", duration, e);
    return ResultGenerator.genFailResult("Exception occurred while generating video: " + e.getMessage());
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Step 2: Polling Video Generation Status&lt;/h3&gt;
&lt;p&gt;Kling AI’s video generation is asynchronous. We need to poll the task status until the video generation is complete.&lt;/p&gt;
&lt;h4&gt;2.1 Status Query API Implementation&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;@Override&lt;br&gt;
public Result getVideoStatus(String taskId) {&lt;br&gt;
    try {&lt;br&gt;
        if (clingaiApiUrl == null || clingaiApiUrl.isEmpty()) {&lt;br&gt;
            return ResultGenerator.genFailResult("Kling AI API configuration not set");&lt;br&gt;
        }
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    // API endpoint: GET /v1/videos/image2video/{task_id}
    String url = clingaiApiUrl + "/v1/videos/image2video/" + taskId;
    HttpGet httpGet = new HttpGet(url);

    httpGet.setHeader("Content-Type", "application/json");

    // Get authentication token
    String authToken = ClingAiUtils.generateJwtToken(clingaiAccessKey, clingaiSecretKey);
    if (authToken == null || authToken.isEmpty()) {
        return ResultGenerator.genFailResult("Kling AI authentication information not configured or generation failed");
    }
    httpGet.setHeader("Authorization", "Bearer " + authToken);

    try (CloseableHttpResponse response = httpClient.execute(httpGet)) {
        String responseBody = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8);
        int statusCode = response.getStatusLine().getStatusCode();

        if (statusCode == 200) {
            try {
                JsonNode jsonNode = objectMapper.readTree(responseBody);
                int code = jsonNode.has("code") ? jsonNode.get("code").asInt() : -1;
                if (code == 0 &amp;amp;amp;&amp;amp;amp; jsonNode.has("data")) {
                    JsonNode dataNode = jsonNode.get("data");
                    Map&amp;amp;lt;String, Object&amp;amp;gt; resultMap = new HashMap&amp;amp;lt;&amp;amp;gt;();
                    resultMap.put("taskId", dataNode.has("task_id") ? dataNode.get("task_id").asText() : null);
                    resultMap.put("taskStatus", dataNode.has("task_status") ? dataNode.get("task_status").asText() : null);
                    resultMap.put("taskStatusMsg", dataNode.has("task_status_msg") ? dataNode.get("task_status_msg").asText() : null);

                    // Parse video result (if task is completed)
                    if (dataNode.has("task_result") &amp;amp;amp;&amp;amp;amp; dataNode.get("task_result").has("videos")) {
                        JsonNode videosNode = dataNode.get("task_result").get("videos");
                        if (videosNode.isArray() &amp;amp;amp;&amp;amp;amp; videosNode.size() &amp;amp;gt; 0) {
                            JsonNode videoNode = videosNode.get(0);
                            resultMap.put("videoUrl", videoNode.has("url") ? videoNode.get("url").asText() : null);
                            resultMap.put("videoId", videoNode.has("id") ? videoNode.get("id").asText() : null);
                            resultMap.put("videoDuration", videoNode.has("duration") ? videoNode.get("duration").asText() : null);
                        }
                    }

                    return ResultGenerator.genSuccessResult(resultMap);
                } else {
                    String message = jsonNode.has("message") ? jsonNode.get("message").asText() : "Unknown error";
                    return ResultGenerator.genFailResult("Query failed: " + message);
                }
            } catch (Exception e) {
                log.error("Failed to parse response", e);
                Map&amp;amp;lt;String, Object&amp;amp;gt; resultMap = new HashMap&amp;amp;lt;&amp;amp;gt;();
                resultMap.put("response", responseBody);
                return ResultGenerator.genSuccessResult(resultMap);
            }
        } else {
            return ResultGenerator.genFailResult("Status query failed: " + statusCode + " - " + responseBody);
        }
    }
} catch (Exception e) {
    log.error("Exception occurred while querying video status", e);
    return ResultGenerator.genFailResult("Exception occurred while querying status: " + e.getMessage());
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;2.2 Polling Logic&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;// Step 2: Poll video generation status (wait up to maxPollingTime seconds)&lt;br&gt;
log.info("Step 2: Start polling video generation status (wait up to {} seconds)", maxPollingTime);&lt;br&gt;
String videoUrl = null;&lt;br&gt;
long startTime = System.currentTimeMillis();&lt;br&gt;
int pollCount = 0;&lt;br&gt;
int maxPolls = maxPollingTime / 3; // Query every 3 seconds

&lt;p&gt;while (pollCount &amp;lt; maxPolls) {&lt;br&gt;
    Thread.sleep(3000); // Wait 3 seconds&lt;br&gt;
    pollCount++;&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Result statusResult = getVideoStatus(taskId);
if (statusResult.getCode() != 200) {
    log.warn("Failed to query video status: {}", statusResult.getMessage());
    continue;
}

Map&amp;amp;lt;String, Object&amp;amp;gt; statusData = (Map&amp;amp;lt;String, Object&amp;amp;gt;) statusResult.getData();
String taskStatus = (String) statusData.get("taskStatus");
videoUrl = (String) statusData.get("videoUrl");

log.info("Poll #{}: status: {}, videoUrl: {}", pollCount, taskStatus, 
         videoUrl != null ? "generated" : "not generated");

if (videoUrl != null &amp;amp;amp;&amp;amp;amp; !videoUrl.isEmpty()) {
    log.info("Video generation completed, URL: {}", videoUrl);
    break;
}

if ("failed".equals(taskStatus) || "error".equals(taskStatus)) {
    return ResultGenerator.genFailResult("Video generation failed, status: " + taskStatus);
}

// Check timeout
if (System.currentTimeMillis() - startTime &amp;amp;gt; maxPollingTime * 1000L) {
    return ResultGenerator.genFailResult("Video generation timeout, please query status manually later");
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;if (videoUrl == null || videoUrl.isEmpty()) {&lt;br&gt;
    return ResultGenerator.genFailResult("Video generation timeout or failed, please query status manually later, taskId: " + taskId);&lt;br&gt;
}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Step 3: Download Generated Video File&lt;/h3&gt;
&lt;p&gt;After obtaining the video URL, we need to download the video file locally for subsequent processing.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@Override&lt;br&gt;
public MultipartFile downloadVideoFromUrl(String videoUrl) {&lt;br&gt;
    try {&lt;br&gt;
        log.info("Start downloading video: {}", videoUrl);
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    HttpGet httpGet = new HttpGet(videoUrl);
    httpGet.setHeader("User-Agent", "Mozilla/5.0");

    try (CloseableHttpResponse response = httpClient.execute(httpGet)) {
        int statusCode = response.getStatusLine().getStatusCode();
        if (statusCode != 200) {
            log.error("Failed to download video, HTTP status code: {}", statusCode);
            return null;
        }

        byte&amp;amp;#91;] videoBytes = EntityUtils.toByteArray(response.getEntity());
        log.info("Video download completed, size: {} bytes", videoBytes.length);

        // Wrap as MultipartFile and return
        return new MultipartFile() {
            @Override
            public String getName() {
                return "video";
            }

            @Override
            public String getOriginalFilename() {
                return "generated_video.mp4";
            }

            @Override
            public String getContentType() {
                return "video/mp4";
            }

            @Override
            public boolean isEmpty() {
                return videoBytes.length == 0;
            }

            @Override
            public long getSize() {
                return videoBytes.length;
            }

            @Override
            public byte&amp;amp;#91;] getBytes() throws IOException {
                return videoBytes;
            }

            @Override
            public InputStream getInputStream() throws IOException {
                return new ByteArrayInputStream(videoBytes);
            }

            @Override
            public void transferTo(java.io.File dest) throws IOException, IllegalStateException {
                java.nio.file.Files.write(dest.toPath(), videoBytes);
            }
        };
    }
} catch (Exception e) {
    log.error("Failed to download video file", e);
    return null;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Step 4: Automatic Blink Time Point Detection&lt;/h3&gt;
&lt;p&gt;This is a &lt;strong&gt;critical step&lt;/strong&gt; in the entire process. We need to find the blink time point in the video as the &lt;strong&gt;keyframe&lt;/strong&gt; for looping. Blinking is a natural action node, and choosing the blink moment as the loop point ensures a more natural loop.&lt;/p&gt;
&lt;h4&gt;4.1 Why Choose Blinking as the Loop Point?&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Natural Transition&lt;/strong&gt;: Blinking is a brief action, and the facial state before and after blinking is similar, making it suitable as a loop point&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Visual Concealment&lt;/strong&gt;: The visual change during the blink moment can mask the loop transition&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Temporal Precision&lt;/strong&gt;: The blink action has a clear start and end, facilitating precise positioning&lt;/p&gt;
&lt;h4&gt;4.2 Blink Detection Implementation&lt;/h4&gt;
&lt;p&gt;We use a Python script to call MediaPipe or OpenCV for blink detection. Complete &lt;code&gt;detectBlink&lt;/code&gt; method implementation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@Override&lt;br&gt;
public Result detectBlink(MultipartFile video) {&lt;br&gt;
    try {&lt;br&gt;
        if (video == null || video.isEmpty()) {&lt;br&gt;
            return ResultGenerator.genFailResult("Video file cannot be empty");&lt;br&gt;
        }
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    // Create temporary working directory
    Path workDir = Files.createTempDirectory("clingai-detect-");
    Path inputPath = workDir.resolve("input.mp4");
    Path scriptPath = null;

    try {
        // 1. Save video file to temporary directory
        Files.copy(video.getInputStream(), inputPath, StandardCopyOption.REPLACE_EXISTING);

        // 2. Get Python script path (from resources or file system)
        try {
            java.net.URL scriptUrl = getClass().getClassLoader().getResource("scripts/detect_blink.py");
            if (scriptUrl != null) {
                scriptPath = Paths.get(scriptUrl.toURI());
            } else {
                // If resource file doesn't exist, try reading from file system
                String scriptResourcePath = "src/main/resources/scripts/detect_blink.py";
                Path projectRoot = Paths.get(System.getProperty("user.dir"));
                scriptPath = projectRoot.resolve(scriptResourcePath);
                if (!Files.exists(scriptPath)) {
                    return ResultGenerator.genFailResult("Blink detection script not found, please manually mark the blink time point");
                }
            }
        } catch (Exception e) {
            log.warn("Unable to load script from resources, trying to read from file system", e);
            String scriptResourcePath = "src/main/resources/scripts/detect_blink.py";
            Path projectRoot = Paths.get(System.getProperty("user.dir"));
            scriptPath = projectRoot.resolve(scriptResourcePath);
            if (!Files.exists(scriptPath)) {
                return ResultGenerator.genFailResult("Blink detection script not found, please manually mark the blink time point");
            }
        }

        // 3. Call Python script
        String pythonCmd = "python3";
        if (System.getProperty("os.name").toLowerCase().contains("windows")) {
            pythonCmd = "python";
        }

        ProcessBuilder pb = new ProcessBuilder(
                pythonCmd,
                scriptPath.toString(),
                inputPath.toString()
        );
        // Don't redirect stderr, read stdout and stderr separately
        pb.redirectErrorStream(false);
        Process p = pb.start();

        // 4. Read stdout (JSON output)
        StringBuilder output = new StringBuilder();
        try (BufferedReader reader = new BufferedReader(
                new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) {
            String line;
            while ((line = reader.readLine()) != null) {
                output.append(line).append("n");
            }
        }

        // 5. Read stderr (error messages, for logging only)
        StringBuilder errorOutput = new StringBuilder();
        Thread stderrReader = new Thread(() -&amp;amp;gt; {
            try (BufferedReader errorReader = new BufferedReader(
                    new InputStreamReader(p.getErrorStream(), StandardCharsets.UTF_8))) {
                String line;
                while ((line = errorReader.readLine()) != null) {
                    synchronized (errorOutput) {
                        errorOutput.append(line).append("n");
                    }
                }
            } catch (IOException e) {
                log.warn("Failed to read Python stderr", e);
            }
        });
        stderrReader.start();

        // Wait for stderr reading thread to complete (wait up to 5 seconds)
        try {
            stderrReader.join(5000);
        } catch (InterruptedException e) {
            log.warn("Stderr reading thread was interrupted", e);
        }

        if (errorOutput.length() &amp;amp;gt; 0) {
            log.info("Python script stderr output: {}", errorOutput.toString());
        }

        // 6. Wait for process to complete and check exit code
        int exitCode = p.waitFor();
        if (exitCode != 0) {
            log.error("Python script execution failed, exit code: {}, stdout: {}, stderr: {}",
                    exitCode, output.toString(), errorOutput.toString());
            return ResultGenerator.genFailResult("Blink detection failed, please manually mark the blink time point");
        }

        // 7. Extract JSON from output (may contain other text, need to find JSON part)
        String fullOutput = output.toString().trim();
        String jsonOutput = ClingAiUtils.extractJsonFromOutput(fullOutput);

        if (jsonOutput == null || jsonOutput.isEmpty()) {
            log.error("Unable to extract JSON from Python output, full output: {}", fullOutput);
            log.error("stderr output: {}", errorOutput.toString());
            return ResultGenerator.genFailResult("Blink detection failed: unable to parse result, please manually mark the blink time point");
        }

        // 8. Parse JSON result
        log.info("JSON returned by Python script: {}", jsonOutput);
        JsonNode resultNode = objectMapper.readTree(jsonOutput);

        if (resultNode.has("success") &amp;amp;amp;&amp;amp;amp; resultNode.get("success").asBoolean()) {
            double blinkTime = resultNode.get("blinkTime").asDouble();
            Map&amp;amp;lt;String, Object&amp;amp;gt; resultMap = new HashMap&amp;amp;lt;&amp;amp;gt;();
            resultMap.put("blinkTime", blinkTime);
            return ResultGenerator.genSuccessResult(resultMap);
        } else {
            String errorMsg = resultNode.has("error")
                    ? resultNode.get("error").asText()
                    : "No blink detected";
            return ResultGenerator.genFailResult(errorMsg + ", please manually mark the blink time point");
        }

    } finally {
        // Clean up temporary files
        try {
            if (Files.exists(inputPath)) {
                Files.delete(inputPath);
            }
            if (Files.exists(workDir)) {
                Files.delete(workDir);
            }
        } catch (Exception e) {
            log.warn("Failed to clean up temporary files", e);
        }
    }

} catch (Exception e) {
    log.error("Exception occurred while detecting blink", e);
    return ResultGenerator.genFailResult("Exception occurred while detecting blink: " + e.getMessage() + 
                                         ", please manually mark the blink time point");
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;4.3 Calling Blink Detection&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;// Step 4: Automatically detect blink time point in video&lt;br&gt;
log.info("Step 4: Automatically detect blink time point in video");&lt;br&gt;
Result detectResult = videoProcessService.detectBlink(videoFile);&lt;br&gt;
Double blinkTime;&lt;br&gt;
if (detectResult.getCode() != 200) {&lt;br&gt;
    log.warn("Automatic blink detection failed: {}, using default value 2.5 seconds", detectResult.getMessage());&lt;br&gt;
    // If detection fails, use default value&lt;br&gt;
    blinkTime = 2.5;&lt;br&gt;
    log.info("Using default blink time: {} seconds", blinkTime);&lt;br&gt;
} else {&lt;br&gt;
    Map&amp;lt;String, Object&amp;gt; detectData = (Map&amp;lt;String, Object&amp;gt;) detectResult.getData();&lt;br&gt;
    blinkTime = ((Number) detectData.get("blinkTime")).doubleValue();&lt;br&gt;
    log.info("Detected blink time: {} seconds", blinkTime);&lt;br&gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;4.4 Python Blink Detection Script&lt;/h4&gt;
&lt;p&gt;Our blink detection script supports two detection methods: prioritize MediaPipe (high precision), fallback to OpenCV (compatibility). Here is the complete implementation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/usr/bin/env python3
&lt;h1&gt;
  
  
  -&lt;em&gt;- coding: utf-8 -&lt;/em&gt;-
&lt;/h1&gt;

&lt;p&gt;"""&lt;br&gt;
Video Blink Detection Script&lt;br&gt;
Uses mature libraries for accurate blink detection:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prioritize MediaPipe Face Mesh (Google open-source, high accuracy)&lt;/li&gt;
&lt;li&gt;Fallback to OpenCV Haar Cascades (simple but lower accuracy)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Dependencies installation:&lt;br&gt;
pip install opencv-python numpy mediapipe==0.10.9&lt;br&gt;
"""&lt;/p&gt;

&lt;p&gt;import sys&lt;br&gt;
import cv2&lt;br&gt;
import json&lt;br&gt;
import os&lt;br&gt;
import numpy as np&lt;/p&gt;

&lt;h1&gt;
  
  
  Set standard output encoding to UTF-8 (avoid Windows console garbled text)
&lt;/h1&gt;

&lt;p&gt;if sys.platform == 'win32':&lt;br&gt;
    try:&lt;br&gt;
        import io&lt;br&gt;
        if hasattr(sys.stdout, 'buffer'):&lt;br&gt;
            sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8', &lt;br&gt;
                                         errors='replace', line_buffering=True)&lt;br&gt;
        if hasattr(sys.stderr, 'buffer'):&lt;br&gt;
            sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8', &lt;br&gt;
                                         errors='replace', line_buffering=True)&lt;br&gt;
    except Exception:&lt;br&gt;
        pass&lt;/p&gt;

&lt;p&gt;def detect_blink_with_mediapipe(video_path):&lt;br&gt;
    """&lt;br&gt;
    Use MediaPipe for more accurate blink detection&lt;br&gt;
    Requires installation: pip install mediapipe==0.10.9&lt;br&gt;
    """&lt;br&gt;
    try:&lt;br&gt;
        import mediapipe as mp&lt;br&gt;
    except ImportError as e:&lt;br&gt;
        print(f"MediaPipe not installed: {e}", file=sys.stderr)&lt;br&gt;
        return None&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Check MediaPipe version and API availability
mp_version = getattr(mp, '__version__', 'unknown')
print(f"MediaPipe version: {mp_version}", file=sys.stderr)

# Check if solutions module exists (old API)
if not hasattr(mp, 'solutions'):
    print(f"MediaPipe {mp_version} uses new tasks API, does not support old solutions API", 
          file=sys.stderr)
    print("Please downgrade to a version that supports solutions: pip install mediapipe==0.10.9", 
          file=sys.stderr)
    return None

mp_face_mesh = mp.solutions.face_mesh
face_mesh = mp_face_mesh.FaceMesh(
    static_image_mode=False,
    max_num_faces=1,
    refine_landmarks=True,
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5
)

cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
    return None

fps = cap.get(cv2.CAP_PROP_FPS)
frame_count = 0

# Eye keypoint indices (MediaPipe 468-point model)
LEFT_EYE_INDICES = &amp;amp;#91;33, 7, 163, 144, 145, 153, 154, 155, 133, 173, 
                    157, 158, 159, 160, 161, 246]
RIGHT_EYE_INDICES = &amp;amp;#91;362, 382, 381, 380, 374, 373, 390, 249, 263, 466, 
                     388, 387, 386, 385, 384, 398]

def calculate_eye_aspect_ratio(landmarks, eye_indices):
    """Calculate Eye Aspect Ratio (EAR)"""
    eye_points = &amp;amp;#91;landmarks&amp;amp;#91;i] for i in eye_indices]
    if len(eye_points) &amp;amp;lt; 6:
        return 1.0

    # Calculate vertical distances
    vertical_1 = abs(eye_points&amp;amp;#91;1].y - eye_points&amp;amp;#91;5].y)
    vertical_2 = abs(eye_points&amp;amp;#91;2].y - eye_points&amp;amp;#91;4].y)
    # Calculate horizontal distance
    horizontal = abs(eye_points&amp;amp;#91;0].x - eye_points&amp;amp;#91;3].x)

    if horizontal == 0:
        return 1.0

    ear = (vertical_1 + vertical_2) / (2.0 * horizontal)
    return ear

blink_times = &amp;amp;#91;]
ear_threshold = 0.25  # EAR threshold, values below this are considered blinks
consecutive_frames = 0

while True:
    ret, frame = cap.read()
    if not ret:
        break

    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    results = face_mesh.process(rgb_frame)

    if results.multi_face_landmarks:
        landmarks = results.multi_face_landmarks&amp;amp;#91;0].landmark

        # Calculate EAR for left and right eyes
        left_ear = calculate_eye_aspect_ratio(landmarks, LEFT_EYE_INDICES)
        right_ear = calculate_eye_aspect_ratio(landmarks, RIGHT_EYE_INDICES)
        avg_ear = (left_ear + right_ear) / 2.0

        # Detect blink
        if avg_ear &amp;amp;lt; ear_threshold:
            consecutive_frames += 1
            if consecutive_frames == 1:  # Blink starts
                time_sec = frame_count / fps
                blink_times.append(time_sec)
        else:
            consecutive_frames = 0

    frame_count += 1
    # Limit processing frames (improve performance)
    if frame_count &amp;amp;gt; 300:
        break

cap.release()
face_mesh.close()

if blink_times:
    return blink_times&amp;amp;#91;0]
return None
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;def detect_blink_simple(video_path):&lt;br&gt;
    """&lt;br&gt;
    Improved OpenCV blink detection method: based on eye region changes and eye count&lt;br&gt;
    Use this improved version if MediaPipe is unavailable&lt;br&gt;
    """&lt;br&gt;
    cap = cv2.VideoCapture(video_path)&lt;br&gt;
    if not cap.isOpened():&lt;br&gt;
        return None&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fps = cap.get(cv2.CAP_PROP_FPS)
if fps &amp;amp;lt;= 0:
    fps = 30.0

# Use OpenCV face detector
face_cascade = cv2.CascadeClassifier(
    cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier(
    cv2.data.haarcascades + 'haarcascade_eye.xml')

blink_times = &amp;amp;#91;]
frame_count = 0
prev_eye_count = None
prev_eye_area = None
blink_threshold = 0.7  # Eye region change threshold
min_eye_area = 50

# Eye area history for smoothing
eye_area_history = &amp;amp;#91;]
history_size = 3

while True:
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(
        gray, scaleFactor=1.1, minNeighbors=3, minSize=(50, 50))

    current_eye_count = 0
    current_eye_area = 0

    if len(faces) &amp;amp;gt; 0:
        # Select the largest face
        largest_face = max(faces, key=lambda f: f&amp;amp;#91;2] * f&amp;amp;#91;3])
        x, y, w, h = largest_face

        # Only detect upper half of face (eye region)
        roi_gray = gray&amp;amp;#91;y:y+int(h*0.6), x:x+w]
        eyes = eye_cascade.detectMultiScale(
            roi_gray, scaleFactor=1.1, minNeighbors=2, minSize=(15, 15))
        current_eye_count = len(eyes)

        for (ex, ey, ew, eh) in eyes:
            eye_area = ew * eh
            if eye_area &amp;amp;gt;= min_eye_area:
                current_eye_area += eye_area

    # Smoothing: use historical average
    eye_area_history.append(current_eye_area)
    if len(eye_area_history) &amp;amp;gt; history_size:
        eye_area_history.pop(0)
    avg_eye_area = sum(eye_area_history) / len(eye_area_history) if eye_area_history else 0

    # Blink detection logic
    if prev_eye_count is not None and prev_eye_area is not None:
        # Method 1: Eye count change (from 2 to 0 or 1)
        if prev_eye_count &amp;amp;gt;= 2 and current_eye_count &amp;amp;lt; 2:
            time_sec = (frame_count - 1) / fps
            blink_times.append(time_sec)
        # Method 2: Eye area suddenly decreases
        elif prev_eye_area &amp;amp;gt; min_eye_area and avg_eye_area &amp;amp;gt; 0:
            area_ratio = avg_eye_area / prev_eye_area if prev_eye_area &amp;amp;gt; 0 else 1.0
            area_drop = (prev_eye_area - avg_eye_area) / prev_eye_area if prev_eye_area &amp;amp;gt; 0 else 0
            if area_ratio &amp;amp;lt; blink_threshold or area_drop &amp;amp;gt; 0.15:
                time_sec = (frame_count - 1) / fps
                if not blink_times or abs(blink_times&amp;amp;#91;-1] - time_sec) &amp;amp;gt; 0.3:
                    blink_times.append(time_sec)

    prev_eye_count = current_eye_count
    prev_eye_area = avg_eye_area if avg_eye_area &amp;amp;gt; 0 else (prev_eye_area if prev_eye_area else 0)
    frame_count += 1

    # Limit processing time (process first 15 seconds or first 450 frames)
    max_frames = min(450, int(fps * 15))
    if frame_count &amp;amp;gt;= max_frames:
        break

cap.release()

if blink_times:
    return blink_times&amp;amp;#91;0]
return None
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;def main():&lt;br&gt;
    if len(sys.argv) &amp;lt; 2:&lt;br&gt;
        result = {&lt;br&gt;
            "error": "Video path must be provided as argument",&lt;br&gt;
            "success": False&lt;br&gt;
        }&lt;br&gt;
        print(json.dumps(result, ensure_ascii=False))&lt;br&gt;
        sys.exit(1)&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;video_path = sys.argv&amp;amp;#91;1]

if not os.path.exists(video_path):
    result = {
        "error": f"Video file does not exist: {video_path}",
        "success": False
    }
    print(json.dumps(result, ensure_ascii=False))
    sys.exit(1)

# Prioritize MediaPipe (most accurate)
blink_time = None
detection_method = None

try:
    blink_time = detect_blink_with_mediapipe(video_path)
    if blink_time is not None:
        detection_method = "mediapipe"
except Exception as e:
    print(f"MediaPipe detection exception: {e}", file=sys.stderr)

# If MediaPipe fails, use OpenCV simple method (as fallback)
if blink_time is None:
    try:
        blink_time = detect_blink_simple(video_path)
        if blink_time is not None:
            detection_method = "opencv"
    except Exception as e:
        print(f"OpenCV detection exception: {e}", file=sys.stderr)

if blink_time is not None:
    result = {
        "blinkTime": round(blink_time, 2),
        "success": True,
        "method": detection_method or "unknown"
    }
else:
    result = {
        "error": "No blink detected. Possible reasons: 1) No face in video 2) Poor face angle 3) Low video quality 4) MediaPipe not properly installed. Please manually mark the blink time point.",
        "success": False
    }

# Output JSON result to stdout (error messages already output to stderr)
json_output = json.dumps(result, ensure_ascii=False)
print(json_output, flush=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    main()&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Script Features&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Dual Algorithm Support&lt;/strong&gt;: Prioritize MediaPipe (high-precision EAR algorithm), fallback to OpenCV (compatibility)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;EAR Algorithm&lt;/strong&gt;: MediaPipe uses Eye Aspect Ratio (EAR) for precise blink detection&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Multiple Detection Methods&lt;/strong&gt;: OpenCV uses eye count changes, area changes, and other methods&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Smoothing&lt;/strong&gt;: Use historical frame averages to reduce noise interference&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Performance Optimization&lt;/strong&gt;: Limit processing frames to improve processing speed&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Error Handling&lt;/strong&gt;: Comprehensive exception handling and log output&lt;/p&gt;
&lt;h3&gt;Step 5: Generate Loop Video (FFmpeg Processing)&lt;/h3&gt;
&lt;p&gt;This is the final and most critical step. We need to:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Extract 1 second before and after the blink time point (2 seconds total)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Reverse the 2-second clip&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Concatenate the original clip and the reversed clip to form a 4-second loop video&lt;/p&gt;
&lt;h4&gt;5.1 Complete Loop Video Generation Implementation&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;@Override&lt;br&gt;
public Result loopVideo(MultipartFile video, Double blinkTime, &lt;br&gt;
                       Double beforeSeconds, Double afterSeconds, String userId) {&lt;br&gt;
    try {&lt;br&gt;
        if (video == null || video.isEmpty()) {&lt;br&gt;
            return ResultGenerator.genFailResult("Video file cannot be empty");&lt;br&gt;
        }
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    double before = beforeSeconds == null ? 1.0 : beforeSeconds;
    double after = afterSeconds == null ? 1.0 : afterSeconds;

    // Create temporary working directory
    Path workDir = Files.createTempDirectory("clingai-loop-");
    Path inputPath = workDir.resolve("input.mp4");
    Path clipPath = workDir.resolve("clip.mp4");
    Path revPath = workDir.resolve("reversed.mp4");
    Path outPath = workDir.resolve("loop.mp4");

    try {
        // 1. Save input video
        Files.copy(video.getInputStream(), inputPath, StandardCopyOption.REPLACE_EXISTING);

        // 2. Calculate clipping parameters
        double t = blinkTime == null ? 2.5 : blinkTime;
        double start = Math.max(0.0, t - before);
        double duration = before + after;

        // 3. Extract video clip (2 seconds)
        int clipExit = runFfmpeg(new String&amp;amp;#91;]{
                "ffmpeg", "-y", "-ss", String.valueOf(start),
                "-t", String.valueOf(duration), "-i", inputPath.toString(),
                "-an", "-c:v", "libx264", "-pix_fmt", "yuv420p",
                clipPath.toString()
        });
        if (clipExit != 0) {
            return ResultGenerator.genFailResult("ffmpeg clipping failed");
        }

        // 4. Reverse video clip
        int revExit = runFfmpeg(new String&amp;amp;#91;]{
                "ffmpeg", "-y", "-i", clipPath.toString(),
                "-vf", "reverse", "-an", "-c:v", "libx264",
                "-pix_fmt", "yuv420p", revPath.toString()
        });
        if (revExit != 0) {
            return ResultGenerator.genFailResult("ffmpeg reverse failed");
        }

        // 5. Concatenate original clip and reversed clip (4-second loop video)
        int concatExit = runFfmpeg(new String&amp;amp;#91;]{
                "ffmpeg", "-y", "-i", clipPath.toString(),
                "-i", revPath.toString(),
                "-filter_complex", "&amp;amp;#91;0:v]&amp;amp;#91;1:v]concat=n=2:v=1:a=0&amp;amp;#91;v]",
                "-map", "&amp;amp;#91;v]", "-an", "-c:v", "libx264",
                "-pix_fmt", "yuv420p", outPath.toString()
        });
        if (concatExit != 0) {
            return ResultGenerator.genFailResult("ffmpeg concatenation failed");
        }

        // 6. Read generated video
        byte&amp;amp;#91;] outBytes = Files.readAllBytes(outPath);

        // 7. Create MultipartFile object
        MultipartFile outFile = new MultipartFile() {
            @Override
            public String getName() {
                return "file";
            }

            @Override
            public String getOriginalFilename() {
                return "loop.mp4";
            }

            @Override
            public String getContentType() {
                return "video/mp4";
            }

            @Override
            public boolean isEmpty() {
                return outBytes.length == 0;
            }

            @Override
            public long getSize() {
                return outBytes.length;
            }

            @Override
            public byte&amp;amp;#91;] getBytes() throws IOException {
                return outBytes;
            }

            @Override
            public InputStream getInputStream() throws IOException {
                return new ByteArrayInputStream(outBytes);
            }

            @Override
            public void transferTo(java.io.File dest) throws IOException, IllegalStateException {
                Files.write(dest.toPath(), outBytes);
            }
        };

        // 8. Save file
        try {
            AppFile appFile = saveFile(outFile, userId);
            return ResultGenerator.genSuccessResult(appFile);
        } catch (Exception saveException) {
            log.error("Failed to save file", saveException);
            // If file save fails, try returning temporary file path
            Map&amp;amp;lt;String, Object&amp;amp;gt; resultMap = new HashMap&amp;amp;lt;&amp;amp;gt;();
            resultMap.put("fileUrl", "/temp/" + outPath.getFileName().toString());
            resultMap.put("fileName", "loop.mp4");
            resultMap.put("fileType", "video/mp4");
            resultMap.put("message", "File generated but failed to save to database: " + saveException.getMessage());
            return ResultGenerator.genSuccessResult(resultMap);
        }

    } finally {
        // Clean up temporary files
        try {
            if (Files.exists(inputPath)) {
                Files.delete(inputPath);
            }
            if (Files.exists(clipPath)) {
                Files.delete(clipPath);
            }
            if (Files.exists(revPath)) {
                Files.delete(revPath);
            }
            if (Files.exists(outPath)) {
                Files.delete(outPath);
            }
            if (Files.exists(workDir)) {
                Files.delete(workDir);
            }
        } catch (Exception e) {
            log.warn("Failed to clean up temporary files", e);
        }
    }
} catch (Exception e) {
    log.error("Failed to process loop video", e);
    String errorMsg = e.getMessage();
    if (errorMsg == null || errorMsg.isEmpty()) {
        errorMsg = e.getClass().getSimpleName();
    }
    return ResultGenerator.genFailResult("Failed to process loop video: " + errorMsg);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;/**&lt;/p&gt;


&lt;ul&gt;

&lt;li&gt;Execute FFmpeg command
*/
private int runFfmpeg(String[] command) throws IOException, InterruptedException {
ProcessBuilder pb = new ProcessBuilder(command);
pb.redirectErrorStream(true);
Process p = pb.start();
try (InputStream is = p.getInputStream()) {
    byte[] buf = new byte[1024];
    while (is.read(buf) != -1) {
        // Read output to avoid buffer blocking
    }
}
return p.waitFor();
}&lt;/li&gt;

&lt;/ul&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;5.2 FFmpeg Command Details&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Extract Video Clip&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ffmpeg -y -ss 1.5 -t 2.0 -i input.mp4 -an -c:v libx264 -pix_fmt yuv420p clip.mp4&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;-ss 1.5&lt;/code&gt;: Start from 1.5 seconds&lt;/p&gt;
&lt;p&gt;&lt;code&gt;▪ -t 2.0&lt;/code&gt;: Extract 2 seconds&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;-an&lt;/code&gt;: Remove audio&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;-c:v libx264&lt;/code&gt;: Use H.264 encoding&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;-pix_fmt yuv420p&lt;/code&gt;: Pixel format (compatibility)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reverse Video&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ffmpeg -y -i clip.mp4 -vf reverse -an -c:v libx264 -pix_fmt yuv420p reversed.mp4&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;-vf reverse&lt;/code&gt;: Video filter, reverse playback&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Concatenate Video&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ffmpeg -y -i clip.mp4 -i reversed.mp4 -filter_complex "[0:v][1:v]concat=n=2:v=1:a=0[v]" -map "[v]" -an -c:v libx264 -pix_fmt yuv420p loop.mp4&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;concat=n=2:v=1:a=0&lt;/code&gt;: Concatenate 2 videos, video stream only, no audio stream&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ProcessBuilder pb = new ProcessBuilder(&lt;br&gt;
"python",&lt;br&gt;
scriptPath.toString(),&lt;br&gt;
inputPath.toString()&lt;br&gt;
);&lt;br&gt;
pb.redirectErrorStream(false); // Read stdout and stderr separately&lt;br&gt;
Process p = pb.start();

&lt;p&gt;// Read stdout (JSON output)&lt;br&gt;
StringBuilder output = new StringBuilder();&lt;br&gt;
try (BufferedReader reader = new BufferedReader(&lt;br&gt;
        new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) {&lt;br&gt;
    String line;&lt;br&gt;
    while ((line = reader.readLine()) != null) {&lt;br&gt;
        output.append(line).append("n");&lt;br&gt;
    }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Extract JSON from output&lt;br&gt;
String jsonOutput = ClingAiUtils.extractJsonFromOutput(output.toString());&lt;br&gt;
JsonNode resultNode = objectMapper.readTree(jsonOutput);&lt;br&gt;
double blinkTime = resultNode.get("blinkTime").asDouble();&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;IV. Core Interface Implementation&lt;/h2&gt;
&lt;h3&gt;4.1 Complete Process Interface&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;@PostMapping("/generateLoopVideo")&lt;br&gt;
@ApiOperation(value = "Complete process: Upload image to generate loop video (auto-detect blink)")&lt;br&gt;
public Result generateLoopVideo(&lt;br&gt;
        @RequestPart("image") MultipartFile image,&lt;br&gt;
        @RequestParam(value = "prompt", required = false) String prompt,&lt;br&gt;
        @RequestParam(value = "beforeSeconds", required = false, defaultValue = "1.0") Double beforeSeconds,&lt;br&gt;
        @RequestParam(value = "afterSeconds", required = false, defaultValue = "1.0") Double afterSeconds,&lt;br&gt;
        @RequestParam(value = "maxPollingTime", required = false, defaultValue = "300") Integer maxPollingTime) {
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if (image == null || image.isEmpty()) {
    return ResultGenerator.genFailResult("Image file cannot be empty");
}

// Get current user ID
String userId = getCurrentTokenUserId();

// Call Service layer to complete the full process
return clingAiService.generateLoopVideo(
    image, prompt, beforeSeconds, afterSeconds, maxPollingTime, userId
);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.2 Interface Parameters&lt;/h3&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Required&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;image&lt;/td&gt;
&lt;td&gt;MultipartFile&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;Digital human image&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;prompt&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Default prompt&lt;/td&gt;
&lt;td&gt;Video generation prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;beforeSeconds&lt;/td&gt;
&lt;td&gt;Double&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;Duration to extract before blink time point (seconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;afterSeconds&lt;/td&gt;
&lt;td&gt;Double&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;Duration to extract after blink time point (seconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;maxPollingTime&lt;/td&gt;
&lt;td&gt;Integer&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;Maximum waiting time for video generation (seconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;4.3 Response Result&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;{&lt;br&gt;
  "code": 200,&lt;br&gt;
  "message": "success",&lt;br&gt;
  "data": {&lt;br&gt;
    "id": "File ID",&lt;br&gt;
    "fileName": "loop.mp4",&lt;br&gt;
    "fileUrl": "/uploadFiles/2026/02/02/xxx.mp4",&lt;br&gt;
    "detectedBlinkTime": 2.5,&lt;br&gt;
    "originalTaskId": "Kling AI Task ID",&lt;br&gt;
    "originalVideoUrl": "Original Video URL"&lt;br&gt;
  }&lt;br&gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;V. Technical Highlights&lt;/h2&gt;
&lt;h3&gt;5.1 Intelligent Blink Detection&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Multiple Algorithm Support&lt;/strong&gt;: Prioritize MediaPipe (high precision), fallback to OpenCV (compatibility)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;EAR Algorithm&lt;/strong&gt;: Use Eye Aspect Ratio (EAR) for precise blink detection&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fault Tolerance&lt;/strong&gt;: Use default value (video midpoint 2.5 seconds) when detection fails&lt;/p&gt;
&lt;h3&gt;5.2 Seamless Loop Design&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Still + Blink&lt;/strong&gt;: Prompt design ensures digital human faces screen, completely still, only natural blink after 1 second&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Precise Extraction&lt;/strong&gt;: Extract 1 second before and after blink time point as center&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reverse and Concatenate&lt;/strong&gt;: Original clip + Reversed clip = Perfect loop (blink action naturally connects at loop point)&lt;/p&gt;
&lt;h3&gt;5.3 Architecture Design&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Layered Architecture&lt;/strong&gt;: Controller → Service → Utils, clear responsibilities&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Asynchronous Processing&lt;/strong&gt;: Video generation is asynchronous, polling query status&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Error Handling&lt;/strong&gt;: Comprehensive exception handling and logging&lt;/p&gt;
&lt;h2&gt;VI. Summary&lt;/h2&gt;
&lt;p&gt;Through the complete technical solution of &lt;strong&gt;AI Video Generation + Intelligent Blink Detection + FFmpeg Video Processing&lt;/strong&gt;, we have successfully achieved:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Perfect 4-Second Loop&lt;/strong&gt;: Original 2-second clip + Reversed 2-second clip = 4-second seamless loop&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Natural Movement&lt;/strong&gt;: Intelligent extraction based on blink time point ensures natural loop&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Automated Process&lt;/strong&gt;: Fully automated from image upload to loop video generation&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;High-Quality Output&lt;/strong&gt;: Use Kling AI Pro mode to generate high-quality videos&lt;/p&gt;
&lt;p&gt;This solution not only addresses NavTalk’s digital human loop video requirements but also provides a solid foundation for future extensions (such as different loop durations, custom loop points, etc.).&lt;/p&gt;
&lt;h3&gt;Feature Release Plan&lt;/h3&gt;
&lt;p&gt;We will officially release this feature in the near future, allowing users to &lt;strong&gt;directly upload a custom character image&lt;/strong&gt;, and the system will automatically generate a vivid 4-second loop video. The generated videos can be directly applied to digital human displays in NavTalk, providing users with a more personalized and vivid conversation experience. This feature will significantly lower the barrier to digital human video production, enabling every user to easily create their own exclusive digital human avatar.&lt;/p&gt;
&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/navtalk-digital-human-loop-video-generation-technical-implementation/" rel="noopener noreferrer"&gt;NavTalk Digital Human Loop Video Generation Technical Implementation&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>Understanding Reinforcement Learning through OpenDuck</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:50:12 +0000</pubDate>
      <link>https://dev.to/frankfu/understanding-reinforcement-learning-through-openduck-1if0</link>
      <guid>https://dev.to/frankfu/understanding-reinforcement-learning-through-openduck-1if0</guid>
      <description>&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Objective&lt;/strong&gt;: Replicate the OpenDuck Mini project and control it using the RDK X5 development board.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;OpenDuck Mini is an open-source robotics project aimed at creating a miniature, low-cost replica of Disney’s BDX Droid. The project was initiated and is maintained by developer Antoine Pirrone (apirrone).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Table of Contents&lt;/strong&gt;：&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Project Research&lt;/p&gt;
&lt;p&gt;      &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; International Projects&lt;/p&gt;
&lt;p&gt;      &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Domestic Projects&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenDuck Development Workflow&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; OpenDuck Repository Overview&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Raspberry Pi Zero 2W Deployment Process&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; RDK X5 Deployment Process&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Frequently Asked Questions (FAQ)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Reinforcement Learning&lt;/p&gt;
&lt;h2&gt;I. Project Research&lt;/h2&gt;
&lt;h3&gt;1.1 International Projects&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;Focus on algorithm implementation and community ecosystem.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h4&gt;1.1.1 &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1fa-1f1f8.png" alt="🇺🇸" width="72" height="72"&gt; OpenDuck Mini&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/apirrone/Open_Duck_Mini" rel="noopener noreferrer"&gt;Open_Duck_Mini&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Raspberry Pi Zero 2W&lt;/code&gt; + &lt;code&gt;Feetech ST3215 Servo&lt;/code&gt; + &lt;code&gt;IMU&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Ultra-low cost (&amp;lt;$400)&lt;/strong&gt;, fully 3D-printed structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tech Stack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Sim2Real (MuJoCo)&lt;/strong&gt;, successfully implemented reinforcement learning control on low-cost servos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2b50.png" alt="⭐" width="72" height="72"&gt; Best for beginners, suitable as a low-cost educational tool or desktop display project&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;1.1.2 &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1fa-1f1f8.png" alt="🇺🇸" width="72" height="72"&gt; K-Scale Labs (Stompy)&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/kscalelabs" rel="noopener noreferrer"&gt;github.com/kscalelabs&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Committed to full-stack open source, including self-developed driver boards and host computers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large community scale, dedicated to establishing a universal humanoid robot standard (K-Lang)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Adopts an “ecosystem” development strategy, aiming to become the Android platform of the robotics field&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;1.1.3 &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1fa-1f1f8.png" alt="🇺🇸" width="72" height="72"&gt; Berkeley Humanoid Lite&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://berkeley-humanoid-lite.gitbook.io/docs/releases" rel="noopener noreferrer"&gt;berkeley-humanoid-lite&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High-performance brushless motors&lt;/strong&gt; + &lt;strong&gt;3D-printed gearboxes&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Academic “low-cost” research platform benchmark (&amp;lt;$5000), designed specifically for reinforcement learning research&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hardcore research-oriented, suitable for studying high-dynamic motion control (such as jumping, backflips, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;1.1.4 &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1eb-1f1f7.png" alt="🇫🇷" width="72" height="72"&gt; Poppy Project &amp;amp; &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1f0-1f1f7.png" alt="🇰🇷" width="72" height="72"&gt; Robotis OP3&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.poppy-project.org/" rel="noopener noreferrer"&gt;Poppy&lt;/a&gt; | &lt;a href="https://emanual.robotis.com/docs/en/platform/op3/introduction/" rel="noopener noreferrer"&gt;Robotis&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Dynamixel&lt;/code&gt; high-end servos + &lt;code&gt;x86/SBC&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt; &lt;strong&gt;Previous generation technology route&lt;/strong&gt;, relies on expensive Dynamixel servos, not suitable for end-to-end reinforcement learning applications&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;1.2 Domestic Projects&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;Domestic projects are generally more aggressive in &lt;strong&gt;brushless motor (BLDC/FOC)&lt;/strong&gt; applications with stronger hardware performance.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h4&gt;1.2.1 &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1e8-1f1f3.png" alt="🇨🇳" width="72" height="72"&gt; Kit-Miao (Damiao Technology)&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://gitee.com/kit-miao/bipedal-robot" rel="noopener noreferrer"&gt;Gitee Repo&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Damiao joint motors&lt;/strong&gt; (integrated FOC driver) + &lt;code&gt;STM32/ESP32&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mature technical solution, provides complete source code for both MPC and reinforcement learning algorithms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2b50.png" alt="⭐" width="72" height="72"&gt; &lt;strong&gt;Highly suitable for secondary development&lt;/strong&gt;, motor performance is in the first tier of domestic products&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;1.2.2 &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1e8-1f1f3.png" alt="🇨🇳" width="72" height="72"&gt; Unitree Qmini (Yushu)&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/unitreerobotics/Qmini" rel="noopener noreferrer"&gt;Unitree GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Unitree 8010 hub motors&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Only includes leg structure, official Isaac Gym training environment provided&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large company technology downscaling, high motor reliability and excellent algorithm performance ceiling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;1.2.3 &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1e8-1f1f3.png" alt="🇨🇳" width="72" height="72"&gt; AlexBot (Alexhuge1)&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/Alexhuge1/Alexbot" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Self-made/modified brushless motors + &lt;code&gt;ODrive&lt;/code&gt; or similar FOC drivers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Personal geek project, adapted to &lt;code&gt;Humanoid-Gym&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hardcore DIY representative, suitable for in-depth research on motor control and mechanical design&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;1.2.4 &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f1e8-1f1f3.png" alt="🇨🇳" width="72" height="72"&gt; HighTorque &amp;amp; FFTAI&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.hightorque.cn/pi/" rel="noopener noreferrer"&gt;HighTorque&lt;/a&gt; | &lt;a href="https://www.fftai.cn/grx" rel="noopener noreferrer"&gt;FFTAI&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Leaning towards &lt;strong&gt;commercial products&lt;/strong&gt;. HighTorque is suitable as a teaching tool; FFTAI is suitable for university laboratory procurement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;II. OpenDuck Development Workflow&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR&lt;br&gt;
    A[🛠 Modeling &amp;amp; Simulation] --&amp;gt; B[🏃 Motion Generation]&lt;br&gt;
    B --&amp;gt; C[🧠 Reinforcement Learning]&lt;br&gt;
    C --&amp;gt; D[🖨 Hardware Construction]&lt;br&gt;
    D --&amp;gt; E[🚀 Runtime Deployment]&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;2.1 Phase 1: Model and Simulation Preparation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Reference: &lt;code&gt;prepare_robot.md&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Tool/Operation&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Modeling &amp;amp; Export&lt;/td&gt;
&lt;td&gt;Solid Works / Onshape + &lt;code&gt;onshape2robot&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;URDF file&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. MuJoCo Configuration&lt;/td&gt;
&lt;td&gt;Execute &lt;code&gt;MUJOCO compile&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;MuJoCo XML&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Model Correction&lt;/td&gt;
&lt;td&gt;Modify XML (add actuator, free joint)&lt;/td&gt;
&lt;td&gt;Complete XML&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Simulation Verification&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;simulate&lt;/code&gt; to confirm scene&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;2.2 Phase 2: Motion Generation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Repository: &lt;code&gt;reference_motion_generator&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Input&lt;/strong&gt;: Motion generator (polynomial fitting)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Output&lt;/strong&gt;: &lt;strong&gt;Reference motion pkl&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;2.3 Phase 3: Reinforcement Learning&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Repository: &lt;code&gt;playground&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Input&lt;/strong&gt;: Reference motion pkl file + verified XML scene file&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Core Task&lt;/strong&gt;: Sim2Real training (train and verify robot control strategy in virtual environment)&lt;/p&gt;
&lt;h3&gt;2.4 Phase 4: Hardware Construction&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Repository: Main repository&lt;/em&gt;&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Reference Document&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3D Print Parts&lt;/td&gt;
&lt;td&gt;&lt;code&gt;print_guide.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Assemble Robot&lt;/td&gt;
&lt;td&gt;&lt;code&gt;assembly_guide.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connect Circuit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;open_duck_mini_v2_wiring_diagram.png&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;2.5 Phase 5: Runtime Deployment&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Repository: &lt;code&gt;Runtime&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;1. System environment installation&lt;/p&gt;
&lt;p&gt;2. Servo + IMU initialization&lt;/p&gt;
&lt;p&gt;3. Controller Bluetooth connection&lt;/p&gt;
&lt;p&gt;4. Foot sensor debugging&lt;/p&gt;
&lt;p&gt;5. &lt;strong&gt;Sim2Real deployment&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;III. OpenDuck Repository Overview&lt;/h2&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Open Duck Mini&lt;/td&gt;
&lt;td&gt;Documentation + 3D print models&lt;/td&gt;
&lt;td&gt;Parts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Duck Mini Runtime&lt;/td&gt;
&lt;td&gt;Real robot inference + Sim2Real&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Duck Playground&lt;/td&gt;
&lt;td&gt;GPU parallel training strategy&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.onnx&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Duck reference motion generator&lt;/td&gt;
&lt;td&gt;Gait generator&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.pkl&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;IV. Raspberry Pi Zero 2W Deployment Process&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;Although steps like flashing images, setting WiFi passwords, and enabling I2C have detailed tutorials online, this article provides a complete deployment process for reference due to encountering WiFi connection issues during actual deployment and some differences from official documentation.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;4.1 Flash Image&lt;/h3&gt;
&lt;p&gt;Follow the standard image flashing process, note to select the &lt;strong&gt;headless version&lt;/strong&gt; (lite version), and configure WiFi account and password in advance.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Recommended to use the same image version as the tutorial&lt;/strong&gt;: &lt;code&gt;2025-12-04-raspios-trixie-arm64-lite.img.xz&lt;/code&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;4.2 SD Card Expansion&lt;/h3&gt;
&lt;p&gt;After image flashing is complete, the actual available space is usually only a small portion of the SD card’s total capacity, requiring filesystem expansion.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# 32GB SD card may only show 7GB after flashing&lt;br&gt;
sudo raspi-config -&amp;gt; Advanced options -&amp;gt; Expand Filesystem
&lt;h1&gt;
  
  
  Verify
&lt;/h1&gt;

&lt;p&gt;df -h&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.3 APT Source Configuration&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;# Backup&lt;br&gt;
sudo cp /etc/apt/sources.list.d/debian.sources /etc/apt/sources.list.d/debian.sources.bak&lt;br&gt;
sudo cp /etc/apt/sources.list.d/raspi.sources /etc/apt/sources.list.d/raspi.sources.bak&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Modify Debian main source&lt;/strong&gt; (&lt;code&gt;/etc/apt/sources.list.d/debian.sources&lt;/code&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Types: deb&lt;br&gt;
URIs: &lt;a href="https://mirrors.tuna.tsinghua.edu.cn/debian/" rel="noopener noreferrer"&gt;https://mirrors.tuna.tsinghua.edu.cn/debian/&lt;/a&gt;&lt;br&gt;
Suites: trixie trixie-updates trixie-backports&lt;br&gt;
Components: main contrib non-free non-free-firmware&lt;br&gt;
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

&lt;p&gt;Types: deb&lt;br&gt;
URIs: &lt;a href="https://mirrors.tuna.tsinghua.edu.cn/debian-security/" rel="noopener noreferrer"&gt;https://mirrors.tuna.tsinghua.edu.cn/debian-security/&lt;/a&gt;&lt;br&gt;
Suites: trixie-security&lt;br&gt;
Components: main contrib non-free non-free-firmware&lt;br&gt;
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Modify Raspberry Pi source&lt;/strong&gt; (&lt;code&gt;/etc/apt/sources.list.d/raspi.sources&lt;/code&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Types: deb&lt;br&gt;
URIs: &lt;a href="https://mirrors.tuna.tsinghua.edu.cn/raspberrypi/" rel="noopener noreferrer"&gt;https://mirrors.tuna.tsinghua.edu.cn/raspberrypi/&lt;/a&gt;&lt;br&gt;
Suites: trixie&lt;br&gt;
Components: main&lt;br&gt;
Signed-By: /usr/share/keyrings/raspberrypi-archive-keyring.gpg&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;# Update&lt;br&gt;
sudo apt update&lt;br&gt;
sudo apt upgrade -y&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.4 Reduce FTDI USB Serial Latency&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;# Create rule file&lt;br&gt;
sudo tee /etc/udev/rules.d/99-usb-serial.rules &amp;gt;/dev/null &amp;lt;&amp;lt;'EOF'&lt;br&gt;
SUBSYSTEM=="usb-serial", DRIVER=="ftdi_sio", ATTR{latency_timer}="1"&lt;br&gt;
EOF
&lt;h1&gt;
  
  
  Apply
&lt;/h1&gt;

&lt;p&gt;sudo udevadm control --reload-rules&lt;br&gt;
sudo udevadm trigger&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4a1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4a1.png" alt="💡" width="72" height="72"&gt;&lt;/a&gt; This rule only applies to FTDI drivers and does not affect CH340/CP210x.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;4.5 Enable I2C&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;sudo raspi-config -&amp;gt; Interface Options -&amp;gt; I2C&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.6 Install System Packages&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;sudo apt install -y git unzip i2c-tools joystick python3-pip python3-venv&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.7 Configure pip Source&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;pip config set global.index-url &lt;a href="https://mirrors.aliyun.com/pypi/simple" rel="noopener noreferrer"&gt;https://mirrors.aliyun.com/pypi/simple&lt;/a&gt;&lt;br&gt;
pip config set global.trusted-host mirrors.aliyun.com
&lt;h1&gt;
  
  
  Verify
&lt;/h1&gt;

&lt;p&gt;pip config list&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.8 Install Miniconda&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;# Create directory&lt;br&gt;
mkdir download &amp;amp;&amp;amp; cd download
&lt;h1&gt;
  
  
  Download Miniconda (aarch64)
&lt;/h1&gt;
&lt;h1&gt;
  
  
  &lt;a href="https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh" rel="noopener noreferrer"&gt;https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;chmod +x Miniconda3-latest-Linux-aarch64.sh&lt;br&gt;
./Miniconda3-latest-Linux-aarch64.sh&lt;/p&gt;

&lt;h1&gt;
  
  
  Follow prompts: Enter -&amp;gt; yes -&amp;gt; Enter -&amp;gt; yes
&lt;/h1&gt;

&lt;p&gt;source ~/.bashrc&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Configure Conda Mirror&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Clean old configuration&lt;br&gt;
conda config --remove-key channels 2&amp;gt;/dev/null || true&lt;br&gt;
conda config --remove-key default_channels 2&amp;gt;/dev/null || true
&lt;h1&gt;
  
  
  Set Tsinghua source
&lt;/h1&gt;

&lt;p&gt;conda config --append default_channels &lt;a href="https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main" rel="noopener noreferrer"&gt;https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main&lt;/a&gt;&lt;br&gt;
conda config --append default_channels &lt;a href="https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r" rel="noopener noreferrer"&gt;https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r&lt;/a&gt;&lt;br&gt;
conda config --set custom_channels.conda-forge &lt;a href="https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud" rel="noopener noreferrer"&gt;https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Set channels
&lt;/h1&gt;

&lt;p&gt;conda config --add channels conda-forge&lt;br&gt;
conda config --add channels defaults&lt;br&gt;
conda config --set show_channel_urls yes&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Create Environment&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;conda create -n duck310 python=3.10 -y --repodata-fn current_repodata.json -v&lt;br&gt;
conda activate duck310&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.9 Configure pip Acceleration and Install uv&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt;&lt;/a&gt; Must be executed in the &lt;code&gt;(duck310)&lt;/code&gt; environment&lt;/p&gt;&lt;/blockquote&gt;
&lt;pre&gt;&lt;code&gt;pip config set global.index-url &lt;a href="https://mirrors.aliyun.com/pypi/simple" rel="noopener noreferrer"&gt;https://mirrors.aliyun.com/pypi/simple&lt;/a&gt;&lt;br&gt;
pip config set global.trusted-host mirrors.aliyun.com

&lt;p&gt;pip install -U uv&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.10 Install OpenDuckMini Dependencies&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;uv pip install -U pip setuptools wheel

&lt;p&gt;uv pip install rustypot==0.1.0 onnxruntime==1.18.1 numpy &lt;br&gt;
    adafruit-circuitpython-bno055==5.4.13 scipy==1.15.1 &lt;br&gt;
    pygame==2.6.0 openai==1.70.0 RPi.GPIO&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.11 Configure Proxy (Optional)&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;git config --global http.proxy &lt;a href="http://your_proxy_address:your_proxy_port" rel="noopener noreferrer"&gt;http://your_proxy_address:your_proxy_port&lt;/a&gt;&lt;br&gt;
git config --global https.proxy &lt;a href="https://your_proxy_address:your_proxy_port" rel="noopener noreferrer"&gt;https://your_proxy_address:your_proxy_port&lt;/a&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Example configuration:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git config --global http.proxy &lt;a href="http://192.168.1.196:6551" rel="noopener noreferrer"&gt;http://192.168.1.196:6551&lt;/a&gt;&lt;br&gt;
git config --global https.proxy &lt;a href="https://192.168.1.196:6551" rel="noopener noreferrer"&gt;https://192.168.1.196:6551&lt;/a&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.12 Install pypot and Open_Duck_Mini_Runtime&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;mkdir ~/project &amp;amp;&amp;amp; cd ~/project&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Install Open_Duck_Mini_Runtime&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Download: &lt;a href="https://github.com/apirrone/Open_Duck_Mini_Runtime/tree/v2" rel="noopener noreferrer"&gt;https://github.com/apirrone/Open_Duck_Mini_Runtime/tree/v2&lt;/a&gt;&lt;br&gt;
unzip Open_Duck_Mini_Runtime-2.zip&lt;br&gt;
cd Open_Duck_Mini_Runtime-2&lt;br&gt;
uv pip install -e .&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Install pypot&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Download: &lt;a href="https://github.com/apirrone/pypot/tree/support-feetech-sts3215" rel="noopener noreferrer"&gt;https://github.com/apirrone/pypot/tree/support-feetech-sts3215&lt;/a&gt;&lt;br&gt;
unzip pypot-support-feetech-sts3215.zip&lt;br&gt;
cd pypot-support-feetech-sts3215&lt;br&gt;
uv pip install .&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.13 Calibrate IMU&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;sudo usermod -aG i2c $USER&lt;br&gt;
i2cdetect -y 1

&lt;p&gt;cd ~/project/Open_Duck_Mini_Runtime-2/scripts/&lt;br&gt;
python calibrate_imu.py&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Rotate and move the robot in different &lt;strong&gt;directions&lt;/strong&gt; until the terminal outputs &lt;code&gt;[3,3,3,3]&lt;/code&gt; and displays &lt;code&gt;Calibrated = True&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Calibration results will be saved in the &lt;code&gt;imu_calib_data.pkl&lt;/code&gt; file&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cp imu_calib_data.pkl ~/project/Open_Duck_Mini_Runtime-2/mini_bdx_runtime/mini_bdx_runtime/&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4.14 Adjust Servo Offsets&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/Open_Duck_Mini_Runtime-2/scripts&lt;br&gt;
python find_soft_offsets.py&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Operation Steps&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;1. Use a cardboard box or stand to elevate the robot from the bottom, ensuring both feet are suspended&lt;/p&gt;
&lt;p&gt;2. Refer to the servo position diagram for calibration:&lt;br&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2026%2F01%2F29%2FcK92DsEOULGekrx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs2.loli.net%2F2026%2F01%2F29%2FcK92DsEOULGekrx.jpg" alt="openduckmini-motor-position.jpg" width="514" height="709"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;3. Put the robot in an upright position with all motors in torque-locked state&lt;/p&gt;
&lt;p&gt;4. Unlock motors one by one, manually adjust to the correct position, then re-lock&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Final State Check&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; Chassis (abdomen) direction remains horizontal or slightly upward&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; Left and right legs, left and right feet are symmetrical, should completely overlap when viewed from the side&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; When placed on a table, both feet’s micro switches should trigger simultaneously&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F2705.png" alt="✅" width="72" height="72"&gt;&lt;/a&gt; Head direction remains horizontal or slightly upward&lt;/p&gt;
&lt;h3&gt;4.15 Modify Configuration File&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/Open_Duck_Mini_Runtime-2/&lt;br&gt;
cp example_config.json ~/duck_config.json&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Fill in the &lt;strong&gt;servo offsets&lt;/strong&gt; in the &lt;code&gt;~/duck_config.json&lt;/code&gt; configuration file and add the following settings:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{&lt;br&gt;
  "imu_upside_down": true&lt;br&gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Important&lt;/strong&gt;: If the &lt;code&gt;imu_upside_down&lt;/code&gt; parameter is not set, the robot will exhibit abnormal oscillations during walking and cannot maintain balance correctly.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;4.16 Initial Bent Leg Posture&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/Open_Duck_Mini_Runtime-2/scripts&lt;br&gt;
python turn_on.py&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Under normal assembly conditions, servo position should be 0 when fully upright&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; After startup, the robot should be in a bent leg posture with servo torque locked&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;If you encounter problems, please refer to Frequently Asked Questions (FAQ)&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;4.17 Test Walking&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/Open_Duck_Mini_Runtime-2/scripts

&lt;p&gt;python v2_rl_walk_mujoco.py &lt;br&gt;
    --duck_config_path ~/duck_config.json &lt;br&gt;
    --onnx_model_path ~/BEST_WALK_ONNX_2.onnx&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4a1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4a1.png" alt="💡" width="72" height="72"&gt;&lt;/a&gt; The &lt;code&gt;BEST_WALK_ONNX_2.onnx&lt;/code&gt; model file needs to be downloaded from the official repository and placed in the home directory.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The robot will first enter the initial posture, then begin movement. Actual operation requires controller control. If you don’t have a Bluetooth controller, you can modify the code to default to forward movement.&lt;/p&gt;
&lt;h2&gt;V. RDK X5 Deployment Process&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;The RDK kit provides Ubuntu 22.04 system images (desktop/server versions).&lt;br&gt;The following only lists &lt;strong&gt;steps different from Raspberry Pi&lt;/strong&gt;, please refer to the above for identical steps.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;5.1 System Flashing&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Download Image&lt;/strong&gt;: &lt;a href="https://archive.d-robotics.cc/downloads/os_images/rdk_x5" rel="noopener noreferrer"&gt;RDK X5 Image Download&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Recommended version: &lt;code&gt;rdk-x5-ubuntu22-preinstalled-desktop-3.4.1-arm64.img.xz&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NAND Firmware Flashing&lt;/strong&gt; (optional, for version consistency):&lt;/p&gt;
&lt;p&gt;Download: &lt;a href="https://archive.d-robotics.cc/downloads/miniboot/rdk_x5/" rel="noopener noreferrer"&gt;NAND Firmware Download&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Recommended version: &lt;code&gt;product_20251111.zip&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;5.2 Install System Packages&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" alt="👉" width="72" height="72"&gt;&lt;/a&gt; Same as Raspberry Pi Step 6&lt;/p&gt;
&lt;h3&gt;5.3 Configure pip Source&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" alt="👉" width="72" height="72"&gt;&lt;/a&gt; Same as Raspberry Pi Step 7&lt;/p&gt;
&lt;h3&gt;5.4 Create venv (&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt; Different from Raspberry Pi)&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;The official &lt;code&gt;hobot.GPIO&lt;/code&gt;, &lt;code&gt;hobot_dnn&lt;/code&gt; and other packages from Digua Robotics are precompiled for the RDK system Python environment.&lt;br&gt;Compatibility issues may occur in Conda environments, &lt;strong&gt;recommended to use system Python + venv virtual environment&lt;/strong&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;pre&gt;&lt;code&gt;python3 -m venv --system-site-packages ~/duck_env&lt;br&gt;
source ~/duck_env/bin/activate
&lt;h1&gt;
  
  
  Verify GPIO module
&lt;/h1&gt;

&lt;p&gt;python3 -c "import Hobot.GPIO; print('OK')"&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;5.5 Configure pip Acceleration and Install uv&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt;&lt;/a&gt; Must be executed in the &lt;code&gt;(duck_env)&lt;/code&gt; environment&lt;/p&gt;&lt;/blockquote&gt;
&lt;pre&gt;&lt;code&gt;pip config set global.index-url &lt;a href="https://mirrors.aliyun.com/pypi/simple" rel="noopener noreferrer"&gt;https://mirrors.aliyun.com/pypi/simple&lt;/a&gt;&lt;br&gt;
pip config set global.trusted-host mirrors.aliyun.com

&lt;p&gt;python3 -m pip install -U uv&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;5.6 Install Dependencies (&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F26a0.png" alt="⚠" width="72" height="72"&gt; Different from Raspberry Pi)&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;python3 -m uv pip install -U pip setuptools wheel
&lt;h1&gt;
  
  
  Note: RDK X5 uses smbus2 instead of RPi.GPIO
&lt;/h1&gt;

&lt;p&gt;python3 -m uv pip install rustypot==0.1.0 onnxruntime==1.18.1 numpy &lt;br&gt;
    adafruit-circuitpython-bno055==5.4.13 scipy==1.15.1 &lt;br&gt;
    pygame==2.6.0 openai==1.70.0 smbus2&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;5.7 Configure Proxy (Optional)&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" alt="👉" width="72" height="72"&gt;&lt;/a&gt; Same as Raspberry Pi Step 11&lt;/p&gt;
&lt;h3&gt;5.8 Install pypot and Runtime&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;mkdir ~/project &amp;amp;&amp;amp; cd ~/project&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Install Open_Duck_Mini_Runtime&lt;/strong&gt; (RDK X5 version):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;unzip Open_Duck_Mini_Runtime-2_RDK_X5.zip&lt;br&gt;
cd Open_Duck_Mini_Runtime-2_RDK_X5&lt;br&gt;
uv pip install -e .&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Install pypot&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Download: &lt;a href="https://github.com/apirrone/pypot/tree/support-feetech-sts3215" rel="noopener noreferrer"&gt;https://github.com/apirrone/pypot/tree/support-feetech-sts3215&lt;/a&gt;&lt;br&gt;
unzip pypot-support-feetech-sts3215.zip&lt;br&gt;
cd pypot-support-feetech-sts3215&lt;br&gt;
uv pip install .&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;5.9 Calibrate IMU&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" alt="👉" width="72" height="72"&gt;&lt;/a&gt; Same as Raspberry Pi Step 13 (change path to &lt;code&gt;Open_Duck_Mini_Runtime-2_RDK_X5&lt;/code&gt;)&lt;/p&gt;
&lt;h3&gt;5.10 Adjust Servo Offsets&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" alt="👉" width="72" height="72"&gt;&lt;/a&gt; Same as Raspberry Pi Step 14 (change path to &lt;code&gt;Open_Duck_Mini_Runtime-2_RDK_X5&lt;/code&gt;)&lt;/p&gt;
&lt;h3&gt;5.11 Modify Configuration File&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" alt="👉" width="72" height="72"&gt;&lt;/a&gt; Same as Raspberry Pi Step 15 (change path to &lt;code&gt;Open_Duck_Mini_Runtime-2_RDK_X5&lt;/code&gt;)&lt;/p&gt;
&lt;h3&gt;5.12 Initial Bent Leg Posture&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f449.png" alt="👉" width="72" height="72"&gt;&lt;/a&gt; Same as Raspberry Pi Step 16 (change path to &lt;code&gt;Open_Duck_Mini_Runtime-2_RDK_X5&lt;/code&gt;)&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;If you encounter problems, please refer to Frequently Asked Questions (FAQ)&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;5.13 Test Walking&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/Open_Duck_Mini_Runtime-2_RDK_X5/scripts

&lt;p&gt;python v2_rl_walk_mujoco.py &lt;br&gt;
    --duck_config_path ~/duck_config.json &lt;br&gt;
    --onnx_model_path ~/BEST_WALK_ONNX_2.onnx&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This article has added support for Logitech F710 controller.&lt;/p&gt;
&lt;h2&gt;VI. Frequently Asked Questions (FAQ)&lt;/h2&gt;
&lt;h3&gt;6.1 Q1: When running &lt;code&gt;find_soft_offsets.py&lt;/code&gt;, gravity shows horizontal posture&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Problem Cause&lt;/strong&gt;: Servo 22 or 12 is not installed in horizontal orientation, causing servo position to be approximately -1.57 radians&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;1. Loosen the 4 fixing screws on the servo main disk to allow the entire leg to be freely adjustable&lt;/p&gt;
&lt;p&gt;2. Create the following script to return the servo to center position:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/Open_Duck_Mini_Runtime-2/scripts  # or corresponding RDK X5 path&lt;br&gt;
nano set_servo_mid.py&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;from mini_bdx_runtime.rustypot_position_hwi import HWI&lt;br&gt;
from mini_bdx_runtime.duck_config import DuckConfig&lt;br&gt;
import argparse&lt;br&gt;
import time&lt;br&gt;
import traceback

&lt;p&gt;def zero_motor(hwi, joint_id, tol=0.02, timeout=5.0):&lt;br&gt;
    """Move motor to 0 rad and wait until reached."""&lt;br&gt;
    print(f"Zeroing motor ID {joint_id} to 0 rad")&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    current_pos = hwi.io.read_present_position(&amp;amp;#91;joint_id])&amp;amp;#91;0]
    print(f"Current position: {current_pos:.3f} rad")

    hwi.io.write_goal_position(&amp;amp;#91;joint_id], &amp;amp;#91;0.0])

    start_time = time.time()
    while True:
        pos = hwi.io.read_present_position(&amp;amp;#91;joint_id])&amp;amp;#91;0]
        err = abs(pos)

        print(f"  pos={pos:.3f} rad, err={err:.3f}")

        if err &amp;amp;lt; tol:
            print("✓ Zero position reached")
            return True

        if time.time() - start_time &amp;amp;gt; timeout:
            print("✗ Timeout while zeroing motor")
            return False

        time.sleep(0.05)

except Exception as e:
    print(f"✗ Error zeroing motor ID {joint_id}: {e}")
    print(traceback.format_exc())
    return False
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;def main():&lt;br&gt;
    parser = argparse.ArgumentParser()&lt;br&gt;
    parser.add_argument("--id", type=int, required=True, help="Motor ID to zero")&lt;br&gt;
    args = parser.parse_args()&lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("Initializing hardware interface...")
try:
    duck_config = DuckConfig()
    hwi = HWI(duck_config=duck_config)
    print("Successfully connected to hardware")
except Exception as e:
    print(f"Error initializing HWI: {e}")
    print(traceback.format_exc())
    return

zero_motor(hwi, args.id)

try:
    hwi.io.disable_torque(&amp;amp;#91;args.id])
    print(f"Torque disabled for motor ID {args.id}")
except Exception:
    pass
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    main()&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;3. Run the script, specify the servo ID to calibrate and return to center position:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;python set_servo_mid.py --id 12&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Expected Output&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Initializing hardware interface...&lt;br&gt;
Successfully connected to hardware&lt;br&gt;
Zeroing motor ID 12 to 0 rad&lt;br&gt;
Current position: -3.086 rad&lt;br&gt;
  pos=-3.086 rad, err=3.086&lt;br&gt;
  ...&lt;br&gt;
✗ Timeout while zeroing motor&lt;br&gt;
Torque disabled for motor ID 12&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;4. The servo disk will automatically rotate. After rotation is complete, fix the four screws in the upright posture.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4dd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4dd.png" alt="📝" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Document Update Log&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; As of the writing of this article, multiple tutorials for OpenDuck Mini have Python environment configuration issues&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; This tutorial, when used with the specified image version, has been verified in practice and can avoid common environment issues&lt;/p&gt;


&lt;/blockquote&gt;

&lt;h2&gt;VII. Reinforcement Learning&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;This section introduces how to use the OpenDuck project for reinforcement learning training, including reference motion generation, data processing, and model training.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;7.1 Generate Reference Motions&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Repository&lt;/strong&gt;: &lt;code&gt;Open_Duck_reference_motion_generator&lt;/code&gt;&lt;br&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Generate reference motion data for imitation learning&lt;/p&gt;&lt;/blockquote&gt;
&lt;h4&gt;7.1.1 Clone Repository and Install Dependencies&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/open_duck_mini_ws&lt;br&gt;
git clone &lt;a href="https://github.com/apirrone/Open_Duck_reference_motion_generator.git" rel="noopener noreferrer"&gt;https://github.com/apirrone/Open_Duck_reference_motion_generator.git&lt;/a&gt;&lt;br&gt;
cd Open_Duck_reference_motion_generator
&lt;h1&gt;
  
  
  Install dependencies using uv
&lt;/h1&gt;

&lt;p&gt;uv sync&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;7.1.2 Batch Generate Motions&lt;/h4&gt;
&lt;blockquote&gt;&lt;p&gt;Use the &lt;code&gt;auto_waddle.py&lt;/code&gt; script to batch generate motion files with different gait parameters&lt;/p&gt;&lt;/blockquote&gt;
&lt;pre&gt;&lt;code&gt;uv run scripts/auto_waddle.py &lt;br&gt;
    --duck open_duck_mini_v2 &lt;br&gt;
    --sweep &lt;br&gt;
    -j8&lt;/code&gt;&lt;/pre&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;br&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;br&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;br&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;br&gt;
&lt;tbody&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;--duck&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Robot model (&lt;code&gt;open_duck_mini_v2&lt;/code&gt;)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;--sweep&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Traverse all parameter combinations&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;-j8&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Use 8 threads for parallel generation&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/tbody&gt;
&lt;br&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Generation Result&lt;/strong&gt;: Approximately 240 &lt;code&gt;.json&lt;/code&gt; motion files will be generated in the &lt;code&gt;recordings/&lt;/code&gt; directory&lt;/p&gt;
&lt;p&gt;File naming format: &lt;code&gt;{number}&lt;em&gt;{x_velocity}&lt;/em&gt;{y_velocity}&lt;em&gt;{turn_velocity}.json&lt;/em&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Example: &lt;code&gt;99_0.074-0.111_-0.074.json&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; X-direction velocity: 0.074 m/s (forward)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Y-direction velocity: -0.111 m/s (right)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Turn angular velocity: -0.074 rad/s (clockwise)&lt;/p&gt;
&lt;h4&gt;7.1.3 Verify Generated Motions (Optional)&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;# Use Meshcat for visualization&lt;br&gt;
uv run open_duck_reference_motion_generator/gait_playground.py --duck open_duck_mini_v2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then open &lt;code&gt;&lt;a href="http://127.0.0.1:7000/static/" rel="noopener noreferrer"&gt;http://127.0.0.1:7000/static/&lt;/a&gt;&lt;/code&gt; in your browser to view the 3D model animation&lt;/p&gt;
&lt;h3&gt;7.2 Process Motion Data&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Perform polynomial fitting on motion data to compress data and smooth noise&lt;/p&gt;&lt;/blockquote&gt;
&lt;h4&gt;7.2.1 Polynomial Fitting&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/open_duck_mini_ws/Open_Duck_reference_motion_generator

&lt;p&gt;uv run scripts/fit_poly.py --ref_motion recordings/&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;: The &lt;code&gt;polynomial_coefficients.pkl&lt;/code&gt; file will be generated in the current directory, containing polynomial coefficients for all motions&lt;/p&gt;
&lt;blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4a1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F1f4a1.png" alt="💡" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Purpose of Polynomial Fitting&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Significantly compress data volume (each joint only needs 5-10 coefficients to represent the complete motion trajectory)&lt;/p&gt;

&lt;p&gt;Effectively smooth noise and jitter in raw data&lt;/p&gt;

&lt;p&gt;Facilitate fast sampling and interpolation during reinforcement learning training&lt;/p&gt;


&lt;/blockquote&gt;
&lt;h4&gt;7.2.2 View Fitting Results (Optional)&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;uv run scripts/plot_poly_fit.py --coefficients polynomial_coefficients.pkl&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The script will display fitting curve graphs for each motion one by one to verify fitting effectiveness&lt;/p&gt;
&lt;h4&gt;7.2.3 Copy to Training Directory&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;cp polynomial_coefficients.pkl &lt;br&gt;
   ~/project/open_duck_mini_ws/Open_Duck_Playground/playground/open_duck_mini_v2/data/&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;7.3 Reinforcement Learning Training&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Repository&lt;/strong&gt;: &lt;code&gt;Open_Duck_Playground&lt;/code&gt;&lt;br&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Train walking strategy using PPO algorithm&lt;/p&gt;&lt;/blockquote&gt;
&lt;h4&gt;7.3.1 Clone Repository and Install Dependencies&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/open_duck_mini_ws&lt;br&gt;
git clone &lt;a href="https://github.com/apirrone/Open_Duck_Playground.git" rel="noopener noreferrer"&gt;https://github.com/apirrone/Open_Duck_Playground.git&lt;/a&gt;&lt;br&gt;
cd Open_Duck_Playground

&lt;p&gt;uv sync&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;7.3.2 Start Training&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;python3 playground/open_duck_mini_v2/runner.py &lt;br&gt;
    --task flat_terrain_backlash &lt;br&gt;
    --num_timesteps 300000000&lt;/code&gt;&lt;/pre&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;br&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;br&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;br&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;br&gt;
&lt;tbody&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;--task&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Training task type (&lt;code&gt;flat_terrain_backlash&lt;/code&gt; means flat terrain + backlash compensation)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;--num_timesteps&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Total training steps (300 million steps, usually takes several hours to complete)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/tbody&gt;
&lt;br&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Training Output&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;checkpoints/&lt;/code&gt; directory – Saves model checkpoints during training&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;code&gt;ONNX.onnx&lt;/code&gt; file – Final exported ONNX format inference model&lt;/p&gt;
&lt;h4&gt;7.3.3 Monitor Training Progress&lt;/h4&gt;
&lt;p&gt;Run the following command in a new terminal:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd ~/project/open_duck_mini_ws/Open_Duck_Playground&lt;br&gt;
tensorboard --logdir=checkpoints/&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Open &lt;code&gt;&lt;a href="http://localhost:6006" rel="noopener noreferrer"&gt;http://localhost:6006&lt;/a&gt;&lt;/code&gt; in your browser to view training curves and metrics&lt;/p&gt;
&lt;h4&gt;7.3.4 Training Parameters&lt;/h4&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;br&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;br&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Default Value&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;br&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;br&gt;
&lt;tbody&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;num_envs&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;8192&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Number of parallel simulation environments&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;batch_size&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Training batch size&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;learning_rate&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;0.0003&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Learning rate&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;discounting&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;0.97&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Discount factor (for calculating present value of future rewards)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;code&gt;episode_length&lt;/code&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Maximum steps per episode&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/tbody&gt;
&lt;br&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;7.3.5 Deploy to Real Robot&lt;/h4&gt;
&lt;p&gt;After training is complete, copy the generated &lt;code&gt;ONNX.onnx&lt;/code&gt; model file to the robot device:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;scp ONNX.onnx user@raspberry-pi:~/BEST_WALK_ONNX_2.onnx&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then follow the steps in the Test Walking section to complete deployment&lt;/p&gt;
&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/understanding-reinforcement-learning-through-openduck/" rel="noopener noreferrer"&gt;Understanding Reinforcement Learning through OpenDuck&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>NavTalk Official Support for NVIDIA RTX 5090 on Linux</title>
      <dc:creator>Frank Fu</dc:creator>
      <pubDate>Mon, 30 Mar 2026 08:50:11 +0000</pubDate>
      <link>https://dev.to/frankfu/navtalk-official-support-for-nvidia-rtx-5090-on-linux-248m</link>
      <guid>https://dev.to/frankfu/navtalk-official-support-for-nvidia-rtx-5090-on-linux-248m</guid>
      <description>&lt;p&gt;NavTalk’s digital human lip-sync and real-time audio/video capabilities are &lt;strong&gt;fully supported for deployment and operation on Linux servers equipped with NVIDIA RTX 5090&lt;/strong&gt;. End-to-end adaptation and validation—from drivers and frameworks to the inference engine—have been completed for the latest generation (Blackwell architecture and corresponding NVIDIA drivers and libraries), ensuring a stable, high-performance real-time digital human experience on current hardware.&lt;/p&gt;
&lt;p&gt;This document describes NavTalk’s &lt;strong&gt;official support for RTX 5090 on Linux&lt;/strong&gt; in terms of technology stack, adaptation work, and product value, and provides &lt;strong&gt;recommended concurrent real-time chat Session counts for RTX 5090 / 4090 / 3090&lt;/strong&gt; based on measured results, for evaluation and sizing reference.&lt;/p&gt;
&lt;h2&gt;1. Why RTX 5090 and Linux Matter&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Compute upgrade&lt;/strong&gt;: RTX 5090 is based on the Blackwell architecture, with significantly higher memory and compute, suited for real-time high-resolution lip-sync and multi-session concurrency.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Linux first&lt;/strong&gt;: Most production and cloud environments run Linux; NavTalk offers a full set of services on Linux (including real-time lip-sync, video lip-sync, and other APIs), making integration and scaling straightforward.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Long-term compatibility&lt;/strong&gt;: Adaptation has been completed for the latest NVIDIA drivers and AI runtime (e.g. CUDA 12.8, PyTorch 2.7), keeping NavTalk aligned with the official software stack for the foreseeable future and reducing upgrade cost.&lt;/p&gt;
&lt;p&gt;Thus, &lt;strong&gt;“deployable, operable, and scalable” on RTX 5090 Linux is a clear commitment from NavTalk for production and high-end compute scenarios&lt;/strong&gt;. We recommend using NVIDIA drivers that support RTX 5090 (e.g. 5xx series) and a common Linux distribution (e.g. Ubuntu 22.04 LTS or newer).&lt;/p&gt;
&lt;h2&gt;2. Technology Stack and Adaptation&lt;/h2&gt;
&lt;p&gt;NavTalk’s runtime on RTX 5090 Linux is selected and validated separately from environments used for older GPUs (e.g. CUDA 11.8), and is &lt;strong&gt;maintained independently&lt;/strong&gt; to avoid wrong or mixed installations and to simplify environment isolation and issue reproduction.&lt;/p&gt;
&lt;h3&gt;2.1 Core Runtime (5090-specific)&lt;/h3&gt;
&lt;p&gt;The table below lists &lt;strong&gt;officially verified software versions&lt;/strong&gt; for NavTalk on RTX 5090, for operations and integration reference. &lt;strong&gt;Python&lt;/strong&gt; is the runtime; &lt;strong&gt;CUDA&lt;/strong&gt; is the NVIDIA compute platform; &lt;strong&gt;PyTorch&lt;/strong&gt; is the main framework for AI models; &lt;strong&gt;mmcv / mmdet / mmpose&lt;/strong&gt; are the vision libraries used for face and pose, etc.&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;5090 Linux recommended version&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.10.11&lt;/td&gt;
&lt;td&gt;Runtime version&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CUDA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;12.8&lt;/td&gt;
&lt;td&gt;NVIDIA compute platform for RTX 5090&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PyTorch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.7.0+cu128&lt;/td&gt;
&lt;td&gt;AI model framework (vision, audio, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TensorFlow&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;≥2.16.0&lt;/td&gt;
&lt;td&gt;Required when enabling related features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NumPy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.26.0&lt;/td&gt;
&lt;td&gt;Numerical library, compatible with image processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;mmcv&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.1.0&lt;/td&gt;
&lt;td&gt;Computer vision base (face, image processing, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;mmdet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.2.0&lt;/td&gt;
&lt;td&gt;Detection library paired with mmcv&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;mmpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.2.0&lt;/td&gt;
&lt;td&gt;Pose library paired with mmcv&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;NavTalk &lt;strong&gt;maintains a dedicated dependency list&lt;/strong&gt; for the 5090 environment, including the above components and versions, with notes on TensorFlow, CUDA 12.8, NumPy, etc., separate from older GPU environments, reflecting 5090-specific adaptation and maintainability.&lt;/p&gt;
&lt;h3&gt;2.2 5090 Architecture Compatibility&lt;/h3&gt;
&lt;p&gt;RTX 5090 uses the new Blackwell architecture (compute capability 9.0). Some vision libraries may not ship prebuilt packages for 5090. Compatibility has been verified and adapted for the 5090 architecture so that face, pose, and related capabilities run correctly on 5090, enabling full usability.&lt;/p&gt;
&lt;h3&gt;2.3 Inference and Model Management&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; NavTalk’s lip-sync core is based on &lt;strong&gt;MuseTalk 1.5&lt;/strong&gt; (a widely used high-quality lip-sync model) and runs on 5090 with the PyTorch 2.7 + CUDA 12.8 stack above.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; NavTalk provides &lt;strong&gt;unified GPU and model management&lt;/strong&gt;: models are loaded on demand, and multi-task contention for the GPU is avoided, improving stability in multi-service or multi-GPU setups and long-term operation on 5090.&lt;/p&gt;
&lt;p&gt;All versions and adaptation work above have been verified, representing &lt;strong&gt;reproducible, deliverable engineering support&lt;/strong&gt;, not just “theoretical” compatibility.&lt;/p&gt;
&lt;h2&gt;3. Product Value and Use Cases&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Latency and quality&lt;/strong&gt;: On 5090, NavTalk can leverage the new generation’s compute for &lt;strong&gt;real-time lip-sync at 30+ fps&lt;/strong&gt; and higher resolution with multi-session concurrency, suitable for digital humans, virtual hosts, and live interaction where latency and quality matter.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Service forms&lt;/strong&gt;: On 5090 Linux, NavTalk offers &lt;strong&gt;real-time lip API, video lip API, digital human avatar API&lt;/strong&gt;, and other interfaces for live, recorded, and interactive use; the real-time lip API is optimized for low latency and streaming.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Production-ready&lt;/strong&gt;: Concurrency, quality enhancements (e.g. face enhancement, mouth sharpening), GPU options, and output directories are configurable, easing integration with your existing business systems, storage, and monitoring.&lt;/p&gt;
&lt;p&gt;Thus, &lt;strong&gt;NavTalk on 5090 Linux is not only “runs” but full production support for the latest compute&lt;/strong&gt;, supporting evaluation and rollout.&lt;/p&gt;
&lt;h2&gt;4. RTX 5090 / 4090 / 3090 Concurrency and Responsiveness&lt;/h2&gt;
&lt;p&gt;Conclusions in this section are based on &lt;strong&gt;single-node, single-GPU&lt;/strong&gt; measured memory usage (service port 8800, real-time chat WebSocket call scenario). The following gives &lt;strong&gt;RTX 5090, 4090, and 3090&lt;/strong&gt; concurrent Session recommendations from a memory perspective; if the GPU is shared with other processes (e.g. LLM services), recalculate using available memory.&lt;/p&gt;
&lt;h3&gt;4.1 Memory and Single-Session Peak (Measured)&lt;/h3&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RTX 5090 total memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;32,607 MiB&lt;/strong&gt; (~31.8 GiB)&lt;/td&gt;
&lt;td&gt;Single-GPU physical memory; after small desktop usage, still ~32 GiB for planning.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Single-session real-time chat peak&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;10,410 MiB&lt;/strong&gt; (~10.2 GiB)&lt;/td&gt;
&lt;td&gt;NavTalk process group usage when one real-time chat Session is inferring.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;Composition of the single-session peak (measured): main process during inference ~&lt;strong&gt;8,746 MiB&lt;/strong&gt;, plus two worker processes at &lt;strong&gt;832 MiB&lt;/strong&gt; each, total &lt;strong&gt;8,746 + 832×2 = 10,410 MiB&lt;/strong&gt;. In the current deployment, each real-time chat Session corresponds to a separately started service process set (not multi-threaded sharing), so each additional Session adds ~10.2 GiB memory; this peak is used for sizing.&lt;/p&gt;
&lt;p&gt;Share of total capacity: 10,410 MiB ÷ 32,607 MiB ≈ &lt;strong&gt;31.9%&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;4.2 Concurrent Session Count (Memory-Based)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;When NavTalk has exclusive use of RTX 5090:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Usable memory for NavTalk is &lt;strong&gt;32,607 MiB&lt;/strong&gt; (still close to 32 GiB after desktop, etc.).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Floor&lt;/strong&gt; by single-session peak 10,410 MiB: 32,607 ÷ 10,410 ≈ 3.13 → &lt;strong&gt;3 concurrent real-time chat Sessions&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Check: 3 × 10,410 = 31,230 MiB &amp;lt; 32,607 MiB; ~1,377 MiB headroom for fragmentation and short-term spikes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When other processes use GPU memory&lt;/strong&gt; (e.g. LLM inference, other services):&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Available memory = 32,607 MiB − other process usage;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Concurrent Sessions = ⌊ available memory ÷ 10,410 ⌋ (floor).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Actual concurrency limits also depend on system RAM, CPU, and network; &lt;strong&gt;we recommend load testing in the target environment (including whether the GPU is shared)&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;4.3 Three-GPU Concurrent Session Recommendations (Measured and Inferred)&lt;/h3&gt;
&lt;p&gt;Single-session real-time chat peak is taken from 5090 measurements: &lt;strong&gt;10,410 MiB&lt;/strong&gt; (~10.2 GiB). 5090 was tested on Linux; 4090 and 3090 on Windows. Using memory floor and measured results, &lt;strong&gt;recommended planning&lt;/strong&gt; is:&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;Total memory&lt;/th&gt;
&lt;th&gt;Environment&lt;/th&gt;
&lt;th&gt;Recommended concurrent real-time chat Sessions&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RTX 5090&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;32,607 MiB (~31.8 GiB)&lt;/td&gt;
&lt;td&gt;Linux&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RTX 4090&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;24,564 MiB (~24.0 GiB)&lt;/td&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RTX 3090&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;24,576 MiB (~24.0 GiB)&lt;/td&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;The above are memory-based recommendations; actual limits also depend on system memory, CPU, and network. We recommend load testing in the target environment.&lt;/p&gt;
&lt;h2&gt;5. Summary&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; NavTalk &lt;strong&gt;is officially supported and runs fully on NVIDIA RTX 5090 + Linux&lt;/strong&gt;. For 5090, NavTalk specifies runtime versions (e.g. Python 3.10, CUDA 12.8, PyTorch 2.7), a 5090-specific dependency list, and recommended versions for face/pose and related libraries, with end-to-end adaptation and validation from drivers through the inference engine.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; Compatibility has been addressed for the 5090 architecture; where prebuilt packages are unavailable, building from source and similar approaches are supported to run correctly on 5090.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fs.w.org%2Fimages%2Fcore%2Femoji%2F15.0.3%2F72x72%2F25aa.png" alt="▪" width="72" height="72"&gt;&lt;/a&gt; &lt;strong&gt;Concurrency and responsiveness&lt;/strong&gt;: Based on measurements and memory sizing, &lt;strong&gt;RTX 5090&lt;/strong&gt; (Linux, exclusive) supports &lt;strong&gt;3&lt;/strong&gt; concurrent real-time chat Sessions, &lt;strong&gt;RTX 4090&lt;/strong&gt; (Windows) is recommended at &lt;strong&gt;2&lt;/strong&gt;, and &lt;strong&gt;RTX 3090&lt;/strong&gt; (Windows, with system/desktop usage) at &lt;strong&gt;1&lt;/strong&gt;; if the GPU is shared with other processes, recalculate from available memory. Higher compute improves low-latency and real-time lip experience.&lt;/p&gt;
&lt;p&gt;This document describes &lt;strong&gt;product-level support for RTX 5090 on Linux&lt;/strong&gt;, for external communication and technical evaluation.&lt;/p&gt;
&lt;p&gt;The post &lt;a href="https://frankfu.blog/openai/navtalk-official-support-for-nvidia-rtx-5090-on-linux-2/" rel="noopener noreferrer"&gt;NavTalk Official Support for NVIDIA RTX 5090 on Linux&lt;/a&gt; appeared first on &lt;a href="https://frankfu.blog" rel="noopener noreferrer"&gt;Frank Fu's Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
  </channel>
</rss>
