<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yuki Imamura</title>
    <description>The latest articles on DEV Community by Yuki Imamura (@yuk6ra).</description>
    <link>https://dev.to/yuk6ra</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1041550%2Fc2f7888e-902c-4b49-a655-72e5f093e592.jpg</url>
      <title>DEV Community: Yuki Imamura</title>
      <link>https://dev.to/yuk6ra</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yuk6ra"/>
    <language>en</language>
    <item>
      <title>A Practical Guide to Fine-Tuning and Inference with GR00T N1 &amp; LeRobot on a Custom Dataset</title>
      <dc:creator>Yuki Imamura</dc:creator>
      <pubDate>Sun, 20 Jul 2025 10:23:32 +0000</pubDate>
      <link>https://dev.to/yuk6ra/a-practical-guide-to-fine-tuning-and-inference-with-gr00t-n1-lerobot-on-a-custom-dataset-52ec</link>
      <guid>https://dev.to/yuk6ra/a-practical-guide-to-fine-tuning-and-inference-with-gr00t-n1-lerobot-on-a-custom-dataset-52ec</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;NVIDIA's "Project GR00T," a foundation model for humanoid robots, has the potential to revolutionize the world of AI and robotics. The GR00T N1 model can be fine-tuned not just with simulation data but also with real-world data, enabling it to generate behaviors specialized for specific tasks.&lt;/p&gt;

&lt;p&gt;This article provides a comprehensive, step-by-step guide to the entire process: from fine-tuning the GR00T N1 model using a custom dataset collected with a "SO-ARM101" single arm, to running inference on the physical robot, and finally, evaluating the trained model. We'll include concrete steps and detailed troubleshooting advice along the way.&lt;/p&gt;

&lt;p&gt;By the end of this guide, you'll be ready to take your first steps toward building a robot model that can perform your own custom tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Article Covers
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Data Selection:&lt;/strong&gt; An overview of the training dataset and the settings used during data collection.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Fine-Tuning:&lt;/strong&gt; Fine-tuning the GR00T N1 model with our custom data.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Inference:&lt;/strong&gt; Controlling the physical robot using the fine-tuned model.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Evaluation:&lt;/strong&gt; Quantitatively evaluating the performance of the trained model.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;This article is based on the following official documentation and blogs. All procedures have been verified with the specific versions listed below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Official Blog&lt;/strong&gt;: &lt;a href="https://huggingface.co/blog/nvidia/gr00t-n1-5-so101-tuning" rel="noopener noreferrer"&gt;Fine-tuning GR00T N1 with LeRobot on a custom dataset&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/NVIDIA/Isaac-GR00T" rel="noopener noreferrer"&gt;NVIDIA/Isaac-GR00T&lt;/a&gt; / &lt;a href="https://github.com/huggingface/lerobot" rel="noopener noreferrer"&gt;huggingface/lerobot&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isaac-GR00T&lt;/strong&gt;: &lt;a href="https://github.com/NVIDIA/Isaac-GR00T/tree/d5984002e24d418872adc5822a5bbb1d6a9b4ddc" rel="noopener noreferrer"&gt;&lt;code&gt;d598400&lt;/code&gt;&lt;/a&gt; (used in this guide)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;LeRobot&lt;/strong&gt;: &lt;a href="https://github.com/huggingface/lerobot/tree/519b76110efeea55a4f919895d0029dc0df41e8b" rel="noopener noreferrer"&gt;&lt;code&gt;519b761&lt;/code&gt;&lt;/a&gt; (used in this guide)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: These repositories are updated frequently. Please be aware of this if you are trying to replicate this environment.&lt;/p&gt;

&lt;h1&gt;
  
  
  Part 1: Data Selection
&lt;/h1&gt;

&lt;p&gt;High-quality training data is essential for fine-tuning. This article assumes the data collection process is already complete, but we'll explain what kind of dataset we used and how it was collected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Task Selection
&lt;/h2&gt;

&lt;p&gt;We chose the simplest of the following three tasks: "pick up a single piece of tape and place it in a box." Complex tasks have a lower success rate, so we recommend starting with something simple.&lt;/p&gt;

&lt;p&gt;You can visualize the datasets we created using the &lt;a href="https://huggingface.co/spaces/lerobot/visualize_dataset" rel="noopener noreferrer"&gt;LeRobot Dataset Visualizer&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Complex Task&lt;/strong&gt;: &lt;a href="https://huggingface.co/spaces/lerobot/visualize_dataset?path=%2Fyuk6ra%2Fso101-pen-cleanup%2Fepisode_0" rel="noopener noreferrer"&gt;&lt;code&gt;pen-cleanup&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Simple Task&lt;/strong&gt;: &lt;a href="https://huggingface.co/spaces/lerobot/visualize_dataset?path=%2Fyuk6ra%2Fso101-tapes-cleanup%2Fepisode_0" rel="noopener noreferrer"&gt;&lt;code&gt;tapes-cleanup&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Easiest Task&lt;/strong&gt;: &lt;a href="https://huggingface.co/spaces/lerobot/visualize_dataset?path=%2Fyuk6ra%2Fso101-onetape-cleanup%2Fepisode_0" rel="noopener noreferrer"&gt;&lt;code&gt;onetape-cleanup&lt;/code&gt;&lt;/a&gt; (Used in this guide)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Collection Settings
&lt;/h2&gt;

&lt;p&gt;Below are the collection settings for the dataset we used (50 episodes).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;cameras&lt;/code&gt; configuration is particularly important. The camera names defined here (e.g., &lt;code&gt;tip&lt;/code&gt;, &lt;code&gt;front&lt;/code&gt;) will be referenced during the fine-tuning process, so you must remember them exactly. &lt;strong&gt;We recommend using the name &lt;code&gt;wrist&lt;/code&gt; to align with GR00T's default settings, which will save you some configuration changes later.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;dataset&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;repo_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yuk6ra/so101-onetape-cleanup"&lt;/span&gt;  &lt;span class="c1"&gt;# Hugging Face repository ID&lt;/span&gt;
  &lt;span class="na"&gt;single_task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Grab&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tape&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;place&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;it&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;box."&lt;/span&gt;  &lt;span class="c1"&gt;# Task description&lt;/span&gt;
  &lt;span class="na"&gt;num_episodes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;  &lt;span class="c1"&gt;# Number of episodes to record&lt;/span&gt;
  &lt;span class="na"&gt;fps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;  &lt;span class="c1"&gt;# Frame rate&lt;/span&gt;
  &lt;span class="na"&gt;episode_time_s&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;  &lt;span class="c1"&gt;# Max time per episode (seconds)&lt;/span&gt;
  &lt;span class="na"&gt;reset_time_s&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;  &lt;span class="c1"&gt;# Reset time after episode recording (seconds)&lt;/span&gt;

&lt;span class="c1"&gt;# Follower Arm&lt;/span&gt;
&lt;span class="na"&gt;robot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;so101_follower"&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/dev/ttyACM0"&lt;/span&gt;  &lt;span class="c1"&gt;# Serial port&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;white"&lt;/span&gt;  &lt;span class="c1"&gt;# Follower ID&lt;/span&gt;

  &lt;span class="c1"&gt;# Camera settings (the names defined here are used later)&lt;/span&gt;
  &lt;span class="na"&gt;cameras&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;tip&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Using 'wrist' here will streamline later steps&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;opencv"&lt;/span&gt;
      &lt;span class="na"&gt;index_or_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
      &lt;span class="na"&gt;fps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;640&lt;/span&gt;
      &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;480&lt;/span&gt;
    &lt;span class="na"&gt;front&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;opencv"&lt;/span&gt;
      &lt;span class="na"&gt;index_or_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
      &lt;span class="na"&gt;fps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;640&lt;/span&gt;
      &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;480&lt;/span&gt;

&lt;span class="c1"&gt;# Leader Arm&lt;/span&gt;
&lt;span class="na"&gt;teleop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;so101_leader"&lt;/span&gt; 
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/dev/ttyACM1"&lt;/span&gt;  &lt;span class="c1"&gt;# Serial port&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black"&lt;/span&gt;  &lt;span class="c1"&gt;# Leader ID&lt;/span&gt;

&lt;span class="c1"&gt;# Additional options&lt;/span&gt;
&lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;display_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;  &lt;span class="c1"&gt;# Whether to display camera feed&lt;/span&gt;
  &lt;span class="na"&gt;push_to_hub&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="c1"&gt;# Whether to automatically upload to Hugging Face Hub&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Part 2: Fine-Tuning
&lt;/h1&gt;

&lt;p&gt;Now, let's use the collected data to fine-tune the GR00T N1 model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing the Execution Environment
&lt;/h2&gt;

&lt;p&gt;Fine-tuning requires a high-spec machine. We used the following cloud environment for this process.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Specs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NVIDIA H100 SXM (80GB VRAM)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Disk&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;300GB+ (5000 steps consumed ~100GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;128GB+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ubuntu 24.04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4Gbps (Upload/Download)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: A slow network connection will significantly increase the time it takes to upload the model.&lt;/p&gt;

&lt;p&gt;After connecting to the remote server via SSH, let's verify the specs with a few commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh &lt;span class="nt"&gt;-p&lt;/span&gt; 30454 root@xxx.xxx.xxx.xx &lt;span class="nt"&gt;-L&lt;/span&gt; 8080:localhost:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;nvidia-smi
Sun Jul 13 06:57:05 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------|
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|&lt;span class="o"&gt;=========================================&lt;/span&gt;+&lt;span class="o"&gt;========================&lt;/span&gt;+&lt;span class="o"&gt;======================&lt;/span&gt;|
|   0  NVIDIA H100 80GB HBM3          On  |   00000000:E4:00.0 Off |                    0 |
| N/A   47C    P0             73W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|&lt;span class="o"&gt;=========================================================================================&lt;/span&gt;|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;df&lt;/span&gt; /home &lt;span class="nt"&gt;-h&lt;/span&gt;
Filesystem      Size  Used Avail Use% Mounted on
overlay         300G   90M  300G   1% /
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;free &lt;span class="nt"&gt;-h&lt;/span&gt;
               total        used        free      shared  buff/cache   available
Mem:           503Gi        34Gi       372Gi        47Mi       101Gi       469Gi
Swap:          8.0Gi       186Mi       7.8Gi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;lsb_release &lt;span class="nt"&gt;-d&lt;/span&gt;
No LSB modules are available.
Description:    Ubuntu 24.04.2 LTS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, following the official instructions, we'll set up a Conda virtual environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repository&lt;/span&gt;
git clone https://github.com/NVIDIA/Isaac-GR00T
&lt;span class="nb"&gt;cd &lt;/span&gt;Isaac-GR00T

&lt;span class="c"&gt;# Create and activate the Conda environment&lt;/span&gt;
conda create &lt;span class="nt"&gt;-n&lt;/span&gt; gr00t &lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3.10
conda activate gr00t

&lt;span class="c"&gt;# Install required packages&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; setuptools
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; .[base]
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-build-isolation&lt;/span&gt; flash-attn&lt;span class="o"&gt;==&lt;/span&gt;2.7.1.post4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, log in to Hugging Face and Weights &amp;amp; Biases (Wandb).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Hugging Face Token&lt;/strong&gt;: &lt;a href="https://huggingface.co/settings/tokens" rel="noopener noreferrer"&gt;huggingface.co/settings/tokens&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Wandb API Key&lt;/strong&gt;: &lt;a href="https://wandb.ai/authorize" rel="noopener noreferrer"&gt;wandb.ai/authorize&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;huggingface-cli login
wandb login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Preparing the Training Data
&lt;/h2&gt;

&lt;p&gt;Download your chosen dataset from the Hugging Face Hub.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;huggingface-cli download &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nt"&gt;--repo-type&lt;/span&gt; dataset yuk6ra/so101-onetape-cleanup &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ./demo_data/so101-onetape-cleanup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To ensure GR00T correctly recognizes the data format, a configuration file named &lt;code&gt;modality.json&lt;/code&gt; is required. Copy the sample and modify its contents to match your environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Copy the sample file&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;getting_started/examples/so100_dualcam__modality.json ./demo_data/so101-onetape-cleanup/meta/modality.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change the &lt;code&gt;wrist&lt;/code&gt; entry to &lt;code&gt;tip&lt;/code&gt; to match the camera name you set during data collection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Edit the configuration file&lt;/span&gt;
vim ./demo_data/so101-onetape-cleanup/meta/modality.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt; {
    ...
        "video": {
            "front": {
                "original_key": "observation.images.front"
            },
&lt;span class="gd"&gt;-           "wrist": {
-               "original_key": "observation.images.wrist"
&lt;/span&gt;&lt;span class="gi"&gt;+           "tip": { 
+               "original_key": "observation.images.tip" 
&lt;/span&gt;            }
        },
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify that the data can be loaded correctly with the following script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python scripts/load_dataset.py &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--dataset-path&lt;/span&gt; ./demo_data/so101-onetape-cleanup &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--plot-state-action&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--video-backend&lt;/span&gt; torchvision_av
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If successful, it will output the dataset structure and frame information.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;====================================================================================================&lt;/span&gt;
&lt;span class="o"&gt;=========================================&lt;/span&gt; Humanoid Dataset &lt;span class="o"&gt;=========================================&lt;/span&gt;
&lt;span class="o"&gt;====================================================================================================&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'action.gripper'&lt;/span&gt;: &lt;span class="s1"&gt;'np scalar: 1.1111111640930176 [1, 1] float64'&lt;/span&gt;,
 &lt;span class="s1"&gt;'action.single_arm'&lt;/span&gt;: &lt;span class="s1"&gt;'np: [1, 5] float64'&lt;/span&gt;,
 &lt;span class="s1"&gt;'annotation.human.task_description'&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Grab the tape and place it in the '&lt;/span&gt;
                                       &lt;span class="s1"&gt;'box.'&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;,
 &lt;span class="s1"&gt;'state.gripper'&lt;/span&gt;: &lt;span class="s1"&gt;'np scalar: 2.410423517227173 [1, 1] float64'&lt;/span&gt;,
 &lt;span class="s1"&gt;'state.single_arm'&lt;/span&gt;: &lt;span class="s1"&gt;'np: [1, 5] float64'&lt;/span&gt;,
 &lt;span class="s1"&gt;'video.front'&lt;/span&gt;: &lt;span class="s1"&gt;'np: [1, 480, 640, 3] uint8'&lt;/span&gt;,
 &lt;span class="s1"&gt;'video.tip'&lt;/span&gt;: &lt;span class="s1"&gt;'np: [1, 480, 640, 3] uint8'&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
dict_keys&lt;span class="o"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'video.front'&lt;/span&gt;, &lt;span class="s1"&gt;'video.tip'&lt;/span&gt;, &lt;span class="s1"&gt;'state.single_arm'&lt;/span&gt;, &lt;span class="s1"&gt;'state.gripper'&lt;/span&gt;, &lt;span class="s1"&gt;'action.single_arm'&lt;/span&gt;, &lt;span class="s1"&gt;'action.gripper'&lt;/span&gt;, &lt;span class="s1"&gt;'annotation.human.task_description'&lt;/span&gt;&lt;span class="o"&gt;])&lt;/span&gt;
&lt;span class="o"&gt;==================================================&lt;/span&gt;
video.front: &lt;span class="o"&gt;(&lt;/span&gt;1, 480, 640, 3&lt;span class="o"&gt;)&lt;/span&gt;
video.tip: &lt;span class="o"&gt;(&lt;/span&gt;1, 480, 640, 3&lt;span class="o"&gt;)&lt;/span&gt;
state.single_arm: &lt;span class="o"&gt;(&lt;/span&gt;1, 5&lt;span class="o"&gt;)&lt;/span&gt;
state.gripper: &lt;span class="o"&gt;(&lt;/span&gt;1, 1&lt;span class="o"&gt;)&lt;/span&gt;
action.single_arm: &lt;span class="o"&gt;(&lt;/span&gt;1, 5&lt;span class="o"&gt;)&lt;/span&gt;
action.gripper: &lt;span class="o"&gt;(&lt;/span&gt;1, 1&lt;span class="o"&gt;)&lt;/span&gt;
annotation.human.task_description: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Grab the tape and place it in the box.'&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
...
Warning: Skipping left_arm as it&lt;span class="s1"&gt;'s not found in both state and action dictionaries
Warning: Skipping right_arm as it'&lt;/span&gt;s not found &lt;span class="k"&gt;in &lt;/span&gt;both state and action dictionaries
Warning: Skipping left_hand as it&lt;span class="s1"&gt;'s not found in both state and action dictionaries
Warning: Skipping right_hand as it'&lt;/span&gt;s not found &lt;span class="k"&gt;in &lt;/span&gt;both state and action dictionaries
Plotted state and action space
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running the Training
&lt;/h2&gt;

&lt;p&gt;Once everything is set up, start the fine-tuning process. On an H100 GPU, this took about 30 minutes and consumed around 100GB of disk space for 5000 steps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python scripts/gr00t_finetune.py &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--dataset-path&lt;/span&gt; ./demo_data/so101-onetape-cleanup/ &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--num-gpus&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--output-dir&lt;/span&gt; ./so101-checkpoints  &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--max-steps&lt;/span&gt; 5000 &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--data-config&lt;/span&gt; so100_dualcam &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--video-backend&lt;/span&gt; torchvision_av
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you run into memory issues or have lower specs, try reducing &lt;code&gt;--dataloader-num-workers&lt;/code&gt; or &lt;code&gt;--batch-size&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For lower-spec machines:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python scripts/gr00t_finetune.py &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--dataset-path&lt;/span&gt; ./demo_data/so101-onetape-cleanup/ &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--num-gpus&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--output-dir&lt;/span&gt; ./so101-checkpoints  &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--max-steps&lt;/span&gt; 5000 &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--data-config&lt;/span&gt; so100_dualcam &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--batch-size&lt;/span&gt; 8 &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--video-backend&lt;/span&gt; torchvision_av &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--dataloader-num-workers&lt;/span&gt; 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Error 1: &lt;code&gt;ValueError: Video key wrist not found&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This error occurs because the training script is looking for the default camera name &lt;code&gt;wrist&lt;/code&gt; but finds &lt;code&gt;tip&lt;/code&gt; in your dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Directly edit &lt;code&gt;gr00t/experiment/data_config.py&lt;/code&gt; to fix the camera name.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;# Around line 225
&lt;span class="p"&gt;class So100DualCamDataConfig(So100DataConfig):
&lt;/span&gt;&lt;span class="gd"&gt;-   video_keys = ["video.front", "video.wrist"]
&lt;/span&gt;&lt;span class="gi"&gt;+   video_keys = ["video.front", "video.tip"]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Error 2: &lt;code&gt;av.error.MemoryError: [Errno 12] Cannot allocate memory&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This error happens if you run out of memory while decoding video data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;...
RuntimeError: Caught MemoryError &lt;span class="k"&gt;in &lt;/span&gt;DataLoader worker process 0.
Original Traceback &lt;span class="o"&gt;(&lt;/span&gt;most recent call last&lt;span class="o"&gt;)&lt;/span&gt;:
...
  File &lt;span class="s2"&gt;"av/error.pyx"&lt;/span&gt;, line 326, &lt;span class="k"&gt;in &lt;/span&gt;av.error.err_check
av.error.MemoryError: &lt;span class="o"&gt;[&lt;/span&gt;Errno 12] Cannot allocate memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Updating the PyAV library to the latest version might solve this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; av
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Uploading the Trained Model
&lt;/h2&gt;

&lt;p&gt;Once training is complete, upload the generated checkpoint to the Hugging Face Hub.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;so101-checkpoints/checkpoint-5000/

&lt;span class="c"&gt;# Remove unnecessary files (optional)&lt;/span&gt;
&lt;span class="c"&gt;# rm -rf scheduler.pt optimizer.pt&lt;/span&gt;

&lt;span class="c"&gt;# Upload to Hugging Face Hub&lt;/span&gt;
huggingface-cli upload &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--repo-type&lt;/span&gt; model yuk6ra/so101-onetape-cleanup &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--commit-message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Finetuned model with 5000 steps"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: If you're using a cloud server, don't forget to shut down the instance after the upload is complete.&lt;/p&gt;

&lt;h1&gt;
  
  
  Part 3: Inference
&lt;/h1&gt;

&lt;p&gt;Let's use our fine-tuned model to control a physical robot. The inference setup consists of two main components: an &lt;strong&gt;inference server&lt;/strong&gt; to host the model and a &lt;strong&gt;client node&lt;/strong&gt; to control the robot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inference Server Setup and Execution
&lt;/h2&gt;

&lt;p&gt;The inference server can run on a local or cloud GPU machine. We used the following local environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;GPU&lt;/strong&gt;: NVIDIA GeForce RTX 4070 Ti (12GB)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;RAM&lt;/strong&gt;: 128GB&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OS&lt;/strong&gt;: Ubuntu 22.04&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're running it on the cloud, make sure to open the necessary port for the inference server (e.g., &lt;code&gt;5555&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;First, set up the GR00T environment just as you did for fine-tuning. Then, download the model from the Hugging Face Hub and start the server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set up GR00T environment (see Fine-Tuning section)&lt;/span&gt;
git clone https://github.com/NVIDIA/Isaac-GR00T
&lt;span class="nb"&gt;cd &lt;/span&gt;Isaac-GR00T
conda create &lt;span class="nt"&gt;-n&lt;/span&gt; gr00t &lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3.10
conda activate gr00t
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; setuptools
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; .[base]
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-build-isolation&lt;/span&gt; flash-attn&lt;span class="o"&gt;==&lt;/span&gt;2.7.1.post4

&lt;span class="c"&gt;# Download the model&lt;/span&gt;
huggingface-cli download &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--repo-type&lt;/span&gt; model yuk6ra/so101-onetape-cleanup &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ./model/so101-onetape-cleanup

&lt;span class="c"&gt;# Start the inference server&lt;/span&gt;
python scripts/inference_service.py &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--model_path&lt;/span&gt; ./model/so101-onetape-cleanup &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--embodiment_tag&lt;/span&gt; new_embodiment &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--data_config&lt;/span&gt; so100_dualcam &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--server&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--port&lt;/span&gt; 5555
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see &lt;code&gt;Server is ready and listening on tcp://0.0.0.0:5555&lt;/code&gt;, the server has started successfully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; You will need to modify &lt;code&gt;data_config.py&lt;/code&gt; to change the camera name to &lt;code&gt;tip&lt;/code&gt; here as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Error 1: &lt;code&gt;OSError: CUDA_HOME environment variable is not set&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;This happens during the &lt;code&gt;flash-attn&lt;/code&gt; installation if the CUDA path isn't found.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;...
OSError: CUDA_HOME environment variable is not set. Please &lt;span class="nb"&gt;set &lt;/span&gt;it to your CUDA &lt;span class="nb"&gt;install &lt;/span&gt;root.
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Install the CUDA Toolkit via &lt;code&gt;conda&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;conda &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; nvidia cuda-toolkit&lt;span class="o"&gt;=&lt;/span&gt;12.4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Error 2: &lt;code&gt;ModuleNotFoundError: No module named 'flash_attn'&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Make sure you have correctly activated the &lt;code&gt;gr00t&lt;/code&gt; conda environment with &lt;code&gt;conda activate gr00t&lt;/code&gt;. You might have accidentally ended up in the &lt;code&gt;base&lt;/code&gt; or &lt;code&gt;lerobot&lt;/code&gt; environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Client Node Setup and Execution
&lt;/h2&gt;

&lt;p&gt;The client runs in the &lt;code&gt;lerobot&lt;/code&gt; environment used during data collection. If you don't have the environment, create it with these steps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you don't have a lerobot virtual environment:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the LeRobot repository&lt;/span&gt;
git clone https://github.com/huggingface/lerobot.git
&lt;span class="nb"&gt;cd &lt;/span&gt;lerobot

&lt;span class="c"&gt;# Create and activate the Conda environment&lt;/span&gt;
conda create &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; lerobot &lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3.10
conda activate lerobot

&lt;span class="c"&gt;# Install required packages&lt;/span&gt;
conda &lt;span class="nb"&gt;install &lt;/span&gt;ffmpeg &lt;span class="nt"&gt;-c&lt;/span&gt; conda-forge
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[feetech]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;conda activate lerobot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After activating the &lt;code&gt;lerobot&lt;/code&gt; environment, install any other necessary packages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;matplotlib
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to your &lt;code&gt;Isaac-GR00T&lt;/code&gt; directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~/Documents/Isaac-GR00T/ &lt;span class="c"&gt;# or your path to Isaac-GR00T&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, use &lt;code&gt;lerobot.find_cameras&lt;/code&gt; to identify the camera IDs connected to your system. You'll use these IDs as arguments when launching the client.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; lerobot.find_cameras opencv
&lt;span class="c"&gt;# ... (From the output, find the camera IDs for 'tip' and 'front')&lt;/span&gt;
&lt;span class="c"&gt;# Example: tip is 2, front is 0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we need to modify &lt;code&gt;getting_started/examples/eval_lerobot.py&lt;/code&gt; to integrate with GR00T's inference capabilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- from lerobot.common.cameras.opencv.configuration_opencv import (
&lt;/span&gt;&lt;span class="gi"&gt;+ from lerobot.cameras.opencv.configuration_opencv import (
&lt;/span&gt;    OpenCVCameraConfig,
)
&lt;span class="gd"&gt;- from lerobot.common.robots import (
&lt;/span&gt;&lt;span class="gi"&gt;+ from lerobot.robots import (
&lt;/span&gt;    Robot,
    RobotConfig,
    koch_follower,
    make_robot_from_config,
    so100_follower,
    so101_follower,
)
&lt;span class="gd"&gt;- from lerobot.common.utils.utils import (
&lt;/span&gt;&lt;span class="gi"&gt;+ from lerobot.utils.utils import (
&lt;/span&gt;    init_logging,
    log_say,
)
&lt;span class="err"&gt;
&lt;/span&gt;# NOTE:
# Sometimes we would like to abstract different env, or run this on a separate machine
# User can just move this single python class method gr00t/eval/service.py
# to their code or do the following line below
&lt;span class="gd"&gt;- # sys.path.append(os.path.expanduser("~/Isaac-GR00T/gr00t/eval/"))
&lt;/span&gt;&lt;span class="gi"&gt;+ import os 
+ import sys 
+ sys.path.append(os.path.expanduser("./gr00t/eval/")) # Fix the path
&lt;/span&gt;&lt;span class="p"&gt;from service import ExternalRobotInferenceClient
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Launch the client with the following command to send instructions to the robot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For a local environment:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python getting_started/examples/eval_lerobot.py &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--robot&lt;/span&gt;.type&lt;span class="o"&gt;=&lt;/span&gt;so101_follower &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--robot&lt;/span&gt;.port&lt;span class="o"&gt;=&lt;/span&gt;/dev/ttyACM1 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--robot&lt;/span&gt;.id&lt;span class="o"&gt;=&lt;/span&gt;white &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--robot&lt;/span&gt;.cameras&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"{
        tip: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30},
        front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}
    }"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--lang_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Grab the tape and place it in the box."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are using a cloud server for inference, set &lt;code&gt;--policy_host&lt;/code&gt; and &lt;code&gt;--policy_port&lt;/code&gt; accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For a cloud environment:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python getting_started/examples/eval_lerobot.py &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--robot&lt;/span&gt;.type&lt;span class="o"&gt;=&lt;/span&gt;so101_follower &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--robot&lt;/span&gt;.port&lt;span class="o"&gt;=&lt;/span&gt;/dev/ttyACM0 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--robot&lt;/span&gt;.id&lt;span class="o"&gt;=&lt;/span&gt;white &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--robot&lt;/span&gt;.cameras&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"{
        tip: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30},
        front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}
    }"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--policy_host&lt;/span&gt; xxx.xx.xx.xx &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--policy_port&lt;/span&gt; xxxxx &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--lang_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Grab tapes and place into pen holder."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Execution Results
&lt;/h2&gt;

&lt;p&gt;Here are the results for GR00T N1. For comparison, we've also included success and failure examples from an ACT model.&lt;/p&gt;

&lt;h3&gt;
  
  
  GR00T N1 (Success)
&lt;/h3&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/x7jxNTNg8cU"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  ACT (Success)
&lt;/h3&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/pFOEiMeKiWQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  GR00T N1 (Failure)
&lt;/h3&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/CaIFHwCWR2w"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  ACT (Failure)
&lt;/h3&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/pCkPsHewTGk"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Error: Inference results are &lt;code&gt;NaN&lt;/code&gt; or the robot movement is jerky.
&lt;/h4&gt;

&lt;p&gt;The model itself might be fine, but the input from the robot could be incorrect.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/ggfFMOJ5QhE"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Double-check your &lt;code&gt;cameras&lt;/code&gt; configuration and try rebuilding the environment from scratch.&lt;/p&gt;

&lt;h1&gt;
  
  
  Part 4: Model Evaluation
&lt;/h1&gt;

&lt;p&gt;Finally, let's evaluate how well the trained model can reproduce the tasks from the dataset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing for Evaluation
&lt;/h2&gt;

&lt;p&gt;We'll work in the &lt;code&gt;gr00t&lt;/code&gt; environment. Download the evaluation dataset and model, and prepare the &lt;code&gt;modality.json&lt;/code&gt; file. This process is the same as in the fine-tuning section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;conda activate gr00t

&lt;span class="c"&gt;# Download the dataset&lt;/span&gt;
huggingface-cli download &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nt"&gt;--repo-type&lt;/span&gt; dataset yuk6ra/so101-onetape-cleanup &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ./demo_data/so101-onetape-cleanup

&lt;span class="c"&gt;# Download the model (use --revision to evaluate a specific step)&lt;/span&gt;
huggingface-cli download &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--repo-type&lt;/span&gt; model  yuk6ra/so101-onetape-cleanup &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ./model/so101-onetape-cleanup
    &lt;span class="c"&gt;# --revision checkpoint-2000&lt;/span&gt;

&lt;span class="c"&gt;# Prepare modality.json&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;getting_started/examples/so100_dualcam__modality.json ./demo_data/so101-onetape-cleanup/meta/modality.json
&lt;span class="c"&gt;# Use vim to change the camera name to 'tip'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running the Evaluation
&lt;/h2&gt;

&lt;p&gt;Once ready, run the evaluation script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python scripts/eval_policy.py &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--plot&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--embodiment_tag&lt;/span&gt; new_embodiment &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--model_path&lt;/span&gt; ./model/so101-onetape-cleanup/ &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--data_config&lt;/span&gt; so100_dualcam &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--dataset_path&lt;/span&gt; ./demo_data/so101-onetape-cleanup/ &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--video_backend&lt;/span&gt; torchvision_av &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--modality_keys&lt;/span&gt; single_arm gripper &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--denoising_steps&lt;/span&gt; 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; You'll need to modify &lt;code&gt;data_config.py&lt;/code&gt; to change the camera name to &lt;code&gt;tip&lt;/code&gt; here as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Error: &lt;code&gt;ModuleNotFoundError: No module named 'tyro'&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Double-check that you are in the correct &lt;code&gt;gr00t&lt;/code&gt; virtual environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation Results
&lt;/h2&gt;

&lt;p&gt;When the script finishes, it will generate a plot comparing the model's prediction (Prediction: green line) with the actual recorded action (Ground truth: orange line). This graph allows you to visually confirm how accurately the model has learned the task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdow33d3gk7dyc93o5j0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdow33d3gk7dyc93o5j0z.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparative Analysis
&lt;/h2&gt;

&lt;p&gt;Let's use the evaluation script to compare how training progress and task difficulty affect model performance. The plots visualize the difference between the model's prediction (green line) and the ground truth (orange line). The closer the curves are, the more accurately the model is reproducing the task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Comparison by Training Steps
&lt;/h3&gt;

&lt;p&gt;We'll compare the performance on the same task (&lt;code&gt;onetape-cleanup&lt;/code&gt;) at 2000 and 5000 training steps.&lt;/p&gt;

&lt;h4&gt;
  
  
  Evaluation at 2000 Steps
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8wtpcyiufh005vp1dq2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8wtpcyiufh005vp1dq2c.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis&lt;/strong&gt;: At 2000 steps, the prediction (green) roughly follows the ground truth (orange), but there are noticeable deviations and oscillations. The arm's movement (&lt;code&gt;single_arm&lt;/code&gt;) in certain dimensions is not smooth, indicating that the model has not yet fully learned to reproduce the task.&lt;/p&gt;

&lt;h4&gt;
  
  
  Evaluation at 5000 Steps
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xz3mxos6w1qspjd73ry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xz3mxos6w1qspjd73ry.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis&lt;/strong&gt;: After 5000 steps, the prediction and ground truth curves are nearly identical, showing that the model can reproduce the motion very smoothly. This clearly demonstrates that additional training improved the model's performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Comparison by Task Complexity
&lt;/h3&gt;

&lt;p&gt;Next, let's compare the performance of models trained for 5000 steps on tasks of varying complexity.&lt;/p&gt;

&lt;h4&gt;
  
  
  Easiest Task: &lt;code&gt;onetape-cleanup&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xz3mxos6w1qspjd73ry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xz3mxos6w1qspjd73ry.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis&lt;/strong&gt;: As shown before, the model reproduces the easiest task almost perfectly.&lt;/p&gt;

&lt;h4&gt;
  
  
  Simple Task: &lt;code&gt;tapes-cleanup&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzkbn07r2k2r8hwiianl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzkbn07r2k2r8hwiianl.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis&lt;/strong&gt;: Simply by having multiple tapes, the deviation in the prediction (green line) becomes slightly larger.&lt;/p&gt;

&lt;h4&gt;
  
  
  Complex Task: &lt;code&gt;pen-cleanup&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyaoe47d5fgw496of5p67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyaoe47d5fgw496of5p67.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis&lt;/strong&gt;: For the more complex task of cleaning up pens (featured in the official blog), the gap between the prediction and ground truth becomes significant. There are large deviations in specific joint movements (e.g., &lt;code&gt;single_arm_4&lt;/code&gt;), suggesting that 5000 training steps are insufficient for this level of complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Effect of Further Training (7000 Steps)
&lt;/h3&gt;

&lt;p&gt;Let's see the evaluation for the complex &lt;code&gt;pen-cleanup&lt;/code&gt; task after training for 7000 steps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzp9xcp6cpnvrxnvx1j6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzp9xcp6cpnvrxnvx1j6.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis&lt;/strong&gt;: Compared to 5000 steps, even after 7000 steps, a significant gap remains between the prediction and ground truth. In fact, the MSE has increased, meaning the prediction has moved further away from the ground truth. This suggests that simply increasing the number of training steps may not solve the problem. It could point to other issues, such as a lack of diversity or quantity in the dataset, or that the model architecture itself is not capable of handling the task's complexity.&lt;/p&gt;

&lt;p&gt;These comparisons reaffirm that &lt;strong&gt;you need a sufficient number of training steps and a high-quality, diverse dataset that corresponds to the complexity of your task.&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this article, we walked through the entire workflow of fine-tuning an NVIDIA GR00T N1 model with custom data collected via LeRobot, followed by inference and evaluation on a physical robot, complete with detailed commands and logs.&lt;/p&gt;

&lt;p&gt;Here are the key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Data Consistency&lt;/strong&gt;: It is crucial to ensure that settings, especially camera names, are consistent between data collection (LeRobot) and training/inference (GR00T).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Environment Setup&lt;/strong&gt;: Properly separating virtual environments with &lt;code&gt;conda&lt;/code&gt; and installing the correct libraries are key to success.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Troubleshooting&lt;/strong&gt;: You'll need to be able to read error logs carefully and adapt to errors specific to your custom dataset, such as by directly editing configuration files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I hope this guide helps you in developing your own robotics applications with GR00T. Future work could involve tackling more complex tasks or comparing model performance across different numbers of training steps.&lt;/p&gt;

&lt;p&gt;If you have any feedback or find any mistakes, please don't hesitate to reach out. Let's build this exciting future together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.zrek.co/" rel="noopener noreferrer"&gt;https://www.zrek.co/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gr00t</category>
      <category>lerobot</category>
      <category>robotics</category>
    </item>
  </channel>
</rss>
