<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: yosaKun</title>
    <description>The latest articles on DEV Community by yosaKun (@hanasite).</description>
    <link>https://dev.to/hanasite</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4005774%2F2f9db1bd-ae8b-4a82-92c7-b5b1ad6e7d07.jpg</url>
      <title>DEV Community: yosaKun</title>
      <link>https://dev.to/hanasite</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hanasite"/>
    <language>en</language>
    <item>
      <title>PAL: Giving AI Agents Hands in the Physical World</title>
      <dc:creator>yosaKun</dc:creator>
      <pubDate>Sat, 27 Jun 2026 20:11:18 +0000</pubDate>
      <link>https://dev.to/hanasite/pal-giving-ai-agents-hands-in-the-physical-world-48mj</link>
      <guid>https://dev.to/hanasite/pal-giving-ai-agents-hands-in-the-physical-world-48mj</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;REPL Is All You Need.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A 19-year-old's proposal for an open standard that lets AI Agent control hardware directly.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Problem: AI Agents Are Trapped in Screens
&lt;/h2&gt;

&lt;p&gt;AI agents can write code, search the web, deploy servers, and manage databases. They're incredibly capable — inside a container.&lt;/p&gt;

&lt;p&gt;But ask your favorite AI to &lt;strong&gt;flip a relay&lt;/strong&gt;, &lt;strong&gt;read a temperature sensor&lt;/strong&gt;, or &lt;strong&gt;scan an I2C bus&lt;/strong&gt; — and it can't. Not because it doesn't know how. It knows exactly what &lt;code&gt;machine.Pin(5, Pin.OUT)&lt;/code&gt; does. It just has nowhere to run that code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI agents lack a physical execution terminal.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CLI tools gave LLMs hands in the digital world (&lt;code&gt;df -h&lt;/code&gt;, &lt;code&gt;pip install&lt;/code&gt;, &lt;code&gt;docker compose up&lt;/code&gt;). Embedded systems can give them hands in the real world (&lt;code&gt;GPIO.on()&lt;/code&gt;, &lt;code&gt;ADC.read()&lt;/code&gt;, &lt;code&gt;I2C.scan()&lt;/code&gt;). But nobody has defined a standard for how agents should talk to hardware.&lt;/p&gt;

&lt;p&gt;That's what PAL is.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Landscape: What Exists, and Why It Falls Short
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ESP-Claw (Espressif Official)
&lt;/h3&gt;

&lt;p&gt;Espressif's "Chat Coding" framework puts a full AI agent on an ESP32-S3 — ReAct loop, LLM calls, tool registry, IM channels (Telegram, WeChat), Event Router. It's impressive engineering. It's also the wrong architecture.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent runs on the MCU&lt;/strong&gt; — 8MB PSRAM minimum, $15+ BOM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uses Lua&lt;/strong&gt; — LLMs generate Python 50x more accurately than Lua&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-registration required&lt;/strong&gt; — every GPIO needs a C struct registered as a Lua tool module&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM API calls from MCU&lt;/strong&gt; — one network timeout can stall the entire FreeRTOS task&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  mimiclaw
&lt;/h3&gt;

&lt;p&gt;A lighter ESP32 agent. Same architecture, GPIO bugs confirmed. Toy-grade.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bare-metal C + UART commands
&lt;/h3&gt;

&lt;p&gt;Rock solid. But adding a new operation means: write C → compile → flash → reboot. 10 minutes minimum. Agents can't iterate at that speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Raspberry Pi + GPIO
&lt;/h3&gt;

&lt;p&gt;Python libraries everywhere, but 30-second boot, 5W power draw, $35+ cost, no hard real-time. Overkill for controlling a relay.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Insight: MicroPython REPL IS the Agent Interface
&lt;/h2&gt;

&lt;p&gt;Here's what everyone missed: &lt;strong&gt;Python's REPL and an AI agent's interaction model are isomorphic.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;REPL loop:                    Agent loop:
  &amp;gt;&amp;gt;&amp;gt; type code                 receive task
  execute                       reason → generate code
  see output                    send code to REPL
  &amp;gt;&amp;gt;&amp;gt; type next                 observe result → adjust
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't need a tool registry. You don't need JSON schema. You don't need a Skill Registry. &lt;strong&gt;The REPL is the world's simplest IPC.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And Python? It's the language LLMs generate best — 25% of GitHub public repos, versus &amp;lt;0.5% for Lua. Claude has seen millions of &lt;code&gt;machine.Pin()&lt;/code&gt; calls in training data. It knows how to write this code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: PAL in Two Cores
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────┐
│         Cloud Agent (AstrBot)         │  ← Reasoning, planning
└──────────────┬───────────────────────┘
               │ WebSocket JSON
┌──────────────▼───────────────────────┐
│        ESP32-S3 PAL Terminal          │
│                                       │
│  Core 0 (C, FreeRTOS, NEVER CHANGES): │  ← Hard real-time
│  · SPI/I2C/UART drivers               │
│  · Hardware watchdog                  │
│  · WiFi auto-reconnect               │
│  · Pin ownership table                │
│                                       │
│  Core 1 (MicroPython, ANYTHING GOES): │  ← Agent playground
│  · WebSocket → JSON → Python exec     │
│  · machine module → hardware          │
│  · uasyncio → concurrent tasks        │
│  · Crash? Core 0 restarts you.        │
└──────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Core 0 is the brake pedal. It never changes. Core 1 is the steering wheel. The agent can grip it however it wants.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the agent writes an infinite loop? Core 0 detects heartbeat timeout → restarts Core 1 VM. If the agent tries to access system pins? &lt;code&gt;machine.Pin()&lt;/code&gt; returns &lt;code&gt;OSError&lt;/code&gt;. If the agent crashes? Core 0 keeps running. Physical control link never breaks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Protocol: JSON That Executes Python
&lt;/h2&gt;

&lt;p&gt;No tool schemas. No pre-registration. Just Python code over WebSocket:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent → Terminal:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"msg_001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"exec"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"from machine import Pin; Pin(5, Pin.OUT).on()"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timeout_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Terminal → Agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"msg_001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stdout"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stderr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"exec_time_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. One round-trip over WiFi: &amp;lt;10ms. Python execution: &amp;lt;1ms.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Cloud Agent, Not On-Device Agent
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Brain and hands should use different hardware
&lt;/h3&gt;

&lt;p&gt;Physical execution needs determinism, low latency, 24/7 stability. AI reasoning needs elastic compute, large memory, frequent iteration. Forcing both onto one MCU is asking one chip to do two contradictory things.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. One brain, many hands
&lt;/h3&gt;

&lt;p&gt;A single cloud agent can manage dozens of PAL terminals across a factory floor. Agent-on-Device requires one agent instance per node — no global perspective.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. You can stop the brain anytime
&lt;/h3&gt;

&lt;p&gt;Cloud agent misbehaving? Cut the WebSocket. It's over. Agent-on-Device misbehaving on an MCU? You wait for the hardware watchdog to trigger — that's your only recovery mechanism.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Skills, MCP, tools run on cloud infrastructure
&lt;/h3&gt;

&lt;p&gt;Hermes-style skill accumulation, vector databases, MCP tool chains, SQLite persistence — all mature AI infrastructure. None of it needs to be squeezed into 8MB of PSRAM.&lt;/p&gt;




&lt;h2&gt;
  
  
  What PAL Is, and What It Isn't
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;PAL IS:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ An agent-friendly embedded terminal standard (ESP32-class, MicroPython, dual-core)&lt;/li&gt;
&lt;li&gt;✅ A JSON protocol for executing Python code on hardware&lt;/li&gt;
&lt;li&gt;✅ A safety model (5-layer defense, pin ownership, Core 0 isolation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;PAL IS NOT:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ An I2C/SPI bus protocol&lt;/li&gt;
&lt;li&gt;❌ An Agent framework&lt;/li&gt;
&lt;li&gt;❌ A hardware reference design&lt;/li&gt;
&lt;li&gt;❌ Tied to any specific MCU or peripheral&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Current Status
&lt;/h2&gt;

&lt;p&gt;PAL is a &lt;strong&gt;draft specification (v0.1)&lt;/strong&gt;. I'm a freshman at Anhui University of Science and Technology. The reference implementation (ESP32-S3 Core 0/1) is under development.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: I'm currently preparing for exams and still learning how to express technical ideas fluently in English, so I used Claude to help draft and polish this post. All ideas, architecture, and the PAL specification itself are my own work.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This spec is open for discussion. I'm looking for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feedback on the architecture&lt;/li&gt;
&lt;li&gt;Core 0 boundary definition&lt;/li&gt;
&lt;li&gt;Multi-terminal coordination&lt;/li&gt;
&lt;li&gt;MCP integration (Phase 1-4 roadmap in the repo)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub (International):&lt;/strong&gt; &lt;a href="https://github.com/hanasite/pal-spec" rel="noopener noreferrer"&gt;github.com/hanasite/pal-spec&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gitee (China):&lt;/strong&gt; &lt;a href="https://gitee.com/yosakun/pal-spec" rel="noopener noreferrer"&gt;gitee.com/yosakun/pal-spec&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The best way to predict the future is to define it."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;— PAL v0.1, 2026&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>iot</category>
      <category>discuss</category>
      <category>python</category>
    </item>
  </channel>
</rss>
