<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Interlap</title>
    <description>The latest articles on DEV Community by Interlap (@mobai_019d06386873d90ed58).</description>
    <link>https://dev.to/mobai_019d06386873d90ed58</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3820538%2F49bba752-99b1-4799-99b2-31d8f2a75d20.png</url>
      <title>DEV Community: Interlap</title>
      <link>https://dev.to/mobai_019d06386873d90ed58</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mobai_019d06386873d90ed58"/>
    <language>en</language>
    <item>
      <title>AI-Native Mobile Device Automation: Give Your AI Agent Eyes and Hands on Real Phones</title>
      <dc:creator>Interlap</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:13:30 +0000</pubDate>
      <link>https://dev.to/mobai_019d06386873d90ed58/ai-native-mobile-device-automation-give-your-ai-agent-eyes-and-hands-on-real-phones-43go</link>
      <guid>https://dev.to/mobai_019d06386873d90ed58/ai-native-mobile-device-automation-give-your-ai-agent-eyes-and-hands-on-real-phones-43go</guid>
      <description>&lt;h1&gt;
  
  
  AI-Native Mobile Device Automation: Your AI Agent Can Write Code — But Can It Use a Phone?
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By the MobAI team · Published April 2026 · 10 min read&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AI coding agents — &lt;a href="https://docs.anthropic.com/en/docs/claude-code/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, &lt;a href="https://www.cursor.com/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, &lt;a href="https://openai.com/index/introducing-codex/" rel="noopener noreferrer"&gt;Codex&lt;/a&gt; — have crossed a threshold. They refactor entire modules, scaffold features, and ship pull requests without a human touching the keyboard. But mobile device automation has remained a human-only task. These agents can't tap a button, read a screen, or run a mobile test on a real iPhone or Android device.&lt;/p&gt;

&lt;p&gt;That's exactly the problem &lt;a href="https://mobai.run" rel="noopener noreferrer"&gt;MobAI&lt;/a&gt; was built to solve — an AI-native mobile automation tool that gives agents eyes and hands on real phones.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Mobile Device Automation Works for AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://mobai.run" rel="noopener noreferrer"&gt;MobAI&lt;/a&gt; is a desktop application for AI-powered mobile device automation, connecting AI agents to physical and simulated iOS and Android devices. It works as an &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt;, an HTTP API, or both — meaning any AI agent that speaks MCP (Claude Code, Cursor, Codex) or HTTP can control a mobile device as naturally as it reads a file.&lt;/p&gt;

&lt;p&gt;The architecture is intentionally simple. MobAI runs on your Mac, Windows, or Linux machine, talks to your iOS or Android device, and exposes a unified interface on top. No &lt;a href="https://appium.io/" rel="noopener noreferrer"&gt;Appium&lt;/a&gt;. No Selenium grid. No YAML configs. Plug in a device, start the bridge, and the agent has a phone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Traditional Mobile Testing Tools Don't Work for AI Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://appium.io/" rel="noopener noreferrer"&gt;Appium&lt;/a&gt;, &lt;a href="https://wix.github.io/Detox/" rel="noopener noreferrer"&gt;Detox&lt;/a&gt;, &lt;a href="https://developer.android.com/training/testing/espresso" rel="noopener noreferrer"&gt;Espresso&lt;/a&gt;, &lt;a href="https://developer.apple.com/documentation/xctest" rel="noopener noreferrer"&gt;XCTest&lt;/a&gt; — these traditional mobile testing frameworks are built for humans writing test scripts. They assume you know the screen hierarchy in advance, that you'll write explicit waits, that you'll maintain page objects. They produce verbose, stateful sessions that burn through an LLM's context window before anything useful happens.&lt;/p&gt;

&lt;p&gt;AI agents need something different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compact UI snapshots&lt;/strong&gt; that fit in a context window, not multi-megabyte XML dumps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic element targeting&lt;/strong&gt; — "tap the button near the Email label" — not brittle XPath selectors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batched execution&lt;/strong&gt; — send a full flow, not one action per round trip&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in failure handling&lt;/strong&gt; so the agent doesn't need to reinvent retry logic every time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MobAI was designed for these constraints from day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  MobAI vs. Appium: Key Differences for AI-Driven Mobile Testing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Appium&lt;/th&gt;
&lt;th&gt;MobAI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Designed for&lt;/td&gt;
&lt;td&gt;Human test scripts&lt;/td&gt;
&lt;td&gt;AI agents and LLMs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI representation&lt;/td&gt;
&lt;td&gt;Verbose XML page source&lt;/td&gt;
&lt;td&gt;Compact, indexed accessibility tree&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Element targeting&lt;/td&gt;
&lt;td&gt;XPath / CSS selectors&lt;/td&gt;
&lt;td&gt;Semantic predicates (text, type, spatial)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution model&lt;/td&gt;
&lt;td&gt;One action per round trip&lt;/td&gt;
&lt;td&gt;Batched DSL with 30+ actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure handling&lt;/td&gt;
&lt;td&gt;Manual retry logic&lt;/td&gt;
&lt;td&gt;Built-in strategies (retry, skip, replan)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup complexity&lt;/td&gt;
&lt;td&gt;Server + drivers + capabilities&lt;/td&gt;
&lt;td&gt;Plug in device, start bridge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-platform&lt;/td&gt;
&lt;td&gt;Separate drivers per platform&lt;/td&gt;
&lt;td&gt;Unified interface for iOS and Android&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window impact&lt;/td&gt;
&lt;td&gt;High (verbose sessions)&lt;/td&gt;
&lt;td&gt;Low (compact snapshots)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Accessibility Trees Optimized for LLM Context Windows
&lt;/h2&gt;

&lt;p&gt;When an agent needs to understand what's on screen, it asks MobAI to observe. The response is a structured accessibility tree — but not the raw platform dump. MobAI filters out noise (non-interactive containers, invisible elements), assigns global indices, and formats the tree to be compact and machine-readable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[0] StaticText "Settings" (20,58 350x44)
[1] Button "Wi-Fi" (20,120 350x44)
[2] Switch "Wi-Fi" value=1 (330,120 51x31)
[3] Button "Bluetooth" (20,170 350x44)
[4] Button "General" (20,220 350x44)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every element has a type, text, bounds, and an index. The agent can reason about full screens without context window pressure. This is what we mean by agent-optimized: the snapshot is a first-class input to an LLM, not an afterthought.&lt;/p&gt;

&lt;p&gt;For apps with custom-rendered UIs — &lt;a href="https://reactnative.dev/" rel="noopener noreferrer"&gt;React Native&lt;/a&gt;, &lt;a href="https://flutter.dev/" rel="noopener noreferrer"&gt;Flutter&lt;/a&gt;, games — where the accessibility tree is sparse, MobAI offers an OCR fallback that returns recognized text with tap coordinates. The agent always has something to work with.&lt;/p&gt;

&lt;p&gt;When visual context is needed, MobAI captures lightweight, compressed screenshots sized for LLM consumption — small enough to reason about layout without blowing the token budget. But most of the time, the UI tree and OCR are enough. Structure is cheaper than pixels.&lt;/p&gt;

&lt;h2&gt;
  
  
  The MobAI DSL: 30+ Mobile Automation Actions in One Tool
&lt;/h2&gt;

&lt;p&gt;Most MCP-based agent tools register a separate function for each capability: one for tap, one for swipe, one for type, one for screenshot. This explodes the tool surface, confuses the LLM's tool selection, and wastes tokens on schema overhead.&lt;/p&gt;

&lt;p&gt;MobAI takes a different approach. All mobile device automation flows through a single &lt;code&gt;execute_dsl&lt;/code&gt; call — a JSON script with a &lt;code&gt;steps&lt;/code&gt; array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"open_app"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"bundle_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"com.example.myapp"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wait_for"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"stable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"timeout_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text_contains"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sign In"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user@test.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"near"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text_contains"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"direction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"below"&lt;/span&gt;&lt;span class="p"&gt;}}},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Continue"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wait_for"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"stable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"timeout_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"observe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"include"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ui_tree"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"on_fail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"strategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"retry"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"max_retries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One call. Opens the app, navigates a login flow, waits for the screen to settle, and returns the updated UI tree. This unified approach to mobile automation eliminates context switching and reduces token overhead — critical for agents running complex test flows.&lt;/p&gt;

&lt;p&gt;The DSL covers taps, swipes, scrolls, drags, pinches, text input, assertions, screenshots, screen recording, web automation inside WebViews, performance metrics — over 30 action types. Agents learn one tool and can do everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Semantic Predicates: Finding Mobile UI Elements Without Coordinates
&lt;/h2&gt;

&lt;p&gt;The core innovation in MobAI's DSL is the predicate system. Instead of hardcoding coordinates or XPath expressions, agents describe what they're looking for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text_contains"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Settings"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"button"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"near"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text_contains"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Password"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"direction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"below"&lt;/span&gt;&lt;span class="p"&gt;}}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text_regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d+ results"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"bounds_hint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"top_half"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Predicates support text matching (exact, substring, regex), element types, accessibility labels, spatial relationships (&lt;code&gt;near&lt;/code&gt; with direction and distance), screen regions (&lt;code&gt;bounds_hint&lt;/code&gt;), and disambiguation by index. They work identically on iOS and Android. The agent never writes platform-specific code.&lt;/p&gt;

&lt;p&gt;This predicate-based approach is the foundation of agent-driven mobile test automation — the agent describes intent, and MobAI resolves it at runtime. That's what separates AI-powered mobile automation from traditional scripting.&lt;/p&gt;

&lt;h2&gt;
  
  
  From AI Exploration to Deterministic Mobile Tests
&lt;/h2&gt;

&lt;p&gt;AI agents are naturally exploratory. They observe a screen, reason about it, take an action, observe again. That's great for discovery — but eventually you want deterministic, repeatable test cases that run in CI.&lt;/p&gt;

&lt;p&gt;MobAI bridges this gap with &lt;code&gt;.mob&lt;/code&gt; scripts — a human-readable, line-based format for cross-platform mobile automation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight robot_framework"&gt;&lt;code&gt;# Tags: smoke, auth&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;On-Fail:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;retry&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;open&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;com.example.myapp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;wait&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;stable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;3000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;tap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"Sign&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;In"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"user@test.com"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;near&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"Email"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;below&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"password123"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;near&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"Password"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;below&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;tap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"Continue"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;wait&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;stable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;5000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;assert&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;exists&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"Welcome&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;back"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each line maps to one DSL step. An agent can create these scripts during exploration, then replay them deterministically. They're diffable in git, reviewable by humans, and executable in CI through MobAI's testing runner. The workflow is: agent explores → agent writes &lt;code&gt;.mob&lt;/code&gt; script → human reviews → CI runs → regressions caught.&lt;/p&gt;

&lt;p&gt;Platform-specific blocks handle iOS and Android divergence in the same file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight robot_framework"&gt;&lt;code&gt;#[ios]&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;tap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"Allow"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#[end]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;#[android]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;tap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;"While&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;using&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;app"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#[end]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Detecting Animation Bugs and UI Transition Issues Automatically
&lt;/h2&gt;

&lt;p&gt;Unit tests verify logic. Snapshot tests verify layout. But neither catches a janky navigation transition, a white flash between screens, or a loading spinner that stutters before disappearing. These are visual, temporal bugs — and they've historically required a human staring at a phone to spot.&lt;/p&gt;

&lt;p&gt;MobAI's &lt;code&gt;record_start&lt;/code&gt; / &lt;code&gt;record_stop&lt;/code&gt; actions capture screenshots as fast as the device can produce them while other actions execute. Frames are grabbed continuously in the background — every capture starts the moment the previous one finishes. When the recording stops, all frames are saved to disk and run through computer vision analysis that flags anomalies automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Jump&lt;/strong&gt; — a sudden large visual change between consecutive frames (layout snapping instead of animating)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flash&lt;/strong&gt; — a brief brightness spike, like a white or black frame that appears for a single capture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stutter&lt;/strong&gt; — frame N and frame N+2 look nearly identical, but frame N+1 is different (a flicker)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structural change&lt;/strong&gt; — content shifts that are subtle in raw pixels but change the texture of the screen (catches dark-on-dark transitions that pixel diffs miss)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incoherent motion&lt;/strong&gt; — blocks on screen moving in inconsistent directions (layout jump vs. smooth animation)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"record_start"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Next"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wait_for"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Welcome"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"timeout_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"record_stop"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result includes a &lt;code&gt;transition_hints&lt;/code&gt; array — each hint tells the agent which frames, what type of anomaly, where on screen, and how severe. The agent doesn't need to eyeball 40 frames. It reads the transition hints to find flagged anomalies, then opens the specific frame screenshots to visually confirm whether it's a real issue or a false positive.&lt;/p&gt;

&lt;p&gt;This turns animation quality from a subjective human judgment into something an AI agent can measure, flag, and track across releases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Built-In Failure Handling for Reliable Mobile Automation
&lt;/h2&gt;

&lt;p&gt;Mobile automation is flaky by nature. Screens take time to load. Animations play. Network calls hang. Traditional mobile testing puts all the retry logic on the caller — which means the agent has to reason about failure handling at every step.&lt;/p&gt;

&lt;p&gt;MobAI moves failure handling into the DSL itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"on_fail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"strategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"retry"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max_retries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"retry_delay_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fallback_strategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"strategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"skip"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five strategies: &lt;code&gt;abort&lt;/code&gt;, &lt;code&gt;skip&lt;/code&gt;, &lt;code&gt;retry&lt;/code&gt;, &lt;code&gt;replan&lt;/code&gt; (ask the agent to re-evaluate), and &lt;code&gt;require_user&lt;/code&gt; (pause for human input). These can be set per step or for the entire script, with fallback chains. The agent sends its intent; MobAI handles the resilience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automating Native Apps and WebViews in the Same Flow
&lt;/h2&gt;

&lt;p&gt;Modern apps aren't purely native. WebViews are everywhere — payment flows, embedded content, hybrid frameworks. MobAI handles both native and web automation through the same DSL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"select_web_context"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"page_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"web"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"css_selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#checkout-btn"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"execute_js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"script"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"document.title"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent switches between native and web automation seamlessly. Native chrome (navigation bars, tab bars) uses accessibility-based targeting. In-page content uses CSS selectors and JavaScript execution. Same DSL, same call, same device.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI Agents Can Do with Mobile Device Automation
&lt;/h2&gt;

&lt;p&gt;Once an AI agent has reliable automated mobile testing and device automation capabilities, the applications go well beyond simple test scripts:&lt;/p&gt;

&lt;h3&gt;
  
  
  Build and Verify in the Same Session
&lt;/h3&gt;

&lt;p&gt;The agent writes a feature in your codebase, then launches the app on a connected device to visually verify it works — not just that it compiles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Autonomous Mobile QA
&lt;/h3&gt;

&lt;p&gt;Describe a test in natural language. The agent translates it to a &lt;code&gt;.mob&lt;/code&gt; script, runs it, captures screenshots at each step, and reports pass/fail with visual evidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accessibility Audits on Real Devices
&lt;/h3&gt;

&lt;p&gt;The agent navigates every screen, inspects the accessibility tree for missing labels, small tap targets, and broken semantics — then writes a report.&lt;/p&gt;

&lt;h3&gt;
  
  
  Competitor Research on Real Devices
&lt;/h3&gt;

&lt;p&gt;Install a competitor's app, walk through their onboarding, screenshot their paywall, and generate a comparison report. On a real device, not a browser.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated Localization Testing
&lt;/h3&gt;

&lt;p&gt;Switch device language, navigate key flows, capture screenshots per locale, flag truncated strings and layout breaks.&lt;/p&gt;

&lt;p&gt;These aren't hypothetical. They're workflows MobAI users run today through Claude Code and other MCP-compatible agents.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Ready to give your AI agent mobile device control?&lt;/strong&gt; &lt;a href="https://mobai.run" rel="noopener noreferrer"&gt;Download MobAI&lt;/a&gt; — connect a device, start the bridge, and your agent has a phone. No Appium, no Selenium, no YAML.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is MobAI?
&lt;/h3&gt;

&lt;p&gt;MobAI is a desktop application that enables AI agents to control real iOS and Android devices. It exposes mobile device automation through an MCP server and HTTP API, allowing tools like Claude Code, Cursor, and Codex to tap, swipe, type, take screenshots, and run automated tests on physical phones and simulators.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is MobAI different from Appium?
&lt;/h3&gt;

&lt;p&gt;Appium was designed for human-written test scripts with verbose XML page sources and XPath selectors. MobAI was designed for AI agents, with compact accessibility tree snapshots, semantic predicate-based element targeting, batched DSL execution, and built-in failure handling — all optimized to fit within an LLM's context window.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need a real device, or does MobAI work with simulators?
&lt;/h3&gt;

&lt;p&gt;MobAI works with both. You can connect physical iOS and Android devices via USB, or use iOS Simulators and Android Emulators. Real devices are recommended for testing camera, biometrics, and hardware-specific behavior.&lt;/p&gt;

&lt;h3&gt;
  
  
  What AI agents work with MobAI?
&lt;/h3&gt;

&lt;p&gt;Any AI agent that supports &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt; or HTTP can use MobAI. This includes &lt;a href="https://docs.anthropic.com/en/docs/claude-code/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, &lt;a href="https://www.cursor.com/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, &lt;a href="https://openai.com/index/introducing-codex/" rel="noopener noreferrer"&gt;Codex&lt;/a&gt;, and any custom agent built with the &lt;a href="https://docs.anthropic.com/en/docs/agents-and-tools/agent-sdk" rel="noopener noreferrer"&gt;Claude Agent SDK&lt;/a&gt; or similar frameworks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can MobAI automate WebViews inside native apps?
&lt;/h3&gt;

&lt;p&gt;Yes. MobAI supports both native UI automation (via accessibility trees) and web automation inside WebViews (via CSS selectors and JavaScript execution). You can switch between native and web contexts within the same DSL script.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>mcp</category>
      <category>mobile</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
