DEV Community

krish pavuluri
krish pavuluri

Posted on

How We Built a Chat AI Agent Into Live Device Testing Sessions

We ship a cloud device farm — real Android and iOS devices you can control from a browser. Our users are mostly SDETs and QA engineers running Appium tests.

The problem we kept hearing: finding the right locator wastes too much time.

The typical workflow looks like this:

  1. Open a device session
  2. Notice an element on screen
  3. Switch to Appium Inspector
  4. Inspect the element tree
  5. Copy the locator
  6. Paste it into your test
  7. Run the test, fail, go back to step 3

We wanted to collapse that loop. So we built a Chat AI Agent that lives inside the device session.

What It Does

The agent can see the live device screen. You can ask it in plain English:

  • "What's the XPath for the equals button?"
  • "Give me a UIAutomator2 selector for the digit 7"
  • "What's the Accessibility ID of the login button?"

And it responds instantly with working locators — in whatever language you're using (Java, Python, Swift, Kotlin, WebDriverIO).

No switching tools. No Appium Inspector. Just ask.

How We Built It

Screen visibility

Our sessions already stream device screens via WebRTC. We grab frames from the stream at the point of the user's question — a single screenshot at query time. This keeps latency low and avoids sending a continuous video feed to the model.

The model

We send the screenshot + user message to a vision-capable LLM. The prompt is structured to return locators in a specific format — we parse the response and render it with syntax highlighting in the UI.

Locator formats

We support:

  • XPath
  • CSS Selector
  • UIAutomator2 (Android)
  • XCUITest (iOS)
  • Accessibility ID

The model is instructed to return all applicable formats for the visible element, not just one.

Code output

Users pick their language from a dropdown (Java, Python, Swift, Kotlin, WebDriverIO). We wrap the locator in idiomatic framework code for each:

# Python / Appium
driver.find_element(AppiumBy.XPATH, "//android.widget.Button[@content-desc='equals']")
Enter fullscreen mode Exit fullscreen mode
// Java / Appium
driver.findElement(By.xpath("//android.widget.Button[@content-desc='equals']"));
Enter fullscreen mode Exit fullscreen mode

UI integration

The panel sits alongside the device stream — it doesn't overlay the screen. Users can keep testing while asking questions. The conversation history stays within the session.

What We Learned

The hardest part wasn't the AI integration — it was the prompt engineering. Getting the model to return clean, parseable locator output (not prose with embedded code) required iteration.

We also found that grounding the model on the visible screen state (not a DOM or accessibility tree) made responses feel more natural. Users think in terms of what they see, not what's in the XML hierarchy.

Try It

The Chat AI Agent is live now in the RobotActions portal. Free trial available.

We'd love feedback from anyone doing Appium or mobile automation — especially if you've built similar tooling. Drop a comment or reach out directly.

Top comments (0)