DEV Community

冯键(FENG JIAN)
冯键(FENG JIAN)

Posted on

🚀 Build an AI-Powered Test Automation Platform from Scratch

📌 Introduction

Over the past decade, test automation has evolved significantly — from handwritten scripts, to the Page Object Model, and now to no-code platforms.

But element selection remains the most tedious and brittle part of UI testing.

Frameworks like Selenium, Playwright, or Puppeteer are great at automating actions like .click() or .type(). However, they still rely heavily on manual selectors (XPath or CSS) to locate elements — and that’s where the pain lies.

Thanks to AI — especially large language models (LLMs) — we now have the ability to separate:

What to do (natural language instruction)
from
Where to do it (element selector)

In this tutorial, you’ll build a working solution using:

  • 🧠 Talk2Dom — to convert natural language into precise element selectors
  • 🧪 Selenium — to execute actions in a real browser

This means you can now write a test like:

“Find the login button”

…and watch it execute automatically.


🛠️ Quickstart: One-Click Setup with Docker Compose

After cloning the Talk2Dom repository, everything is pre-configured.

Simply run:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

This starts:

  • The Talk2Dom backend (API server)
  • A database store all projects/tokens

Once up and running, you’re ready to test!


🔁 End-to-End Test Script (e2e.py)

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.remote.remote_connection import RemoteConnection
import time

# 1. Start Selenium Chrome
driver = webdriver.Remote(
    command_executor=RemoteConnection('http://localhost:4444/wd/hub', resolve_ip=False),
    desired_capabilities=DesiredCapabilities.CHROME
)
driver.get("https://example.com")
time.sleep(2)  # Wait for the page to load
html = driver.page_source

# 2. Call Talk2Dom API to locate the element
headers = {
    "x-api-key": "your_api_key",
    "x-project-id": "your_project_id"
}
resp = requests.post("http://localhost:8000/api/v1/inference/locator", headers=headers, json={
    "instruction": "find the login button",
    "html": html,
    "url": driver.current_url
})
data = resp.json()
selector = data["selector_value"]

# 3. Use Selenium to perform the action
el = driver.find_element(By.CSS_SELECTOR, selector)
el.click()

print("✅ Test completed.")
driver.quit()
Enter fullscreen mode Exit fullscreen mode

🔐 Security Best Practices

  • ✅ Deploy behind an internal firewall (optional)
  • ✅ Limit HTML payload size to prevent injection attacks
  • ✅ Use API key authentication if the service is public

📊 Conclusion

By combining:

  • Talk2Dom for natural language → selector translation
  • Selenium for real browser interaction

…we decouple what you want to test from how it’s implemented.

This clean separation leads to a powerful and lightweight AI-assisted test automation workflow.

🔥 Benefits

  • Boosts developer productivity
  • Enables QA teams to write tests in plain English
  • Lays the groundwork for a low-code automation platform

Top comments (0)