📌 Introduction
Over the past decade, test automation has evolved significantly — from handwritten scripts, to the Page Object Model, and now to no-code platforms.
But element selection remains the most tedious and brittle part of UI testing.
Frameworks like Selenium, Playwright, or Puppeteer are great at automating actions like .click() or .type(). However, they still rely heavily on manual selectors (XPath or CSS) to locate elements — and that’s where the pain lies.
Thanks to AI — especially large language models (LLMs) — we now have the ability to separate:
What to do (natural language instruction)
from
Where to do it (element selector)
In this tutorial, you’ll build a working solution using:
- 🧠 Talk2Dom — to convert natural language into precise element selectors
- 🧪 Selenium — to execute actions in a real browser
This means you can now write a test like:
“Find the login button”
…and watch it execute automatically.
🛠️ Quickstart: One-Click Setup with Docker Compose
After cloning the Talk2Dom repository, everything is pre-configured.
Simply run:
docker compose up -d
This starts:
- The Talk2Dom backend (API server)
- A database store all projects/tokens
Once up and running, you’re ready to test!
🔁 End-to-End Test Script (e2e.py)
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.remote.remote_connection import RemoteConnection
import time
# 1. Start Selenium Chrome
driver = webdriver.Remote(
command_executor=RemoteConnection('http://localhost:4444/wd/hub', resolve_ip=False),
desired_capabilities=DesiredCapabilities.CHROME
)
driver.get("https://example.com")
time.sleep(2) # Wait for the page to load
html = driver.page_source
# 2. Call Talk2Dom API to locate the element
headers = {
"x-api-key": "your_api_key",
"x-project-id": "your_project_id"
}
resp = requests.post("http://localhost:8000/api/v1/inference/locator", headers=headers, json={
"instruction": "find the login button",
"html": html,
"url": driver.current_url
})
data = resp.json()
selector = data["selector_value"]
# 3. Use Selenium to perform the action
el = driver.find_element(By.CSS_SELECTOR, selector)
el.click()
print("✅ Test completed.")
driver.quit()
🔐 Security Best Practices
- ✅ Deploy behind an internal firewall (optional)
- ✅ Limit HTML payload size to prevent injection attacks
- ✅ Use API key authentication if the service is public
📊 Conclusion
By combining:
- Talk2Dom for natural language → selector translation
- Selenium for real browser interaction
…we decouple what you want to test from how it’s implemented.
This clean separation leads to a powerful and lightweight AI-assisted test automation workflow.
🔥 Benefits
- Boosts developer productivity
- Enables QA teams to write tests in plain English
- Lays the groundwork for a low-code automation platform
Top comments (0)