📌 Introduction
Over the past decade, test automation has evolved significantly — from handwritten scripts, to the Page Object Model, and now to no-code platforms.
But element selection remains the most tedious and brittle part of UI testing.
Frameworks like Selenium, Playwright, or Puppeteer are great at automating actions like .click()
or .type()
. However, they still rely heavily on manual selectors (XPath or CSS) to locate elements — and that’s where the pain lies.
Thanks to AI — especially large language models (LLMs) — we now have the ability to separate:
What to do (natural language instruction)
from
Where to do it (element selector)
In this tutorial, you’ll build a working solution using:
- 🧠 Talk2Dom — to convert natural language into precise element selectors
- 🧪 Selenium — to execute actions in a real browser
This means you can now write a test like:
“Find the login button”
…and watch it execute automatically.
🛠️ Quickstart: One-Click Setup with Docker Compose
After cloning the Talk2Dom repository, everything is pre-configured.
Simply run:
docker compose up -d
This starts:
- The Talk2Dom backend (API server)
- A database store all projects/tokens
Once up and running, you’re ready to test!
🔁 End-to-End Test Script (e2e.py
)
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.remote.remote_connection import RemoteConnection
import time
# 1. Start Selenium Chrome
driver = webdriver.Remote(
command_executor=RemoteConnection('http://localhost:4444/wd/hub', resolve_ip=False),
desired_capabilities=DesiredCapabilities.CHROME
)
driver.get("https://example.com")
time.sleep(2) # Wait for the page to load
html = driver.page_source
# 2. Call Talk2Dom API to locate the element
headers = {
"x-api-key": "your_api_key",
"x-project-id": "your_project_id"
}
resp = requests.post("http://localhost:8000/api/v1/inference/locator", headers=headers, json={
"instruction": "find the login button",
"html": html,
"url": driver.current_url
})
data = resp.json()
selector = data["selector_value"]
# 3. Use Selenium to perform the action
el = driver.find_element(By.CSS_SELECTOR, selector)
el.click()
print("✅ Test completed.")
driver.quit()
🔐 Security Best Practices
- ✅ Deploy behind an internal firewall (optional)
- ✅ Limit HTML payload size to prevent injection attacks
- ✅ Use API key authentication if the service is public
📊 Conclusion
By combining:
- Talk2Dom for natural language → selector translation
- Selenium for real browser interaction
…we decouple what you want to test from how it’s implemented.
This clean separation leads to a powerful and lightweight AI-assisted test automation workflow.
🔥 Benefits
- Boosts developer productivity
- Enables QA teams to write tests in plain English
- Lays the groundwork for a low-code automation platform
Top comments (0)