I got tired of manually inspecting HTML to find selectors. So I taught my framework to do it instead.
Here’s a question that kept me up at night:
Why am I spending more time finding selectors than writing actual tests?
I watched myself burn 30 minutes on a simple login test — not writing the test itself, but hunting through DevTools for the right selectors, creating fixture files, and crafting test data that would actually work.
What if the framework could just… look at the page and figure it out?
The Problem Nobody Talks About
Here’s the dirty secret of test automation: writing the actual test is the easy part.
The hard part? Finding #username vs input[name="user"] vs .login-field. Creating realistic test data. Building fixture files that match the actual form structure.
Every new page means:
Open DevTools
Inspect elements
Copy selectors
Hope they’re stable
Create JSON fixtures
Hope nothing changes tomorrow
Most “AI-powered” testing tools focus on running tests or analyzing failures. But what about the beginning — the tedious setup that drains your time before you write a single assertion?
The Experiment: Teaching AI to See
The idea was simple but audacious: give the AI a URL and let it figure out everything else.
Not mock data. Not hardcoded selectors. Real selectors from real HTML.
Here’s what I wanted:
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login
And the framework should:
Fetch the actual page
Analyze the HTML structure
Extract real, working selectors
Generate meaningful test cases
Save everything as a Cypress fixture
Then generate tests that use that data
Sounds impossible? I thought so too.
How It Actually Works
The magic happens in about 50 lines of Python:
def generate_test_data_from_url(url: str, requirements: list) -> tuple:
# Step 1: Fetch the real page
resp = requests.get(url, timeout=10, headers={'User-Agent': 'Mozilla/5.0'})
html = resp.text[:5000] # First 5KB is usually enough
# Step 2: Ask AI to analyze it
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = f"""Analyze this HTML and generate test data.
URL: {url}
HTML: {html}
Return JSON with:
- Real selectors from the HTML
- Valid test case with working data
- Invalid test case for error handling
"""
# Step 3: Parse and save as fixture
test_data = json.loads(llm.invoke(prompt).content)
with open("cypress/fixtures/url_test_data.json", 'w') as f:
json.dump(test_data, f, indent=2)
return test_data
The AI doesn’t guess. It reads the actual HTML and extracts what’s really there.
What The AI Sees vs What It Returns
When I point it at a login page, here’s the actual flow:
Input: Just a URL
--url https://the-internet.herokuapp.com/login
What the AI analyzes:
<input type="text" id="username" name="username">
<input type="password" id="password" name="password">
<button type="submit" class="radius">Login</button>
What it generates:
{
"url": "https://the-internet.herokuapp.com/login",
"selectors": {
"username": "#username",
"password": "#password",
"submit": "button[type='submit']"
},
"test_cases": [
{
"name": "valid_test",
"username": "tomsmith",
"password": "SuperSecretPassword!",
"expected": "success"
},
{
"name": "invalid_test",
"username": "wronguser",
"password": "badpassword",
"expected": "error"
}
]
}
Real selectors. Actual test data. Zero manual inspection.
The Generated Test Uses It All
The framework then generates a Cypress test that consumes this fixture:
describe('Login Tests', function () {
beforeEach(function () {
cy.fixture('url_test_data').then((data) => {
this.testData = data;
});
});
it('should login with valid credentials', function () {
cy.visit(this.testData.url);
const valid = this.testData.test_cases.find(tc => tc.name === 'valid_test');
cy.get(this.testData.selectors.username).type(valid.username);
cy.get(this.testData.selectors.password).type(valid.password);
cy.get(this.testData.selectors.submit).click();
cy.url().should('include', '/secure');
});
it('should show error with invalid credentials', function () {
cy.visit(this.testData.url);
const invalid = this.testData.test_cases.find(tc => tc.name === 'invalid_test');
cy.get(this.testData.selectors.username).type(invalid.username);
cy.get(this.testData.selectors.password).type(invalid.password);
cy.get(this.testData.selectors.submit).click();
cy.get('#flash').should('contain', 'invalid');
});
});
Notice something? The selectors come from the fixture, not hardcoded in the test.
If the page changes, update the fixture. Tests stay clean.
Two Ways to Feed Data
Sometimes you already have test data. Maybe from a previous run. Maybe from your team’s shared fixtures.
So I added a second option:
# Option 1: AI analyzes live URL
python qa_automation.py "Test login" --url https://example.com/login
# Option 2: Use existing JSON file
python qa_automation.py "Test login" --data cypress/fixtures/my_data.json
Same test generation. Different data sources. Your choice.
The Part That Surprised Me
I expected the AI to find basic selectors. What I didn’t expect was how well it understood context.
When analyzing a registration form, it didn’t just find #email — it generated test data like:
Valid: testuser@example.com
Invalid: not-an-email
For password fields:
Valid: SecurePass123!
Invalid: 123 (too short)
The AI understood what kind of data each field expected. Not because I told it — because it read the HTML attributes, labels, and validation patterns.
The Gotcha: Fixtures Need function() Syntax
One thing tripped me up for hours. Cypress fixtures with this.testData require a specific pattern:
// WRONG - arrow functions don't have 'this'
describe('Test', () => {
beforeEach(() => {
cy.fixture('data').then((d) => { this.testData = d; }); // undefined!
});
});
// RIGHT - function() preserves 'this'
describe('Test', function () {
beforeEach(function () {
cy.fixture('data').then((data) => { this.testData = data; });
});
it('works', function () {
console.log(this.testData); // actual data!
});
});
The framework now enforces this pattern in generated tests. Lesson learned the hard way.
What This Means For Your Workflow
Before:
Open page in browser
Inspect elements manually
Copy selectors to notepad
Create fixture JSON by hand
Write test using those selectors
Fix typos in selectors
Run test
Debug why selectors don’t work
After:
Run one command with URL
Framework handles the rest
That’s not an exaggeration. The 30-minute login test? Under 2 minutes now.
Try It Yourself
The framework is open source:
git clone https://github.com/user/cypress-natural-language-tests
cd cypress-natural-language-tests
pip install -r requirements.txt
Set your API key:
export OPENAI_API_KEY=your_key_here
export OPENROUTER_API_KEY=your_openrouter_api_key_here
Generate tests from any URL:
python qa_automation.py "Test the login form" --url https://the-internet.herokuapp.com/login
Check what it created:
cat cypress/fixtures/url_test_data.json
cat cypress/e2e/generated/*.cy.js
The Bigger Picture
We’re at an interesting moment in test automation. The tooling is getting smarter, but the real breakthrough isn’t replacing testers — it’s eliminating the tedious parts.
Finding selectors is tedious. Creating fixture files is tedious. Debugging why #submit-btn worked yesterday but not today is tedious.
Let AI handle tedious. Let humans handle important.
That’s the framework I’m building.
Follow for more AI + QA experiments:
GitHub: https://github.com/aiqualitylab/cypress-natural-language-tests.git


Top comments (0)