DEV Community

Let's Automate 🛡️ for AI and QA Leaders

Posted on • Originally published at blog.gopenai.com on

AI Natural Language Tests — Dual Framework Test Automation with Cypress & Playwright

AI Natural Language Tests — Dual Framework Test Automation with Cypress & Playwright

Open-source AI test automation framework with natural language test generation, self-healing, and dual framework support

Writing end-to-end tests is one of those things every team knows they should do, but nobody really enjoys doing. You stare at a login page, figure out the selectors, write the steps, handle the waits, and repeat this for every feature. I kept thinking — what if I could just say what I want to test, and let AI handle the rest?

That’s exactly what I built.


Architecture

What Is It?

ai-natural-language-tests is an open-source tool that takes a plain English description of a test scenario and generates a fully working Cypress or Playwright test file. No templates. No copy-pasting. You describe the test, point it at a URL, and it writes the code.

Here’s what a typical command looks like:

python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login
Enter fullscreen mode Exit fullscreen mode

That single line does everything — fetches the page, reads the HTML, picks up the right selectors, and generates a complete test file you can run immediately.

Want Playwright instead of Cypress? Just add a flag:

python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login --framework playwright
Enter fullscreen mode Exit fullscreen mode

How It Actually Works

Under the hood, the tool runs a 5-step workflow built with LangGraph:


Complete Workflow

Step 1 — It sets up a vector store. Think of this as a memory bank for test patterns.

Step 2 — It fetches the target URL, pulls the HTML, and extracts useful selectors like input fields, buttons, and links.

Step 3 — It searches the vector store for similar tests it has generated before. If you tested a login page last week, it remembers the patterns.

Step 4 — It sends everything to GPT-4 along with a carefully crafted prompt — the description, the selectors, and any matching patterns from history. The AI generates the actual test code.

Step 5 — Optionally, it runs the test right away using Cypress or Playwright.

The interesting part is Step 3. Every test the tool generates gets saved as a pattern. Over time, it builds a library of patterns and uses them to write better tests. The first test for a login page might be decent. The tenth one will be much better because it has learned from all the previous ones.

Why Two Frameworks?

I started with Cypress because it’s what most teams I’ve worked with use. But Playwright has been gaining serious traction — especially for teams that need multi-browser testing or prefer TypeScript.

So in v3.1, I added full Playwright support. The tool uses different prompts for each framework. The Cypress prompt focuses on chaining commands and cy.get() patterns. The Playwright prompt covers locators, async/await, network interception, multi-tab handling, and all the TypeScript-specific patterns.

You pick the framework. The AI adapts.

The Part I Didn’t Expect — Failure Analysis

While building this, I realized that generating tests is only half the problem. Tests fail. And reading Cypress or Playwright error logs can be painful, especially for someone newer to the frameworks.

So I added an AI-powered failure analyzer:

python qa_automation.py --analyze "CypressError: Timed out retrying after 4000ms"
Enter fullscreen mode Exit fullscreen mode

It reads the error, explains what went wrong in plain language, and suggests a fix. You can also point it at a log file. It’s a small feature but it has saved me a surprising amount of time.

Running It in CI/CD

The tool comes with a GitHub Actions workflow out of the box. You can trigger it manually from the Actions tab — type your test description, provide a URL, pick Cypress or Playwright, and it runs the full pipeline. Generate, execute, and get results — all inside your CI.


CI/CD PIPELINE

This makes it practical for teams that want to try AI-generated tests without changing their existing setup. Just add the workflow and trigger it when you need a new test.

What I Learned Building This

A few things surprised me along the way:

Prompts matter more than the model. I spent more time refining the system prompts than on any other part of the codebase. A well-structured prompt with clear constraints produces dramatically better test code than a vague one, regardless of which GPT model you use.

Pattern learning is underrated. The vector store approach turned out to be more useful than I expected. When the tool has seen similar pages before, the generated tests are noticeably more accurate. It picks up things like common selector patterns and assertion styles from its history.

Keeping frameworks separate is important. Early on, I tried using a single generic prompt for both Cypress and Playwright. The results were mediocre for both. Dedicated prompts for each framework made a huge difference in output quality.

Try It Out

The project is open source and ready to use:

GitHub: github.com/aiqualitylab/ai-natural-language-tests

First Release —https://github.com/aiqualitylab/ai-natural-language-tests/releases/tag/v2026.02.01

Setup takes about five minutes — clone the repo, install dependencies, add your OpenAI API key, and you’re generating tests.

If you work in QA or test automation and you’ve been curious about how AI fits into your workflow, give it a try. I’d love to hear what you think.

Exploring how AI can make quality engineering more practical and less tedious. I write about this stuff regularly at AI Quality Engineer .


Top comments (0)