<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Let's Automate 🛡️</title>
    <description>The latest articles on DEV Community by Let's Automate 🛡️ (@letsautomate).</description>
    <link>https://dev.to/letsautomate</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3582938%2Fd47e0b42-428a-4790-af53-79366dc1e7fc.png</url>
      <title>DEV Community: Let's Automate 🛡️</title>
      <link>https://dev.to/letsautomate</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/letsautomate"/>
    <language>en</language>
    <item>
      <title>AI-Assisted Testing vs AI Agents vs AI Agent Skills: A Practical Journey Through All Three</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Sat, 07 Mar 2026 13:08:54 +0000</pubDate>
      <link>https://dev.to/qa-leaders/ai-assisted-testing-vs-ai-agents-vs-ai-agent-skills-a-practical-journey-through-all-three-48dj</link>
      <guid>https://dev.to/qa-leaders/ai-assisted-testing-vs-ai-agents-vs-ai-agent-skills-a-practical-journey-through-all-three-48dj</guid>
      <description>&lt;h4&gt;
  
  
  Most teams are only using one layer of AI in testing. Here is what the full picture looks like — and how I built across all three.
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AOHLYcxWt1ZlY-T2z" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AOHLYcxWt1ZlY-T2z" width="1024" height="1383"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo by Possessed Photography on Unsplash&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Before any of this made sense, I had to answer a more basic question: what does AI QA Engineering actually mean?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://medium.com/ai-in-quality-assurance/what-is-ai-qa-engineering-and-why-qaes-sdets-and-qa-automation-engineers-should-pay-attention-e8d26e460153" rel="noopener noreferrer"&gt;What is AI QA Engineering — and Why QAEs, SDETs, and QA Automation Engineers Should Pay Attention&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;And before touching AI at all — the foundations still matter. Clean BDD tests. Reports that stakeholders can read.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://aiqualityengineer.com/how-to-add-beautiful-bdd-test-reports-to-your-reqnroll-project-using-expressium-livingdoc-aafaf799523d" rel="noopener noreferrer"&gt;How to Add Beautiful BDD Test Reports to Your Reqnroll Project Using Expressium LivingDoc&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Before you automate smarter, you have to know what good looks like.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Layer 1 — AI-Assisted Testing
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;AI speeds you up. You are still driving.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is where most teams start — and where most teams stay.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You write a prompt, get a test, review it, ship it. AI is a productivity multiplier. GitHub Copilot suggests the next line. ChatGPT drafts your test cases. Claude rewrites a flaky selector. You are in control at every step.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The catch? A bad prompt gives you a bad test — and it will look convincing. Garbage in, confident garbage out.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://blog.gopenai.com/crafting-effective-prompts-for-genai-in-software-testing-e5f76d2ccbf6" rel="noopener noreferrer"&gt;Crafting Effective Prompts for GenAI in Software Testing&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I built &lt;a href="https://github.com/aiqualitylab/ai-natural-language-tests" rel="noopener noreferrer"&gt;&lt;strong&gt;ai-natural-language-tests&lt;/strong&gt;&lt;/a&gt; at this layer. Give it a plain English requirement, and it generates Cypress or Playwright tests using GPT-4, LangChain, and LangGraph. Every output still needs your eyes on it — but the heavy lifting is done.&lt;/p&gt;

&lt;p&gt;Same idea with &lt;a href="https://github.com/aiqualitylab/JIRA-QA-Automation-with-AI" rel="noopener noreferrer"&gt;&lt;strong&gt;JIRA-QA-Automation-with-AI&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;:&lt;/strong&gt; feed it a JIRA story with acceptance criteria, and BDD test scripts come out the other side. Human judgment still required at the end. You own every decision.&lt;/p&gt;

&lt;p&gt;That last part is the definition of this layer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Layer 2 — AI Agents for Testing
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;You give the goal. The agent executes, adapts, and decides.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;At this layer, you stop steering and start delegating.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You set the objective. The agent figures out how to get there — and when something breaks mid-run, it handles that too. No human in the loop for every step.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/aiqualitylab/selenium-selfhealing-mcp" rel="noopener noreferrer"&gt;&lt;strong&gt;selenium-selfhealing-mcp&lt;/strong&gt;&lt;/a&gt; is a good example of what this looks like in practice. A UI change breaks a Selenium locator mid-execution. The agent inspects the DOM, finds the updated element, and keeps going — without stopping to ask you what to do. I submitted this to the Docker MCP Registry, and watching it recover from failures on its own still feels like a step-change from Layer 1.&lt;/p&gt;

&lt;p&gt;For .NET teams, &lt;a href="https://github.com/aiqualitylab/SeleniumSelfHealing.Reqnroll" rel="noopener noreferrer"&gt;&lt;strong&gt;SeleniumSelfHealing.Reqnroll&lt;/strong&gt;&lt;/a&gt; does the same with C#, NUnit, Reqnroll, and Semantic Kernel. And &lt;a href="https://github.com/aiqualitylab/IntelliTest" rel="noopener noreferrer"&gt;&lt;strong&gt;IntelliTest&lt;/strong&gt;&lt;/a&gt; takes it further — write your assertions in plain English, and the agent decides whether the application behaviour actually matches the intent.&lt;/p&gt;

&lt;p&gt;But there is a trap at this layer. Agents move fast and look thorough. It is easy to trust the output and skip the checks. Coverage looks complete — but the agent may have tested the wrong thing entirely.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://medium.com/ai-in-quality-assurance/the-ai-qa-engineers-decision-framework-when-not-to-use-ai-in-testing-5be256108750" rel="noopener noreferrer"&gt;The AI QA Engineer’s Decision Framework: When NOT to Use AI in Testing&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;And if you are using AI agents to run tests, a harder question follows: how do you know the agent’s output is correct? That is the LLM evaluation problem, and it turns out to be one of the most interesting unsolved problems in this space.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://medium.com/ai-in-quality-assurance/llm-evaluation-explained-how-to-know-if-your-ai-is-actually-working-7c17ba59c3f4" rel="noopener noreferrer"&gt;LLM Evaluation Explained: How to Know If Your AI Is Actually Working&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3 — AI Agent Skills
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Not a tool. Not an agent. Expertise that travels.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Layer 3 is the one most people have not thought about yet.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Here is the pattern I kept running into: every new agent project started from scratch. New codebase, new prompts, same underlying knowledge — how to read a requirement, what makes a test meaningful, when to flag a risk. The expertise was always being rebuilt. That seemed wrong.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A skill is a portable, encoded unit of expertise. It is not tied to one agent or one project. Any compatible agent can load it and apply it — without rebuilding the logic again. You build it once, and it travels.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://medium.com/ai-in-quality-assurance/github-copilot-agent-skills-teaching-ai-your-repository-patterns-01168b6d7a25" rel="noopener noreferrer"&gt;GitHub Copilot Agent Skills: Teaching AI Your Repository Patterns&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://github.com/aiqualitylab/vibe-coding-checklist" rel="noopener noreferrer"&gt;&lt;strong&gt;vibe-coding-checklist&lt;/strong&gt;&lt;/a&gt; applies the same idea to AI code review — a shared quality framework that any team or any agent can use consistently.&lt;/p&gt;

&lt;p&gt;The shift in thinking is subtle but significant. At Layer 1, you build prompts and tools. At Layer 2, you build goals and trust boundaries. At Layer 3, you build expertise itself — in a form that outlasts any single project or team.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Difference That Matters
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcctx1duwy2nixyo5ieop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcctx1duwy2nixyo5ieop.png" width="800" height="315"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;AI-Assisted Testing vs AI Agents vs AI Agent Skills&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Three layers. All called AI testing. Now you know which one you are actually in.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All repos →&lt;/em&gt; &lt;a href="https://github.com/aiqualitylab" rel="noopener noreferrer"&gt;&lt;em&gt;github.com/aiqualitylab&lt;/em&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;More writing →&lt;/em&gt; &lt;a href="https://aiqualityengineer.com/" rel="noopener noreferrer"&gt;&lt;em&gt;aiqualityengineer.com&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>testautomation</category>
      <category>softwareengineering</category>
      <category>artificialintelligen</category>
      <category>agents</category>
    </item>
    <item>
      <title>The GitHub Copilot Features That Are Quietly Draining Your Premium Requests</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Thu, 19 Feb 2026 17:19:23 +0000</pubDate>
      <link>https://dev.to/qa-leaders/the-github-copilot-features-that-are-quietly-draining-your-premium-requests-i34</link>
      <guid>https://dev.to/qa-leaders/the-github-copilot-features-that-are-quietly-draining-your-premium-requests-i34</guid>
      <description>&lt;h4&gt;
  
  
  &lt;em&gt;10 optimisations most developers miss — including why the Copilot Coding Agent beats Agent Mode Chat every time&lt;/em&gt;
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Most developers hit their monthly limit in the first week. Here’s what’s actually happening under the hood — and how to work smarter before it happens to you.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2APnmZ7qNMCsXjh1RO" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2APnmZ7qNMCsXjh1RO" width="1024" height="683"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo by Resume Genius on Unsplash&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Before diving in, it helps to understand what GitHub Copilot actually counts as a premium request, because most developers don’t find out until it’s too late.&lt;/p&gt;

&lt;p&gt;Inline code completions on paid plans are unlimited and cost nothing. What drains your monthly allowance is everything else — Copilot Chat, Agent Mode, Copilot Code Review, Copilot CLI, and the Copilot Coding Agent.&lt;/p&gt;

&lt;p&gt;Each model also carries a multiplier. Some models are included free on paid plans. Once your allowance is gone, premium features are locked for the rest of the billing cycle.&lt;/p&gt;

&lt;p&gt;Knowing that, here’s how to make every request count.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;1. Name your functions like they’re instructions&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Inline autocomplete is unlimited on paid plans and costs nothing from your premium allowance. The more precisely you name a function, the more accurately Copilot completes the body without any Chat involved. This is your primary tool, not a fallback.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;2. Write your intent as a comment above the cursor&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A detailed comment placed directly before your cursor is treated by Copilot as an instruction. You get the same outcome as a Chat message at zero premium cost. Use this for any logic you would otherwise describe to Copilot in conversation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;3. Cycle through alternatives with&lt;/strong&gt; &lt;strong&gt;Alt+] before opening Chat&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When the first inline suggestion misses, most developers immediately reach for Chat. Before doing that, cycle through alternative suggestions. The second or third option is often exactly what’s needed — and one saved Chat message multiplies across a full day of work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;4. Disable Agent Mode when you’re not actively using it&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent Mode runs in the background and silently runs even when you’re not directing it. GitHub’s official documentation explicitly flags this as a common cause of unexpected quota drain. Disable it in your repository settings when it isn’t part of your current workflow.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;5. Use the Copilot Coding Agent for complex tasks instead of Agent Mode Chat&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is one of the least-known optimisations available. The Copilot Coding Agent — the one that creates and modifies pull requests asynchronously — counts as one premium request per full session regardless of how much work it does. Agent Mode Chat charges one premium request per message, multiplied by the model rate. For any task involving multiple files or significant implementation work, the Coding Agent is dramatically more efficient.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;6. Start a new Chat thread when switching topics&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;As a conversation grows, all prior messages remain in context and contribute to token consumption. GitHub’s documentation specifically calls this out as a driver of elevated usage. When you move to a new task or a different area of your codebase, start a fresh thread rather than continuing an existing one.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;7. Understand the model multiplier before choosing one&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Before switching to a powerful model, weigh whether the capability gain justifies the cost. For most day-to-day work, it doesn’t.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;8. Use auto model selection for a built-in discount&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you enable auto model selection in Copilot Chat in VS Code, GitHub applies a 10% multiplier discount across all premium model usage. It requires no change to your workflow and the saving compounds quietly across a full month.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;9. Use&lt;/strong&gt;  &lt;strong&gt;#file references instead of&lt;/strong&gt;  &lt;strong&gt;&lt;a class="mentioned-user" href="https://dev.to/workspace"&gt;@workspace&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a class="mentioned-user" href="https://dev.to/workspace"&gt;@workspace&lt;/a&gt; scans your entire codebase on every message, consuming more than most questions require. Using #file:yourfile.ts targets exactly the context Copilot needs, which produces more focused answers with less back-and-forth and fewer requests spent getting there.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;10. Set a budget alert before your allowance runs out&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;GitHub lets you configure alerts at 75%, 90%, and 100% of any spending threshold you define. Setting a low or zero spending budget with alerts enabled means you get notified well before premium features are cut off — without risking unexpected charges. Check your current usage anytime at &lt;strong&gt;github.com/settings/billing&lt;/strong&gt; or through the Copilot icon in your IDE status bar.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Principle Underneath All of It
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Every tip here points back to the same question worth asking before you open Chat: is&lt;/em&gt; &lt;strong&gt;&lt;em&gt;there a way to get this through autocomplete instead?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Reference — &lt;a href="https://docs.github.com/en/copilot" rel="noopener noreferrer"&gt;https://docs.github.com/en/copilot&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most of the time, there is. And building that habit is what separates developers who hit the wall in week one from those who reach month end with room to spare.&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>ai</category>
      <category>development</category>
      <category>softwaredevelopment</category>
      <category>softwaretesting</category>
    </item>
    <item>
      <title>AI Natural Language Tests — Dual Framework Test Automation with Cypress &amp; Playwright</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Sun, 01 Feb 2026 16:55:23 +0000</pubDate>
      <link>https://dev.to/qa-leaders/ai-natural-language-tests-dual-framework-test-automation-with-cypress-playwright-1khp</link>
      <guid>https://dev.to/qa-leaders/ai-natural-language-tests-dual-framework-test-automation-with-cypress-playwright-1khp</guid>
      <description>&lt;h3&gt;
  
  
  AI Natural Language Tests — Dual Framework Test Automation with Cypress &amp;amp; Playwright
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Open-source AI test automation framework with natural language test generation, self-healing, and dual framework support
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;Writing end-to-end tests is one of those things every team knows they should do, but nobody really enjoys doing. You stare at a login page, figure out the selectors, write the steps, handle the waits, and repeat this for every feature. I kept thinking — what if I could just say what I want to test, and let AI handle the rest?&lt;/p&gt;

&lt;p&gt;That’s exactly what I built.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre19sjdwnfg3xlj0bw42.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre19sjdwnfg3xlj0bw42.png" width="784" height="718"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  What Is It?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://github.com/aiqualitylab/ai-natural-language-tests" rel="noopener noreferrer"&gt;&lt;strong&gt;ai-natural-language-tests&lt;/strong&gt;&lt;/a&gt; is an open-source tool that takes a plain English description of a test scenario and generates a fully working Cypress or Playwright test file. No templates. No copy-pasting. You describe the test, point it at a URL, and it writes the code.&lt;/p&gt;

&lt;p&gt;Here’s what a typical command looks like:&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;That single line does everything — fetches the page, reads the HTML, picks up the right selectors, and generates a complete test file you can run immediately.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Want Playwright instead of Cypress? Just add a flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login --framework playwright
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How It Actually Works
&lt;/h3&gt;

&lt;p&gt;Under the hood, the tool runs a 5-step workflow built with LangGraph:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9yynpcdmfm0ci9rsxkbp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9yynpcdmfm0ci9rsxkbp.png" width="784" height="1029"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Complete Workflow&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Step 1 — It sets up a vector store. Think of this as a memory bank for test patterns.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Step 2 — It fetches the target URL, pulls the HTML, and extracts useful selectors like input fields, buttons, and links.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Step 3 — It searches the vector store for similar tests it has generated before. If you tested a login page last week, it remembers the patterns.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Step 4 — It sends everything to GPT-4 along with a carefully crafted prompt — the description, the selectors, and any matching patterns from history. The AI generates the actual test code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Step 5 — Optionally, it runs the test right away using Cypress or Playwright.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The interesting part is Step 3. Every test the tool generates gets saved as a pattern. Over time, it builds a library of patterns and uses them to write better tests. The first test for a login page might be decent. The tenth one will be much better because it has learned from all the previous ones.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Why Two Frameworks?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;I started with Cypress because it’s what most teams I’ve worked with use. But Playwright has been gaining serious traction — especially for teams that need multi-browser testing or prefer TypeScript.&lt;/p&gt;

&lt;p&gt;So in v3.1, I added full Playwright support. The tool uses different prompts for each framework. The Cypress prompt focuses on chaining commands and cy.get() patterns. The Playwright prompt covers locators, async/await, network interception, multi-tab handling, and all the TypeScript-specific patterns.&lt;/p&gt;

&lt;p&gt;You pick the framework. The AI adapts.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  The Part I Didn’t Expect — Failure Analysis
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;While building this, I realized that generating tests is only half the problem. Tests fail. And reading Cypress or Playwright error logs can be painful, especially for someone newer to the frameworks.&lt;/p&gt;

&lt;p&gt;So I added an AI-powered failure analyzer:&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py --analyze "CypressError: Timed out retrying after 4000ms"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;It reads the error, explains what went wrong in plain language, and suggests a fix. You can also point it at a log file. It’s a small feature but it has saved me a surprising amount of time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Running It in CI/CD
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The tool comes with a GitHub Actions workflow out of the box. You can trigger it manually from the Actions tab — type your test description, provide a URL, pick Cypress or Playwright, and it runs the full pipeline. Generate, execute, and get results — all inside your CI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid27xcjb19ddabf6vppe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid27xcjb19ddabf6vppe.png" width="784" height="1143"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;CI/CD PIPELINE&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This makes it practical for teams that want to try AI-generated tests without changing their existing setup. Just add the workflow and trigger it when you need a new test.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  What I Learned Building This
&lt;/h3&gt;

&lt;p&gt;A few things surprised me along the way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Prompts matter more than the model.&lt;/strong&gt; I spent more time refining the system prompts than on any other part of the codebase. A well-structured prompt with clear constraints produces dramatically better test code than a vague one, regardless of which GPT model you use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern learning is underrated.&lt;/strong&gt; The vector store approach turned out to be more useful than I expected. When the tool has seen similar pages before, the generated tests are noticeably more accurate. It picks up things like common selector patterns and assertion styles from its history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keeping frameworks separate is important.&lt;/strong&gt; Early on, I tried using a single generic prompt for both Cypress and Playwright. The results were mediocre for both. Dedicated prompts for each framework made a huge difference in output quality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Try It Out
&lt;/h3&gt;

&lt;p&gt;The project is open source and ready to use:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/aiqualitylab/ai-natural-language-tests" rel="noopener noreferrer"&gt;github.com/aiqualitylab/ai-natural-language-tests&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First Release —&lt;/strong&gt;  &lt;a href="https://github.com/aiqualitylab/ai-natural-language-tests/releases/tag/v2026.02.01" rel="noopener noreferrer"&gt;https://github.com/aiqualitylab/ai-natural-language-tests/releases/tag/v2026.02.01&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Setup takes about five minutes — clone the repo, install dependencies, add your OpenAI API key, and you’re generating tests.&lt;/p&gt;

&lt;p&gt;If you work in QA or test automation and you’ve been curious about how AI fits into your workflow, give it a try. I’d love to hear what you think.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Exploring how AI can make quality engineering more practical and less tedious. I write about this stuff regularly at&lt;/em&gt;&lt;/strong&gt; &lt;a href="https://aiqualityengineer.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;AI Quality Engineer&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;&lt;em&gt;.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;




</description>
      <category>softwareengineering</category>
      <category>programming</category>
      <category>javascript</category>
      <category>artificialintelligen</category>
    </item>
    <item>
      <title>The AI QA Engineer’s Decision Framework: When NOT to Use AI in Testing</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Sun, 25 Jan 2026 10:47:51 +0000</pubDate>
      <link>https://dev.to/qa-leaders/the-ai-qa-engineers-decision-framework-when-not-to-use-ai-in-testing-4lng</link>
      <guid>https://dev.to/qa-leaders/the-ai-qa-engineers-decision-framework-when-not-to-use-ai-in-testing-4lng</guid>
      <description>&lt;h4&gt;
  
  
  A Practical Guide for Quality Engineers Who Want Results, Not Hype
&lt;/h4&gt;

&lt;h3&gt;
  
  
  When NOT to Use AI in Testing: A Simple Guide
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Stop. Think. Then Decide.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Big Question
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Everyone talks about using AI in testing. But nobody talks about when to SKIP it.&lt;/p&gt;

&lt;p&gt;This guide helps you decide: &lt;strong&gt;AI or no AI?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Why This Matters
&lt;/h3&gt;

&lt;p&gt;AI testing sounds cool. But it comes with baggage:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;It costs money&lt;/strong&gt;  — AI tools need servers, licenses, and API calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It needs babysitting&lt;/strong&gt;  — Models drift. Prompts need tuning. Things break in weird ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It’s hard to debug&lt;/strong&gt;  — When AI tests fail, figuring out WHY is painful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your team might forget basics&lt;/strong&gt;  — If AI does everything, manual debugging skills fade.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI isn’t bad. But it’s not always the answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  7 Times to Skip AI (Use Traditional Testing Instead)
&lt;/h3&gt;

&lt;h3&gt;
  
  
  1. Math and Calculations
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Tax calculators, loan interest, pricing formulas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why skip AI?&lt;/strong&gt; The answer is either right or wrong. No guessing needed. No patterns to learn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do this instead:&lt;/strong&gt; Simple data-driven tests. Input goes in. Expected output comes out. Done.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2. Audit and Compliance Systems
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Banking apps, healthcare records, legal documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why skip AI?&lt;/strong&gt; Auditors want proof. They want to see EXACTLY what you tested. AI is unpredictable — same prompt, different results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do this instead:&lt;/strong&gt; Scripted tests with detailed logs. Every step recorded. Every result traceable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3. Speed and Load Testing
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Can your app handle 10,000 users at once?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why skip AI?&lt;/strong&gt; You’re measuring app speed. AI adds its own delay. You’d be measuring AI, not your app.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do this instead:&lt;/strong&gt; Use tools built for this — JMeter, k6, Gatling. They’re fast and focused.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  4. Basic CRUD Operations
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Create user. Read user. Update user. Delete user.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why skip AI?&lt;/strong&gt; It’s simple. AI is overkill. Like using a rocket to go to the grocery store.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do this instead:&lt;/strong&gt; Write one test template. Copy it for each operation. Fast and easy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  5. Screens That Never Change
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Internal admin panels. Old systems nobody touches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why skip AI?&lt;/strong&gt; AI shines when things CHANGE. Self-healing locators fix moving targets. No movement? No need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do this instead:&lt;/strong&gt; Regular automation. Page Object Model. Set it and forget it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  6. Security Testing
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Finding SQL injection, XSS attacks, login bypasses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why skip AI?&lt;/strong&gt; Security needs creative thinking. Breaking things in new ways. AI follows patterns — hackers don’t.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do this instead:&lt;/strong&gt; Security tools (OWASP ZAP, Burp Suite) plus human testers who think like attackers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  7. Physical Device Testing
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Barcode scanners, payment terminals, IoT sensors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why skip AI?&lt;/strong&gt; AI lives in software. It can’t press physical buttons or read blinking lights.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do this instead:&lt;/strong&gt; Hardware test rigs. Human testers. Real-world verification.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Quick Decision Guide
&lt;/h3&gt;

&lt;p&gt;Ask yourself these 4 questions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fin57e16hm04f6y9q9giy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fin57e16hm04f6y9q9giy.png" width="800" height="476"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;DECISION TABLE FRAMEWORK&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Before You Buy Any AI Tool, Answer These:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What exact problem am I solving?&lt;/strong&gt; (Not “we want AI” — a real problem)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can a simple script fix this?&lt;/strong&gt; (Seriously, can it?)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How will I know if it worked?&lt;/strong&gt; (What number goes up or down?)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who will maintain it?&lt;/strong&gt; (AI tools need constant care)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I explain it to my boss?&lt;/strong&gt; (If you can’t explain it, don’t buy it)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Simple Truth
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AI is a tool. Not a magic wand.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Good testers know WHEN to use each tool:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq617lq3te9cpuqutxkx6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq617lq3te9cpuqutxkx6.png" width="800" height="331"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;USAGE CHECKLIST&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  One Page Summary
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;USE AI FOR:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Generating test ideas from requirements&lt;/p&gt;

&lt;p&gt;Handling UI changes automatically&lt;/p&gt;

&lt;p&gt;Analyzing why tests keep failing&lt;/p&gt;

&lt;p&gt;Creating test data variations&lt;/p&gt;

&lt;p&gt;Exploring edge cases&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;SKIP AI FOR:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Exact calculations (math, money, dates)&lt;/p&gt;

&lt;p&gt;Compliance and audit trails&lt;/p&gt;

&lt;p&gt;Performance/load measurements&lt;/p&gt;

&lt;p&gt;Simple CRUD operations&lt;/p&gt;

&lt;p&gt;Stable, unchanging systems&lt;/p&gt;

&lt;p&gt;Security penetration testing&lt;/p&gt;

&lt;p&gt;Physical hardware testing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Final Word
&lt;/h3&gt;

&lt;p&gt;The smartest move isn’t always the newest tool.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Sometimes a simple script beats a fancy AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Know when to use AI. Know when to skip it. That’s real skill.&lt;/strong&gt;
&lt;/h3&gt;




</description>
      <category>qualityassurance</category>
      <category>softwaredevelopment</category>
      <category>artificialintelligen</category>
      <category>testautomation</category>
    </item>
    <item>
      <title>Machine Learning Pipelines Made Easy for Quality Assurance Professionals</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Sat, 10 Jan 2026 19:45:18 +0000</pubDate>
      <link>https://dev.to/qa-leaders/machine-learning-pipelines-made-easy-for-quality-assurance-professionals-12ei</link>
      <guid>https://dev.to/qa-leaders/machine-learning-pipelines-made-easy-for-quality-assurance-professionals-12ei</guid>
      <description>&lt;h4&gt;
  
  
  &lt;em&gt;A very simple guide to how machine learning works&lt;/em&gt;
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;Machine learning looks hard. But it is not.&lt;/p&gt;

&lt;p&gt;If you know QA, you already know the basics.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;ML systems have three parts. We call them FTI:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;F = Feature (clean the data)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;T = Training (teach the model)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I = Inference (use the model)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Let me explain each one.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Part 1: Feature Pipeline
&lt;/h3&gt;

&lt;h3&gt;
  
  
  What does it do?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;It cleans dirty data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Simple example:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;You have messy data. Names are written in different ways. Dates are in wrong formats. Numbers have errors.&lt;/p&gt;

&lt;p&gt;This pipeline fixes all that. It makes data clean and ready.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2sjw3nhsg5p6a6vm15j6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2sjw3nhsg5p6a6vm15j6.png" width="800" height="1117"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Feature Pipeline Detail&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  In QA words:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;You never test with bad data. You clean it first. This pipeline does the same thing.&lt;/p&gt;

&lt;p&gt;The clean data goes to a &lt;strong&gt;Feature Store&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Part 2: Training Pipeline
&lt;/h3&gt;

&lt;h3&gt;
  
  
  What does it do?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;It teaches the model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Simple example:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;You show the model 1000 pictures of cats. You tell it “this is a cat” each time. The model learns what a cat looks like.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  In QA words:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;You learn from requirements. Then you write test cases. The model learns from data. Then it can make predictions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Picture:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The smart model goes to a &lt;strong&gt;Model Registry&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6t1c9wqdbwdfpkz7me92.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6t1c9wqdbwdfpkz7me92.png" width="800" height="139"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Training Pipeline Detail&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Part 3: Inference Pipeline
&lt;/h3&gt;

&lt;h3&gt;
  
  
  What does it do?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;It uses the model to answer questions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Simple example:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Someone shows a new picture. The model says “this is a cat” or “this is not a cat.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  In QA words:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;This is like running tests in production. The model is working and giving answers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wwuf796rcsspkak78gw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wwuf796rcsspkak78gw.png" width="800" height="122"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Inference Pipeline Detail&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Two Important Storage Places
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Feature Store
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Keeps clean data&lt;/p&gt;

&lt;p&gt;Saves old versions&lt;/p&gt;

&lt;p&gt;Everyone uses same data&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Model Registry
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Keeps trained models&lt;/p&gt;

&lt;p&gt;Saves old versions&lt;/p&gt;

&lt;p&gt;You know which model is in production&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Full Picture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnobw22n1cn7eup1oshh8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnobw22n1cn7eup1oshh8.png" width="800" height="92"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Full FTI Pipeline Overview&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This is Easy for QA
&lt;/h3&gt;

&lt;p&gt;You already know:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;✓ How to check data quality → Test Feature Pipeline&lt;/p&gt;

&lt;p&gt;✓ How to compare old vs new → Test Training Pipeline&lt;/p&gt;

&lt;p&gt;✓ How to test in production → Test Inference Pipeline&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Five Things to Remember
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Three parts.&lt;/strong&gt; Feature, Training, Inference. That’s it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clean data is key.&lt;/strong&gt; Bad data = bad model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Save everything.&lt;/strong&gt; Keep old data. Keep old models. You can go back if needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test each part.&lt;/strong&gt; Don’t test everything together. Test one part at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your skills work here.&lt;/strong&gt; QA testing skills work for ML testing too.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Last Words
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;ML is just &lt;strong&gt;software with a learning step.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You already know how to &lt;strong&gt;test software.&lt;/strong&gt; Now you can &lt;strong&gt;test ML too.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start simple. Ask: &lt;strong&gt;“Show me the three pipelines.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Then test each one.&lt;/p&gt;

&lt;p&gt;You can do this.&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>qualityassurance</category>
      <category>softwaretesting</category>
    </item>
    <item>
      <title>I Built an AI-Powered Test Data Generator That Analyzes Any URL and Creates Test Data JSON</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Wed, 31 Dec 2025 19:12:47 +0000</pubDate>
      <link>https://dev.to/letsautomate/i-built-an-ai-powered-test-data-generator-that-analyzes-any-url-and-creates-test-data-json-48l2</link>
      <guid>https://dev.to/letsautomate/i-built-an-ai-powered-test-data-generator-that-analyzes-any-url-and-creates-test-data-json-48l2</guid>
      <description>&lt;h4&gt;
  
  
  &lt;em&gt;I got tired of manually inspecting HTML to find selectors. So I taught my framework to do it instead.&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl07bqppbcobwxqacbhu2.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl07bqppbcobwxqacbhu2.gif" width="800" height="900"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture flow&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here’s a question that kept me up at night:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why am I spending more time finding selectors than writing actual tests?&lt;/p&gt;

&lt;p&gt;I watched myself burn 30 minutes on a simple login test — not writing the test itself, but hunting through DevTools for the right selectors, creating fixture files, and crafting test data that would actually work.&lt;/p&gt;

&lt;p&gt;What if the framework could just… look at the page and figure it out?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  The Problem Nobody Talks About
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Here’s the dirty secret of test automation: &lt;strong&gt;writing the actual test is the easy part.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The hard part? Finding #username vs input[name="user"] vs .login-field. Creating realistic test data. Building fixture files that match the actual form structure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every new page means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Open DevTools&lt;/p&gt;

&lt;p&gt;Inspect elements&lt;/p&gt;

&lt;p&gt;Copy selectors&lt;/p&gt;

&lt;p&gt;Hope they’re stable&lt;/p&gt;

&lt;p&gt;Create JSON fixtures&lt;/p&gt;

&lt;p&gt;Hope nothing changes tomorrow&lt;/p&gt;

&lt;p&gt;Most “AI-powered” testing tools focus on running tests or analyzing failures. But what about the beginning — the tedious setup that drains your time before you write a single assertion?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  The Experiment: Teaching AI to See
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The idea was simple but audacious: &lt;strong&gt;give the AI a URL and let it figure out everything else.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not mock data. Not hardcoded selectors. Real selectors from real HTML.&lt;/p&gt;

&lt;p&gt;Here’s what I wanted:&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the framework should:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Fetch the actual page&lt;/p&gt;

&lt;p&gt;Analyze the HTML structure&lt;/p&gt;

&lt;p&gt;Extract real, working selectors&lt;/p&gt;

&lt;p&gt;Generate meaningful test cases&lt;/p&gt;

&lt;p&gt;Save everything as a Cypress fixture&lt;/p&gt;

&lt;p&gt;Then generate tests that use that data&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Sounds impossible? I thought so too.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Actually Works
&lt;/h3&gt;

&lt;p&gt;The magic happens in about 50 lines of Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def generate_test_data_from_url(url: str, requirements: list) -&amp;gt; tuple:
    # Step 1: Fetch the real page
    resp = requests.get(url, timeout=10, headers={'User-Agent': 'Mozilla/5.0'})
    html = resp.text[:5000] # First 5KB is usually enough

    # Step 2: Ask AI to analyze it
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    prompt = f"""Analyze this HTML and generate test data.

    URL: {url}
    HTML: {html}

    Return JSON with:
    - Real selectors from the HTML
    - Valid test case with working data
    - Invalid test case for error handling
    """

    # Step 3: Parse and save as fixture
    test_data = json.loads(llm.invoke(prompt).content)

    with open("cypress/fixtures/url_test_data.json", 'w') as f:
        json.dump(test_data, f, indent=2)

    return test_data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI doesn’t guess. It reads the actual HTML and extracts what’s really there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwsrrmhq11zuycl193gj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwsrrmhq11zuycl193gj.png" width="800" height="1717"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Complete Workflow&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  What The AI Sees vs What It Returns
&lt;/h3&gt;

&lt;p&gt;When I point it at a login page, here’s the actual flow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt; Just a URL&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--url https://the-internet.herokuapp.com/login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What the AI analyzes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;input type="text" id="username" name="username"&amp;gt;
&amp;lt;input type="password" id="password" name="password"&amp;gt;
&amp;lt;button type="submit" class="radius"&amp;gt;Login&amp;lt;/button&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it generates:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "url": "https://the-internet.herokuapp.com/login",
  "selectors": {
    "username": "#username",
    "password": "#password",
    "submit": "button[type='submit']"
  },
  "test_cases": [
    {
      "name": "valid_test",
      "username": "tomsmith",
      "password": "SuperSecretPassword!",
      "expected": "success"
    },
    {
      "name": "invalid_test", 
      "username": "wronguser",
      "password": "badpassword",
      "expected": "error"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real selectors. Actual test data. Zero manual inspection.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Generated Test Uses It All
&lt;/h3&gt;

&lt;p&gt;The framework then generates a Cypress test that consumes this fixture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;describe('Login Tests', function () {
    beforeEach(function () {
        cy.fixture('url_test_data').then((data) =&amp;gt; {
            this.testData = data;
        });
    });

it('should login with valid credentials', function () {
        cy.visit(this.testData.url);
        const valid = this.testData.test_cases.find(tc =&amp;gt; tc.name === 'valid_test');

        cy.get(this.testData.selectors.username).type(valid.username);
        cy.get(this.testData.selectors.password).type(valid.password);
        cy.get(this.testData.selectors.submit).click();

        cy.url().should('include', '/secure');
    });
    it('should show error with invalid credentials', function () {
        cy.visit(this.testData.url);
        const invalid = this.testData.test_cases.find(tc =&amp;gt; tc.name === 'invalid_test');

        cy.get(this.testData.selectors.username).type(invalid.username);
        cy.get(this.testData.selectors.password).type(invalid.password);
        cy.get(this.testData.selectors.submit).click();

        cy.get('#flash').should('contain', 'invalid');
    });
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Notice something? &lt;strong&gt;The selectors come from the fixture, not hardcoded in the test.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the page changes, update the fixture. Tests stay clean.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Two Ways to Feed Data
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Sometimes you already have test data. Maybe from a previous run. Maybe from your team’s shared fixtures.&lt;/p&gt;

&lt;p&gt;So I added a second option:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Option 1: AI analyzes live URL
python qa_automation.py "Test login" --url https://example.com/login

# Option 2: Use existing JSON file
python qa_automation.py "Test login" --data cypress/fixtures/my_data.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same test generation. Different data sources. Your choice.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Part That Surprised Me
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;I expected the AI to find basic selectors. What I didn’t expect was how well it understood &lt;strong&gt;context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When analyzing a registration form, it didn’t just find #email — it generated test data like:&lt;/p&gt;

&lt;p&gt;Valid: &lt;a href="mailto:testuser@example.com"&gt;testuser@example.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Invalid: not-an-email&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For password fields:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Valid: SecurePass123!&lt;/p&gt;

&lt;p&gt;Invalid: 123 (too short)&lt;/p&gt;

&lt;p&gt;The AI understood what kind of data each field expected. Not because I told it — because it read the HTML attributes, labels, and validation patterns.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Gotcha: Fixtures Need function() Syntax
&lt;/h3&gt;

&lt;p&gt;One thing tripped me up for hours. Cypress fixtures with this.testData require a specific pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// WRONG - arrow functions don't have 'this'
describe('Test', () =&amp;gt; {
    beforeEach(() =&amp;gt; {
        cy.fixture('data').then((d) =&amp;gt; { this.testData = d; }); // undefined!
    });
});

// RIGHT - function() preserves 'this'
describe('Test', function () {
    beforeEach(function () {
        cy.fixture('data').then((data) =&amp;gt; { this.testData = data; });
    });

    it('works', function () {
        console.log(this.testData); // actual data!
    });
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The framework now enforces this pattern in generated tests. Lesson learned the hard way.&lt;/p&gt;

&lt;h3&gt;
  
  
  What This Means For Your Workflow
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Open page in browser&lt;/p&gt;

&lt;p&gt;Inspect elements manually&lt;/p&gt;

&lt;p&gt;Copy selectors to notepad&lt;/p&gt;

&lt;p&gt;Create fixture JSON by hand&lt;/p&gt;

&lt;p&gt;Write test using those selectors&lt;/p&gt;

&lt;p&gt;Fix typos in selectors&lt;/p&gt;

&lt;p&gt;Run test&lt;/p&gt;

&lt;p&gt;Debug why selectors don’t work&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Run one command with URL&lt;/p&gt;

&lt;p&gt;Framework handles the rest&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not an exaggeration. The 30-minute login test? &lt;strong&gt;Under 2 minutes now.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Try It Yourself
&lt;/h3&gt;

&lt;p&gt;The framework is open source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/user/cypress-natural-language-tests
cd cypress-natural-language-tests
pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export OPENAI_API_KEY=your_key_here
export OPENROUTER_API_KEY=your_openrouter_api_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generate tests from any URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py "Test the login form" --url https://the-internet.herokuapp.com/login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check what it created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat cypress/fixtures/url_test_data.json
cat cypress/e2e/generated/*.cy.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Bigger Picture
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;We’re at an interesting moment in test automation. The tooling is getting smarter, but&lt;/em&gt; &lt;strong&gt;&lt;em&gt;the real breakthrough isn’t replacing testers — it’s eliminating the tedious parts.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Finding selectors is tedious. Creating fixture files is tedious. Debugging why&lt;/em&gt; &lt;em&gt;#submit-btn worked yesterday but not today is tedious.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Let AI handle tedious. Let humans handle important.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That’s the framework I’m building.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Follow for more AI + QA experiments:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/aiqualitylab/cypress-natural-language-tests.git" rel="noopener noreferrer"&gt;https://github.com/aiqualitylab/cypress-natural-language-tests.git&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>llm</category>
      <category>langgraph</category>
    </item>
    <item>
      <title>I Built an AI-Powered Cypress Framework That Analyses Test Failures for Free</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Sun, 28 Dec 2025 14:03:59 +0000</pubDate>
      <link>https://dev.to/qa-leaders/i-built-an-ai-powered-cypress-framework-that-analyses-test-failures-for-free-5f78</link>
      <guid>https://dev.to/qa-leaders/i-built-an-ai-powered-cypress-framework-that-analyses-test-failures-for-free-5f78</guid>
      <description>&lt;h4&gt;
  
  
  Cypress test debugging is painful. This free AI-powered framework analyses failures instantly and tells you exactly what went wrong.
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcbcjpl0coe6p2wprcku.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcbcjpl0coe6p2wprcku.gif" width="900" height="350"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;AI-Powered Cypress Framework That Analyses Test Failures for Free&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Ever stared at a cryptic Cypress error message wondering what broke? 😩 We’ve all been there. That’s why I built something that changed my debugging workflow forever.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;
  
  
  Introducing &lt;strong&gt;v2.1&lt;/strong&gt; of my Cypress Natural Language Test Framework — now featuring &lt;strong&gt;🔍 AI Failure Analysis&lt;/strong&gt; that costs you absolutely nothing.
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7yciqspi8tbs2gialcp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7yciqspi8tbs2gialcp.png" width="800" height="1806"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  😤 The Problem Every QA Engineer Knows
&lt;/h3&gt;

&lt;p&gt;Picture this: Your CI pipeline fails and error be like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CypressError: Timed out retrying after 4000ms: Expected to find element: '#submit-btn', but never found it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you’re left guessing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🤔 Did the selector change?&lt;/p&gt;

&lt;p&gt;⏳ Is the page loading too slowly?&lt;/p&gt;

&lt;p&gt;✏️ Did someone rename the button?&lt;/p&gt;

&lt;p&gt;⚡ Is it a timing issue?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You spend the next hour digging through logs, comparing commits, and testing locally. Sound familiar?&lt;/p&gt;

&lt;h3&gt;
  
  
  💡 The Solution: AI That Debugs For You
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;With v2.1, debugging becomes a one-liner:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py --analyze "CypressError: Timed out retrying: Expected to find element: #submit-btn"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🔍 Analyzing...
REASON: Element #submit-btn not found - selector likely changed during recent UI update
FIX: Use cy.get('[data-testid="submit"]') or add cy.wait() before the click action
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Two lines. Problem identified. Solution provided. Done.&lt;/p&gt;

&lt;h3&gt;
  
  
  🏗️ System Architecture
&lt;/h3&gt;

&lt;p&gt;Here’s how the entire framework fits together:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AZhfR1pLUFuBdtjCj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AZhfR1pLUFuBdtjCj.png" width="800" height="3621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚙️ How It Works Under The Hood
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The implementation is surprisingly simple. Here’s the core function:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def analyze_failure(log: str) -&amp;gt; str:
    response = requests.post(
        url="https://openrouter.ai/api/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {os.getenv('OPENROUTER_API_KEY')}",
            "Content-Type": "application/json"
        },
        json={
            "model": "deepseek/deepseek-r1-0528:free",
            "messages": [{"role": "user", "content": f"Analyze this Cypress test failure. Reply ONLY:\nREASON: (one line)\nFIX: (one line)\n\n{log}"}],
            "max_tokens": 150
        }
    )
    return response.json()["choices"][0]["message"]["content"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;That’s it. About 15 lines of code that leverage OpenRouter’s free tier with DeepSeek R1. 🆓&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  🛠️ Three Ways To Use It
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1️⃣ Direct from command line:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py --analyze "Your error message here"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2️⃣ From a log file:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py --analyze -f cypress-output.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3️⃣ Piped from another command:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat error.log | python qa_automation.py --analyze
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🔄 CI/CD Integration
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The real power comes when you integrate this into your pipeline. Here’s how the updated GitHub Actions workflow looks:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03u2nnc3qchw9iiea2f6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03u2nnc3qchw9iiea2f6.png" width="800" height="1013"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- name: Run Cypress tests
  id: tests
  continue-on-error: true
  run: |
    npx cypress run --spec "cypress/e2e/generated/**/*.cy.js" 2&amp;gt;&amp;amp;1 | tee test-output.log- name: AI Failure Analysis
  if: steps.tests.outcome == 'failure'
  env:
    OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
  run: |
    echo "Analyzing failures with AI..."
    python qa_automation.py --analyze -f test-output.log

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When tests fail, your CI logs now include actionable insights instead of just error dumps. 📋&lt;/p&gt;

&lt;h3&gt;
  
  
  🚀 Setting It Up
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1:&lt;/strong&gt; Get your free API key from &lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;openrouter.ai&lt;/a&gt; 🔑&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2:&lt;/strong&gt; Add to your .env:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENROUTER_API_KEY=your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3:&lt;/strong&gt; Add requests to requirements.txt (if not already there) 📦&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4:&lt;/strong&gt; Start analyzing 🎉&lt;/p&gt;

&lt;p&gt;That’s the entire setup. No complex configurations. No paid subscriptions.&lt;/p&gt;

&lt;h3&gt;
  
  
  🖥️ Local Development Flow
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;For local development, the flow is just as smooth:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2A7hr1LYYMY2vpxfdY.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2A7hr1LYYMY2vpxfdY.png" width="800" height="3668"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  📦 What’s In v2.1
&lt;/h3&gt;

&lt;p&gt;Here’s everything new in this release:&lt;/p&gt;

&lt;h4&gt;
  
  
  Feature Description
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;🔍 &lt;strong&gt;AI Failure Analyzer&lt;/strong&gt; Instant debugging with free LLM&lt;/p&gt;

&lt;p&gt;🌐 &lt;strong&gt;OpenRouter Integration&lt;/strong&gt; Uses DeepSeek R1 at zero cost&lt;/p&gt;

&lt;p&gt;💻 &lt;strong&gt;CLI Flag&lt;/strong&gt; Simple --analyze command&lt;/p&gt;

&lt;p&gt;📁 &lt;strong&gt;File Input&lt;/strong&gt; Analyze entire log files with -f&lt;/p&gt;

&lt;p&gt;⚙️ &lt;strong&gt;CI/CD Ready&lt;/strong&gt; Updated GitHub Actions workflow&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Combined with v2.0 features:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🤖 Natural language test generation&lt;/p&gt;

&lt;p&gt;🔄 cy.prompt() self-healing tests&lt;/p&gt;

&lt;p&gt;📊 LangGraph workflow orchestration&lt;/p&gt;

&lt;p&gt;📚 Vector store documentation context&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  🌍 Real World Example
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Old approach:&lt;/strong&gt; Manual Investigation 😓&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New approach:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py --analyze -f nightly-run.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;REASON: Login button selector changed from #login-btn to .auth-button
FIX: Update selector to cy.get('.auth-button') or use data-testid

REASON: API response timeout - server took 6s, test timeout was 4s
FIX: Increase timeout with cy.request({timeout: 10000}) or add retry logic

REASON: Element detached from DOM after React re-render
FIX: Add cy.wait() after state change or use {force: true} option
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🔗 Try It Yourself
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The framework is open source and available now:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/aiqualitylab/cypress-natural-language-tests" rel="noopener noreferrer"&gt;github.com/aiqualitylab/cypress-natural-language-tests&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Clone it, set up your API keys, and start generating tests and debugging failures with AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  💭 Final Thoughts
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;AI shouldn’t just generate code. It should help maintain it too. This failure analyzer is my attempt at closing that loop — from requirements to tests to debugging, all AI-assisted.&lt;/p&gt;

&lt;p&gt;The best part? It’s completely &lt;strong&gt;free&lt;/strong&gt; to use. 🆓&lt;/p&gt;

&lt;p&gt;Give it a try and let me know how much time it saves you! 💬&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;If this helped you, consider ⭐ starring the repo. It helps others discover it.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>llm</category>
      <category>langchain</category>
      <category>ai</category>
      <category>testautomation</category>
    </item>
    <item>
      <title>AI-Powered Cypress Test Generation from Natural Language v2.0 — Now with cy.prompt() Self-Healing</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Sat, 27 Dec 2025 11:46:37 +0000</pubDate>
      <link>https://dev.to/qa-leaders/ai-powered-cypress-test-generation-from-natural-language-v20-now-with-cyprompt-self-healing-5ebe</link>
      <guid>https://dev.to/qa-leaders/ai-powered-cypress-test-generation-from-natural-language-v20-now-with-cyprompt-self-healing-5ebe</guid>
      <description>&lt;h3&gt;
  
  
  AI-Powered Cypress Test Generation from Natural Language — Now with cy.prompt() Self-Healing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Transform plain English requirements into production-ready Cypress tests using GPT-4, LangChain, and LangGraph — run locally or in CI/CD&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;My Open-source project: &lt;a href="https://github.com/aiqualitylab/cypress-natural-language-tests" rel="noopener noreferrer"&gt;&lt;strong&gt;github.com/aiqualitylab/cypress-natural-language-tests&lt;/strong&gt;&lt;/a&gt;, which utilizes Cypress’s official AI-powered &lt;strong&gt;cy.prompt()&lt;/strong&gt; command introduced at &lt;strong&gt;CypressConf 2025&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2ppga1md065afnq39qk.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2ppga1md065afnq39qk.gif" width="720" height="720"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;AI-Powered Cypress Test Generation from Natural Language v2.0 — Now with cy.prompt() Self-Healing&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Testing shouldn’t be complicated. You know what your application should do — why spend hours writing boilerplate test code?&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/aiqualitylab/cypress-natural-language-tests" rel="noopener noreferrer"&gt;&lt;strong&gt;cypress-natural-language-tests&lt;/strong&gt;&lt;/a&gt; to bridge the gap between your test ideas and working Cypress code. Just describe your test in plain English:&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py "Test user login with valid credentials" --run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; A complete .cy.js file generated and executed automatically!&lt;/p&gt;

&lt;p&gt;And now, with the latest update, the framework also supports &lt;strong&gt;Cypress’s new cy.prompt()&lt;/strong&gt; command for self-healing, AI-powered test execution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  What’s New: cy.prompt() Integration
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Cypress recently launched cy.prompt() — their official AI command that converts natural language into test steps at runtime. My framework now supports both approaches:&lt;/p&gt;

&lt;p&gt;Mode Description Best For &lt;strong&gt;Generate Mode&lt;/strong&gt; Creates complete .cy.js test files Version control, CI/CD pipelines &lt;strong&gt;cy.prompt() Mode&lt;/strong&gt; Generates tests using cy.prompt() syntax Self-healing tests, rapid prototyping&lt;/p&gt;

&lt;p&gt;You choose what works best for your workflow!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;👆 The complete workflow — from requirements to executed tests&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The framework supports &lt;strong&gt;two execution paths&lt;/strong&gt; :&lt;/p&gt;

&lt;h3&gt;
  
  
  🖥️ Local Machine Flow v/s ⚙️ GitHub Actions CI Flow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2lo22bbwvy5d8ssft8u3.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2lo22bbwvy5d8ssft8u3.gif" width="480" height="600"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;🖥️ Local Machine Flow v/s ⚙️ GitHub Actions CI Flow&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Two Powerful Modes
&lt;/h3&gt;
&lt;h3&gt;
  
  
  Mode 1: Traditional Test Generation
&lt;/h3&gt;

&lt;p&gt;Generate standard Cypress test files that you own and version control:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py "Test user login with valid credentials"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;  &lt;strong&gt;01_test-user-login_20241223_102030.cy.js&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;describe('User Login', () =&amp;gt; {
  it('should login successfully with valid credentials', () =&amp;gt; {
    cy.visit('https://the-internet.herokuapp.com/login');
    cy.get('#username').type('tomsmith');
    cy.get('#password').type('SuperSecretPassword!');
    cy.get('button[type="submit"]').click();
    cy.get('.flash.success').should('be.visible');
  });
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mode 2: cy.prompt() Generation
&lt;/h3&gt;

&lt;p&gt;Generate tests using Cypress’s new AI-powered cy.prompt() command for self-healing capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py "Test user login" --use-cyprompt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;  &lt;strong&gt;01_test-user-login_20241223_102030.cy.js&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;describe('User Login', () =&amp;gt; {
  it('should login successfully with valid credentials', () =&amp;gt; {
    cy.prompt([
      'Visit the login page at https://the-internet.herokuapp.com/login',
      'Type "tomsmith" in the username field',
      'Type "SuperSecretPassword!" in the password field',
      'Click the login button',
      'Verify the success message is visible'
    ]);
  });
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why cy.prompt()?&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🔄 &lt;strong&gt;Self-healing&lt;/strong&gt; : Tests adapt when UI changes&lt;/p&gt;

&lt;p&gt;📝 &lt;strong&gt;Readable&lt;/strong&gt; : Natural language steps in your test files&lt;/p&gt;

&lt;p&gt;🛡️ &lt;strong&gt;Resilient&lt;/strong&gt; : Less maintenance when selectors change&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Clone the repository
git clone https://github.com/aiqualitylab/cypress-natural-language-tests.git
cd cypress-natural-language-tests

# Set up Python environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure OpenAI API key
echo "OPENAI_API_KEY=your_key_here" &amp;gt; .env

# Initialize Cypress
npm install cypress --save-dev
npx cypress open
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Generate Your First Test
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Standard Cypress test
python qa_automation.py "Test user registration flow"

# With cy.prompt() syntax
python qa_automation.py "Test user registration flow" --use -cyprompt

# Generate and run immediately
python qa_automation.py "Test homepage loads correctly" --run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Practical Examples
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Example 1: Multiple Test Requirements
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py \
  "Test successful login with valid credentials" \
  "Test login fails with wrong password" \
  "Test login form shows validation errors for empty fields"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Creates three separate test files — one for each requirement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 2: With Documentation Context (RAG)
&lt;/h3&gt;

&lt;p&gt;Supercharge test generation with your own documentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py \
  "Test checkout API according to specifications" \
  --docs ./api-documentation \
  --persist-vstore
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The framework indexes your docs into ChromaDB and uses them as context for more accurate test generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 3: Generate and Execute Locally
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py "Test user profile update" --run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generates the test AND runs Cypress immediately. View results in your terminal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 4: CI/CD Integration
&lt;/h3&gt;

&lt;p&gt;Trigger via GitHub Actions to generate tests in your pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- name: Generate Tests
  run: python qa_automation.py "${{ github.event.inputs.requirement }}"

- name: Run Cypress
  run: npx cypress run

- name: Upload Artifacts
  uses: actions/upload-artifact@v3
  with:
    name: cypress-results
    path: |
      cypress/videos
      cypress/screenshots
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Choose This Framework?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Feature Benefit &lt;strong&gt;Dual Mode Support&lt;/strong&gt; Standard Cypress OR cy.prompt() — your choice &lt;strong&gt;Complete Test Files&lt;/strong&gt; Version control your generated tests &lt;strong&gt;Documentation-Aware&lt;/strong&gt; RAG integration for accurate, context-rich tests &lt;strong&gt;Local &amp;amp; CI Ready&lt;/strong&gt; Works on your machine and in GitHub Actions &lt;strong&gt;Model Flexibility&lt;/strong&gt; Use GPT-4, GPT-4o-mini, or GPT-3.5-turbo &lt;strong&gt;Open Source&lt;/strong&gt; Full control, no vendor lock-in&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Change AI Model
&lt;/h3&gt;

&lt;p&gt;In qa_automation.py:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;llm = ChatOpenAI(
    model="gpt-4o-mini", # Options: gpt-4, gpt-4o, gpt-3.5-turbo
    temperature=0
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Set Your Application URL
&lt;/h3&gt;

&lt;p&gt;Update the prompt template to target your application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CY_PROMPT_TEMPLATE = """
...
- Use `cy.visit('https://your-app-url.com')` as the base URL.
...
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Get Started Now
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;🔗&lt;/strong&gt; &lt;a href="https://github.com/aiqualitylab/cypress-natural-language-tests" rel="noopener noreferrer"&gt;&lt;strong&gt;github.com/aiqualitylab/cypress-natural-language-tests&lt;/strong&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/aiqualitylab/cypress-natural-language-tests.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⭐ Star the repo if you find it useful!&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Natural language test generation is here to stay. With &lt;strong&gt;cypress-natural-language-tests&lt;/strong&gt; , you get:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Two modes&lt;/strong&gt;  — Traditional Cypress or cy.prompt()&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Full ownership&lt;/strong&gt;  — Complete test files you control&lt;br&gt;&lt;br&gt;
&lt;strong&gt;CI/CD ready&lt;/strong&gt;  — Works locally and in GitHub Actions&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Documentation-aware&lt;/strong&gt;  — RAG for accurate test generation&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Open source&lt;/strong&gt;  — No vendor lock-in&lt;/p&gt;

&lt;p&gt;Stop writing boilerplate. Start describing tests in plain English.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;What’s your experience with AI-powered test generation? Drop a comment below!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;




</description>
      <category>openai</category>
      <category>ai</category>
      <category>softwaretesting</category>
      <category>cypress</category>
    </item>
    <item>
      <title>AI-Powered Cypress Test Automation: Automated Test Creation and Execution with Machine Learning</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Fri, 26 Dec 2025 13:56:41 +0000</pubDate>
      <link>https://dev.to/qa-leaders/ai-powered-cypress-test-automation-automated-test-creation-and-execution-with-machine-learning-1228</link>
      <guid>https://dev.to/qa-leaders/ai-powered-cypress-test-automation-automated-test-creation-and-execution-with-machine-learning-1228</guid>
      <description>&lt;h3&gt;
  
  
  How to Build Intelligent End-to-End Testing with OpenAI GPT-4, LangChain, LangGraph, and Continuous Integration Pipeline
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fofzxkk1y7dsf9tl5c7nk.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fofzxkk1y7dsf9tl5c7nk.gif" width="560" height="294"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;AI-Powered Cypress Test Automation: Automated Test Creation and Execution with Machine Learning&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Transform natural language requirements into production-ready automated tests using OpenAI, LangChain, artificial intelligence and test automation best practices&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9v0lj3iglas5bh6s5n71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9v0lj3iglas5bh6s5n71.png" width="800" height="2122"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;COMPLETE WORKFLOW&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Problem That Started It All
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;As a QA engineer specializing in test automation, I’ve spent countless hours writing Cypress tests for web application testing. The manual test creation process was always the same: understand the requirement, inspect the DOM, find the right selectors, write the test code, handle edge cases, and repeat. A simple login test could take 30 minutes. Complex user flows? Hours.&lt;/p&gt;

&lt;p&gt;One day, after spending three hours writing automated tests for a basic checkout flow, I thought: “What if I could use artificial intelligence and machine learning to automatically generate test scripts from plain English requirements?”&lt;/p&gt;

&lt;p&gt;That question led to building an open-source AI-powered test automation framework that does exactly that — combining natural language processing, automated test generation, and continuous integration for intelligent software testing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  What I Built: An Intelligent Test Automation Framework
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The AI-powered testing framework accepts natural language requirements and generates production-ready Cypress E2E tests automatically using machine learning. This automated testing solution combines GPT-4 artificial intelligence with DevOps best practices for continuous testing. Here’s what the intelligent test automation looks like in action:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqba6tkjbc795ws5mppmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqba6tkjbc795ws5mppmu.png" width="800" height="2703"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;FULL WORKFLOW&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py \
  "Test user login with valid credentials" \
  "Test login fails with invalid password" \
  --run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// 01_test-user-login-with-valid-credentials_20241221_120000.cy.js
describe('User Login', () =&amp;gt; {
  it('should login successfully with valid credentials', () =&amp;gt; {
    cy.visit('https://the-internet.herokuapp.com/login');
    cy.get('#username').type('tomsmith');
    cy.get('#password').type('SuperSecretPassword!');
    cy.get('button[type="submit"]').click();
    cy.get('.flash.success').should('contain', 'You logged into a secure area!');
  });

  it('should show error with invalid credentials', () =&amp;gt; {
    cy.visit('https://the-internet.herokuapp.com/login');
    cy.get('#username').type('invaliduser');
    cy.get('#password').type('wrongpassword');
    cy.get('button[type="submit"]').click();
    cy.get('.flash.error').should('contain', 'Your username is invalid!');
  });
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The framework works both locally and in CI/CD pipelines, generating tests in seconds instead of hours.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovcwcvloor77pknsc4fh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovcwcvloor77pknsc4fh.png" width="800" height="1561"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;LOCAL FLOW&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Technical Architecture
&lt;/h3&gt;
&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;p&gt;The system consists of four main pieces:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;1. Python Orchestration Layer&lt;/strong&gt; I built the core in Python, using LangGraph to manage the workflow. LangGraph provides a graph-based state management system perfect for orchestrating complex AI workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. OpenAI Integration&lt;/strong&gt; The heart of the system uses GPT-4o-mini. I chose this model for its balance of speed, cost-effectiveness, and code generation quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Cypress Test Runner&lt;/strong&gt; The generated tests are standard Cypress JavaScript files that run without modification in any Cypress environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Optional Context Store&lt;/strong&gt; Using ChromaDB, the framework can index project documentation to provide additional context for more accurate test generation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  How It Works Internally
&lt;/h3&gt;

&lt;p&gt;Here’s the step-by-step process:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Requirement Parsing&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def parse_cli_args(state: QAState) -&amp;gt; QAState:
    parser = argparse.ArgumentParser(
        description="Generate Cypress tests from natural language"
    )
    parser.add_argument("requirements", nargs="+")
    parser.add_argument("--run", action="store_true")
    args = parser.parse_args()
    state["requirements"] = args.requirements
    return state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: AI Generation&lt;/strong&gt; I crafted a prompt template that guides GPT-4 to generate Cypress-compliant code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CY_PROMPT_TEMPLATE = """You are a senior automation engineer.
Write a Cypress test for: {requirement}

Constraints:
- Use Cypress best practices
- Include describe and it blocks
- Use real selectors (id, class, name)
- Include positive and negative test paths
- Return ONLY runnable JavaScript code
"""

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Code Generation and Validation&lt;/strong&gt; The LLM returns raw JavaScript code, which I save with descriptive filenames:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def generate_tests(state: QAState) -&amp;gt; QAState:
    for idx, req in enumerate(state["requirements"], start=1):
        code = generate_cypress_test(req)
        slug = slugify(req)[:60]
        filename = f"{idx:02d}_{slug}_{now_stamp()}.cy.js"
        filepath = Path(out_dir) / filename
        with open(filepath, "w") as f:
            f.write(f"// Requirement: {req}\n")
            f.write(code)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Optional Execution&lt;/strong&gt; If the --run flag is provided, the framework executes Cypress immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def run_cypress(state: QAState) -&amp;gt; QAState:
    if state.get("run_cypress"):
        specs = state.get("generated_files", [])
        subprocess.run(["npx", "cypress", "run", "--spec", ",".join(specs)])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Workflow
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;LangGraph enabled me to build a clean, maintainable workflow. Here’s the graph structure:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_workflow():
    graph = StateGraph(QAState)
    graph.add_node("ParseCLI", parse_cli_args)
    graph.add_node("BuildVectorStore", create_or_update_vector_store)
    graph.add_node("GenerateTests", generate_tests)
    graph.add_node("RunCypress", run_cypress)

    graph.set_entry_point("ParseCLI")
    graph.add_edge("ParseCLI", "BuildVectorStore")
    graph.add_edge("BuildVectorStore", "GenerateTests")
    graph.add_edge("GenerateTests", "RunCypress")
    graph.add_edge("RunCypress", END)

    return graph.compile()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;This graph-based approach makes it easy to add new nodes (like validation, reporting, or test optimization) without refactoring the entire codebase.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  CI/CD Integration
&lt;/h3&gt;

&lt;p&gt;The framework shines in automated environments. I built a GitHub Actions workflow that:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg032o9pni85jcn918q20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg032o9pni85jcn918q20.png" width="800" height="1033"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;CI/CD&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Accepts test requirements as workflow inputs&lt;/p&gt;

&lt;p&gt;Sets up Node.js and Python environments&lt;/p&gt;

&lt;p&gt;Generates tests using AI&lt;/p&gt;

&lt;p&gt;Executes them with Cypress&lt;/p&gt;

&lt;p&gt;Uploads videos, screenshots, and test files as artifacts&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The workflow file looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;name: AI-Powered Cypress Tests
on:
  push:
  pull_request:
  workflow_dispatch:
    inputs:
      requirements:
        description: 'Test requirements (one per line)'
        required: true
jobs:
  generate-and-run-tests:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20.x'

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          npm install
          pip install -r requirements.txt

      - name: Generate and run tests
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          python qa_automation.py \
            "Test login functionality" \
            "Test checkout process" \
            --run --out cypress/e2e/generated
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Challenges and Solutions
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Challenge 1: Selector Discovery
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; How does the AI know what selectors exist on the page?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; I refined the prompt to instruct the model to use common, semantic selectors. For better accuracy, I added an optional documentation context feature using ChromaDB:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_or_update_vector_store(state: QAState):
    docs_dir = state.get("docs_dir")
    if docs_dir:
        loader = DirectoryLoader(docs_dir, glob="**/*.*")
        documents = loader.load()
        splitter = RecursiveCharacterTextSplitter(chunk_size=800)
        chunks = splitter.split_documents(documents)
        db = Chroma.from_documents(chunks, embeddings, 
                                    persist_directory=VECTOR_STORE_DIR)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;This allows users to provide API documentation or page structure files for more accurate selector generation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Challenge 2: Test Quality Consistency
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; LLM outputs can vary in quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; I implemented strict prompt engineering:&lt;/p&gt;

&lt;p&gt;Explicit instructions for Cypress best practices&lt;/p&gt;

&lt;p&gt;Requirement to include both positive and negative test cases&lt;/p&gt;

&lt;p&gt;Mandate for clear, descriptive assertions&lt;/p&gt;

&lt;p&gt;Instruction to return only executable JavaScript (no explanations)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Challenge 3: Handling Multiple Requirements
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Processing requirements sequentially was slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; While I kept sequential processing for simplicity and cost control, the architecture supports parallel processing. Each requirement is independent, making it trivial to parallelize in the future:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Future enhancement potential
from concurrent.futures import ThreadPoolExecutor
def generate_tests_parallel(state: QAState):
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(generate_cypress_test, req) 
                   for req in state["requirements"]]
        results = [f.result() for f in futures]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Real-World Usage Examples
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Example 1: E-commerce Testing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py \
  "Test product search returns relevant results" \
  "Test adding multiple items to cart" \
  "Test checkout with valid payment information" \
  "Test order confirmation email is sent" \
  --run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 2: User Authentication Flows
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py \
  "Test user registration with valid email" \
  "Test registration fails with existing email" \
  "Test login with correct credentials" \
  "Test password reset flow" \
  "Test account lockout after failed attempts" \
  --run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 3: Form Validation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python qa_automation.py \
  "Test contact form with all fields filled correctly" \
  "Test form shows errors for empty required fields" \
  "Test email validation rejects invalid formats" \
  "Test phone number accepts international formats" \
  --run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Measurable Impact
&lt;/h3&gt;

&lt;p&gt;After using this framework for several projects:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Time savings:&lt;/strong&gt; 95% reduction in test writing time (30 minutes → 90 seconds per test)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test coverage:&lt;/strong&gt; Ability to generate 50+ tests in the time it previously took to write 2–3&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maintenance:&lt;/strong&gt; Regenerating tests for UI changes takes seconds instead of hours&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Onboarding:&lt;/strong&gt; New team members can contribute tests on day one without Cypress expertise&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;The framework is open source and available on GitHub. Here’s how to set it up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/aiqualitylab/cypress-natural-language-tests
cd cypress-natural-language-tests
npm install
pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create .env file
echo "OPENAI_API_KEY=your_key_here" &amp;gt; .env

# Create cypress.config.js
cat &amp;gt; cypress.config.js &amp;lt;&amp;lt; 'EOF'
const { defineConfig } = require('cypress')
module.exports = defineConfig({
  e2e: {
    baseUrl: 'https://your-app.com',
    supportFile: false,
    video: true,
    screenshotOnRunFailure: true,
  },
})
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Generate and run tests
python qa_automation.py \
  "Your test requirement here" \
  --run

# Generate only (no execution)
python qa_automation.py \
  "Your test requirement here"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Lessons Learned
&lt;/h3&gt;

&lt;h3&gt;
  
  
  On Prompt Engineering
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The quality of generated tests is directly proportional to prompt quality. I spent significant time iterating on the prompt template, testing with various requirement phrasings.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  On LLM Selection
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;GPT-4o-mini proved to be the sweet spot for this use case. GPT-3.5 was too inconsistent, while full GPT-4 was unnecessarily expensive for test generation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  On Workflow Design
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;LangGraph’s state-based approach simplified complex orchestration. The ability to visualize the workflow graph helped identify bottlenecks and optimization opportunities.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  On Integration
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Making the framework work seamlessly in both local and CI/CD environments required thoughtful design. The key was keeping the core logic environment-agnostic and using configuration for environment-specific behavior.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Conclusion: The Future of Intelligent Test Automation
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Building this AI-powered test automation framework transformed how I approach software testing and quality assurance. What once took hours now takes seconds with automated test generation. What required deep Cypress expertise now requires clear requirement writing using natural language processing.&lt;/p&gt;

&lt;p&gt;This intelligent testing framework isn’t just about speed — it’s about democratizing test automation and making QA accessible. Anyone who can describe what should be tested can now generate automated tests, regardless of their programming background, thanks to machine learning and artificial intelligence.&lt;/p&gt;

&lt;p&gt;The code is open source, the CI/CD workflow is extensible, and the potential applications go far beyond Cypress test automation. From end-to-end testing to integration testing, this AI-driven approach represents the future of software quality assurance. I’m excited to see how the DevOps and testing community builds upon this foundation for intelligent test automation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Try It Yourself
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/aiqualitylab/ai-natural-language-tests" rel="noopener noreferrer"&gt;https://github.com/aiqualitylab/ai-natural-language-tests&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation:&lt;/strong&gt; See the README for detailed setup and usage instructions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Issues/Contributions:&lt;/strong&gt; Pull requests and feature suggestions welcome!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Connect With Me
&lt;/h3&gt;

&lt;p&gt;I’m passionate about AI-powered quality engineering and love discussing test automation innovations. Find me on:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/aiqualitylab" rel="noopener noreferrer"&gt;@aiqualitylab&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medium:&lt;/strong&gt; Follow for more articles on AI and testing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;What would you build with AI-generated tests? Share your ideas in the comments below!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Appendix: Complete Code Example
&lt;/h3&gt;

&lt;p&gt;Here’s a simplified version of the core generation function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()
def generate_cypress_test(requirement: str) -&amp;gt; str:
    """Generate Cypress test code from natural language requirement"""

    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    prompt = f"""You are a senior automation engineer.
Write a Cypress test in JavaScript for: {requirement}
Requirements:
- Use Cypress best practices
- Include describe and it blocks  
- Use real page selectors
- Include positive and negative paths
- Return ONLY runnable JavaScript code
Code:"""

    result = llm.invoke(prompt)
    return result.content.strip()
# Example usage
test_code = generate_cypress_test("Test user login with valid credentials")
print(test_code)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example demonstrates the core concept. The full framework adds error handling, state management, file organization, and CI/CD integration.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you for reading! If you found this helpful, please give it a clap 👏 and share with others who might benefit from AI-powered test automation.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>softwaretesting</category>
      <category>ai</category>
      <category>langchain</category>
      <category>llm</category>
    </item>
    <item>
      <title>GitHub Copilot Agent Skills: Teaching AI Your Repository Patterns</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Sat, 20 Dec 2025 18:06:36 +0000</pubDate>
      <link>https://dev.to/qa-leaders/github-copilot-agent-skills-teaching-ai-your-repository-patterns-1oa8</link>
      <guid>https://dev.to/qa-leaders/github-copilot-agent-skills-teaching-ai-your-repository-patterns-1oa8</guid>
      <description>&lt;p&gt;&lt;em&gt;A practical guide to the new GitHub Copilot Agent Skills feature (announced December 18, 2025)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Repository:&lt;/strong&gt; &lt;a href="https://github.com/aiqualitylab/SeleniumSelfHealing.Reqnroll" rel="noopener noreferrer"&gt;SeleniumSelfHealing.Reqnroll&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every test automation engineer faces this challenge: you build a sophisticated framework with custom patterns, then AI assistants suggest brittle, outdated approaches that ignore your architecture.&lt;/p&gt;

&lt;p&gt;On December 18, 2025, GitHub announced Agent Skills — folders containing instructions, scripts, and resources that Copilot automatically loads when relevant to your prompt. This feature works across the Copilot coding agent, Copilot CLI, and agent mode in Visual Studio Code.&lt;/p&gt;

&lt;p&gt;Let me show you how I used this to teach Copilot our self-healing Selenium patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Agent Skills?
&lt;/h2&gt;

&lt;p&gt;According to GitHub's announcement, Agent Skills allow you to teach Copilot how to perform specialized tasks in a specific, repeatable way. When Copilot determines a skill is relevant to your task, it loads the instructions and follows them.&lt;/p&gt;

&lt;p&gt;You create skills by adding a &lt;code&gt;.github/skills/[skill-name]/SKILLS.md&lt;/code&gt; file to your repository. The skills work automatically—no manual activation needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Brittle Selenium Tests
&lt;/h2&gt;

&lt;p&gt;Traditional Selenium tests break easily:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Typical brittle approach&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;button&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FindElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;XPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"//button[@id='submit-2023']"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="n"&gt;button&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Click&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When element IDs change, tests fail. Our framework uses AI-powered element recovery with semantic descriptions instead of hardcoded selectors. But without guidance, Copilot suggested the old brittle patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating the Agent Skill
&lt;/h2&gt;

&lt;p&gt;Here's the structure I implemented:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Selenium Self-Healing Automation Skills&lt;/span&gt;

&lt;span class="gu"&gt;## Purpose&lt;/span&gt;
Enable Copilot to:
&lt;span class="p"&gt;-&lt;/span&gt; Generate robust Selenium UI tests
&lt;span class="p"&gt;-&lt;/span&gt; Use AI-powered self-healing locator strategies
&lt;span class="p"&gt;-&lt;/span&gt; Follow BDD patterns with Reqnroll

&lt;span class="gu"&gt;## Hard Rules&lt;/span&gt;

&lt;span class="gu"&gt;### Must&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use self-healing WebDriver extensions
&lt;span class="p"&gt;-&lt;/span&gt; Prefer element descriptions over raw locators
&lt;span class="p"&gt;-&lt;/span&gt; Generate async step definitions
&lt;span class="p"&gt;-&lt;/span&gt; Log all healing attempts

&lt;span class="gu"&gt;### Must Not&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Hardcode XPath or CSS selectors
&lt;span class="p"&gt;-&lt;/span&gt; Use Thread.Sleep
&lt;span class="p"&gt;-&lt;/span&gt; Bypass self-healing logic

&lt;span class="gu"&gt;## Golden Example&lt;/span&gt;
Step Definition Pattern:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;When&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;@"I click the ""(.*)"""&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;WhenIClickElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;elementDescription&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CssSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;elementDescription&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Search for Selenium
  &lt;span class="err"&gt;Given I navigate to "https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="err"&gt;//www.wikipedia.org"&lt;/span&gt;
  &lt;span class="nf"&gt;When &lt;/span&gt;I enter &lt;span class="s"&gt;"Selenium"&lt;/span&gt; into the &lt;span class="s"&gt;"search box"&lt;/span&gt;
  &lt;span class="nf"&gt;And &lt;/span&gt;I click the &lt;span class="s"&gt;"search button"&lt;/span&gt;
  &lt;span class="nf"&gt;Then &lt;/span&gt;I should see &lt;span class="s"&gt;"Selenium"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;After adding the skill, developers type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Create step definition to click login button&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copilot now generates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;When&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;@"I click the ""(.*)"""&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;WhenIClickElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;elementDescription&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CssSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;elementDescription&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perfect! It follows our self-healing pattern with semantic descriptions instead of brittle locators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Components That Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Clear Rules&lt;/strong&gt; Define explicit must-do and must-not-do items. Specificity produces better results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working Examples&lt;/strong&gt; Use actual code from your repository. Copilot learns from real patterns, not theoretical ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context About Structure&lt;/strong&gt; Explain your project organization so Copilot places code correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Templates&lt;/strong&gt; Provide scaffolding for scenarios developers encounter frequently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verifying It Works
&lt;/h2&gt;

&lt;p&gt;According to GitHub's documentation, when Copilot chooses to use a skill, the SKILL.md file will be injected in the agent's context.&lt;/p&gt;

&lt;p&gt;To verify:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open Copilot Chat&lt;/strong&gt; and ask: "How should I create a new step definition?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Look for references&lt;/strong&gt; to your SKILLS.md in the response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check suggestions&lt;/strong&gt; match your patterns (async methods, element descriptions, self-healing extensions)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Create a test file with a comment triggering your patterns and observe what Copilot suggests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Important Requirements
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot Agent Skills requires a paid plan (Individual, Business, or Enterprise). The feature is available with Copilot coding agent, GitHub Copilot CLI, and agent mode in Visual Studio Code Insiders. Support in the stable version of VS Code is coming soon.&lt;/p&gt;

&lt;p&gt;Without a paid plan, the SKILLS.md still serves as valuable documentation for your team.&lt;/p&gt;

&lt;p&gt;The feature may need 5–10 minutes to index new files. Reload your IDE if suggestions don't immediately reflect your patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Test Automation
&lt;/h2&gt;

&lt;p&gt;Testing frameworks evolve beyond standard practices. Self-healing locators, AI-powered recovery, custom assertions — these patterns don't exist in Copilot's base training.&lt;/p&gt;

&lt;p&gt;GitHub notes that you can write your own skills, or use skills shared by others, such as those in the anthropics/skills repository or GitHub's community created github/awesome-copilot collection.&lt;/p&gt;

&lt;p&gt;This transforms AI from suggesting generic approaches to understanding your specific methodology. It's not about speed — it's about generating the right code that maintains your architecture's quality standards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;.github/skills/your-skill/SKILLS.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Document your most critical pattern&lt;/li&gt;
&lt;li&gt;Include one golden example&lt;/li&gt;
&lt;li&gt;Test with a new file&lt;/li&gt;
&lt;li&gt;Expand as you identify more patterns&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Start small. Focus on the pattern that matters most. You can expand later.&lt;/p&gt;

&lt;p&gt;Currently, skills can only be created at the repository level. Support for organization-level and enterprise-level skills is coming soon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Changelog&lt;/strong&gt;: &lt;a href="https://github.blog/changelog/2025-12-18-github-copilot-now-supports-agent-skills/" rel="noopener noreferrer"&gt;GitHub Copilot now supports Agent Skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt;: &lt;a href="https://docs.github.com/en/copilot/concepts/agents/about-agent-skills" rel="noopener noreferrer"&gt;About Agent Skills — GitHub Docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community Skills&lt;/strong&gt;: &lt;a href="https://github.com/github/awesome-copilot" rel="noopener noreferrer"&gt;github/awesome-copilot&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;What patterns would you teach GitHub Copilot in your test automation projects?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>githubcopilot</category>
      <category>testing</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Complete Guide to Testing Types: Traditional vs AI Era</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Thu, 18 Dec 2025 19:18:00 +0000</pubDate>
      <link>https://dev.to/qa-leaders/the-complete-guide-to-testing-types-traditional-vs-ai-era-1b92</link>
      <guid>https://dev.to/qa-leaders/the-complete-guide-to-testing-types-traditional-vs-ai-era-1b92</guid>
      <description>&lt;h1&gt;
  
  
  The Complete Guide to Testing Types: Traditional vs AI Era
&lt;/h1&gt;

&lt;p&gt;As someone deep in the AI-powered testing space, I've noticed a fascinating evolution happening. We're not replacing traditional testing - we're expanding our toolkit. Let me break down both worlds for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Testing Landscape: A Visual Map
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6ahhcxmyluupm5o6lk5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6ahhcxmyluupm5o6lk5.png" alt=" " width="800" height="1371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j3iwlgr04vz77ff87mv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j3iwlgr04vz77ff87mv.png" alt=" " width="800" height="1175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Traditional Testing - The Foundation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Functional Testing: Does It Work?
&lt;/h3&gt;

&lt;p&gt;This is where most QA engineers start. You're verifying the software behaves as expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unit Testing&lt;/strong&gt; - Think of this as testing individual LEGO blocks before building the castle. Each function or method gets its own test suite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration Testing&lt;/strong&gt; - Now we're connecting those LEGO blocks. Does the login module talk to the database correctly? Does the payment gateway integrate with the order system?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System Testing&lt;/strong&gt; - The entire castle is built. We're testing the whole application end-to-end in an environment that mirrors production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Acceptance Testing&lt;/strong&gt; - This is where business stakeholders say "Yes, this meets our needs." Often called UAT (User Acceptance Testing).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regression Testing&lt;/strong&gt; - After adding new features, we verify nothing broke. This is where automation shines!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Smoke Testing&lt;/strong&gt; - Quick sanity checks after deployment. "Is the application even running?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sanity Testing&lt;/strong&gt; - More focused than smoke tests. After a bug fix, we verify that specific area works without retesting everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Non-Functional Testing: How Well Does It Work?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Performance Testing&lt;/strong&gt; is my favorite category because it reveals how your app behaves under real-world conditions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load Testing&lt;/strong&gt;: Simulating expected user traffic. Can your app handle 10,000 concurrent users?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stress Testing&lt;/strong&gt;: Pushing beyond normal capacity. What's the breaking point?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spike Testing&lt;/strong&gt;: Sudden traffic surges (think Black Friday sales). Does your system gracefully handle it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Security Testing&lt;/strong&gt; - Finding vulnerabilities before hackers do. SQL injection, XSS, authentication flaws.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Usability Testing&lt;/strong&gt; - Can users actually navigate your interface? This often gets overlooked by developers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compatibility Testing&lt;/strong&gt; - Testing across browsers, devices, OS versions. Mobile vs desktop experiences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliability Testing&lt;/strong&gt; - Can your system run continuously without failure? Mean time between failures (MTBF) matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structural Testing: The Perspective Matters
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;White Box Testing&lt;/strong&gt; - You see the code. You're testing internal logic, code paths, and structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Black Box Testing&lt;/strong&gt; - You're testing like an end-user. No knowledge of how it's implemented.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gray Box Testing&lt;/strong&gt; - Best of both worlds. Partial knowledge helps design better test cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: AI/ML Testing - The New Frontier
&lt;/h2&gt;

&lt;p&gt;Here's where things get interesting. AI systems are fundamentally different from traditional software.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Testing: Garbage In, Garbage Out
&lt;/h3&gt;

&lt;p&gt;AI models are only as good as their training data. Data testing becomes critical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Quality Testing&lt;/strong&gt; - Are your datasets complete? Accurate? Consistent? Missing values? Duplicates?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Validation&lt;/strong&gt; - Checking schemas, data types, value ranges, statistical distributions. If your model expects images at 224x224 but gets 100x100, things break.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Drift Testing&lt;/strong&gt; - Production data often differs from training data over time. User behavior changes. New edge cases emerge. Monitoring drift prevents model degradation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Testing: Beyond Accuracy
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Model Accuracy Testing&lt;/strong&gt; - Measuring precision, recall, F1-score, AUC-ROC. But accuracy alone isn't enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Performance Testing&lt;/strong&gt; - Inference latency matters. A 99% accurate model that takes 10 seconds per prediction is useless in real-time systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Robustness Testing&lt;/strong&gt; - How does your model handle edge cases? Noisy input? Adversarial examples? Missing features?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metamorphic Testing&lt;/strong&gt; - Here's a clever technique: apply transformations that shouldn't change the outcome. Rotating an image of a cat should still classify it as a cat.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI System Testing: Production Reality
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Integration Testing&lt;/strong&gt; - How does your ML model integrate with APIs, databases, frontend applications?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;End-to-End Testing&lt;/strong&gt; - Testing complete workflows. User submits a photo → Model processes → Results displayed → Action taken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A/B Testing&lt;/strong&gt; - Running two model versions simultaneously to compare performance. Model v2 might be more accurate but slower.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shadow Testing&lt;/strong&gt; - Running new models alongside production without affecting users. Comparing predictions to validate before full deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ethical &amp;amp; Bias Testing: The Responsibility Factor
&lt;/h3&gt;

&lt;p&gt;This is where AI testing diverges significantly from traditional testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bias Testing&lt;/strong&gt; - Does your hiring algorithm discriminate based on gender? Does your loan approval model have racial bias?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fairness Testing&lt;/strong&gt; - Ensuring equitable outcomes across demographic groups. Statistical parity, equal opportunity, individual fairness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explainability Testing&lt;/strong&gt; - Can you explain why the model made a decision? Critical for regulated industries (healthcare, finance, legal).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adversarial Testing&lt;/strong&gt; - Intentionally crafting inputs to fool your model. Adding noise to images, manipulating text, poisoning data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fundamental Shift
&lt;/h2&gt;

&lt;p&gt;Traditional software is &lt;strong&gt;deterministic&lt;/strong&gt;. Same input → Same output. Every time.&lt;/p&gt;

&lt;p&gt;AI systems are &lt;strong&gt;probabilistic&lt;/strong&gt;. Same input → Potentially different outputs. Statistical validation required.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test assertions become threshold-based ("accuracy &amp;gt; 95%") rather than exact matches&lt;/li&gt;
&lt;li&gt;Continuous monitoring replaces point-in-time testing&lt;/li&gt;
&lt;li&gt;Data pipelines need as much testing as code&lt;/li&gt;
&lt;li&gt;Model versioning and rollback strategies become critical&lt;/li&gt;
&lt;li&gt;Ethical considerations join functional requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Implications for QA Engineers
&lt;/h2&gt;

&lt;p&gt;If you're coming from traditional QA like I did, here's what changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Learn statistics&lt;/strong&gt;: You'll need to understand confusion matrices, ROC curves, statistical significance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data engineering skills&lt;/strong&gt;: SQL, data pipelines, feature engineering become part of your toolkit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Domain knowledge matters more&lt;/strong&gt;: Understanding what "good" looks like for a medical diagnosis model requires healthcare knowledge.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Testing becomes ongoing&lt;/strong&gt;: Models degrade over time. Monitoring isn't optional.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Collaborate differently&lt;/strong&gt;: You'll work closely with data scientists, ML engineers, domain experts.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tools of the Trade
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Traditional Testing&lt;/strong&gt;: Selenium, Playwright, JUnit, NUnit, Postman, JMeter, Cypress&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Testing&lt;/strong&gt;: Great Expectations, MLflow, Evidently AI, DeepChecks, Weights &amp;amp; Biases, TensorBoard&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bridging Both&lt;/strong&gt;: That's where frameworks like my SeleniumSelfHealing.Reqnroll project come in - using AI to make traditional testing more robust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;We're not abandoning traditional testing principles. We're extending them. The fundamentals of good testing - clear objectives, reproducibility, comprehensive coverage - remain vital.&lt;/p&gt;

&lt;p&gt;But AI introduces new challenges: non-deterministic behavior, data dependencies, ethical considerations, continuous degradation. Our testing strategies must evolve accordingly.&lt;/p&gt;

&lt;p&gt;The future QA engineer needs feet in both worlds. Master traditional testing techniques while embracing AI-specific methodologies. It's an exciting time to be in quality assurance.&lt;/p&gt;

&lt;p&gt;What testing types are you working with? Traditional, AI, or both? Drop a comment below!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;P.S. If you're interested in AI-powered test automation, check out my open-source projects on GitHub &lt;a href="https://github.com/aiqualitylab" rel="noopener noreferrer"&gt;@aiqualitylab&lt;/a&gt; or read more on &lt;a href="https://aiqualityengineer.com" rel="noopener noreferrer"&gt;aiqualityengineer.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags&lt;/strong&gt;: #testing #qa #ai #machinelearning #automation #softwaredevelopment #qualityassurance #devops&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ai</category>
      <category>qa</category>
      <category>automation</category>
    </item>
    <item>
      <title>Testing AI Systems: Handling the Test Oracle Problem</title>
      <dc:creator>Let's Automate 🛡️</dc:creator>
      <pubDate>Wed, 17 Dec 2025 20:09:54 +0000</pubDate>
      <link>https://dev.to/qa-leaders/testing-ai-systems-handling-the-test-oracle-problem-3038</link>
      <guid>https://dev.to/qa-leaders/testing-ai-systems-handling-the-test-oracle-problem-3038</guid>
      <description>&lt;p&gt;AI systems are typically a blend of AI components, such as machine learning models, and non-AI components, like APIs, databases, or UI layers. Testing the non-AI parts of these systems is similar to testing traditional software. Standard techniques like boundary testing, equivalence partitioning, and automation can be applied effectively. However, the AI components present a different set of challenges. Their complexity, unpredictability, and data-driven nature require a specialized approach to testing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flvhuwppjeq7aardnsdcd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flvhuwppjeq7aardnsdcd.png" alt=" " width="800" height="684"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Challenge: The Test Oracle Problem
&lt;/h2&gt;

&lt;p&gt;In traditional software testing, we compare the actual results of a test with the expected results, which serve as the "oracle." This comparison determines whether the test has passed or failed. However, in AI systems, defining what the "correct" output should be for every possible input is often difficult. This is known as the "test oracle problem."&lt;/p&gt;

&lt;p&gt;This difficulty arises because:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI behavior is probabilistic, not deterministic:&lt;/strong&gt; AI models, especially machine learning models, don't always produce the same output for the same input. There's often an element of randomness involved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outputs can vary even for similar inputs:&lt;/strong&gt; Small changes in the input data can sometimes lead to significant changes in the output, making it hard to predict the expected behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Techniques to Tackle the Oracle Problem
&lt;/h2&gt;

&lt;p&gt;Several techniques can be used to address the test oracle problem in AI systems:&lt;/p&gt;

&lt;h3&gt;
  
  
  Back-to-Back Testing
&lt;/h3&gt;

&lt;p&gt;This technique involves comparing the outputs of two systems performing the same task. One system can serve as a reference for the other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; You run the same input through both systems and compare their outputs. If the outputs are significantly different, it indicates a potential issue in one of the systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Regression testing:&lt;/strong&gt; Comparing the output of a new version of a model with the output of a previous, trusted version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Baseline comparison:&lt;/strong&gt; Comparing the output of a new model with the output of a different model that is known to perform well.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt; Useful when a trusted baseline exists or for detecting regressions in model performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  A/B Testing
&lt;/h3&gt;

&lt;p&gt;A/B testing involves comparing two versions of a model in a production environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Real users are randomly assigned to one of the two versions of the model. The performance of each version is then measured based on user behavior and feedback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Self-learning systems:&lt;/strong&gt; Evaluating the impact of new training data or model updates on real-world performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live model updates:&lt;/strong&gt; Ensuring that new model versions perform as expected before fully deploying them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt; Allows for testing with real user input and detecting changes, regressions, or data poisoning in a live environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metamorphic Testing
&lt;/h3&gt;

&lt;p&gt;Metamorphic testing relies on identifying logical relations between inputs and outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Instead of knowing the exact correct output for a given input, you define relationships that should hold true. For example, if you rotate an image of a cat, the model should still identify it as a cat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; If rotating a cat image still shows "cat," the model is consistent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt; Helps find issues without knowing the exact correct output. It is especially helpful for non-experts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt; There are currently no commercial tools available for metamorphic testing; it is mostly a manual process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other AI-Specific Testing Techniques
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Adversarial Testing
&lt;/h3&gt;

&lt;p&gt;Adversarial testing involves feeding tricky or intentionally misleading inputs to the model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; You create inputs that are designed to exploit weaknesses in the model and cause it to make incorrect predictions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security-sensitive systems:&lt;/strong&gt; Identifying vulnerabilities that could be exploited by malicious actors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety-critical systems:&lt;/strong&gt; Ensuring that the model can handle unexpected or unusual inputs without causing harm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt; Checks robustness and is useful in security-sensitive or safety-critical systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Poisoning Tests
&lt;/h3&gt;

&lt;p&gt;Data poisoning tests involve injecting bad or malicious data into the training sets to see if the model can be corrupted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; You introduce flawed or biased data into the training data used to build the model. Then, you observe how the model's performance changes as a result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI systems exposed to untrusted or public data sources:&lt;/strong&gt; Protecting against malicious actors who might try to manipulate the model by injecting bad data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt; Important for AI systems exposed to untrusted or public data sources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pairwise Testing
&lt;/h3&gt;

&lt;p&gt;Pairwise testing involves testing all combinations of input parameter pairs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; You identify the key input parameters that affect the model's behavior. Then, you create test cases that cover all possible combinations of these parameters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt; Reduces test set size while covering more interactions in complex models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experience-Based Testing
&lt;/h3&gt;

&lt;p&gt;Experience-based testing leverages domain knowledge and tester intuition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Testers use their understanding of the system and the data to design test cases that are likely to uncover issues. This often includes Exploratory Data Analysis (EDA) to understand the data used in training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt; Useful when model behavior depends heavily on the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Neural Network Coverage
&lt;/h3&gt;

&lt;p&gt;Neural network coverage is similar to code coverage but applied to neural networks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; You measure the extent to which the test cases exercise different parts of the neural network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt; Ensures all parts of the model logic are exercised. Useful for deep learning models to detect untested paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary for QA Teams
&lt;/h2&gt;

&lt;p&gt;Testing AI components isn't about verifying fixed outputs. It's about understanding behavior, patterns, and risks.&lt;/p&gt;

&lt;p&gt;Choosing the right mix of testing techniques depends on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Risk level:&lt;/strong&gt; (e.g., safety, security)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;System complexity&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data quality&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model type:&lt;/strong&gt; (static vs. self-learning)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By combining traditional testing with AI-specific methods, QA teams can validate AI systems effectively and ensure they're reliable, safe, and fair.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aqe</category>
      <category>qa</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
