DEV Community: Let's Automate 🛡️

EU AI Act Compliance: Why Open Source Is Your Smartest Move Before August 2026

Let's Automate 🛡️ — Sat, 09 May 2026 12:53:53 +0000

What the regulation actually demands, 8 steps to open source an AI repo the right way, and a real GitHub example that already does it.

The clock is ticking. August 2, 2026 is the hard deadline for EU AI Act compliance — and if your AI system touches hiring, finance, healthcare, or critical infrastructure, the rules apply to you.

But here is something most people are not talking about yet: your choice of development model matters as much as your compliance checklist.

Going open source is not just a philosophical decision. Right now, it is one of the most practical moves you can make toward EU AI Act readiness.

Here is why — and what it actually looks like when you do it right.

The Three Things the EU AI Act Actually Demands

Strip away the legal language and the Act comes down to three core demands:

Transparency. Regulators need to understand how your AI works. The logic, the decisions, the data — all of it needs to be explainable. A black box is not acceptable.

Traceability. Every change to an AI system needs to be logged. Who changed it, when, and why. If something goes wrong, there has to be a trail.

Human Oversight. A person must always be able to review, question, or stop an AI decision. Automation without a human check-in is a liability.

Now look at what open source gives you by default.

EU ACT

Why Open Source Matches These Demands Naturally

When a codebase is public, transparency is not something you need to retrofit — it is already there. Anyone can read the logic. Any regulator can audit it. Any auditor can trace it.

Git history is a natural audit trail. Every commit is timestamped and attributed. Every change is logged. That is traceability built into the workflow, not bolted on later.

And open source communities are faster at catching problems. When more people can see the code, issues get spotted and fixed sooner — which directly supports the Act’s requirement for ongoing monitoring.

The result: lower compliance effort, less documentation work, and a stronger trust signal when regulators come asking.

8 Steps to Open Source an AI Repo the Right Way

Open sourcing is not just making a repo public. Done carelessly, it can expose secrets, violate data licenses, or create legal problems. Here is how to do it properly:

Step 1 — Scan the repo history for secrets and sensitive data. Git remembers everything, including things you deleted. Run a tool like TruffleHog across the full commit history before going public. Better yet, add it to your CI pipeline so every future push is scanned automatically.

Step 2 — Review all training data licenses. The code and the data are separate questions. Check whether you have the right to share or reference any data used to train or run the model.

Step 3 — Check third-party dependency licenses. Every library you use has its own license. Make sure none of them restrict open distribution.

Step 4 — Define what is open and what stays private. Open source does not mean everything is public. Many projects open source the framework while keeping training data or deployment configuration private. Be explicit about the boundary.

Step 5 — Choose the right license. Apache 2.0 is a solid default for most AI projects — permissive, commercially friendly, and includes patent protection. If responsible use is a priority, look at RAIL (Responsible AI License).

Step 6 — Write a model card. A model card is a short document that explains what the AI does, what it was trained on, what it is good at, and where it has known limitations. It is essentially required under the EU AI Act for high-risk systems — and it takes less than an hour to write.

Step 7 — Document contribution and governance rules. Who can approve changes? How are reported problems handled? This does not need to be complex — a CONTRIBUTING.md and a short governance note will do.

Step 8 — Set up issue tracking and incident reporting. The Act requires a process for reporting and responding to AI failures. Open platforms like GitHub make this straightforward. Add a public issue tracker and a private channel for sensitive reports.

· AI Quality Lab aiqualitylab.org · github.com/aiqualitylab

A Real Example: What This Actually Looks Like

Theory is useful. A working example is better.

The ai-natural-language-tests repo from AI Quality Lab is an open source AI platform that generates end-to-end tests from plain English requirements. It is not a compliance demo — it is a real, actively developed tool. But it has quietly implemented most of what the EU AI Act asks for:

Secret scanning in CI — TruffleHog runs on every push, catching credentials before they hit the public repo

Model card — MODEL_CARD.md documents which AI providers are used, what data is sent to them, known limitations, and what the tool should not be used for

Incident response — INCIDENT_RESPONSE.md defines what counts as an incident, five steps to respond, and how to handle sensitive issues privately

Transparency notice — the README clearly states what gets sent to AI providers and what does not

Full audit trail — 324 commits, all public, every change attributed and timestamped

Open license — AGPL-3.0, clearly documented

None of this required a legal team or a compliance budget. It required clear thinking and a few well-placed documents.

The Checklist: Are You Ready?

Before making an AI repo public — or before your next compliance review — run through this:

CHECKLIST — EA ACT

Repo history scanned for secrets and sensitive data

Training data licenses reviewed and documented

Third-party dependency licenses checked

Clear boundary defined between what is open and what stays private

Open source license selected and applied

Model card written and included

Contribution and governance process documented

Issue tracking and incident reporting in place

Eight items. Most can be done in a day.

The Bottom Line

Open source does not automatically make you compliant with the EU AI Act. But it gives you a structure where compliance is far easier to demonstrate — because the transparency, traceability, and accountability regulators are asking for are already part of how open source works.

Organizations that treat this deadline as a burden will aim for the minimum. Those that treat it as a reason to build properly will end up with AI systems that are genuinely more trustworthy — and that matters beyond Europe, beyond 2026, and beyond any single regulation.

The deadline is fixed. The approach is a choice.

· AI Quality Lab aiqualitylab.org · github.com/aiqualitylab

How to Generate Cypress, Playwright, and WebdriverIO Tests From Natural Language Using AI

Let's Automate 🛡️ — Mon, 04 May 2026 23:01:02 +0000

A step-by-step breakdown of an open-source platform that converts plain English requirements into runnable E2E tests — no manual coding required

Writing end-to-end tests is one of those things every developer knows they should do well and almost nobody actually enjoys. You spend an hour getting a Playwright spec to click the right button, another hour figuring out why the selector breaks in CI, and by then the feature has already been redesigned anyway.

So when I came across a project that lets you describe what you want to test in plain English — and then generates the actual test code — I had to dig in.

The project is called AI Natural Language Tests , built under AI Quality Lab. It is open source on GitHub, has a published academic DOI on Zenodo, and as of this week just shipped v5.0.0. You can also try it right now in your browser on Hugging Face Spaces — no installation needed.

Here is what it does, how it works, and why it deserves a spot in your QA toolkit.

The Core Idea

Instead of writing:

cy.get('#username').type('admin')
cy.get('#password').type('secret')
cy.get('[type=submit]').click()
cy.contains('Dashboard').should('be.visible')

You just say:

"Test login with valid credentials"

The platform reads that sentence, visits the URL you point it at, analyzes the live HTML to find the actual form fields and selectors, then generates a complete runnable test — in Cypress, Playwright, or WebdriverIO, whichever you prefer.

That is the pitch. But the internals are more interesting than the pitch.

What Is Actually Happening Under the Hood

This is not a thin wrapper around a ChatGPT call. It runs a structured five-step workflow built with LangGraph — and each step has a clear purpose.

Step 1 — Understand the page. When you pass a --url, the system fetches the live HTML and extracts real selectors, form fields, and interactive elements. This is what prevents it from hallucinating IDs that do not exist on your page.

Step 2 — Check memory. The system keeps a vector database (FAISS + SQLite) of patterns from every test it has previously generated. Before writing anything new, it searches for similar past tests using semantic similarity. If it has seen a login flow before, it reuses what worked.

Step 3 — Generate with an LLM. The actual test code is produced by your choice of LLM — OpenAI, Anthropic Claude, or Google Gemini. LangChain handles prompt templating and output parsing, while LangGraph turns the multi-step flow into a repeatable, auditable pipeline rather than a single prompt-and-pray call.

Step 4 — Optional human review. There is a --approve flag that pauses execution before saving the generated test and asks a human to confirm. This Human-in-the-Loop gate is especially useful when running the tool against production-critical flows where you want a set of eyes before anything gets committed.

Step 5 — Run it. Pass --run and the tool immediately executes the generated test through the framework runner. If it fails, an AI-assisted failure analyzer categorizes the error and suggests a fix — more on that below.

Getting Started Takes About Five Minutes

git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
python -m venv .venv && source .venv/bin/activate # macOS/Linux
pip install -r requirements.txt
npm ci
npx playwright install chromium

Add your API key to a .env file:

OPENAI_API_KEY=your_key

Then generate and immediately run a test:

python qa_automation.py "Test login with valid credentials" \
  --url https://the-internet.herokuapp.com/login \
  --framework playwright \
  --run

That single command fetches the page, generates a .spec.ts file, and runs it through Playwright — without you writing a line of test code.

If you just want to see it work before installing anything, the live Hugging Face Spaces demo lets you paste in a requirement and watch the generation happen in real time.

Three Frameworks, One Workflow

The tool supports all three major E2E frameworks with the same natural language interface. You switch between them with a single flag:

3 Frameworks, One Workflow

The Cypress integration is worth noting specifically — it supports two distinct modes. The traditional mode generates standard Cypress code. The prompt-powered mode uses cy.prompt() to keep natural language embedded directly in the test, which is useful for teams exploring the newer AI-native Cypress APIs.

If your team is mid-migration from Cypress to Playwright, you can generate equivalent tests in both frameworks from the same requirement and compare them side by side.

Writing Prompts That Actually Work

The output quality depends heavily on how specific you are. A few patterns that work well:

Name the expected outcome. “Test login fails with wrong password and shows an error message” produces a far more precise test than “Test login.”

Chain multiple requirements. You can pass several prompts in one run: "Test login" "Test logout" --url — each gets its own generated file.

Always use --url. Giving the tool a real page means it reads actual HTML instead of guessing selector names. This is the single biggest factor in test quality, because the generator extracts real element IDs and attributes from the live DOM.

Some practical examples:

Usage : https://github.com/aiqualitylab/ai-natural-language-tests#usage

When Tests Fail: AI-Assisted Diagnosis

One of the more practical features is the failure analyzer. Instead of staring at a cryptic Cypress error, you pass it to the tool:

python qa_automation.py --analyze "CypressError: Element not found"

The analyzer categorizes the error into one of ten types — SELECTOR, TIMING, ASSERTION, NETWORK, STATE, NAVIGATION, INTERACTION, CONFIGURATION, ENVIRONMENT, or DYNAMIC_URL — then gives you a plain-English explanation of the root cause and a concrete suggestion for fixing it.

You can also pipe in a full log file: python qa_automation.py --analyze -f error.log

The Quality Evaluation Layer

This is the part most people skip over in the README, but it is arguably the most important piece for teams that care about reliability.

Generating test code is only valuable if the generated tests are actually correct. The project includes two evaluation scripts that measure whether the output is grounded in the real page content.

Offline evaluation (no API key needed). The ragas_nlp_evaluator.py script compares generated output against a reference dataset using ROUGE and string similarity metrics. It runs entirely offline, exits with a non-zero code if quality drops below a configurable threshold, and is designed to run as a fast CI gate.

LLM-based evaluation (requires OpenAI key). The ragas_evaluator.py script goes further. It fetches the live page HTML, uses GPT-4o-mini to answer the test requirement using that HTML, then scores the generated test on four dimensions: faithfulness to the page, relevance to the requirement, context precision, and context recall.

Both evaluators are wired into the GitHub Actions CI pipeline. The offline script runs first as a baseline check. If it passes, three parallel jobs spin up — one per framework — each generating tests, evaluating them with the LLM evaluator, and then executing them. If the score drops below threshold, the pipeline blocks before the tests even run.

You are not shipping generated tests blindly. You have a measurable, automated quality signal at every stage.

Docker and CI/CD

The project ships pre-built Docker images on GitHub Container Registry. You can skip the clone entirely:

docker pull ghcr.io/aiqualitylab/ai-natural-language-tests:latest

docker run --rm \
  -e OPENAI_API_KEY=your_key \
  ghcr.io/aiqualitylab/ai-natural-language-tests:latest \
  "Test login" --url [https://the-internet.herokuapp.com/login](https://the-internet.herokuapp.com/login)

For CI/CD, pin to a specific release tag (v5.0.0) rather than latest for reproducibility. The recommended pipeline stages cover dependency installation, NLP baseline evaluation, test generation, LLM evaluation, test execution, and optional telemetry export to Grafana Tempo and Loki.

Observability (Optional but Thoughtful)

If your team runs Grafana, the project has native OpenTelemetry integration that exports traces to Grafana Tempo and ships logs to Loki. This is entirely optional — leaving the relevant environment variables unset disables it completely. But for teams that already operate a Grafana stack, having AI test generation traces alongside your application traces is a genuinely useful debugging surface.

What It Does Not Do Yet

To be fair about the limits: the current CLI works through URL-driven generation. A --data flag for passing raw JSON specifications directly is not implemented yet. If your tests target APIs or non-rendered content, you will need to adapt. Given the active release cadence — nine releases with v5.0.0 landing this week — that gap may close soon.

Why This Matters Beyond the Tool Itself

The bottleneck in most QA pipelines is not running tests — it is writing them. Engineers skip test authoring because it is slow, tedious, and breaks constantly as UIs change. This tool makes the first draft essentially free, which lowers the activation energy enough that more tests actually get written.

The pattern memory design compounds the value over time. Every test the system generates gets stored as a vector embedding. Future generations for similar requirements pull from those patterns, so the output becomes more consistent and more project-specific as usage grows. It is not just generating tests in isolation — it is building institutional knowledge about how your application is structured.

The Ragas evaluation layer means you can measure whether that knowledge is accurate, and block on it in CI if it is not.

Try It

The project is open source at github.com/aiqualitylab/ai-natural-language-tests.

Want to experiment without installing anything? The live demo is on Hugging Face Spaces.

Hugging Face Space

Are you using AI-assisted test generation in your pipeline?

Share what has worked — and what has not.

QA Bug Triage Pipeline: From App Reviews to Searchable Bug Reports

Let's Automate 🛡️ — Tue, 28 Apr 2026 19:02:48 +0000

A simple Python project that turns messy user reviews into structured QA bug reports using an LLM and RAG.

📖 Full guide: blog.aiqualitylab.org

Why this project

Product teams get lots of feedback, but most of it is noisy and unstructured. This project helps QA teams convert that feedback into consistent bug records that are easy to search and summarize.

Photo by Guille B on Unsplash

What it does

Collects reviews from Google Play

Routes review text (bug report vs non-bug)

Generates structured JSON bug reports with an LLM

Stores bugs in ChromaDB for semantic retrieval

Adds BM25 keyword matching for hybrid search

Produces short AI summaries for triage

Lets you clear the stored bugs from the UI

API key

This app uses BYOK (Bring Your Own Key):

Paste your OpenAI API key in the UI

The key is masked

Do not commit keys to source control

Main files

app.py: Gradio app flows

collect.py: review collection

triage.py: routing and structured triage logic

rag.py: storage and hybrid retrieval

eval/eval.py: evaluation script

Evaluation sample

Answer Relevancy: 0.868

Faithfulness: 0.292

Context Precision: 0.020

Cost target

For a short demo session, the expected usage is typically under $0.50.

Tips:

Keep review count low (5 to 10)

Avoid repeated large collection runs

Use short test inputs when validating triage

Tech stack

Python

Gradio

OpenAI GPT-4o

ChromaDB

rank-bm25

RAGAS

google-play-scraper

This project is useful for QA teams that want a lightweight bug triage assistant with searchable bug intelligence and fast summaries.

Prompt Injection Attacks Are Breaking AI Products — Here’s How to Stop Them

Let's Automate 🛡️ — Sat, 25 Apr 2026 12:23:02 +0000

The Simple, Non-Technical Guide to Defensive Prompting: How to Protect Your LLM-Powered App Before Someone Exploits It

📖 Full guide: blog.aiqualitylab.org

Your AI is only as safe as the thought you put into protecting it. Prompts aren’t just instructions — they’re the rules your AI lives by. Protect them like you’d protect any critical part of your product.

Photo by Nik Shuliahin 💛💙 on Unsplash

The teams winning at AI aren’t just the ones moving fast. They’re the ones moving fast and thinking about this.

AI Is Normal Now. The Problems Aren’t.

GitHub Copilot CLI Remote: Control Your AI Coding Agent From Phone and Web

Let's Automate 🛡️ — Fri, 17 Apr 2026 17:10:11 +0000

New copilot --remote preview lets you steer Copilot CLI sessions from GitHub.com and GitHub Mobile — here's what it does and why it matters

📖 Full guide, team scenarios, and honest limitations: blog.aiqualitylab.org

💻 Source on GitHub: aiqualitylab/blog

🔗 Official GitHub changelog: Remote control CLI sessions on web and mobile

If you use AI coding tools in your terminal, you know the problem. You start a 20-minute task, step away, and come back to find the agent stalled — waiting for you to approve something ten minutes ago.

On April 13, GitHub shipped a fix: copilot --remote.

GitHub Copilot CLI Remote: Control Your AI Coding Agent From Phone and Web

What it does

Turn on remote mode and your CLI session streams to GitHub in real time. Your terminal shows a link and a QR code. Open it on any phone or browser, and you get a live, two-way view. You can send messages, approve permissions, switch modes, and stop the session — all from your phone.

How to turn it on

copilot --remote

You need to be in a GitHub repo.

Copilot Business and Enterprise users need an admin to enable the policy first.

Pain-Driven Architecture: How Each Problem Led to the Next Layer of an AI Agentic Architecture

Let's Automate 🛡️ — Sun, 29 Mar 2026 20:11:00 +0000

How a weekend experiment turned into a full agentic architecture — and why I’m really proud, because the future of architecture is where my magical spices are.

🔗 GitHub: github.com/aiqualitylab/ai-natural-language-tests

🌐 Website: tests.aiqualitylab.org

One weekend, I wrote a script. Give it a sentence and a URL. It calls an LLM, gets some code back, saves it to a file. Dead simple.

That was v1. It was dumb. It worked sometimes.

But something about it felt right. So I kept going.

AI Agentic Architecture

It Started Simple

The first version was embarrassingly basic. One script. One API call. One prompt that said “write me a test for this requirement.”

No memory. No analysis. No structure.

The problems showed up immediately. But when it worked, it saved me time. When it didn’t, I spent time fixing it. The math still worked in my favour.

So instead of throwing it away, I started asking: what would it take to make this actually reliable?

Then the Architecture Grew

Over the few months, the script turned into a system. Not because I planned it that way — but because each pain pointed to a clear next step.

Pain: The AI picked terrible selectors.

Solution: I wrote a set of rules and injected them into every prompt. Always prefer stable attributes. Never use fragile selectors.

Pain: The AI was writing code blind.

Solution: I added a step before generation — fetch the page, analyze the structure, extract every element into a clean structured format.

Pain: Every run started from zero.

Solution: I added a vector database. Every generated output gets stored as an embedding. Next time a similar requirement comes in, the system pulls up past patterns as references. First run: scratch. Fiftieth run: it knows your style.

Pain: Locked into one framework.

Solution: I made the framework architecture simpler, Adding a third framework took almost 6 months.

Pain: Locked into one AI provider.

Solution: I built a thin layer over three providers — OpenAI, Anthropic, and Google. One flag switches the brain. The pipeline doesn’t care which model is thinking.

Pain: No idea why things break.

Solution: I added OpenTelemetry spans to every step. Traces go to Grafana Tempo. Logs go to Grafana Loki. Now when something goes wrong, I see the entire decision chain. Like a flight recorder for an AI system.

Pain: When things fail, you’re on your own.

Solution: I built a failure analyzer. Feed it an error, it classifies the failure and gives you a structured diagnosis — what went wrong, why, and how to fix it.

Where It Stands Today

The weekend script is now a five-step workflow.

It takes a requirement in plain English and a URL. It analyses the page. It searches its memory for similar patterns. It generates a complete, runnable spec — constrained by rules and informed by past experience. And if you want, it runs the test right there.

Three AI providers. Three frameworks. A growing pattern library. Structured prompt specs. A failure analyser. Full observability.

One sentence in. A working test out.

But Here’s What Gets Me Excited

The architecture mindset is shifting.

Automation engineer → Architectural LLM Engineer. That’s the shift.

I’m excited. Are you?

If so — wait and watch. The gun is loaded.

Photo by averie woodard on Unsplash

The Honest Part

I want to be straight about what this is and what it isn’t.

It doesn’t know your business logic. It doesn’t know which flows are critical and which don’t matter. It doesn’t know your edge cases unless you tell it.

The system handles the mechanical work. You handle the strategic work.

The project is open source under AGPL-3.0 at github.com/aiqualitylab/ai-natural-language-tests .

Clone it. Install dependencies.

Give it a sentence and a URL.

Watch it work.

It started as weekend idea. Now it’s a full agentic pipeline with memory, observability, and multi-framework support.

And the best part? I’m just getting started.

Follow the project on [_GitHub](https://github.com/aiqualitylab/ai-natural-language-tests)._

AI-Assisted Testing vs AI Agents vs AI Agent Skills: A Practical Journey Through All Three

Let's Automate 🛡️ — Sat, 07 Mar 2026 13:08:54 +0000

Most teams are only using one layer of AI in testing. Here is what the full picture looks like — and how I built across all three.

Photo by Possessed Photography on Unsplash

Before any of this made sense, I had to answer a more basic question: what does AI QA Engineering actually mean?

What is AI QA Engineering — and Why QAEs, SDETs, and QA Automation Engineers Should Pay Attention

And before touching AI at all — the foundations still matter. Clean BDD tests. Reports that stakeholders can read.

How to Add Beautiful BDD Test Reports to Your Reqnroll Project Using Expressium LivingDoc

Before you automate smarter, you have to know what good looks like.

Layer 1 — AI-Assisted Testing

AI speeds you up. You are still driving.

This is where most teams start — and where most teams stay.

You write a prompt, get a test, review it, ship it. AI is a productivity multiplier. GitHub Copilot suggests the next line. ChatGPT drafts your test cases. Claude rewrites a flaky selector. You are in control at every step.

The catch? A bad prompt gives you a bad test — and it will look convincing. Garbage in, confident garbage out.

Crafting Effective Prompts for GenAI in Software Testing

I built ai-natural-language-tests at this layer. Give it a plain English requirement, and it generates Cypress or Playwright tests using GPT-4, LangChain, and LangGraph. Every output still needs your eyes on it — but the heavy lifting is done.

Same idea with JIRA-QA-Automation-with-AI : feed it a JIRA story with acceptance criteria, and BDD test scripts come out the other side. Human judgment still required at the end. You own every decision.

That last part is the definition of this layer.

Layer 2 — AI Agents for Testing

You give the goal. The agent executes, adapts, and decides.

At this layer, you stop steering and start delegating.

You set the objective. The agent figures out how to get there — and when something breaks mid-run, it handles that too. No human in the loop for every step.

selenium-selfhealing-mcp is a good example of what this looks like in practice. A UI change breaks a Selenium locator mid-execution. The agent inspects the DOM, finds the updated element, and keeps going — without stopping to ask you what to do. I submitted this to the Docker MCP Registry, and watching it recover from failures on its own still feels like a step-change from Layer 1.

For .NET teams, SeleniumSelfHealing.Reqnroll does the same with C#, NUnit, Reqnroll, and Semantic Kernel. And IntelliTest takes it further — write your assertions in plain English, and the agent decides whether the application behaviour actually matches the intent.

But there is a trap at this layer. Agents move fast and look thorough. It is easy to trust the output and skip the checks. Coverage looks complete — but the agent may have tested the wrong thing entirely.

The AI QA Engineer’s Decision Framework: When NOT to Use AI in Testing

And if you are using AI agents to run tests, a harder question follows: how do you know the agent’s output is correct? That is the LLM evaluation problem, and it turns out to be one of the most interesting unsolved problems in this space.

LLM Evaluation Explained: How to Know If Your AI Is Actually Working

Layer 3 — AI Agent Skills

Not a tool. Not an agent. Expertise that travels.

Layer 3 is the one most people have not thought about yet.

Here is the pattern I kept running into: every new agent project started from scratch. New codebase, new prompts, same underlying knowledge — how to read a requirement, what makes a test meaningful, when to flag a risk. The expertise was always being rebuilt. That seemed wrong.

A skill is a portable, encoded unit of expertise. It is not tied to one agent or one project. Any compatible agent can load it and apply it — without rebuilding the logic again. You build it once, and it travels.

GitHub Copilot Agent Skills: Teaching AI Your Repository Patterns

vibe-coding-checklist applies the same idea to AI code review — a shared quality framework that any team or any agent can use consistently.

The shift in thinking is subtle but significant. At Layer 1, you build prompts and tools. At Layer 2, you build goals and trust boundaries. At Layer 3, you build expertise itself — in a form that outlasts any single project or team.

The Difference That Matters

AI-Assisted Testing vs AI Agents vs AI Agent Skills

Three layers. All called AI testing. Now you know which one you are actually in.

All repos → github.com/aiqualitylab

More writing → aiqualityengineer.com

The GitHub Copilot Features That Are Quietly Draining Your Premium Requests

Let's Automate 🛡️ — Thu, 19 Feb 2026 17:19:23 +0000

10 optimisations most developers miss — including why the Copilot Coding Agent beats Agent Mode Chat every time

Most developers hit their monthly limit in the first week. Here’s what’s actually happening under the hood — and how to work smarter before it happens to you.

Photo by Resume Genius on Unsplash

Before diving in, it helps to understand what GitHub Copilot actually counts as a premium request, because most developers don’t find out until it’s too late.

Inline code completions on paid plans are unlimited and cost nothing. What drains your monthly allowance is everything else — Copilot Chat, Agent Mode, Copilot Code Review, Copilot CLI, and the Copilot Coding Agent.

Each model also carries a multiplier. Some models are included free on paid plans. Once your allowance is gone, premium features are locked for the rest of the billing cycle.

Knowing that, here’s how to make every request count.

1. Name your functions like they’re instructions

Inline autocomplete is unlimited on paid plans and costs nothing from your premium allowance. The more precisely you name a function, the more accurately Copilot completes the body without any Chat involved. This is your primary tool, not a fallback.

2. Write your intent as a comment above the cursor

A detailed comment placed directly before your cursor is treated by Copilot as an instruction. You get the same outcome as a Chat message at zero premium cost. Use this for any logic you would otherwise describe to Copilot in conversation.

3. Cycle through alternatives with Alt+] before opening Chat

When the first inline suggestion misses, most developers immediately reach for Chat. Before doing that, cycle through alternative suggestions. The second or third option is often exactly what’s needed — and one saved Chat message multiplies across a full day of work.

4. Disable Agent Mode when you’re not actively using it

Agent Mode runs in the background and silently runs even when you’re not directing it. GitHub’s official documentation explicitly flags this as a common cause of unexpected quota drain. Disable it in your repository settings when it isn’t part of your current workflow.

5. Use the Copilot Coding Agent for complex tasks instead of Agent Mode Chat

This is one of the least-known optimisations available. The Copilot Coding Agent — the one that creates and modifies pull requests asynchronously — counts as one premium request per full session regardless of how much work it does. Agent Mode Chat charges one premium request per message, multiplied by the model rate. For any task involving multiple files or significant implementation work, the Coding Agent is dramatically more efficient.

6. Start a new Chat thread when switching topics

As a conversation grows, all prior messages remain in context and contribute to token consumption. GitHub’s documentation specifically calls this out as a driver of elevated usage. When you move to a new task or a different area of your codebase, start a fresh thread rather than continuing an existing one.

7. Understand the model multiplier before choosing one

Before switching to a powerful model, weigh whether the capability gain justifies the cost. For most day-to-day work, it doesn’t.

8. Use auto model selection for a built-in discount

When you enable auto model selection in Copilot Chat in VS Code, GitHub applies a 10% multiplier discount across all premium model usage. It requires no change to your workflow and the saving compounds quietly across a full month.

9. Use #file references instead of @workspace

@workspace scans your entire codebase on every message, consuming more than most questions require. Using #file:yourfile.ts targets exactly the context Copilot needs, which produces more focused answers with less back-and-forth and fewer requests spent getting there.

10. Set a budget alert before your allowance runs out

GitHub lets you configure alerts at 75%, 90%, and 100% of any spending threshold you define. Setting a low or zero spending budget with alerts enabled means you get notified well before premium features are cut off — without risking unexpected charges. Check your current usage anytime at github.com/settings/billing or through the Copilot icon in your IDE status bar.

The Principle Underneath All of It

Every tip here points back to the same question worth asking before you open Chat: is there a way to get this through autocomplete instead?

Reference — https://docs.github.com/en/copilot

Most of the time, there is. And building that habit is what separates developers who hit the wall in week one from those who reach month end with room to spare.

AI Natural Language Tests — Dual Framework Test Automation with Cypress & Playwright

Let's Automate 🛡️ — Sun, 01 Feb 2026 16:55:23 +0000

AI Natural Language Tests — Dual Framework Test Automation with Cypress & Playwright

Open-source AI test automation framework with natural language test generation, self-healing, and dual framework support

Writing end-to-end tests is one of those things every team knows they should do, but nobody really enjoys doing. You stare at a login page, figure out the selectors, write the steps, handle the waits, and repeat this for every feature. I kept thinking — what if I could just say what I want to test, and let AI handle the rest?

That’s exactly what I built.

Architecture

What Is It?

ai-natural-language-tests is an open-source tool that takes a plain English description of a test scenario and generates a fully working Cypress or Playwright test file. No templates. No copy-pasting. You describe the test, point it at a URL, and it writes the code.

Here’s what a typical command looks like:

python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login

That single line does everything — fetches the page, reads the HTML, picks up the right selectors, and generates a complete test file you can run immediately.

Want Playwright instead of Cypress? Just add a flag:

python qa_automation.py "Test login with valid credentials" --url https://the-internet.herokuapp.com/login --framework playwright

How It Actually Works

Under the hood, the tool runs a 5-step workflow built with LangGraph:

Complete Workflow

Step 1 — It sets up a vector store. Think of this as a memory bank for test patterns.

Step 2 — It fetches the target URL, pulls the HTML, and extracts useful selectors like input fields, buttons, and links.

Step 3 — It searches the vector store for similar tests it has generated before. If you tested a login page last week, it remembers the patterns.

Step 4 — It sends everything to GPT-4 along with a carefully crafted prompt — the description, the selectors, and any matching patterns from history. The AI generates the actual test code.

Step 5 — Optionally, it runs the test right away using Cypress or Playwright.

The interesting part is Step 3. Every test the tool generates gets saved as a pattern. Over time, it builds a library of patterns and uses them to write better tests. The first test for a login page might be decent. The tenth one will be much better because it has learned from all the previous ones.

Why Two Frameworks?

I started with Cypress because it’s what most teams I’ve worked with use. But Playwright has been gaining serious traction — especially for teams that need multi-browser testing or prefer TypeScript.

So in v3.1, I added full Playwright support. The tool uses different prompts for each framework. The Cypress prompt focuses on chaining commands and cy.get() patterns. The Playwright prompt covers locators, async/await, network interception, multi-tab handling, and all the TypeScript-specific patterns.

You pick the framework. The AI adapts.

The Part I Didn’t Expect — Failure Analysis

While building this, I realized that generating tests is only half the problem. Tests fail. And reading Cypress or Playwright error logs can be painful, especially for someone newer to the frameworks.

So I added an AI-powered failure analyzer:

python qa_automation.py --analyze "CypressError: Timed out retrying after 4000ms"

It reads the error, explains what went wrong in plain language, and suggests a fix. You can also point it at a log file. It’s a small feature but it has saved me a surprising amount of time.

Running It in CI/CD

The tool comes with a GitHub Actions workflow out of the box. You can trigger it manually from the Actions tab — type your test description, provide a URL, pick Cypress or Playwright, and it runs the full pipeline. Generate, execute, and get results — all inside your CI.

CI/CD PIPELINE

This makes it practical for teams that want to try AI-generated tests without changing their existing setup. Just add the workflow and trigger it when you need a new test.

What I Learned Building This

A few things surprised me along the way:

Prompts matter more than the model. I spent more time refining the system prompts than on any other part of the codebase. A well-structured prompt with clear constraints produces dramatically better test code than a vague one, regardless of which GPT model you use.

Pattern learning is underrated. The vector store approach turned out to be more useful than I expected. When the tool has seen similar pages before, the generated tests are noticeably more accurate. It picks up things like common selector patterns and assertion styles from its history.

Keeping frameworks separate is important. Early on, I tried using a single generic prompt for both Cypress and Playwright. The results were mediocre for both. Dedicated prompts for each framework made a huge difference in output quality.

Try It Out

The project is open source and ready to use:

GitHub: github.com/aiqualitylab/ai-natural-language-tests

First Release — https://github.com/aiqualitylab/ai-natural-language-tests/releases/tag/v2026.02.01

Setup takes about five minutes — clone the repo, install dependencies, add your OpenAI API key, and you’re generating tests.

If you work in QA or test automation and you’ve been curious about how AI fits into your workflow, give it a try. I’d love to hear what you think.

Exploring how AI can make quality engineering more practical and less tedious. I write about this stuff regularly at AI Quality Engineer .

The AI QA Engineer’s Decision Framework: When NOT to Use AI in Testing

Let's Automate 🛡️ — Sun, 25 Jan 2026 10:47:51 +0000

A Practical Guide for Quality Engineers Who Want Results, Not Hype

When NOT to Use AI in Testing: A Simple Guide

Stop. Think. Then Decide.

The Big Question

Everyone talks about using AI in testing. But nobody talks about when to SKIP it.

This guide helps you decide: AI or no AI?

Why This Matters

AI testing sounds cool. But it comes with baggage:

It costs money — AI tools need servers, licenses, and API calls.

It needs babysitting — Models drift. Prompts need tuning. Things break in weird ways.

It’s hard to debug — When AI tests fail, figuring out WHY is painful.

Your team might forget basics — If AI does everything, manual debugging skills fade.

AI isn’t bad. But it’s not always the answer.

7 Times to Skip AI (Use Traditional Testing Instead)

1. Math and Calculations

Example: Tax calculators, loan interest, pricing formulas.

Why skip AI? The answer is either right or wrong. No guessing needed. No patterns to learn.

Do this instead: Simple data-driven tests. Input goes in. Expected output comes out. Done.

2. Audit and Compliance Systems

Example: Banking apps, healthcare records, legal documents.

Why skip AI? Auditors want proof. They want to see EXACTLY what you tested. AI is unpredictable — same prompt, different results.

Do this instead: Scripted tests with detailed logs. Every step recorded. Every result traceable.

3. Speed and Load Testing

Example: Can your app handle 10,000 users at once?

Why skip AI? You’re measuring app speed. AI adds its own delay. You’d be measuring AI, not your app.

Do this instead: Use tools built for this — JMeter, k6, Gatling. They’re fast and focused.

4. Basic CRUD Operations

Example: Create user. Read user. Update user. Delete user.

Why skip AI? It’s simple. AI is overkill. Like using a rocket to go to the grocery store.

Do this instead: Write one test template. Copy it for each operation. Fast and easy.

5. Screens That Never Change

Example: Internal admin panels. Old systems nobody touches.

Why skip AI? AI shines when things CHANGE. Self-healing locators fix moving targets. No movement? No need.

Do this instead: Regular automation. Page Object Model. Set it and forget it.

6. Security Testing

Example: Finding SQL injection, XSS attacks, login bypasses.

Why skip AI? Security needs creative thinking. Breaking things in new ways. AI follows patterns — hackers don’t.

Do this instead: Security tools (OWASP ZAP, Burp Suite) plus human testers who think like attackers.

7. Physical Device Testing

Example: Barcode scanners, payment terminals, IoT sensors.

Why skip AI? AI lives in software. It can’t press physical buttons or read blinking lights.

Do this instead: Hardware test rigs. Human testers. Real-world verification.

The Quick Decision Guide

Ask yourself these 4 questions:

DECISION TABLE FRAMEWORK

Before You Buy Any AI Tool, Answer These:

What exact problem am I solving? (Not “we want AI” — a real problem)

Can a simple script fix this? (Seriously, can it?)

How will I know if it worked? (What number goes up or down?)

Who will maintain it? (AI tools need constant care)

Can I explain it to my boss? (If you can’t explain it, don’t buy it)

The Simple Truth

AI is a tool. Not a magic wand.

Good testers know WHEN to use each tool:

USAGE CHECKLIST

One Page Summary

USE AI FOR:

Generating test ideas from requirements

Handling UI changes automatically

Analyzing why tests keep failing

Creating test data variations

Exploring edge cases

SKIP AI FOR:

Exact calculations (math, money, dates)

Compliance and audit trails

Performance/load measurements

Simple CRUD operations

Stable, unchanging systems

Security penetration testing

Physical hardware testing

Final Word

The smartest move isn’t always the newest tool.

Sometimes a simple script beats a fancy AI.

Know when to use AI. Know when to skip it. That’s real skill.

Machine Learning Pipelines Made Easy for Quality Assurance Professionals

Let's Automate 🛡️ — Sat, 10 Jan 2026 19:45:18 +0000

A very simple guide to how machine learning works

Machine learning looks hard. But it is not.

If you know QA, you already know the basics.

ML systems have three parts. We call them FTI:

F = Feature (clean the data)

T = Training (teach the model)

I = Inference (use the model)

Let me explain each one.

Part 1: Feature Pipeline

What does it do?

It cleans dirty data.

Simple example:

You have messy data. Names are written in different ways. Dates are in wrong formats. Numbers have errors.

This pipeline fixes all that. It makes data clean and ready.

Feature Pipeline Detail

In QA words:

You never test with bad data. You clean it first. This pipeline does the same thing.

The clean data goes to a Feature Store.

Part 2: Training Pipeline

What does it do?

It teaches the model.

Simple example:

You show the model 1000 pictures of cats. You tell it “this is a cat” each time. The model learns what a cat looks like.

In QA words:

You learn from requirements. Then you write test cases. The model learns from data. Then it can make predictions.

Picture:

The smart model goes to a Model Registry.

Training Pipeline Detail

Part 3: Inference Pipeline

What does it do?

It uses the model to answer questions.

Simple example:

Someone shows a new picture. The model says “this is a cat” or “this is not a cat.”

In QA words:

This is like running tests in production. The model is working and giving answers.

Inference Pipeline Detail

Two Important Storage Places

Feature Store

Keeps clean data

Saves old versions

Everyone uses same data

Model Registry

Keeps trained models

Saves old versions

You know which model is in production

The Full Picture

Full FTI Pipeline Overview

Why This is Easy for QA

You already know:

✓ How to check data quality → Test Feature Pipeline

✓ How to compare old vs new → Test Training Pipeline

✓ How to test in production → Test Inference Pipeline

Five Things to Remember

Three parts. Feature, Training, Inference. That’s it.

Clean data is key. Bad data = bad model.

Save everything. Keep old data. Keep old models. You can go back if needed.

Test each part. Don’t test everything together. Test one part at a time.

Your skills work here. QA testing skills work for ML testing too.

Last Words

ML is just software with a learning step.

You already know how to test software. Now you can test ML too.

Start simple. Ask: “Show me the three pipelines.”

Then test each one.

You can do this.

I Built an AI-Powered Test Data Generator That Analyzes Any URL and Creates Test Data JSON

Let's Automate 🛡️ — Wed, 31 Dec 2025 19:12:47 +0000

I got tired of manually inspecting HTML to find selectors. So I taught my framework to do it instead.

Architecture flow

Here’s a question that kept me up at night:

Why am I spending more time finding selectors than writing actual tests?

I watched myself burn 30 minutes on a simple login test — not writing the test itself, but hunting through DevTools for the right selectors, creating fixture files, and crafting test data that would actually work.

What if the framework could just… look at the page and figure it out?

The Problem Nobody Talks About

Here’s the dirty secret of test automation: writing the actual test is the easy part.

The hard part? Finding #username vs input[name="user"] vs .login-field. Creating realistic test data. Building fixture files that match the actual form structure.

Every new page means:

Open DevTools

Inspect elements

Copy selectors

Hope they’re stable

Create JSON fixtures

Hope nothing changes tomorrow

Most “AI-powered” testing tools focus on running tests or analyzing failures. But what about the beginning — the tedious setup that drains your time before you write a single assertion?

The Experiment: Teaching AI to See

The idea was simple but audacious: give the AI a URL and let it figure out everything else.

Not mock data. Not hardcoded selectors. Real selectors from real HTML.

Here’s what I wanted:

python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login

And the framework should:

Fetch the actual page

Analyze the HTML structure

Extract real, working selectors

Generate meaningful test cases

Save everything as a Cypress fixture

Then generate tests that use that data

Sounds impossible? I thought so too.

How It Actually Works

The magic happens in about 50 lines of Python:

def generate_test_data_from_url(url: str, requirements: list) -> tuple:
    # Step 1: Fetch the real page
    resp = requests.get(url, timeout=10, headers={'User-Agent': 'Mozilla/5.0'})
    html = resp.text[:5000] # First 5KB is usually enough

    # Step 2: Ask AI to analyze it
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    prompt = f"""Analyze this HTML and generate test data.

    URL: {url}
    HTML: {html}

    Return JSON with:
    - Real selectors from the HTML
    - Valid test case with working data
    - Invalid test case for error handling
    """

    # Step 3: Parse and save as fixture
    test_data = json.loads(llm.invoke(prompt).content)

    with open("cypress/fixtures/url_test_data.json", 'w') as f:
        json.dump(test_data, f, indent=2)

    return test_data

The AI doesn’t guess. It reads the actual HTML and extracts what’s really there.

Complete Workflow

What The AI Sees vs What It Returns

When I point it at a login page, here’s the actual flow:

Input: Just a URL

--url https://the-internet.herokuapp.com/login

What the AI analyzes:

<input type="text" id="username" name="username">
<input type="password" id="password" name="password">
<button type="submit" class="radius">Login</button>

What it generates:

{
  "url": "https://the-internet.herokuapp.com/login",
  "selectors": {
    "username": "#username",
    "password": "#password",
    "submit": "button[type='submit']"
  },
  "test_cases": [
    {
      "name": "valid_test",
      "username": "tomsmith",
      "password": "SuperSecretPassword!",
      "expected": "success"
    },
    {
      "name": "invalid_test", 
      "username": "wronguser",
      "password": "badpassword",
      "expected": "error"
    }
  ]
}

Real selectors. Actual test data. Zero manual inspection.

The Generated Test Uses It All

The framework then generates a Cypress test that consumes this fixture:

describe('Login Tests', function () {
    beforeEach(function () {
        cy.fixture('url_test_data').then((data) => {
            this.testData = data;
        });
    });

it('should login with valid credentials', function () {
        cy.visit(this.testData.url);
        const valid = this.testData.test_cases.find(tc => tc.name === 'valid_test');

        cy.get(this.testData.selectors.username).type(valid.username);
        cy.get(this.testData.selectors.password).type(valid.password);
        cy.get(this.testData.selectors.submit).click();

        cy.url().should('include', '/secure');
    });
    it('should show error with invalid credentials', function () {
        cy.visit(this.testData.url);
        const invalid = this.testData.test_cases.find(tc => tc.name === 'invalid_test');

        cy.get(this.testData.selectors.username).type(invalid.username);
        cy.get(this.testData.selectors.password).type(invalid.password);
        cy.get(this.testData.selectors.submit).click();

        cy.get('#flash').should('contain', 'invalid');
    });
});

Notice something? The selectors come from the fixture, not hardcoded in the test.

If the page changes, update the fixture. Tests stay clean.

Two Ways to Feed Data

Sometimes you already have test data. Maybe from a previous run. Maybe from your team’s shared fixtures.

So I added a second option:

# Option 1: AI analyzes live URL
python qa_automation.py "Test login" --url https://example.com/login

# Option 2: Use existing JSON file
python qa_automation.py "Test login" --data cypress/fixtures/my_data.json

Same test generation. Different data sources. Your choice.

The Part That Surprised Me

I expected the AI to find basic selectors. What I didn’t expect was how well it understood context.

When analyzing a registration form, it didn’t just find #email — it generated test data like:

Valid: testuser@example.com

Invalid: not-an-email

For password fields:

Valid: SecurePass123!

Invalid: 123 (too short)

The AI understood what kind of data each field expected. Not because I told it — because it read the HTML attributes, labels, and validation patterns.

The Gotcha: Fixtures Need function() Syntax

One thing tripped me up for hours. Cypress fixtures with this.testData require a specific pattern:

// WRONG - arrow functions don't have 'this'
describe('Test', () => {
    beforeEach(() => {
        cy.fixture('data').then((d) => { this.testData = d; }); // undefined!
    });
});

// RIGHT - function() preserves 'this'
describe('Test', function () {
    beforeEach(function () {
        cy.fixture('data').then((data) => { this.testData = data; });
    });

    it('works', function () {
        console.log(this.testData); // actual data!
    });
});

The framework now enforces this pattern in generated tests. Lesson learned the hard way.

What This Means For Your Workflow

Before:

Open page in browser

Inspect elements manually

Copy selectors to notepad

Create fixture JSON by hand

Write test using those selectors

Fix typos in selectors

Run test

Debug why selectors don’t work

After:

Run one command with URL

Framework handles the rest

That’s not an exaggeration. The 30-minute login test? Under 2 minutes now.

Try It Yourself

The framework is open source:

git clone https://github.com/user/cypress-natural-language-tests
cd cypress-natural-language-tests
pip install -r requirements.txt

Set your API key:

export OPENAI_API_KEY=your_key_here
export OPENROUTER_API_KEY=your_openrouter_api_key_here

Generate tests from any URL:

python qa_automation.py "Test the login form" --url https://the-internet.herokuapp.com/login

Check what it created:

cat cypress/fixtures/url_test_data.json
cat cypress/e2e/generated/*.cy.js

The Bigger Picture

We’re at an interesting moment in test automation. The tooling is getting smarter, but the real breakthrough isn’t replacing testers — it’s eliminating the tedious parts.

Finding selectors is tedious. Creating fixture files is tedious. Debugging why #submit-btn worked yesterday but not today is tedious.

Let AI handle tedious. Let humans handle important.

That’s the framework I’m building.

Follow for more AI + QA experiments:

GitHub: https://github.com/aiqualitylab/cypress-natural-language-tests.git