Laura Hannah

Posted on May 28

The Role of QA in the New AI SDLC

#ai #testing #codequality #sdlc

QA’s role in the new AI SDLC is no longer just “test the finished application.”

It is becoming quality engineering across the entire lifecycle:

Requirements
Prompts
Data
Models
Generated code
Automation
Deployment
Monitoring
Governance
Production feedback

The big shift is this:

Old SDLC QA: Does the software meet the requirements?

New AI SDLC QA: Can we trust the system, the AI-generated work, the data, the model behavior, and the delivery process — repeatedly, safely, and measurably?

AI does not eliminate QA.

It makes strong QA leadership more important.

The AI SDLC Quality Loop

For a first pass on dev.to, I would use a simple text diagram rather than Mermaid. It is safer for copy/paste into the dev.to/new editor and avoids renderer surprises.

Business Need / Product Idea
        ↓
Requirements + Risk Definition
        ↓
Spec-Driven Development
        ↓
Prompt / Agent / Workflow Design
        ↓
AI-Assisted Code + Test Generation
        ↓
Human Review + Automated Testing
        ↓
CI/CD Quality Gates
        ↓
Deployment
        ↓
Production Monitoring
        ↓
Feedback, Drift, Incidents, Metrics
        ↺ loops back into Requirements + Risk Definition

QA is not sitting at the end of this flow.

QA influences the entire loop:

QA / Quality Engineering
        ↳ Requirements
        ↳ Specs
        ↳ Prompts and agents
        ↳ Generated code
        ↳ Automated tests
        ↳ CI/CD quality gates
        ↳ Production monitoring
        ↳ Feedback and improvement
        ↳ Governance and audit evidence

1. Requirements and Risk Definition

QA should be involved before code exists.

For AI-enabled systems, requirements need to include not just functional behavior, but also risk, trust, and guardrails.

QA helps define:

What “good” output looks like
What “bad” output looks like
What the AI must never do
What needs human approval
What needs automated validation
What risks need mitigation
What security, privacy, compliance, bias, hallucination, and explainability concerns need to be addressed

This is one of the most important changes in the AI SDLC.

QA cannot wait until the end of the process and then try to test quality into the system. The quality strategy has to start at the beginning.

2. Spec-Driven Development and Testable Intent

In an AI SDLC, the specification becomes more important, not less.

If AI agents or copilots are generating code, tests, documentation, or workflows, then QA needs to help make the specification precise enough that AI can generate useful output.

QA should push for:

Clear business rules
Examples
Counterexamples
Edge cases
Negative scenarios
Test data assumptions
Explicit quality gates
Traceability from requirement to evidence

A useful traceability chain looks like this:

Requirement → Prompt/Spec → Generated Code → Tests → Evidence

This is where QA becomes a system designer of correctness, not just a defect finder.

3. Prompt, Agent, and Workflow Validation

Many engineering teams are now using tools like Claude Code, GitHub Copilot, Cursor, ChatGPT, and internal AI agents to generate or modify software artifacts.

That means QA also needs to help test the prompts, skills, conventions, and workflows themselves.

QA should validate whether AI workflows:

Produce consistent results
Follow architecture standards
Generate useful and maintainable tests
Avoid hallucinated APIs or false assumptions
Respect security and data-handling rules
Handle edge cases
Fit repository conventions
Produce code that compiles, runs, and behaves correctly

For AI QE Architects, this is a major opportunity.

A strong QA function can create reusable prompts, skills, conventions, documentation, and evaluation checks so teams generate better software and better tests consistently.

4. AI-Assisted Test Generation, But With Review

AI can generate a lot of tests quickly.

That is useful.

It is also risky if nobody checks whether those tests are meaningful.

QA’s role is to make sure AI-generated tests are:

Relevant
Deterministic where possible
Maintainable
Properly scoped
Not just happy-path coverage
Connected to real business risk
Running reliably in CI/CD
Producing evidence that humans can trust

The trap is believing this:

More tests automatically means better quality.

It does not.

QA needs to guard against shallow, duplicated, brittle, or misleading AI-generated tests.

The goal is not just volume. The goal is useful coverage, meaningful validation, and trustworthy release evidence.

5. Data Quality and Model Behavior

For systems using machine learning, large language models, recommendations, classification, scoring, summarization, or prediction, QA now has to care about data and model behavior too.

That includes:

Test data quality
Training and evaluation data assumptions
Bias and representativeness
Regression sets for model behavior
Prompt-response evaluation
Golden datasets
Drift detection
Accuracy
Precision
Recall
False positives
False negatives
Task-specific scoring
Human review workflows

Traditional software tests usually ask whether the code follows deterministic rules.

AI systems often require a broader question:

Is the behavior acceptable, safe, and reliable across the kinds of real-world inputs the system will receive?

That requires evaluation strategy, monitoring, and human judgment.

6. CI/CD Quality Gates

QA should help define automated gates that prevent bad AI-generated or AI-enabled changes from reaching production.

Examples include:

Unit tests
API tests
UI tests
Integration tests
Contract tests
End-to-end tests
Static analysis
Dependency scans
Security scans
Prompt evaluation suites
LLM response regression checks
Accessibility checks
Performance checks
Synthetic production checks
Test coverage thresholds
Code review rules for AI-generated code
Required release evidence before deployment

The goal is not to slow everyone down.

The goal is to make fast delivery safe.

This is especially important when AI increases the speed at which teams can produce code.

Faster generation without stronger quality gates simply accelerates risk.

7. Production Monitoring and Feedback Loops

AI systems can degrade after release because the world around them changes.

Things that can change include:

Data
User behavior
Prompts
Models
Third-party APIs
Business expectations
Security threats
Regulatory expectations

QA therefore needs to stay involved after release through:

Observability
Defect trend analysis
Model and prompt performance monitoring
Data drift checks
Behavior drift checks
User feedback review
Incident analysis
Continuous improvement of test suites
Release quality metrics

This is one of the biggest mindset shifts:

Production becomes part of the test strategy.

In the AI SDLC, testing does not stop at deployment.

Production behavior becomes a source of quality information that feeds back into requirements, specs, tests, prompts, and governance.

8. Governance and Auditability

AI creates a new need for evidence.

QA can own or strongly influence the evidence trail.

That means documenting:

What was tested
What model, prompt, or version was used
What data was used
What risks were considered
What human approvals occurred
What known limitations remain
What monitoring is in place
Why the release was considered acceptable

This matters in regulated environments, but it also matters for any company trying to use AI responsibly.

Governance is not just paperwork.

Good governance helps teams prove that they understood the risks, tested the right things, and made informed release decisions.

The New QA Title Is Closer to “Quality Architect”

In the AI SDLC, QA becomes less about manual validation at the end and more about designing a trustworthy delivery system.

Area	QA / QE Responsibility
Product idea	Identify quality risks early
Requirements	Make requirements testable, measurable, and risk-aware
Specs	Add examples, counterexamples, edge cases, and acceptance criteria
Prompts / agents	Validate consistency, correctness, guardrails, and failure modes
Generated code	Review AI-generated code for correctness, maintainability, and standards
Test automation	Generate, review, scale, and govern automated tests
Data / model quality	Validate datasets, model behavior, drift, and evaluation metrics
CI/CD	Build quality gates into pipelines
Deployment	Require release evidence before production
Production	Monitor quality after release
Governance	Preserve traceability, audit evidence, approvals, and known limitations

Traditional QA vs. AI SDLC QA

Traditional QA

Requirements → Code → Test → Release

Traditional QA often enters late and asks:

Does the software meet the requirements?

AI SDLC QA

Risk → Spec → Prompt → Generated Code → Test → Gate → Monitor → Improve

AI SDLC QA enters early and keeps asking:

How do we know this is correct, safe, maintainable, observable, and fit for purpose?

The Plainspoken Version

QA is becoming the group that answers:

How do we know this AI-assisted system is correct, safe, maintainable, observable, and fit for purpose?

That is a much bigger role than traditional testing.

It is also a huge opportunity for experienced QA architects, because AI makes weak engineering processes worse and strong engineering processes faster.

QA’s job is to make sure the organization gets the second outcome, not the first.

Bottom Line

In the new AI SDLC, QA is not just testing software.

QA is helping the organization build systems that are:

Correct
Safe
Trustworthy
Maintainable
Observable
Governed
Measurable
Ready for production
Continuously improving

AI does not replace QA. AI makes strong QA leadership more important.

References

These references are useful for grounding this model of QA in the AI SDLC.

NIST AI Risk Management Framework 1.0

NIST provides a practical framework for thinking about AI risk through governance, mapping, measurement, and management.

Useful for supporting the role of QA in risk definition, measurement, monitoring, governance, and lifecycle accountability.

https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf

Google Cloud: MLOps Continuous Delivery and Automation Pipelines in Machine Learning

Google Cloud’s MLOps guidance explains why machine learning systems require CI/CD, continuous training, automation, monitoring, and production feedback loops.

Useful for supporting the idea that AI quality is not a one-time testing event.

https://docs.cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Google Cloud: Practitioners Guide to MLOps

This guide provides a broader view of operationalizing ML systems, including lifecycle practices, automation, monitoring, and production readiness.

Useful for grounding QA’s role in end-to-end ML system quality.

https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf

Microsoft Responsible AI Standard

Microsoft’s Responsible AI Standard provides concrete requirements for building AI systems responsibly.

Useful for supporting governance, accountability, transparency, reliability, safety, fairness, privacy, and inclusive design considerations.

https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Microsoft-Responsible-AI-Standard-General-Requirements.pdf

OWASP Top 10 for LLM Applications

OWASP identifies major security risks for LLM applications, including prompt injection, insecure output handling, training data poisoning, sensitive information disclosure, and supply-chain vulnerabilities.

Useful for supporting QA involvement in LLM-specific security and quality risks.

https://genai.owasp.org/llm-top-10/