DEV Community

Cover image for The Role of QA in the New AI SDLC
Laura Hannah
Laura Hannah

Posted on

The Role of QA in the New AI SDLC

QA’s role in the new AI SDLC is no longer just “test the finished application.”

It is becoming quality engineering across the entire lifecycle:

  • Requirements
  • Prompts
  • Data
  • Models
  • Generated code
  • Automation
  • Deployment
  • Monitoring
  • Governance
  • Production feedback

The big shift is this:

Old SDLC QA: Does the software meet the requirements?

New AI SDLC QA: Can we trust the system, the AI-generated work, the data, the model behavior, and the delivery process — repeatedly, safely, and measurably?

AI does not eliminate QA.

It makes strong QA leadership more important.


The AI SDLC Quality Loop

For a first pass on dev.to, I would use a simple text diagram rather than Mermaid. It is safer for copy/paste into the dev.to/new editor and avoids renderer surprises.

Business Need / Product Idea
        ↓
Requirements + Risk Definition
        ↓
Spec-Driven Development
        ↓
Prompt / Agent / Workflow Design
        ↓
AI-Assisted Code + Test Generation
        ↓
Human Review + Automated Testing
        ↓
CI/CD Quality Gates
        ↓
Deployment
        ↓
Production Monitoring
        ↓
Feedback, Drift, Incidents, Metrics
        ↺ loops back into Requirements + Risk Definition
Enter fullscreen mode Exit fullscreen mode

QA is not sitting at the end of this flow.

QA influences the entire loop:

QA / Quality Engineering
        ↳ Requirements
        ↳ Specs
        ↳ Prompts and agents
        ↳ Generated code
        ↳ Automated tests
        ↳ CI/CD quality gates
        ↳ Production monitoring
        ↳ Feedback and improvement
        ↳ Governance and audit evidence
Enter fullscreen mode Exit fullscreen mode

1. Requirements and Risk Definition

QA should be involved before code exists.

For AI-enabled systems, requirements need to include not just functional behavior, but also risk, trust, and guardrails.

QA helps define:

  • What “good” output looks like
  • What “bad” output looks like
  • What the AI must never do
  • What needs human approval
  • What needs automated validation
  • What risks need mitigation
  • What security, privacy, compliance, bias, hallucination, and explainability concerns need to be addressed

This is one of the most important changes in the AI SDLC.

QA cannot wait until the end of the process and then try to test quality into the system. The quality strategy has to start at the beginning.


2. Spec-Driven Development and Testable Intent

In an AI SDLC, the specification becomes more important, not less.

If AI agents or copilots are generating code, tests, documentation, or workflows, then QA needs to help make the specification precise enough that AI can generate useful output.

QA should push for:

  • Clear business rules
  • Examples
  • Counterexamples
  • Edge cases
  • Negative scenarios
  • Test data assumptions
  • Explicit quality gates
  • Traceability from requirement to evidence

A useful traceability chain looks like this:

Requirement → Prompt/Spec → Generated Code → Tests → Evidence
Enter fullscreen mode Exit fullscreen mode

This is where QA becomes a system designer of correctness, not just a defect finder.


3. Prompt, Agent, and Workflow Validation

Many engineering teams are now using tools like Claude Code, GitHub Copilot, Cursor, ChatGPT, and internal AI agents to generate or modify software artifacts.

That means QA also needs to help test the prompts, skills, conventions, and workflows themselves.

QA should validate whether AI workflows:

  • Produce consistent results
  • Follow architecture standards
  • Generate useful and maintainable tests
  • Avoid hallucinated APIs or false assumptions
  • Respect security and data-handling rules
  • Handle edge cases
  • Fit repository conventions
  • Produce code that compiles, runs, and behaves correctly

For AI QE Architects, this is a major opportunity.

A strong QA function can create reusable prompts, skills, conventions, documentation, and evaluation checks so teams generate better software and better tests consistently.


4. AI-Assisted Test Generation, But With Review

AI can generate a lot of tests quickly.

That is useful.

It is also risky if nobody checks whether those tests are meaningful.

QA’s role is to make sure AI-generated tests are:

  • Relevant
  • Deterministic where possible
  • Maintainable
  • Properly scoped
  • Not just happy-path coverage
  • Connected to real business risk
  • Running reliably in CI/CD
  • Producing evidence that humans can trust

The trap is believing this:

More tests automatically means better quality.

It does not.

QA needs to guard against shallow, duplicated, brittle, or misleading AI-generated tests.

The goal is not just volume. The goal is useful coverage, meaningful validation, and trustworthy release evidence.


5. Data Quality and Model Behavior

For systems using machine learning, large language models, recommendations, classification, scoring, summarization, or prediction, QA now has to care about data and model behavior too.

That includes:

  • Test data quality
  • Training and evaluation data assumptions
  • Bias and representativeness
  • Regression sets for model behavior
  • Prompt-response evaluation
  • Golden datasets
  • Drift detection
  • Accuracy
  • Precision
  • Recall
  • False positives
  • False negatives
  • Task-specific scoring
  • Human review workflows

Traditional software tests usually ask whether the code follows deterministic rules.

AI systems often require a broader question:

Is the behavior acceptable, safe, and reliable across the kinds of real-world inputs the system will receive?

That requires evaluation strategy, monitoring, and human judgment.


6. CI/CD Quality Gates

QA should help define automated gates that prevent bad AI-generated or AI-enabled changes from reaching production.

Examples include:

  • Unit tests
  • API tests
  • UI tests
  • Integration tests
  • Contract tests
  • End-to-end tests
  • Static analysis
  • Dependency scans
  • Security scans
  • Prompt evaluation suites
  • LLM response regression checks
  • Accessibility checks
  • Performance checks
  • Synthetic production checks
  • Test coverage thresholds
  • Code review rules for AI-generated code
  • Required release evidence before deployment

The goal is not to slow everyone down.

The goal is to make fast delivery safe.

This is especially important when AI increases the speed at which teams can produce code.

Faster generation without stronger quality gates simply accelerates risk.


7. Production Monitoring and Feedback Loops

AI systems can degrade after release because the world around them changes.

Things that can change include:

  • Data
  • User behavior
  • Prompts
  • Models
  • Third-party APIs
  • Business expectations
  • Security threats
  • Regulatory expectations

QA therefore needs to stay involved after release through:

  • Observability
  • Defect trend analysis
  • Model and prompt performance monitoring
  • Data drift checks
  • Behavior drift checks
  • User feedback review
  • Incident analysis
  • Continuous improvement of test suites
  • Release quality metrics

This is one of the biggest mindset shifts:

Production becomes part of the test strategy.

In the AI SDLC, testing does not stop at deployment.

Production behavior becomes a source of quality information that feeds back into requirements, specs, tests, prompts, and governance.


8. Governance and Auditability

AI creates a new need for evidence.

QA can own or strongly influence the evidence trail.

That means documenting:

  • What was tested
  • What model, prompt, or version was used
  • What data was used
  • What risks were considered
  • What human approvals occurred
  • What known limitations remain
  • What monitoring is in place
  • Why the release was considered acceptable

This matters in regulated environments, but it also matters for any company trying to use AI responsibly.

Governance is not just paperwork.

Good governance helps teams prove that they understood the risks, tested the right things, and made informed release decisions.


The New QA Title Is Closer to “Quality Architect”

In the AI SDLC, QA becomes less about manual validation at the end and more about designing a trustworthy delivery system.

Area QA / QE Responsibility
Product idea Identify quality risks early
Requirements Make requirements testable, measurable, and risk-aware
Specs Add examples, counterexamples, edge cases, and acceptance criteria
Prompts / agents Validate consistency, correctness, guardrails, and failure modes
Generated code Review AI-generated code for correctness, maintainability, and standards
Test automation Generate, review, scale, and govern automated tests
Data / model quality Validate datasets, model behavior, drift, and evaluation metrics
CI/CD Build quality gates into pipelines
Deployment Require release evidence before production
Production Monitor quality after release
Governance Preserve traceability, audit evidence, approvals, and known limitations

Traditional QA vs. AI SDLC QA

Traditional QA

Requirements → Code → Test → Release
Enter fullscreen mode Exit fullscreen mode

Traditional QA often enters late and asks:

Does the software meet the requirements?

AI SDLC QA

Risk → Spec → Prompt → Generated Code → Test → Gate → Monitor → Improve
Enter fullscreen mode Exit fullscreen mode

AI SDLC QA enters early and keeps asking:

How do we know this is correct, safe, maintainable, observable, and fit for purpose?


The Plainspoken Version

QA is becoming the group that answers:

How do we know this AI-assisted system is correct, safe, maintainable, observable, and fit for purpose?

That is a much bigger role than traditional testing.

It is also a huge opportunity for experienced QA architects, because AI makes weak engineering processes worse and strong engineering processes faster.

QA’s job is to make sure the organization gets the second outcome, not the first.


Bottom Line

In the new AI SDLC, QA is not just testing software.

QA is helping the organization build systems that are:

  • Correct
  • Safe
  • Trustworthy
  • Maintainable
  • Observable
  • Governed
  • Measurable
  • Ready for production
  • Continuously improving

AI does not replace QA. AI makes strong QA leadership more important.


References

These references are useful for grounding this model of QA in the AI SDLC.

NIST AI Risk Management Framework 1.0

NIST provides a practical framework for thinking about AI risk through governance, mapping, measurement, and management.

Useful for supporting the role of QA in risk definition, measurement, monitoring, governance, and lifecycle accountability.

https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf

Google Cloud: MLOps Continuous Delivery and Automation Pipelines in Machine Learning

Google Cloud’s MLOps guidance explains why machine learning systems require CI/CD, continuous training, automation, monitoring, and production feedback loops.

Useful for supporting the idea that AI quality is not a one-time testing event.

https://docs.cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Google Cloud: Practitioners Guide to MLOps

This guide provides a broader view of operationalizing ML systems, including lifecycle practices, automation, monitoring, and production readiness.

Useful for grounding QA’s role in end-to-end ML system quality.

https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf

Microsoft Responsible AI Standard

Microsoft’s Responsible AI Standard provides concrete requirements for building AI systems responsibly.

Useful for supporting governance, accountability, transparency, reliability, safety, fairness, privacy, and inclusive design considerations.

https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Microsoft-Responsible-AI-Standard-General-Requirements.pdf

OWASP Top 10 for LLM Applications

OWASP identifies major security risks for LLM applications, including prompt injection, insecure output handling, training data poisoning, sensitive information disclosure, and supply-chain vulnerabilities.

Useful for supporting QA involvement in LLM-specific security and quality risks.

https://genai.owasp.org/llm-top-10/

ISO/IEC 42001:2023 AI Management System

ISO/IEC 42001 defines an AI management system standard for organizations that develop, provide, or use AI systems.

Useful for supporting auditability, governance, accountability, lifecycle management, and continuous improvement.

https://www.iso.org/standard/42001

Top comments (0)