Paul Coles

Posted on May 29 • Edited on Jun 3

AI-Assisted Testing: A Lifeline in the Waterfall-Sprint Hybrid Chaos

#mcp #testing #agile

TL;DR for Busy Testers

In 30 seconds: We're drowning in a 5:1 developer-to-tester ratio with massive specs and fake sprints. AI-assisted test generation isn't fixing our broken process, but it's buying us time to breathe, test what matters, and build evidence for real change.

Who This Is For

Test leads drowning in work with insufficient resources
QA managers trying to justify better processes to leadership.
Teams stuck in "Wagile" looking for practical survival tactics.
Anyone who's been told to "just automate everything".

Key Terms

MCP (Model Context Protocol): Anthropic's system for connecting LLMs to your tools
Wagile: Waterfall pretending to be Agile
Horizontal Slicing: Building all of one layer before moving to the next
TAF: Test Automation Framework (your team's specific patterns)

Part 1: The Problem

The Reality Check: When Agile Isn't Really Agile

You're calling it "Agile," but it seems more like waterfall dressed as a sprint. You're familiar with the symptoms:

Massive specifications that could double as doorstops.
Sprint planning ignores testing capacity
Developers pack the sprint based on their bandwidth
Features arrive in testing two sprints late
"Just automate it all!" echoes from above

Does this Sound familiar?

The Tick-Tock Death March

Here's how it plays out in this horizontal slice world:

Sprint 1: Developers put in place package logic. Everything looks great in isolation.

Sprint 2: The team builds mutual exclusion rules. Still seems fine — packages conflict as expected.

Sprint 4: The basket logic finally arrives. Out of the blue, nothing works properly.

The Problem: You can't test the configurator end-to-end because it's not a vertical slice. The packages were stubbed when the basket didn't exist. Now the basket exists, but the stubs are wrong. You're finding integration problems four sprints late. At the same time, you're testing the new features for Sprint 5.

Meanwhile, the testing tick-tock continues:

Tick: Developers grab work from the massive spec, packing the sprint. (After all, developers have to code; it can't be about flow, it has to be about being busy.) Code flies, and pull requests pile up.

Tock: Testers rush to handle half the tasks from Sprint 1. Meanwhile, they are also working on Sprint 2. You might be examining the specs in detail for the first time. After all, it's hard to remember 20,000 words in a 60-minute walk-through. You're checking a Figma, is it the right one? Is it up-to-date.

Tick: Developer work increases. The testing from the last sprint isn't done, and new features keep arriving.

Tock: The deadline looms, and you're seeing features for the first time, two sprints late. Is it correct? Is it good? Who knows — there's no time to find out.

Case Study: The Car Configurator Complexity

Imagine testing a system where:

You can choose individual options (leather seats, sunroof, premium audio).
OR you can choose packages (Sport Pack, Luxury Pack, Winter Pack).
BUT packages have mutual exclusions (can't have Sport and Eco pack).
AND some individual options conflict with packages.
PLUS pricing changes based on combinations.

Part 2: Why Traditional Solutions Fail

"Just Automate Everything!"

As testers, we hear management say "automate everything!" But here's what they don't see...

Reading specifications for the first time as testing starts.
Checking the implementation to ensure it matches the Figma designs.
Writing test cases while executing them.
Dealing with the backlog of "completed" work that we have never tested.

It's like someone telling you to build a ladder while you fall down a cliff.

The Hidden Multipliers: Why It's Even Worse Than You Think

Beyond the automation dream, we face chaos that worsens the testing crisis:

The Revolving Door Problem

Team members change all the time, but no one tells you who has left or joined.
New faces pop up in stand-ups without any introductions.
Knowledge walks away, taking important context with it.
You’re left figuring out who does what by trial and error.

"Agile" Theatre

Detailed specs show up fully formed (hello, waterfall!).
Stand-ups turn into 15-minute individual status reports.
The term "collaboration" appears in slides but not in practice.
You haven’t seen a retrospective in months (do they even happen?).

The Onboarding Black Hole

"Here's your laptop. Good luck!"
No team directory, no architecture overview, no transfer of domain knowledge just videos about something, but you're not sure, they never say.
You learn by osmosis and asking the same questions many times.
Six weeks in, you’re still uncovering critical systems you should test.
There's the automation system without a read.me, the word doc doesn't work, you have to work out what packages it needs through trial and error.

Process? What process?

UI discrepancies pop up during testing (surprise!).
Requirements live in Confluence, Jira, Slack, and someone's head.
We design interactions in Figma, but which one, and is it up to date?
- Every day you ask "Should it have this?"
Feedback loops are so long they feel like feedback spirals.
"We will improve the process after this release" (spoiler: they won't).

Then management wonders why adding more testers doesn't help. A week later, the new hires are still catching up. They are reading specs, watching meeting recordings, and trying to grasp the odd "legacy service that sometimes fails."

We can’t automate our way out – we can’t even communicate our way in.

Part 3: Enter AI-Assisted Testing

Not a Silver Bullet, but a Pressure Release Valve

AI-assisted test generation isn't about achieving testing nirvana; it's about survival.

"Can't I just use Copilot or a public Large Language Model (LLM) for this?"

You absolutely can. Tools like Copilot or direct ChatGPT interaction are easy to access. They can read open pages or take pasted specifications. This makes them quick for generating simple test cases or brainstorming ideas. But, they come with significant drawbacks in a production testing environment:

High Hallucination Risk: General-purpose LLMs don't have specific context about your Jira tickets, Confluence specs, or your team's test patterns. They can generate test cases that sound good but are often wrong, irrelevant, or made up.

No Integration: They don't work directly with your project management or documentation tools. This leads to a lot of manual copy-pasting of context.

No Learning from Your Patterns: They won't adapt to your team's test automation framework (TAF) step patterns. This leads to inconsistent tests that are harder to maintain.

This is the specific area where MCP demonstrates its strengths. It aims to close that gap by offering a more reliable and context-aware solution.

From Spec to Test in Minutes, Not Days

When you check the configurator section in the spec, you won't waste hours writing test cases. Instead, you extract rules from the spec:

"Sport Pack includes premium audio."
"Winter Pack excludes Summer Pack."
"Electric engine can't have a Sport Pack."

From Figma, you get the process flow and state changes:

"If you pick wood trim and change to Summer Pack, then the system shows an 'are you sure' prompt."

AI creates test permutations. This lets you focus on the key question: Does this look like what's in Figma? Also, does it make sense for users?

Catch-Up is Possible: If you notice features two sprints late, AI-generated tests can help you evidence your testing well.
Documentation Comes from Chaos: The AI-generated tests create the documentation that was missing. New team members (or you, three sprints later) can understand what the feature does.
Some Automation Is Better Than None: You can't automate all tasks, but AI can ease the heavy load of configuration testing.

Part 4: Implementation Reality

Building Team Capability, Not Replacing It

Janet Gregory and Lisa Crispin emphasise that quality is woven into the fabric of what we do. AI doesn't weave that fabric — teams do. This tool handles the mundane threading so we can focus on the patterns.

Yes, our process is broken. But at least we have those large specs - we know what we're supposed to be doing. Our stories mostly have acceptance criteria (though when there are 300 changes and 2 ACs, something's wrong).

The point is: AI gives us breathing room to work on the real problem - the cultural shift toward continuous quality.

How MCP Works (In Plain English)

Think of MCP as a translator between your project tools and AI:

It reads your Jira tickets and Confluence specs
It understands your team's test patterns from examples
It generates tests that match your style, not generic templates

Technical detail: It runs in a Docker container and connects via API tokens. (See appendix for setup details)

Here's a real example. The AI produced a car configurator component. This includes package conflicts and mutual exclusions based on the requirements.

There's less risk of it hallucinating because the MCP keeps it on track with your specific project context. It may also spot things you haven't.

Feature: Car Configurator Package Management

  Background:
    Given I am on the car configurator page
    And I have selected a "Hatch Back" model
    And the configurator displays available packages and options

  Scenario: Sport Package conflicts with Eco Package selection
    Given I have selected the "Sport Package" containing:
      | Option            | Price  |
      | Performance Tyres | £800   |
      | Sport Suspension  | £1,200 |
      | Sport Exhaust     | £600   |
    And the total package price is "£2,600"
    When I attempt to select the "Eco Package"
    Then I should see a conflict warning stating: "Eco Package cannot be combined with Sport Package."
    And the "Eco Package" option should be disabled
    And my current selection should remain "Sport Package"
    And the basket total should remain "£2,600"

The Survival Guide: Making It Work

1. Start Where You Are

Don't wait for the perfect process. If you're overwhelmed by complexity, let AI create test cases. You can then check the critical paths yourself.

2. Use It for Comprehension, Not Just Coverage

When you first see that spec section, use AI to quickly generate scenarios. This helps you understand the feature faster — what are all the combinations? What are the edge cases?

3. Focus Your Human Effort

With AI handling the combinatorial explosion, you can focus on:

Does this match the Figma designs?
Do the mutual exclusions make sense to users?
What happens when rules conflict?
Is this actually valuable to customers?

Quick Start for Desperate Teams

First: Check if you're allowed to use AI tools (Really. Ask security. Using unapproved tools with company data is a career-limiting move.)
Pick your most complex feature with multiple rules.
Set up MCP with your Jira/Confluence (see setup guide).
Generate test cases for just that feature.
Compare time spent vs. manual creation.
Use the time saved for exploratory testing.
- Document what you find!

Part 5: The Honest Path Forward

What This Fixes (and What It Doesn't)

Let's be real — AI won't fix your broken process. You still have:

Too much WIP (work in progress).
Artificial sprint boundaries destroying flow.
A 5:1 developer-to-tester ratio that guarantees bottlenecks.
Specifications that arrive fully formed rather than iteratively.

But AI can make an unsustainable situation slightly more bearable. It buys you time to demonstrate the value of comprehensive testing and build the case for the process changes you really need.

Build Evidence for Change

Plan to measure:

Test case creation time (target: 70% reduction).
Edge case coverage (how many combinations would you have missed?).
Time freed for exploratory testing.
Critical issues found with that freed time.

This evidence will help justify the process improvements we desperately need.

The Honest Truth

AI-assisted testing is a lifesaver, not the solution itself. It's like using a better bucket on a ship that’s taking on water. You can bail out faster, but it doesn’t solve the main issue.

AI helps make the case for proper testing. It shows what happens when your testers aren't overwhelmed. And it makes it clear why you need to change up your process.

You should fix the process first, but that is not how it works. In reality, AI-assisted testing is about staying afloat and keeping your sanity. It frees up time to focus on quality, rather than going through the motions.

For instance, AI can create a huge test suite in a few minutes – that's a massive help.

Part 6: Getting Started (Yes, Really)

What You'll Need

Docker (or someone who can install it for you)
Jira/Confluence access with API permissions
About 30 minutes when no one's pinging you
A complex feature to test (you've got plenty)

The 10-Minute Version

Pull the MCP-Atlassian Docker image
Create API tokens in Atlassian
Set up your .env file with credentials
Connect to Amazon Q
Work with Amazon Q to add some rules to guide it
Ask it about your most painful feature

What Success Looks Like

Within an hour, you should be generating test cases that:

Actually match your testing patterns
Cover edge cases you'd miss at 5 PM on a Friday
Make sense to other team members (unlike that legacy automation)

Common Gotchas

If you set up a corporate proxy during onboarding, you might need to build the docker image yourself. Make sure to include your certs.
API tokens expire. Set a calendar reminder.
Start small. One feature. Prove the value.

Full setup guide: https://dev.to/paul_coles_633f698b10fd6e/ai-assisted-testing-a-survival-guide-to-implementing-mcp-with-atlassian-tools-2gnm

DEV Community