Eliana Lam

Posted on Nov 27, 2025 • Originally published at aws-user-group.com

Automated Testing using MCP & AI Agents

#testing #automation #ai #aws

Mariana Chow @ AWS Hong Kong Community Day 2025

Preparation and Planning

Foundation: Testing Data

Often overlooked but critical
Task management system (e.g., Jira): Use webhooks to store updates in AWS S3
Swagger documentation: Provides API specifications and parameters
Historical test cases: Allows verification and retesting of previous cases

Connecting Data Sources

Traditional methods have limited capability to find relationships
Introduction of Large Language Models (LLMs) to bridge connections
Example:
Mariana (AWS career in cloud computing with AI)
Dario (co-founder of Entropic, an AI company)
Entropic developed a series of LLMs available through AWS Bedrock

Execution

AI-Driven Testing Workflow

Leveraging AI to automate and enhance testing processes
Detailed dive into how AI integrates with existing testing frameworks

Reporting

Comprehensive Reporting

Generating insightful reports from automated tests
Utilizing AI to provide actionable insights and recommendations
Emphasis on transforming traditional testing workflows with AI
Invitation for feedback and discussion on the presented ideas

Understanding Relationships with Knowledge Graphs

Knowledge graphs capture complex relationships that simple text searches cannot
Example: Relationship between Mariana (AWS career) and Entropic (AI company founded by Dario)
Knowledge graph reveals hidden connections not found by simple searches

Importance of Knowledge Graphs in Automated Testing

Crucial for automated task and test case generation
Extracts entities and connects related items from data sources
Example:
Requirement tickets
API specifications
Historical test cases

Creating a Knowledge Graph

Traditional methods (e.g., PostgreSQL) are widget-heavy and low visibility
No-code solutions (e.g., DynamoDB) are inefficient
Graph databases (e.g., AWS Neptune, Neo4j) are better but require learning new concepts (nodes, edges, properties)

Solution: Amazon Neptune Analytics

Combines graph database capabilities with foundational models and AWS Bedrock
Allows insertion and retrieval of information without complex syntax
Provides a beautiful graph view for data visualization

Graph Retrieval Augmented (RA)

Technique where AI retrieves reference documents and generates responses
Graph RA uses a knowledge graph for retrieval instead of simple text search

AI-Driven Test Pipeline

Built on top of the knowledge graph foundation
Uses Behavior-Driven Development (BDD) with Gherkin format for test cases
Gherkin scenarios use keywords like Given, When, and Then to specify the initial state, action, and expected outcome of a feature
Example test case: Checking homepage titles with prerequisites and scenarios
Knowledge graphs are essential for effective automated testing
Amazon Neptune Analytics simplifies graph database management and visualization
Graph RA enhances AI-driven test case generation and execution

AI-Driven Test Case Generation

Goal

Improve test case coverage and consistency using AI agents

AI Agent Capabilities

Chooses the best actions to perform and achieve testing goals
Analyzes business flows and reads requirements from the knowledge graph

Example:

Subscription Management Feature

Business Flow Analysis

Knowledge graph identifies:
Validation of payment method in the UI
Calling payment APIs

Conflict Detection

AI agent detects requirement conflicts:
Old rule: User must verify email before accessing premium features
New rule: Trial users can access premium features for seven days without email verification
Updates related test cases accordingly

API Details Discovery

Extracts endpoints, required/optional parameters, and error responses from the graph DB
Identifies API dependencies through recorded data (e.g., successful subscription creation, payment failure)

Test Data Generation

Creates test data covering:
Happy flow (successful scenarios)
Edge cases (boundary conditions)
Error conditions (failure scenarios)

Conclusion

AI agents enhance test case generation by:
Analyzing business flows
Detecting conflicts and updating test cases
Discovering API details and dependencies
Generating comprehensive test data

Refining and Executing Test Cases

Refining Test Cases with Business Rules

Use information from business flow analysis and identified rules to enhance scenarios
Convert refined scenarios into scenario-based test cases

Human-in-the-Loop Verification

AI can generate comprehensive test cases, but human experts are needed for validation
Human verification ensures edge cases and business contexts are captured
Jira ticket system, API documentation, and historical test cases may not cover all scenarios

Execution with Playwright

Playwright: Fast and reliable end-to-end testing
Playwright supports all modern rendering engines including Chromium, WebKit, and Firefox. Cross-platform. Test on Windows, Linux, and macOS, locally
Utilize Playwright for test execution
Well-known, open-source automated testing framework
Communicates directly with the browser for efficient testing
Supports multiple languages (Java, Python, C, JavaScript, TypeScript)
Strong community support with 78.9K stars on GitHub
Native support for MCP (Model-Based Continuous Planning) for natural language instructions
Built-in features:
Parallel execution for efficiency
Comprehensive tracing and debugging capabilities
Automated screenshot and video recording
Detailed logging for documenting the testing process and enabling later verification

Conclusion

AI enhances test case generation, but human verification is crucial
Playwright offers efficient, versatile, and feature-rich test execution

Executing Test Cases with AI Agents

Task Executor Agent

AI agent for handling the execution phase
Extracts test parameters from generated tasks
Creates and stores Playwright scripts in the database for future reference
Automatically executes tests

Examples of AI-Driven Testing with Playwright

Front-end and back-end testing
Experiment using Playwright MCP with Amazon Q developer CLI
Three test cases for a simple e-commerce website:
Purchasing a product with shipment information and completing the order process
Logging in and entering the product page without further action
Adding a product to the shopping cart without further action

Post-Execution Verification

Test cases stored in the database
Utilization of features like video recording for later verification
Testers can review recorded videos to ensure alignment with expected behavior

Recap of Complete Test Execution Flow

Three AI agents work together:
Test case generator
Test case executor
Report generator (captures results and recordings)
Modularized approach provides flexibility and scalability

Leveraging Multiple MCPs

Use various MCPs for different testing needs:
MySQL MCP for generating realistic data
Redis MCP for storing recently used test cases
Simplifies testing processes

Key Message: Shift-Left Testing

Integrate automated testing early in the development cycle
Benefits:
Better quality
Improved efficiency
Lower costs
Reduced technical debt

Conclusion

AI-driven automated testing saves time and cost
Testers focus on verification, not test case generation

Top comments (1)

WalkingTree Technologies • Nov 27 '25

Great write-up! MCP combined with AI agents is one of the most promising patterns we’ve seen recently for automated testing and QA orchestration. Once agents can access tools through a protocol layer like MCP, test flows become much easier to design, execute, and debug - especially when reasoning, tool calls, and logs are all visible in one place.

At WalkingTree Technologies, we’ve been exploring similar patterns for enterprise QA workflows. Have you tried using MCP for multi-agent pipelines (like a test-planner agent → execution agent → reviewer/analysis agent)? It’s showing interesting potential in our experiments.