DEV Community

Eliana Lam
Eliana Lam

Posted on • Originally published at aws-user-group.com

Automated Testing using MCP & AI Agents

Mariana Chow @ AWS Hong Kong Community Day 2025



Preparation and Planning

Foundation: Testing Data

  • Often overlooked but critical

  • Task management system (e.g., Jira): Use webhooks to store updates in AWS S3

  • Swagger documentation: Provides API specifications and parameters

  • Historical test cases: Allows verification and retesting of previous cases

Connecting Data Sources

  • Traditional methods have limited capability to find relationships

  • Introduction of Large Language Models (LLMs) to bridge connections

  • Example:

  • Mariana (AWS career in cloud computing with AI)

  • Dario (co-founder of Entropic, an AI company)

  • Entropic developed a series of LLMs available through AWS Bedrock

Execution

AI-Driven Testing Workflow

  • Leveraging AI to automate and enhance testing processes

  • Detailed dive into how AI integrates with existing testing frameworks

Reporting

Comprehensive Reporting

  • Generating insightful reports from automated tests

  • Utilizing AI to provide actionable insights and recommendations

  • Emphasis on transforming traditional testing workflows with AI

  • Invitation for feedback and discussion on the presented ideas



Understanding Relationships with Knowledge Graphs

  • Knowledge graphs capture complex relationships that simple text searches cannot

  • Example: Relationship between Mariana (AWS career) and Entropic (AI company founded by Dario)

  • Knowledge graph reveals hidden connections not found by simple searches

Importance of Knowledge Graphs in Automated Testing

  • Crucial for automated task and test case generation

  • Extracts entities and connects related items from data sources

  • Example:

  • Requirement tickets

  • API specifications

  • Historical test cases

Creating a Knowledge Graph

  • Traditional methods (e.g., PostgreSQL) are widget-heavy and low visibility

  • No-code solutions (e.g., DynamoDB) are inefficient

  • Graph databases (e.g., AWS Neptune, Neo4j) are better but require learning new concepts (nodes, edges, properties)

Solution: Amazon Neptune Analytics

  • Combines graph database capabilities with foundational models and AWS Bedrock

  • Allows insertion and retrieval of information without complex syntax

  • Provides a beautiful graph view for data visualization

Graph Retrieval Augmented (RA)

  • Technique where AI retrieves reference documents and generates responses

  • Graph RA uses a knowledge graph for retrieval instead of simple text search

AI-Driven Test Pipeline

  • Built on top of the knowledge graph foundation

  • Uses Behavior-Driven Development (BDD) with Gherkin format for test cases

  • Gherkin scenarios use keywords like Given, When, and Then to specify the initial state, action, and expected outcome of a feature

  • Example test case: Checking homepage titles with prerequisites and scenarios

  • Knowledge graphs are essential for effective automated testing

  • Amazon Neptune Analytics simplifies graph database management and visualization

  • Graph RA enhances AI-driven test case generation and execution



AI-Driven Test Case Generation

Goal

  • Improve test case coverage and consistency using AI agents

AI Agent Capabilities

  • Chooses the best actions to perform and achieve testing goals

  • Analyzes business flows and reads requirements from the knowledge graph

Example: 

Subscription Management Feature

Business Flow Analysis

  • Knowledge graph identifies:

  • Validation of payment method in the UI

  • Calling payment APIs

Conflict Detection

  • AI agent detects requirement conflicts:

  • Old rule: User must verify email before accessing premium features

  • New rule: Trial users can access premium features for seven days without email verification

  • Updates related test cases accordingly

API Details Discovery

  • Extracts endpoints, required/optional parameters, and error responses from the graph DB

  • Identifies API dependencies through recorded data (e.g., successful subscription creation, payment failure)

Test Data Generation

  • Creates test data covering:

  • Happy flow (successful scenarios)

  • Edge cases (boundary conditions)

  • Error conditions (failure scenarios)

Conclusion

  • AI agents enhance test case generation by:

  • Analyzing business flows

  • Detecting conflicts and updating test cases

  • Discovering API details and dependencies

  • Generating comprehensive test data



Refining and Executing Test Cases

Refining Test Cases with Business Rules

  • Use information from business flow analysis and identified rules to enhance scenarios

  • Convert refined scenarios into scenario-based test cases

Human-in-the-Loop Verification

  • AI can generate comprehensive test cases, but human experts are needed for validation

  • Human verification ensures edge cases and business contexts are captured

  • Jira ticket system, API documentation, and historical test cases may not cover all scenarios

Execution with Playwright

  • Playwright: Fast and reliable end-to-end testing

  • Playwright supports all modern rendering engines including Chromium, WebKit, and Firefox. Cross-platform. Test on Windows, Linux, and macOS, locally 

  • Utilize Playwright for test execution

  • Well-known, open-source automated testing framework

  • Communicates directly with the browser for efficient testing

  • Supports multiple languages (Java, Python, C, JavaScript, TypeScript)

  • Strong community support with 78.9K stars on GitHub

  • Native support for MCP (Model-Based Continuous Planning) for natural language instructions

  • Built-in features:

  • Parallel execution for efficiency

  • Comprehensive tracing and debugging capabilities

  • Automated screenshot and video recording

  • Detailed logging for documenting the testing process and enabling later verification

Conclusion

  • AI enhances test case generation, but human verification is crucial

  • Playwright offers efficient, versatile, and feature-rich test execution



Executing Test Cases with AI Agents

Task Executor Agent

  • AI agent for handling the execution phase

  • Extracts test parameters from generated tasks

  • Creates and stores Playwright scripts in the database for future reference

  • Automatically executes tests

Examples of AI-Driven Testing with Playwright

  • Front-end and back-end testing

  • Experiment using Playwright MCP with Amazon Q developer CLI

  • Three test cases for a simple e-commerce website:

  • Purchasing a product with shipment information and completing the order process

  • Logging in and entering the product page without further action

  • Adding a product to the shopping cart without further action

Post-Execution Verification

  • Test cases stored in the database

  • Utilization of features like video recording for later verification

  • Testers can review recorded videos to ensure alignment with expected behavior

Recap of Complete Test Execution Flow

  • Three AI agents work together:

  • Test case generator

  • Test case executor

  • Report generator (captures results and recordings)

  • Modularized approach provides flexibility and scalability

Leveraging Multiple MCPs

  • Use various MCPs for different testing needs:

  • MySQL MCP for generating realistic data

  • Redis MCP for storing recently used test cases

  • Simplifies testing processes

Key Message: Shift-Left Testing

  • Integrate automated testing early in the development cycle

  • Benefits:

  • Better quality

  • Improved efficiency

  • Lower costs

  • Reduced technical debt

Conclusion

  • AI-driven automated testing saves time and cost

  • Testers focus on verification, not test case generation

Top comments (1)

Collapse
 
walkingtree_technologies_ profile image
WalkingTree Technologies

Great write-up! MCP combined with AI agents is one of the most promising patterns we’ve seen recently for automated testing and QA orchestration. Once agents can access tools through a protocol layer like MCP, test flows become much easier to design, execute, and debug - especially when reasoning, tool calls, and logs are all visible in one place.

At WalkingTree Technologies, we’ve been exploring similar patterns for enterprise QA workflows. Have you tried using MCP for multi-agent pipelines (like a test-planner agent → execution agent → reviewer/analysis agent)? It’s showing interesting potential in our experiments.