Effectively Managing AI Agents for Testing

#ai #agentaichallenge #programming #testing

Large language models and AI agents have already transformed many fields and are changing our lives in fundamental ways. In the testing domain, AI agents have a clear path for making immediate improvements in process and quality, and ultimately for producing reliable, performant, secure, and compliant software. Check out Demystifying Agentic Test Automation: What It Means for QA Teams.

But it’s not obvious how to take advantage of these capabilities. While AI agents are not fully predictable, they can be managed reliably via robust control mechanisms. Let's see how.

What does it mean to manage AI agents in QA?

There are several important aspects to managing AI agents, both in general and specifically in the testing domain.

Configuration and guardrails involve setting agent autonomy levels and boundaries, defining test objectives through prompts and constraints, and specifying which areas require human approval. These steps ensure that the system operates within controlled parameters while meeting goals.

For example, you may allow your agentic AI system to write test code and even tracking and reporting systems, but not necessarily modify your production code.
Model selection and updates focus on deciding when to upgrade models versus maintaining stability. This includes testing model changes before rolling them out to production to avoid unexpected issues. If the quality of testing and coverage deteriorates when you upgrade models, it's a clear signal your AI agents are too finetuned to a specific model version.
Oversight and validation encompass quality gates and verification protocols, monitoring for flaky tests and low-value coverage, and managing the budget for execution costs. These practices help maintain test reliability and cost-effectiveness throughout the development lifecycle.

How to manage AI agents in practice

With those aspects in mind, let's see how to actually control your agents.

Consider a scenario where your team is migrating to agentic AI testing. You want to configure your agent's autonomy carefully. Here is a concrete example of how to go about it.

First, configure the agent system prompt. For example, you might specify: "You may generate API and UI test code for the /checkout and /payment flows. You may create test reports and update tracking dashboards. You must NOT modify production code or database schemas. All destructive operations (deleting tests, changing CI/CD pipelines) require human approval."
Next is tool integration. Connect your agent to your test management platform via APIs so it can read existing test cases and understand coverage gaps. This may be done through an MCP server like the Tosca MCP server or via custom tools you build.
Next, it’s time for a progressive rollout:

Week 1 - Agent generates tests in "suggestion mode" only.

Week 2-3 - Agent executes tests in an isolated staging environment.

Week 4+ - Agent runs tests in pre-production with human review of failures.
Finally, set up quality gates in your CI/CD pipeline that require a minimum pass rate (for example, 80%) before agent-generated tests can block deployments. Monitor false positive rate weekly and tune prompts accordingly.

How to Control AI Agents: prompts, tools, and feedback loops

Learning to control your agents is critical. AI agents, contrary to common beliefs, don’t actually have intelligence or agency. Agents are best understood as a system prompt combined with state/memory and a selection of tools. All the intelligence lies within the large language model (LLM), which accepts the agent (system prompt plus tools) and the user prompt as context. It then decides which tools to invoke and how to process both prompts until generating a final answer.

There are three ways to help control your AI agents: prompt engineering, tool integration, and performance monitoring/feedback loops.

Prompt engineering serves as the primary interface with these agents. This process involves writing effective test objectives and acceptance criteria, building a library of proven prompts, and iterating based on the outputs generated by the agent. Well-crafted prompts guide the agent to deliver precise and useful results.
Tool integration can be achieved through the MCP server, connecting agents to source control, design documents, and CI/CD pipelines. This integration provides essential context that enhances the agent’s intelligence. Additionally, emerging platforms like Applitools (for visual AI testing), Katalon Studio (for codeless test automation), and Selenium combined with AI extensions provide varying levels of autonomy and control for different organizational needs.
Performance monitoring and feedback loops track key metrics such as coverage, bug detection, false positives, and maintenance time. Real-time monitoring and alerting systems help determine when to reconfigure or retrain the agent for continued optimal performance.

Let's next consider the trajectory from traditional testing to agentic AI testing

How to migrate from traditional testing to agentic AI testing

Transitioning from traditional testing to agentic AI testing will be different for each organization as they start with their own unique combination of test automation, skill, and culture.

Organizations with existing automation typically run agentic AI tests alongside their legacy scripts, adopting gradual migration strategies and increasing the system’s autonomy as confidence grows.

This approach allows for a smoother shift without disrupting existing processes. The primary value here is flexibility and the ability to adapt quickly to changes.

Organizations that rely primarily on manual testing often start the process with low-risk regression suites to minimize risk. They focus on codifying tribal knowledge, transforming informal, human-centered test expertise into formalized, repeatable tests.

Additionally, for these organizations there is a cultural shift within QA teams, moving from maintaining scripts towards overseeing autonomous agents. But, in some ways it is easier to effect this change because the value is so much greater compared to already automated organizations.

Practical integration typically involves hybrid testing methods that combine manual, AI-assisted, and fully agentic approaches. Platforms like Tricentis Tosca and qTest enable unified management across these methods.

Example: Evolving a Traditional Test to an Agentic Approach

Let's compare how a typical test scenario might evolve from traditional to agentic approaches.

In a traditional Selenium test (using manual scripting) a QA engineer might write an explicit WebDriver code like this:

driver.findElement(By.id("username")).sendKeys("test@example.com")

But this test breaks when the UI changes (e.g., ID becomes class-based selector). Manual updates are required for every locator change. And with dynamic UI changes (very common when practicing A/B testing) the maintenance overhead and chance for errors is significant.

However, with AI-Assisted Testing (with tools like Tricentis Tosca or mabl):

The QA engineer records user actions via visual test builder.
Self-healing locators adapt to minor UI changes automatically.
This approach still requires human intervention for test design and assertion logic.

And with fully Agentic AI Testing:

The QA engineer provides high-level intent: "Test the login flow with valid and invalid credentials, edge cases, and security scenarios."
The agent autonomously discovers UI elements, generates test cases, and creates assertions.
It self-adapts to UI changes and refactors test logic without human input.
It learns from failures and adjusts test strategies in real-time.

The key insight is that traditional testing requires constant developer/QA time. Agentic AI shifts effort from execution and maintenance to strategic oversight and prompt engineering.

This is all great. But it’s also good to remember that agentic AI testing is a new paradigm, and migration to the bleeding edge always comes with challenges. So next, let's review some of the common challenges and solutions.

Common challenges in agentic AI testing

A common challenge with implementing agentic testing is calibrating trust in the AI, which is best done through incremental rollout.

Teams can introduce AI agents into the process gradually, validate their behavior on constrained scopes, and expand their responsibilities as confidence grows based on real outcomes and monitored performance.

Another persistent challenge is dealing with flaky tests, which can undermine trust and inflate maintenance costs.

Effective strategies include isolating unstable scenarios, tagging and quarantining flaky tests, and using AI agents to surface patterns in failures so teams can harden those areas.

At the same time, cost control and coverage deduplication require continuously pruning redundant scenarios and ensuring that additional test coverage actually adds incremental value rather than just increasing execution spend.

Finally, maintaining clear accountability is critical as agents take on more work. Even when agents generate, execute, and update tests, QA must own all agent outputs and remain the ultimate decision-maker on what ships. This means establishing review workflows, audit trails, and sign-off checkpoints so that autonomous activity is always paired with human oversight and responsibility.

But, even here AI agents can help as they are perfectly capable of performing review tasks. Moreover, it is possible to try multi-agent review: run three review agents, let them debate their findings, come to an agreement, then have a human receive the final distilled report and perform their own sanity checks.

The end game is integrating agentic AI testing directly into the software development lifecycle where agents don't just test code written by human developers. Instead they operate as part of a multi-agent AI development team where:

AI engineers write code
that AI testers develop and execute tests for
and together they iterate until the task is complete.

Conclusion

The key takeaways? To move to agentic AI:

Start with low autonomy and constrained scope.
Use MCP/tools to give agents context without giving them production write access.
Track failure rate, coverage, and costs.
Expand autonomy only after reliability is proven.

Managing agentic AI for testing shifts the focus from traditional script maintenance to strategic oversight. It requires QA teams to embrace a new mindset where they guide, monitor, and refine autonomous agents rather than manually updating scripts.

To get started with agentic AI testing, organizations should begin small, validating AI agents on low-risk scenarios and gradually expanding their scope as confidence builds.

By following best practices and leveraging robust tools, QA teams can transform their testing processes, accelerate releases, and improve software quality in an increasingly complex digital landscape.

Have a really great day!