DEV Community

Nicolai Bohn
Nicolai Bohn

Posted on

Launch week day 1: drive the full AI testing workflow from inside any AI tool

Building an AI agent uses one tool. Testing it uses another. Every iteration cycle ends with you switching to your test platform UI, running tests, inspecting results, then back to your editor to fix what broke. The context switch is small, but it adds up.

Today we shipped the fix: the Rhesis Agent Skill, day 1 of Rhesis Launch Week.

If you build LLM agents and use Claude Code, Cursor, Codex, Gemini CLI, or any of 40+ other AI tools, you can now drive the full Rhesis testing workflow from inside the chat where you write the code.

What it does

The Agent Skill packages our domain knowledge into a portable skill file that any compatible AI tool can load. Once installed, your AI assistant gains:

  • Endpoint discovery in Quick or Comprehensive mode
  • Test suite design with behaviors, test sets, and metrics
  • Confirmation guards that wait for approval before anything is created
  • Test execution against your endpoints
  • Failure analysis with pass/fail summaries and links back to runs

All powered by the Rhesis MCP server (27 tools covering test sets, behaviors, metrics, runs, and OData queries), all in natural language.

Install

Single command across all your AI tools:

npx skills add rhesis-ai/rhesis -g
Enter fullscreen mode Exit fullscreen mode

The CLI detects which AI tools you have installed and configures the skill for each one. Then set your API token:

export RHESIS_API_KEY=rhs_your_token_here
Enter fullscreen mode Exit fullscreen mode

Get a token at app.rhesis.ai/tokens.

Claude Code

Claude Code uses a plugin system that bundles the skill and MCP server config together:

/plugin marketplace add rhesis-ai/rhesis
/plugin install rhesis@rhesis-ai
Enter fullscreen mode Exit fullscreen mode

Cursor

Add the MCP server to your .cursor/mcp.json:

{
  "mcpServers": {
    "rhesis": {
      "url": "<https://api.rhesis.ai/mcp>",
      "headers": {
        "Authorization": "Bearer YOUR_RHESIS_API_KEY"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

For self-hosted backends, swap https://api.rhesis.ai/mcp for http://localhost:8080/mcp.

Use it

Type something like:

"Test my support agent on billing scenarios, run it, and rank the failures by severity."

The skill walks the conversation through a 6-step loop:

  1. Discover: explores what your endpoint can do
  2. Plan: proposes a test suite with behaviors and metrics
  3. Review: waits for your approval before creating anything
  4. Create: builds entities on the platform following the approved plan
  5. Execute: runs tests once you confirm
  6. Analyze: surfaces a pass/fail summary, failure patterns, and links back to results

For ad-hoc operations:

"List my existing test sets."
"Improve the Safety Compliance metric. Make the threshold stricter."
"Compare my last two test runs."

The host agent's native confirmation handles the safety guard, so destructive actions never happen without your sign-off.

Top comments (0)