Ask your OpenSearch data questions in plain English using any MCP-compatible AI assistant.
This works with any test framework — Robot Framework is used here because that is where the data already lives.
Introduction
In the previous part of this series, every Robot Framework test result was streamed into OpenSearch the moment it completed — failures visible in a live dashboard without waiting for the suite to finish.
That part solved visibility. This part solves what happens after the run ends.
Every result is already in OpenSearch: test name, suite, status, failure message, tags, duration, run ID. The data is there. The question is how to use it efficiently. Building filters in Dashboards and writing DQL queries in Discover works, but it is slow and it breaks focus at exactly the moment you need to act.
Model Context Protocol (MCP) is an open standard that lets AI assistants connect directly to external data sources. OpenSearch has an official MCP server. Connect it to any MCP-compatible AI assistant — Claude, Copilot, or others — and your test data becomes queryable in plain English, from wherever you are already working.
The Problem
After a long CI run, the questions are always the same. Which tests failed? Have they failed before? Do they share a tag or a suite? What does the error mean? Is this a test bug or an application bug?
Answering those manually means switching tools: open Dashboards, build a filter, switch to Discover for the full message, run another query for the historical view. Each step is small. Across a team running multiple pipelines a day, it adds up — and context evaporates while it is happening.
MCP removes those steps. The AI assistant queries the index directly. You ask in plain English, you get an answer, you stay in the editor.
What is MCP?
Model Context Protocol is an open standard, originally developed by Anthropic and now widely adopted, for connecting AI assistants to external tools and data sources. Instead of the assistant working from what you paste into a conversation, it calls out to your systems and works from live data.
OpenSearch published an official MCP server: opensearch-mcp-server-py. Register it with any MCP-compatible AI assistant and it can query any OpenSearch index directly — no custom integration code, no middleware.
Note: The examples below use Claude Code. Any MCP-compatible assistant follows the same registration pattern.
Architecture
┌─────────────────────────────────┐
│ Robot Framework (local) │
│ tests run as normal │
└──────────────┬──────────────────┘
│ end_test fires after each test
▼
┌─────────────────────────────────┐
│ opensearch_listener.py │
│ indexes result immediately │
└──────────────┬──────────────────┘
│
▼
┌─────────────────────────────────┐
│ OpenSearch (Docker) │
│ port 9200 │
│ index: robot-results │
└──────────┬──────────────────────┘
│
┌─────┴──────────┐
│ │
▼ ▼
┌──────────┐ ┌─────────────────┐
│Dashboard │ │ MCP Server │
│port 5601 │ │ (local proc) │
│live view │ └────────┬────────┘
└──────────┘ │
▼
┌─────────────────┐
│ MCP-compatible │
│ AI assistant │
└────────┬────────┘
│
▼
┌─────────────────┐
│ plain English │
│ queries + fixes│
└─────────────────┘
The dashboard and MCP paths run on the same OpenSearch instance. The dashboard stays useful for live monitoring during a run. MCP is the interface for investigation and action after the run.
Tech Stack
- Robot Framework — test framework with a listener API for hooking into execution
- OpenSearch — stores every test result as it happens (from part one)
- opensearch-mcp-server-py — official OpenSearch MCP server, published by the OpenSearch project
- Claude Code — MCP-compatible AI assistant used in this setup (any MCP-compatible assistant works)
- Python — ties it all together
Step by Step
Prerequisites
Everything from part one is in place: OpenSearch running in Docker, the listener shipping results, the robot-results index populated. Verify OpenSearch is healthy before continuing:
curl http://localhost:9200
A JSON response means it is ready. If OpenSearch is down when the MCP server registers, it will appear connected but return errors on every query.
1. The MCP Server Package
opensearch-mcp-server-py is already in requirements.txt and was installed in part one. Nothing new to install.
Security note: Only add MCP servers from trusted sources. This package is the official OpenSearch MCP server, maintained by the OpenSearch project.
2. Environment Variables
The MCP server reads connection details from environment variables. Create a .env file in the project root (already in .gitignore):
OPENSEARCH_URL=http://localhost:9200
OPENSEARCH_NO_AUTH=true
OPENSEARCH_NO_AUTH=true is for local development only. Never use it on a shared or production instance.
3. Register with Claude Code
claude mcp add opensearch \
-e OPENSEARCH_URL=http://localhost:9200 \
-e OPENSEARCH_NO_AUTH=true \
-- uv run --project /path/to/results-execution-monitoring python -m mcp_server_opensearch
Verify:
claude mcp list
You should see:
opensearch: ... ✓ Connected
4. Permanent Config for CLI and VS Code
The registration above is session-scoped. To persist it across sessions — and have it work in both the Claude Code CLI and the VS Code extension — add it to ~/.claude/settings.json:
{
"mcpServers": {
"opensearch": {
"command": "uv",
"args": [
"run",
"--project",
"/path/to/results-execution-monitoring",
"python",
"-m",
"mcp_server_opensearch"
],
"env": {
"OPENSEARCH_URL": "http://localhost:9200",
"OPENSEARCH_NO_AUTH": "true"
}
}
}
}
After saving, reload VS Code (Ctrl+Shift+P → Developer: Reload Window). Run claude mcp list to confirm.
Querying Results in Practice
Once connected, the assistant queries robot-results directly.
Isolating a specific run
Every document in a run carries the same run_id, which is printed at the start of each run:
What tests failed in run 88b407bf?
The assistant returns test names, suite names, failure messages, and elapsed times as a readable summary — not raw JSON.
In CI, pass the build number as the run_id so results are traceable to a specific pipeline build:
python -m robot \
--listener opensearch_listener.OpenSearchListener:url=http://localhost:9200:run_id=${BUILD_NUMBER} \
tests/
Then:
What failed in build 42?
Understanding failure patterns over time
A single failure is a data point. The same test failing across five runs over a week is a problem that needs a decision — is it a flaky test, a broken feature, or an environmental issue?
Show me all failed tests from the last 7 days
This tells you immediately whether today's failures are new or recurring. A test that first appeared today is a different priority from one that has been failing silently for a week.
Triage by tag
Not all failures carry the same weight. Tests tagged smoke are meant to catch the most critical issues fastest. Knowing whether failing tests are smoke tests or deep regression tests changes how urgently you respond.
Show me failures with the smoke tag from the last 3 days
Acting on failures — in the same conversation
Once the assistant has the failure list, the conversation continues without switching context. The failure message, test name, suite, and tags are all in the indexed document. The error is already in scope.
The Division By Zero Fails test is failing with ZeroDivisionError.
What should the Robot Framework keyword look like to handle that safely?
Which of these failures look like test bugs vs application bugs?
Generate a test case that correctly validates that dividing by zero raises an exception.
The path from "what failed" to "here is the fix" happens in one conversation, without copy-pasting anything.
Result
Before MCP: run finishes → open Dashboards → filter by run → open Discover for error messages → cross-reference previous runs manually → copy error into a chat → figure out the fix → switch back to editor.
With MCP: run finishes → ask what failed → ask if it has happened before → ask what the fix looks like → fix it.
The time between "run finished" and "I know what to do" is shorter. Not because the failures changed, but because the path from data to action is direct.
Conclusion
The listener from part one moved test result availability from the end of the run to the moment each test completes. This part moves analysis from dashboards and query consoles to plain English, in the tool you are already using.
The approach is not specific to Robot Framework or Claude. Any test framework with a hook system can stream results to OpenSearch using the same listener pattern. Any MCP-compatible AI assistant can be registered against the same MCP server. The infrastructure stays the same regardless of what is being tested or which assistant is being used.
Store results as they happen. Query them in plain English. Act on what you find.
Resources
- GitHub — github.com/007bsd/results-execution-monitoring
- Model Context Protocol — modelcontextprotocol.io
- Introducing MCP in OpenSearch — opensearch.org/blog/introducing-mcp-in-opensearch
- opensearch-mcp-server-py — github.com/opensearch-project/opensearch-mcp-server-py
- Part one — Real-Time Test Results in Robot Framework
The complete code is in the GitHub repo. If anything in the MCP setup behaves differently, leave a comment.
Top comments (0)