Das

Posted on Apr 11 • Edited on May 3 • Originally published at Medium

Querying Your Test Results with OpenSearch MCP

#robotframework #python #opensearch #testing

Ask your OpenSearch data questions in plain English using any MCP-compatible AI assistant.

This works with any test framework — Robot Framework is used here because that is where the data already lives.

Introduction

In the previous part of this series, every Robot Framework test result was streamed into OpenSearch the moment it completed — failures visible in a live dashboard without waiting for the suite to finish.

That part solved visibility. This part solves what happens after the run ends.

Every result is already in OpenSearch: test name, suite, status, failure message, tags, duration, run ID. The data is there. The question is how to use it efficiently. Building filters in Dashboards and writing DQL queries in Discover works, but it is slow and it breaks focus at exactly the moment you need to act.

Model Context Protocol (MCP) is an open standard that lets AI assistants connect directly to external data sources. OpenSearch has an official MCP server. Connect it to any MCP-compatible AI assistant — Claude, Copilot, or others — and your test data becomes queryable in plain English, from wherever you are already working.

The Problem

After a long CI run, the questions are always the same. Which tests failed? Have they failed before? Do they share a tag or a suite? What does the error mean? Is this a test bug or an application bug?

Answering those manually means switching tools: open Dashboards, build a filter, switch to Discover for the full message, run another query for the historical view. Each step is small. Across a team running multiple pipelines a day, it adds up — and context evaporates while it is happening.

MCP removes those steps. The AI assistant queries the index directly. You ask in plain English, you get an answer, you stay in the editor.

What is MCP?

Model Context Protocol is an open standard, originally developed by Anthropic and now widely adopted, for connecting AI assistants to external tools and data sources. Instead of the assistant working from what you paste into a conversation, it calls out to your systems and works from live data.

OpenSearch published an official MCP server: opensearch-mcp-server-py. Register it with any MCP-compatible AI assistant and it can query any OpenSearch index directly — no custom integration code, no middleware.

Note: The examples below use Claude Code. Any MCP-compatible assistant follows the same registration pattern.

Architecture

┌─────────────────────────────────┐
│     Robot Framework (local)     │
│     tests run as normal         │
└──────────────┬──────────────────┘
               │  end_test fires after each test
               ▼
┌─────────────────────────────────┐
│     opensearch_listener.py      │
│     indexes result immediately  │
└──────────────┬──────────────────┘
               │
               ▼
┌─────────────────────────────────┐
│     OpenSearch  (Docker)        │
│     port 9200                   │
│     index: robot-results        │
└──────────┬──────────────────────┘
           │
     ┌─────┴──────────┐
     │                │
     ▼                ▼
┌──────────┐    ┌─────────────────┐
│Dashboard │    │   MCP Server    │
│port 5601 │    │   (local proc)  │
│live view │    └────────┬────────┘
└──────────┘             │
                         ▼
                ┌─────────────────┐
                │  MCP-compatible │
                │  AI assistant   │
                └────────┬────────┘
                         │
                         ▼
                ┌─────────────────┐
                │  plain English  │
                │  queries + fixes│
                └─────────────────┘

The dashboard and MCP paths run on the same OpenSearch instance. The dashboard stays useful for live monitoring during a run. MCP is the interface for investigation and action after the run.

Tech Stack

Robot Framework — test framework with a listener API for hooking into execution
OpenSearch — stores every test result as it happens (from part one)
opensearch-mcp-server-py — official OpenSearch MCP server, published by the OpenSearch project
Claude Code — MCP-compatible AI assistant used in this setup (any MCP-compatible assistant works)
Python — ties it all together

Step by Step

Prerequisites

Everything from part one is in place: OpenSearch running in Docker, the listener shipping results, the robot-results index populated. Verify OpenSearch is healthy before continuing:

curl http://localhost:9200

A JSON response means it is ready. If OpenSearch is down when the MCP server registers, it will appear connected but return errors on every query.

1. The MCP Server Package

opensearch-mcp-server-py is already in requirements.txt and was installed in part one. Nothing new to install.

Security note: Only add MCP servers from trusted sources. This package is the official OpenSearch MCP server, maintained by the OpenSearch project.

2. Environment Variables

The MCP server reads connection details from environment variables. Create a .env file in the project root (already in .gitignore):

OPENSEARCH_URL=http://localhost:9200
OPENSEARCH_NO_AUTH=true

OPENSEARCH_NO_AUTH=true is for local development only. Never use it on a shared or production instance.

3. Register with Claude Code

claude mcp add opensearch \
  -e OPENSEARCH_URL=http://localhost:9200 \
  -e OPENSEARCH_NO_AUTH=true \
  -- uv run --project /path/to/results-execution-monitoring python -m mcp_server_opensearch

Verify:

claude mcp list

You should see:

opensearch: ... ✓ Connected

4. Permanent Config for CLI and VS Code

The registration above is session-scoped. To persist it across sessions — and have it work in both the Claude Code CLI and the VS Code extension — add it to ~/.claude/settings.json:

{
  "mcpServers": {
    "opensearch": {
      "command": "uv",
      "args": [
        "run",
        "--project",
        "/path/to/results-execution-monitoring",
        "python",
        "-m",
        "mcp_server_opensearch"
      ],
      "env": {
        "OPENSEARCH_URL": "http://localhost:9200",
        "OPENSEARCH_NO_AUTH": "true"
      }
    }
  }
}

After saving, reload VS Code (Ctrl+Shift+P → Developer: Reload Window). Run claude mcp list to confirm.

Querying Results in Practice

Once connected, the assistant queries robot-results directly.

Isolating a specific run

Every document in a run carries the same run_id, which is printed at the start of each run:

What tests failed in run 88b407bf?

The assistant returns test names, suite names, failure messages, and elapsed times as a readable summary — not raw JSON.

In CI, pass the build number as the run_id so results are traceable to a specific pipeline build:

python -m robot \
  --listener opensearch_listener.OpenSearchListener:url=http://localhost:9200:run_id=${BUILD_NUMBER} \
  tests/

Then:

What failed in build 42?

Understanding failure patterns over time

A single failure is a data point. The same test failing across five runs over a week is a problem that needs a decision — is it a flaky test, a broken feature, or an environmental issue?

Show me all failed tests from the last 7 days

This tells you immediately whether today's failures are new or recurring. A test that first appeared today is a different priority from one that has been failing silently for a week.

Triage by tag

Not all failures carry the same weight. Tests tagged smoke are meant to catch the most critical issues fastest. Knowing whether failing tests are smoke tests or deep regression tests changes how urgently you respond.

Show me failures with the smoke tag from the last 3 days

Acting on failures — in the same conversation

Once the assistant has the failure list, the conversation continues without switching context. The failure message, test name, suite, and tags are all in the indexed document. The error is already in scope.

The Division By Zero Fails test is failing with ZeroDivisionError.
What should the Robot Framework keyword look like to handle that safely?

Which of these failures look like test bugs vs application bugs?

Generate a test case that correctly validates that dividing by zero raises an exception.

The path from "what failed" to "here is the fix" happens in one conversation, without copy-pasting anything.

Result

Before MCP: run finishes → open Dashboards → filter by run → open Discover for error messages → cross-reference previous runs manually → copy error into a chat → figure out the fix → switch back to editor.

With MCP: run finishes → ask what failed → ask if it has happened before → ask what the fix looks like → fix it.

The time between "run finished" and "I know what to do" is shorter. Not because the failures changed, but because the path from data to action is direct.

Conclusion

The listener from part one moved test result availability from the end of the run to the moment each test completes. This part moves analysis from dashboards and query consoles to plain English, in the tool you are already using.

The approach is not specific to Robot Framework or Claude. Any test framework with a hook system can stream results to OpenSearch using the same listener pattern. Any MCP-compatible AI assistant can be registered against the same MCP server. The infrastructure stays the same regardless of what is being tested or which assistant is being used.

Store results as they happen. Query them in plain English. Act on what you find.

Resources

Model Context Protocol — modelcontextprotocol.io
Introducing MCP in OpenSearch — opensearch.org/blog/introducing-mcp-in-opensearch
opensearch-mcp-server-py — github.com/opensearch-project/opensearch-mcp-server-py
Part one — Real-Time Test Results in Robot Framework

DEV Community