Pritesh Kiri for LitmusChaos

Posted on Nov 20

Making Chaos Conversational: A Beginner-Friendly Guide to the LitmusChaos MCP Server

#testing #mcp #devops #beginners

Modern software systems are becoming more distributed, more complex, and more dependent on reliability than ever before. But reliability isn’t something you bolt on at the end, it’s something you test deliberately. That’s exactly where chaos engineering comes in: by intentionally injecting controlled failures, teams can understand how their systems behave under stress and make them more resilient.

LitmusChaos has been one of the most widely adopted open-source frameworks for cloud-native chaos engineering. But even with its intuitive UI (ChaosCenter), CRDs, and CLI, we know that YAMLs and APIs can still feel intimidating, especially for teams just starting out.

So, how do we make chaos engineering easier, faster, and more accessible?

This is where the LitmusChaos MCP Server comes in.

What Is MCP and Why Does It Matter?

MCP (Model Context Protocol) is a new standard that allows AI assistants (like Claude or other LLM-powered tools) to communicate with external systems through structured, well-defined tools.

Think of MCP as a bridge that lets your AI assistant:
• List chaos experiments
• Run experiments
• Stop them
• Check statuses
• Explore infrastructures
• Build probes

Using just simple natural language.

So instead of writing YAML or navigating multiple UI screens, you can simply ask:

“Run a pod-delete experiment in my staging environment.”

And the MCP Server will take care of it.

Why Did LitmusChaos Build an MCP Server?

Chaos engineering is powerful, but sometimes feels “too technical” or “too risky” for new practitioners. The LitmusChaos MCP Server solves this problem by:

Lowering the barrier to chaos. Anyone on your team can run chaos with natural language — no CRDs required.
Improving speed and accessibility. Fast discovery, instant experiment triggers, quick monitoring — all via an AI assistant.
Integrating chaos into existing workflows. Teams already using AI agents can now tie chaos steps directly into operational workflows.
Enforcing safe, scoped access. MCP tools only expose specific, controlled actions. Tokens and namespaces ensure safe execution.
Making chaos collaborative. Experiment reviews, run summaries, and probe outputs become conversational and easy to share.

Ultimately, this brings chaos engineering closer to everyday development and reliability workflows.

Setting Up the LitmusChaos MCP Server

Below is a simple guide to getting the MCP Server running on your cloud desktop or local system.

Prerequisites

You will need:
• Go 1.21 or newer
• Access to an existing LitmusChaos ChaosCenter
• A valid project and API token
• Any MCP-enabled AI client (e.g., Claude Desktop)

Clone the repository

git clone https://github.com/litmuschaos/litmus-mcp-server.git
cd litmus-mcp-server

Build the binary

make build

Getting Your Credentials

Chaos Center Endpoint: URL of your LitmusChaos installation
Project ID: Found in Chaos Center project settings
Access Token: Generate from Chaos Center → Settings → Access Tokens

Connecting it with your MCP Client (Using Claude here)

{
  "mcpServers": {
    "litmuschaos": {
      "command": "/path/to/litmuschaos-mcp-server",
      "env": {
        "CHAOS_CENTER_ENDPOINT": "http://localhost:8080",
        "LITMUS_PROJECT_ID": "your-project-id",
        "LITMUS_ACCESS_TOKEN": "your-token"
      }
    }
  }
}

Available Tools

The server provides 17 comprehensive tools for chaos engineering operations:

Experiment Management

list_chaos_experiments - List all chaos experiments with filtering
get_chaos_experiment - Get detailed experiment information
run_chaos_experiment - Execute experiments immediately
stop_chaos_experiment - Stop running experiments

Execution Monitoring

list_experiment_runs - List experiment execution history
get_experiment_run_details - Get detailed run information with logs

Infrastructure Management

list_chaos_infrastructures - List all registered infrastructures
get_infrastructure_details - Get detailed infrastructure information
register_chaos_infrastructure - Register new Kubernetes infrastructures

Environment Organization

list_environments - List all environments
create_environment - Create new environments for the organization

Resilience Validation

list_resilience_probes - List all configured resilience probes
create_resilience_probe - Create HTTP, CMD, K8s, or Prometheus probes

Discovery & Analytics

list_chaos_hubs - List available ChaosHubs
get_chaos_faults - Browse available chaos faults
get_experiment_statistics - Get comprehensive platform statistics

Examples

These are real-world prompts you can use once your MCP server is connected:

List all experiments

Show me all chaos experiments available in my staging environment.

Trigger an experiment

Run the pod-delete-basic experiment and share the run ID.

Check experiment status

Show me the timeline and probe results of run <RUN_ID>.

Stop running chaos

Stop the currently running pod delete experiment.

Explore ChaosHub

List all Kubernetes pod-level faults from the ChaosHub.

Build a probe

Create an HTTP probe that checks /health returns 200 within 2 seconds.

Add a new environment

Create a new environment called chaos-lab.

Why use MCP?

As systems become more distributed and reliability becomes more critical, we need chaos engineering tools that integrate into everyday workflows.

The LitmusChaos MCP Server delivers:

Ease of use → chaos through natural language
Faster adoption → no need to learn CRDs or YAML
Better collaboration → experiments become team-friendly

The LitmusChaos MCP Server is more than just a technical integration, it’s a new way of practicing chaos engineering. Making chaos accessible through natural language removes barriers and encourages teams to adopt reliability as a mindset, not a one-time activity.

DEV Community