Modern software systems are becoming more distributed, more complex, and more dependent on reliability than ever before. But reliability isn’t something you bolt on at the end, it’s something you test deliberately. That’s exactly where chaos engineering comes in: by intentionally injecting controlled failures, teams can understand how their systems behave under stress and make them more resilient.
LitmusChaos has been one of the most widely adopted open-source frameworks for cloud-native chaos engineering. But even with its intuitive UI (ChaosCenter), CRDs, and CLI, we know that YAMLs and APIs can still feel intimidating, especially for teams just starting out.
So, how do we make chaos engineering easier, faster, and more accessible?
This is where the LitmusChaos MCP Server comes in.
What Is MCP and Why Does It Matter?
MCP (Model Context Protocol) is a new standard that allows AI assistants (like Claude or other LLM-powered tools) to communicate with external systems through structured, well-defined tools.
Think of MCP as a bridge that lets your AI assistant:
• List chaos experiments
• Run experiments
• Stop them
• Check statuses
• Explore infrastructures
• Build probes
Using just simple natural language.
So instead of writing YAML or navigating multiple UI screens, you can simply ask:
“Run a pod-delete experiment in my staging environment.”
And the MCP Server will take care of it.
Why Did LitmusChaos Build an MCP Server?
Chaos engineering is powerful, but sometimes feels “too technical” or “too risky” for new practitioners. The LitmusChaos MCP Server solves this problem by:
Lowering the barrier to chaos. Anyone on your team can run chaos with natural language — no CRDs required.
Improving speed and accessibility. Fast discovery, instant experiment triggers, quick monitoring — all via an AI assistant.
Integrating chaos into existing workflows. Teams already using AI agents can now tie chaos steps directly into operational workflows.
Enforcing safe, scoped access. MCP tools only expose specific, controlled actions. Tokens and namespaces ensure safe execution.
Making chaos collaborative. Experiment reviews, run summaries, and probe outputs become conversational and easy to share.
Ultimately, this brings chaos engineering closer to everyday development and reliability workflows.
Setting Up the LitmusChaos MCP Server
Below is a simple guide to getting the MCP Server running on your cloud desktop or local system.
Prerequisites
You will need:
• Go 1.21 or newer
• Access to an existing LitmusChaos ChaosCenter
• A valid project and API token
• Any MCP-enabled AI client (e.g., Claude Desktop)
- Clone the repository
git clone https://github.com/litmuschaos/litmus-mcp-server.git
cd litmus-mcp-server
- Build the binary
make build
- Getting Your Credentials
- Chaos Center Endpoint: URL of your LitmusChaos installation
- Project ID: Found in Chaos Center project settings
- Access Token: Generate from Chaos Center → Settings → Access Tokens
- Connecting it with your MCP Client (Using Claude here)
{
"mcpServers": {
"litmuschaos": {
"command": "/path/to/litmuschaos-mcp-server",
"env": {
"CHAOS_CENTER_ENDPOINT": "http://localhost:8080",
"LITMUS_PROJECT_ID": "your-project-id",
"LITMUS_ACCESS_TOKEN": "your-token"
}
}
}
}
Available Tools
The server provides 17 comprehensive tools for chaos engineering operations:
Experiment Management
list_chaos_experiments - List all chaos experiments with filtering
get_chaos_experiment - Get detailed experiment information
run_chaos_experiment - Execute experiments immediately
stop_chaos_experiment - Stop running experiments
Execution Monitoring
list_experiment_runs - List experiment execution history
get_experiment_run_details - Get detailed run information with logs
Infrastructure Management
list_chaos_infrastructures - List all registered infrastructures
get_infrastructure_details - Get detailed infrastructure information
register_chaos_infrastructure - Register new Kubernetes infrastructures
Environment Organization
list_environments - List all environments
create_environment - Create new environments for the organization
Resilience Validation
list_resilience_probes - List all configured resilience probes
create_resilience_probe - Create HTTP, CMD, K8s, or Prometheus probes
Discovery & Analytics
list_chaos_hubs - List available ChaosHubs
get_chaos_faults - Browse available chaos faults
get_experiment_statistics - Get comprehensive platform statistics
Examples
These are real-world prompts you can use once your MCP server is connected:
- List all experiments
Show me all chaos experiments available in my staging environment.
- Trigger an experiment
Run the pod-delete-basic experiment and share the run ID.
- Check experiment status
Show me the timeline and probe results of run <RUN_ID>.
- Stop running chaos
Stop the currently running pod delete experiment.
- Explore ChaosHub
List all Kubernetes pod-level faults from the ChaosHub.
- Build a probe
Create an HTTP probe that checks /health returns 200 within 2 seconds.
- Add a new environment
Create a new environment called chaos-lab.
Why use MCP?
As systems become more distributed and reliability becomes more critical, we need chaos engineering tools that integrate into everyday workflows.
The LitmusChaos MCP Server delivers:
- Ease of use → chaos through natural language
- Faster adoption → no need to learn CRDs or YAML
- Better collaboration → experiments become team-friendly
The LitmusChaos MCP Server is more than just a technical integration, it’s a new way of practicing chaos engineering. Making chaos accessible through natural language removes barriers and encourages teams to adopt reliability as a mindset, not a one-time activity.
Top comments (0)