Omar Fathy

Posted on Jul 7

Amazon EKS Model Context Protocol (MCP): Revolutionizing Kubernetes Development with AI-Powered Context Awareness

Abstract

They say a picture is worth a thousand prompts but in the fast-paced world of cloud-native development, the Amazon EKS Model Context Protocol (MCP) says even more. Since its release, MCP has quickly distinguished itself as a breakthrough innovation, a clear example of how purposeful design can redefine best practices and significantly accelerate application development on Amazon EKS.

The Amazon EKS Model Context Protocol (MCP) Server represents a paradigm shift in cloud-native development, introducing AI-powered assistance directly into Kubernetes workflows. This open-source protocol bridges the gap between Large Language Models (LLMs) and EKS cluster management, enabling developers to interact with complex Kubernetes operations through natural language interfaces while maintaining enterprise-grade security and operational excellence.

Introduction
What is Amazon EKS Model Context Protocol?
Core Features and Capabilities
Comparison With Traditional Approaches
Use Cases and Real-World Examples
How to Use MCP in EKS
Architecture and Visual Overview
Security and Governance
Future Potential and AWS Vision
Conclusion
References

Introduction

Containerized applications have become the cornerstone of modern cloud deployments, offering consistent environments, streamlined dependency management, and seamless scaling capabilities. However, the journey from application development to production deployment remains fraught with manual, time-consuming processes that require deep expertise in Kubernetes operations, AWS services, and infrastructure management.

AWS has recently announced the launch of the open-source Amazon EKS Model Context Protocol (MCP) Server, alongside the Amazon ECS MCP Server, marking a significant advancement in AI-assisted cloud-native development. This revolutionary tool brings artificial intelligence directly into the Kubernetes development workflow, transforming how developers interact with EKS clusters.

The Challenge

Traditional Kubernetes and EKS management requires developers to:

Master complex kubectl commands and YAML manifests
Navigate intricate AWS service integrations (IAM, VPC, EBS)
Manually troubleshoot cluster issues using multiple tools and documentation sources
Context-switch between various interfaces for cluster management, monitoring, and debugging

The Solution

The EKS MCP Server addresses these challenges by:

Simplifying cluster setup with automated prerequisite creation and best practice application
Streamlining application deployment through high-level workflows and automated code generation
Accelerating troubleshooting via intelligent debugging tools and integrated knowledge base access
Enabling natural language interactions for complex Kubernetes operations

What is Amazon EKS Model Context Protocol?

The Model Context Protocol (MCP) is an open protocol that enables seamless integration between LLM applications and external data sources and tools. Whether you're building an AI-powered IDE, enhancing a chat interface, or creating custom AI workflows, MCP provides a standardized way to connect LLMs with the context they need.

Why MCP Servers?

MCP servers enhance the capabilities of foundation models (FMs) in several key ways:

Improved Output Quality: By providing relevant information directly in the model's context, MCP servers significantly improve model responses for specialized domains like AWS services. This approach reduces hallucinations, provides more accurate technical details, enables more precise code generation, and ensures recommendations align with current AWS best practices and service capabilities.
Access to Latest Documentation: FMs may not have knowledge of recent releases, APIs, or SDKs. MCP servers bridge this gap by pulling in up-to-date documentation, ensuring your AI assistant always works with the latest AWS capabilities.
Workflow Automation: MCP servers convert common workflows into tools that foundation models can use directly. Whether it's CDK, Terraform, or other AWS-specific workflows, these tools enable AI assistants to perform complex tasks with greater accuracy and efficiency.
Specialized Domain Knowledge: MCP servers provide deep, contextual knowledge about AWS services that might not be fully represented in foundation models' training data, enabling more accurate and helpful responses for cloud development tasks.

In the context of Amazon EKS, Integrating the EKS MCP server into AI code assistants enhances development workflow across all phases, from simplifying initial cluster setup with automated prerequisite creation and application of best practices. Further, it streamlines application deployment with high-level workflows and automated code generation. Finally, it accelerates troubleshooting through intelligent debugging tools and knowledge base access. All of this simplifies complex operations through natural language interactions in AI code assistants.

MCP in the EKS Ecosystem

A Model Context Protocol (MCP) server for Amazon EKS that enables generative AI models to create and manage Kubernetes clusters on AWS through MCP tools specifically addresses the complexity of Kubernetes cluster management by

Context-Aware Operations: Understanding the current state of your EKS clusters and providing relevant suggestions
EKS Cluster Management: Create and manage EKS clusters with dedicated VPCs, proper networking, and CloudFormation templates for reliable, repeatable deployments
Kubernetes Resource Management: Create, read, update, delete, and list Kubernetes resources with support for applying YAML manifests
Application Deployment: Generate and deploy Kubernetes manifests with customizable parameters for containerized applications
Operational Support: Access pod logs, Kubernetes events, and monitor cluster resources
CloudWatch Integration: Retrieve logs and metrics from CloudWatch for comprehensive monitoring
Integrated Troubleshooting: Accessing AWS's internal EKS troubleshooting knowledge base
Security-First Design: Configurable read-only mode, sensitive data access controls, and IAM integration for proper permissions management

Core Features and Capabilities

1. Kubernetes Resource Management

The EKS MCP Server provides comprehensive resource management capabilities without requiring deep kubectl expertise:

# Traditional approach - manual YAML creation
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: nginx:1.21
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

With MCP: Natural language request like "Deploy a web application with 3 replicas using nginx 1.21 in the production namespace" automatically generates and applies the appropriate resources.

2. EKS Auto Mode Cluster Management

Automated Cluster Creation

# Traditional eksctl approach
eksctl create cluster \
  --name my-cluster \
  --version 1.29 \
  --region us-west-2 \
  --vpc-private-subnets subnet-xxx,subnet-yyy \
  --vpc-public-subnets subnet-aaa,subnet-bbb \
  --with-oidc \
  --managed

MCP Enhancement: Request "Create an EKS cluster with Auto Mode in us-west-2" triggers automated CloudFormation stack deployment including:

Dedicated VPC with appropriate subnets
Security groups with least-privilege access
OIDC provider configuration
Auto Mode node pools with optimal instance selection

3. Intelligent Troubleshooting Engine

The MCP server includes direct access to AWS's internal EKS troubleshooting guide through the search_eks_troubleshoot_guide function:

// Example MCP function call
{
  "method": "search_eks_troubleshoot_guide",
  "params": {
    "query": "pod scheduling issues",
    "cluster_context": {
      "version": "1.29",
      "node_groups": ["managed", "fargate"]
    }
  }
}

4. Security-Centric Design

Default Read-Only Operation

# Starting MCP server in secure mode (default)
mcp-server-eks --region us-west-2

# Enabling write operations (explicit flag required)
mcp-server-eks --region us-west-2 --allow-write

Comparison With Traditional Approaches

The Reality Check: Before and After MCP

Let's be honest - working with Kubernetes has never been easy. Even experienced developers find themselves drowning in YAML files, debugging cryptic error messages, and spending hours on tasks that should take minutes. The traditional EKS experience often feels like this:

A Day in the Life: Traditional EKS Development

Picture this: You're a developer who just wants to deploy a simple Python web application. Here's what your day typically looks like:

Morning Coffee & kubectl Confusion ☕

   # You start with the basics, but even this requires research
   kubectl create namespace my-app
   kubectl create deployment my-app --image=my-python-app:latest
   # Wait, what's the right syntax for resource limits again?

Afternoon YAML Wrestling 🤼‍♂️

   # After hours of Stack Overflow and documentation diving
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: my-python-app
     namespace: my-app
   spec:
     replicas: 3
     selector:
       matchLabels:
         app: my-python-app
     template:
       metadata:
         labels:
           app: my-python-app
       spec:
         containers:
         - name: app
           image: my-python-app:latest
           ports:
           - containerPort: 8080
           resources:
             requests:
               memory: "64Mi"
               cpu: "250m"
             limits:
               memory: "128Mi"
               cpu: "500m"

Evening Troubleshooting Sessions 🌙

   # Your pods are failing, but why?
   kubectl describe pod my-python-app-xyz
   kubectl logs my-python-app-xyz
   kubectl get events --namespace my-app
   # 3 hours later, you realize it was a simple port mismatch

Enter MCP: The Game Changer

Now, imagine the same scenario with the EKS MCP Server. Here's how that same day transforms:

A Day in the Life: MCP-Enhanced Development

Morning Simplicity ☀️

   You: "I have a Python app in my ECR repo at 123456789.dkr.ecr.eu-west-1.amazonaws.com/my-python-app:latest. 
        Can you deploy it to an EKS cluster called 'my-test-cluster'?"

   AI: "I'll help you deploy this! Let me check if the cluster exists and create the necessary resources."

Automatic Infrastructure Creation 🏗️
Behind the scenes, MCP intelligently:
- Checks if my-test-cluster exists
- Creates a CloudFormation stack with VPC, subnets, and security groups
- Generates appropriate Kubernetes manifests
- Deploys your application with best practices built-in
Intelligent Problem Resolution 🧠
When issues arise:

   You: "My pods seem to be failing. Can you investigate?"

   AI: "I found the issue! Your image architecture (ARM64) doesn't match your node group (AMD64). 
        I'll recreate the deployment with the correct node selector."

Real-World Impact: The Numbers Don't Lie

Based on real developer experiences and our analysis:

Task	Traditional Time	MCP-Enhanced Time	Improvement
New Cluster Setup	45-90 minutes	5-10 minutes	85% faster
Application Deployment	30-60 minutes	3-5 minutes	90% faster
Troubleshooting Issues	2-8 hours	15-45 minutes	80% faster
Learning Basic Operations	2-6 months	1-3 weeks	75% faster

Use Cases and Real-World Examples

The "Vibe Coding" Revolution🎧💻

The EKS MCP Server isn't just about automation - it's about enabling what AWS engineers call "vibe coding." This means you can go from a rough idea to a deployed, production-ready application through natural conversation with your AI assistant.

Use Case 1: The Startup Sprint - Multi-Tenant SaaS Deployment

The Scenario: Meet Alex, a startup founder who needs to deploy a multi-tenant SaaS platform for their new customer management tool. They have limited DevOps experience but big ambitions.

The Traditional Nightmare 😰

# Alex would typically spend days on this:
# 1. Research namespace isolation patterns
# 2. Manually create network policies
# 3. Set up resource quotas for each tenant
# 4. Configure monitoring and logging
# 5. Debug inevitable security and networking issues

kubectl create namespace tenant-companya
kubectl create namespace tenant-companyb
# ... followed by dozens of YAML files and kubectl commands

The MCP Magic ✨

Alex: "I need to set up a multi-tenant environment for my SaaS app. I have tenants 'TechCorp' and 'StartupInc', 
      each should be isolated with 2GB RAM limits and auto-scaling between 2-10 pods based on demand."

AI Assistant: "Perfect! I'll create isolated environments for both tenants with proper security boundaries. 
             Let me set this up with network policies and resource quotas."

What Happens Behind the Scenes:

# Auto-generated with security best practices
apiVersion: v1
kind: Namespace
metadata:
  name: tenant-techcorp
  labels:
    tenant: techcorp
    isolation: enabled
    created-by: mcp-server
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tenant-isolation-techcorp
  namespace: tenant-techcorp
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          tenant: techcorp
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          tenant: techcorp
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-quota-techcorp
  namespace: tenant-techcorp
spec:
  hard:
    requests.memory: "2Gi"
    limits.memory: "2Gi"
    pods: "10"
    services: "5"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: saas-app-hpa-techcorp
  namespace: tenant-techcorp
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: saas-app-techcorp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

The Result: Alex goes from concept to secure, multi-tenant environment in under 10 minutes instead of 3-4 days of research and implementation.

Use Case 2: The "Oh No!" Moment - Production Troubleshooting

The Scenario: Jamie, a DevOps engineer, gets paged at 2 AM. The company's main application is down, customers are complaining, and the CEO is asking for updates every 15 minutes.

The Traditional Detective Work 🕵️‍♀️

# Jamie's typical 2 AM troubleshooting journey:
kubectl get pods --all-namespaces | grep -i CrashLoopBackOff
kubectl describe pod failing-pod-xyz
kubectl logs failing-pod-xyz --previous
kubectl get events --sort-by=.metadata.creationTimestamp
aws logs start-query --log-group-name /aws/eks/cluster-name/cluster
# 2 hours later, still searching through logs and documentation...

The MCP Superhero Moment 🦸‍♀️

Jamie: "The payment service pods in production are failing. Can you investigate what's happening?"

AI Assistant: "I'm analyzing the issue now. Let me check the pod status, events, and recent logs."

[MCP automatically invokes multiple tools:]
- Checks pod health across namespaces
- Retrieves recent events and error patterns  
- Pulls CloudWatch logs with error filtering
- Accesses EKS troubleshooting knowledge base

AI Assistant: "Found the issue! The payment service is failing due to insufficient IAM permissions 
             for accessing the RDS database. The IAM role is missing the 'rds:DescribeDBInstances' 
             permission. I can fix this by updating the service account's IAM policy."

Jamie: "Yes, please fix it."

AI Assistant: "Done! I've updated the IAM policy and restarted the affected pods. 
             The service should be healthy in about 2 minutes."

The Magic Behind the Scenes:
The MCP server automatically:

Used list_k8s_resources to identify failing pods
Called get_k8s_events to gather error context
Invoked get_cloudwatch_logs with error filtering
Searched the eks_troubleshoot_guide for IAM-related issues
Used add_inline_policy to fix the permissions
Applied the fix with manage_k8s_resource

The Result: Jamie resolves a critical production issue in 5 minutes instead of 2-3 hours, becoming the office hero.

How to Use MCP in EKS

Prerequisites: Getting Your Environment Ready For magic 🪄

Before we dive into the magic, let's make sure you have everything you need. Think of this as preparing your workspace before starting a project:

Essential Tools (The Must-Haves):

Python 3.10+ - The foundation for running MCP servers
uv package manager - For fast Python package management
AWS CLI with credentials - Your gateway to AWS services

Optional But Recommended (The Nice-to-Haves):

eksctl - For advanced cluster management
kubectl - For direct Kubernetes interaction when needed ### 🔐 Are You Authorized to Use MCP?

Before you can use the EKS MCP server to manage your Kubernetes resources, it's essential to ensure that your IAM role or user has the proper permissions. Without these, actions like querying cluster metadata, generating manifests, or deploying infrastructure will fail with authorization errors.

Let's walk through what permissions you need and why they matter.

🕵️‍♂️ Read-Only Permissions (For Observability and Safe Exploration)

If you're only querying information—such as cluster status, resource metrics, or IAM roles—grant your IAM principal the following read-only policy. This enables the MCP server to gather cluster insights, CloudWatch metrics, and IAM configurations without making changes:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks:DescribeCluster",
        "cloudformation:DescribeStacks",
        "cloudwatch:GetMetricData",
        "logs:StartQuery",
        "logs:GetQueryResults",
        "iam:GetRole",
        "iam:GetRolePolicy",
        "iam:ListRolePolicies",
        "iam:ListAttachedRolePolicies",
        "iam:GetPolicy",
        "iam:GetPolicyVersion",
        "eks-mcpserver:QueryKnowledgeBase"
      ],
      "Resource": "*"
    }
  ]
}

✅ Tip: Start with read-only mode for safer exploration, especially in production environments.

✍️ Write Permissions (For Cluster Creation and Resource Deployment)

To fully leverage MCP's deployment automation—such as provisioning EKS clusters, creating networking resources, or applying manifests—you'll need broader permissions. We recommend attaching the following managed policies to your IAM role or user:

IAMFullAccess
Grants the ability to create and manage IAM roles and policies needed by your EKS workloads.
AmazonVPCFullAccess
Allows provisioning of VPCs, subnets, route tables, NAT gateways, and other essential networking components.
AWSCloudFormationFullAccess
Required to deploy the CloudFormation stack located at:
/awslabs/eks_mcp_server/templates/eks-templates/eks-with-vpc.yaml
Custom EKS Full Access Policy (needed for full cluster and node group operations):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "eks:*",
      "Resource": "*"
    }
  ]
}

🔄 Accessing the Kubernetes API: What You Should Know

Even with the correct IAM permissions, Kubernetes API access in EKS has a few additional rules. For your user or role to successfully interact with the Kubernetes API via MCP, one of the following conditions must be true:

The IAM principal created the EKS cluster originally, and thus has automatic API access.
An EKS Access Entry has been manually configured to grant access to your IAM principal.

If you encounter Unauthorized or Forbidden errors while performing Kubernetes actions, it's likely due to a missing access entry. Review the EKS documentation on Access Entries for instructions on granting permissions explicitly.

Setting Up Your AI Copilot

The beauty of the EKS MCP Server is that it works with multiple AI assistants. Here's how to set it up with the most popular options:

Option 1: Cursor IDE Setup (Recommended for Developers)

Cursor IDE has become the go-to choice for developers who want AI assistance integrated directly into their coding workflow.

Step 1: Basic Configuration

Open Cursor and click the gear icon (⚙️) in the top-right corner
Navigate to MCP → Add new global MCP server
Paste this configuration:

For Mac/Linux:

{
  "mcpServers": {
    "awslabs.eks-mcp-server": {
      "autoApprove": [],
      "disabled": false,
      "command": "uvx",
      "args": [
        "awslabs.eks-mcp-server@latest",
        "--allow-write"
      ],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR",
        "AWS_PROFILE": "your-profile",
        "AWS_REGION": "us-west-2"
      },
      "transportType": "stdio"
    }
  }
}

For Windows:

{
  "mcpServers": {
    "awslabs.eks-mcp-server": {
      "autoApprove": [],
      "disabled": false,
      "command": "uvx",
      "args": [
        "--from",
        "awslabs.eks-mcp-server@latest",
        "awslabs.eks-mcp-server.exe",
        "--allow-write"
      ],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR",
        "AWS_PROFILE": "your-profile",
        "AWS_REGION": "us-west-2"
      },
      "transportType": "stdio"
    }
  }
}

After a few minutes, you should see a green indicator if your MCP server definition is valid.

Step 2: Test Your Setup
Open a chat panel in Cursor (Ctrl/⌘ + L) and try:

"Create a new EKS cluster named 'my-test-cluster' in the 'us-west-2' region using Kubernetes version 1.31."

Option 2: Amazon Q Developer CLI Setup

Step 1: Install Q Developer CLI

Set up the Amazon Q Developer CLI
1. Install the Amazon Q Developer CLI .
2. The Q Developer CLI supports MCP servers for tools and prompts out-of-the-box. Edit your Q developer CLI's MCP configuration file named mcp.json following these instructions. For example:
Verify Setup

  # Check available tools
  q tools

Step 2: Configure MCP
Edit your mcp.json file:

For Mac/Linux:

  {
    "mcpServers": {
      "awslabs.eks-mcp-server": {
        "command": "uvx",
        "args": ["awslabs.eks-mcp-server@latest"],
        "env": {
          "FASTMCP_LOG_LEVEL": "ERROR"
        },
        "autoApprove": [],
        "disabled": false
      }
    }
  }

For Windows:

  {
    "mcpServers": {
      "awslabs.eks-mcp-server": {
        "command": "uvx",
        "args": ["--from", "awslabs.eks-mcp-server@latest", "awslabs.eks-mcp-server.exe"],
        "env": {
          "FASTMCP_LOG_LEVEL": "ERROR"
        },
        "autoApprove": [],
        "disabled": false
      }
    }
  }

Verify your setup by running the /tools command in the Q Developer CLI to see the available EKS MCP tools.

Understanding Security Flags and Configurations 🔒

The EKS MCP Server comes with built-in configurable arguments and environment variables as safety switches:

The args field in your MCP server definition allows you to customize how the EKS MCP server runs by passing specific command-line arguments. These flags control permissions, security behavior, and how the server interacts with Kubernetes and AWS resources.

You can fine-tune the behavior of the EKS MCP server using environment variables defined under the env field. These variables control everything from logging verbosity to AWS authentication settings.

🔧 Common Command Arguments

`--allow-write` Flag

When the --allow-write flag is enabled, the EKS MCP Server can create missing IAM permissions for EKS resources through the add_inline_policy tool. This tool enables the following:

Only creates new inline policies; it never modifies existing policies.
Is useful for automatically fixing common permissions issues with EKS clusters.
Should be used with caution and with properly scoped IAM roles.
What it does: Enables creation, modification, and deletion of resources
When to use: Development environments, trusted automation
When NOT to use: Production clusters without proper review processes

// Conservative approach (read-only)
"args": ["awslabs.eks-mcp-server@latest"]

// Development approach (with write access)
"args": ["awslabs.eks-mcp-server@latest", "--allow-write"]

`--allow-sensitive-data-access` Flag

Enables access to sensitive data such as logs, events, and Kubernetes Secrets.

Default: false (Access to sensitive data is restricted by default)
What it does: Allows access to logs, events, and secrets
When to use: Troubleshooting, monitoring, development
When NOT to use: Shared environments or when logs contain sensitive data

// Full access (use carefully)
"args": [
  "awslabs.eks-mcp-server@latest",
  "--allow-write",
  "--allow-sensitive-data-access"
]

Important Security Note: Users should exercise caution when --allow-write and --allow-sensitive-data-access modes are enabled with these broad permissions, as this combination grants significant privileges to the MCP server. Only enable these flags when necessary and in trusted environments. For production use, consider creating more restrictive custom policies.

⚙️ Common Environment variables

Here's a sample configuration snippet:

{
  "mcpServers": {
    "awslabs.eks-mcp-server": {
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR",
        "AWS_PROFILE": "my-profile",
        "AWS_REGION": "us-west-2"
      }
    }
  }
}

🔊 `FASTMCP_LOG_LEVEL` (optional)

Controls the verbosity of logs produced by the MCP server.

Accepted values: "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"
Default: "WARNING"
Use case: Set to "ERROR" in production to reduce noise; use "DEBUG" when troubleshooting.

📌 Example:

"FASTMCP_LOG_LEVEL": "ERROR"

🔐 `AWS_PROFILE` (optional)

Specifies which named AWS CLI profile to use when authenticating with AWS services.

Default: If not set, the server falls back to the default credentials provider chain (e.g., environment, EC2 metadata).
Use case: Ideal when running the server locally with multiple profiles configured.

📌 Example:

"AWS_PROFILE": "my-profile"

🌍 `AWS_REGION` (optional)

Defines the target AWS region where EKS clusters are located. All MCP operations will use this region context.

Default: If not provided, AWS SDK default behavior will apply (which may vary based on environment).
Use case: Ensure MCP commands and deployments run in the intended region, especially when managing clusters across multiple environments.

📌 Example:

"AWS_REGION": "us-west-2"

Best Practices for Safe MCP Usage

The "Production Safety" Checklist ✅

[ ] Start Read-Only: Always begin with read-only mode for evaluation
[ ] Environment Separation: Use different configurations for dev/staging/prod
[ ] Access Control: Apply least-privilege IAM policies
[ ] Audit Everything: Enable comprehensive logging
[ ] Regular Updates: Keep MCP server updated with security patches

The "Developer Happiness" Checklist 😊

[ ] Enable Write Mode: For development environments, enable --allow-write
[ ] Sensitive Data Access: Enable for troubleshooting capabilities
[ ] Auto-Approve: Consider enabling for trusted, repeated operations
[ ] Multiple MCP Servers: Combine EKS with other AWS MCP servers as needed
[ ] Custom Regions: Set appropriate AWS regions for your infrastructure ### Quick Troubleshooting Guide

"It's Not Working!" - Common Issues and Solutions

Issue: MCP server shows as disconnected

# Check AWS credentials
aws sts get-caller-identity

# Verify Python and uv installation
python --version
uv --version

# Check MCP server logs
# (Look in your AI assistant's debug/log output)

Issue: Permission denied errors

# Verify IAM permissions
aws iam simulate-principal-policy \
  --policy-source-arn $(aws sts get-caller-identity --query Arn --output text) \
  --action-names eks:DescribeCluster \
  --resource-arns "*"

Issue: Cluster connection problems

# Update kubeconfig
aws eks update-kubeconfig --region us-west-2 --name my-cluster

# Test connectivity
kubectl cluster-info

Architecture and Visual Overview

How Everything Connects: The Big Picture

Imagine the EKS MCP Server as a universal translator that sits between your natural language requests and the complex world of AWS and Kubernetes APIs. Here's how the magic happens:

AI Assistant (e.g., Cursor) at the top
MCP Protocol layer
EKS MCP Server in the middle
AWS Services (EKS, IAM, CloudWatch, VPC) at the bottom
Bidirectional data flow arrows
Security boundaries and encryption indicators]

The Intelligence Behind the Simplicity

What you see: Simple conversation with your AI assistant
What's actually happening: A sophisticated orchestration of AWS services

Your Input: "Deploy my Python app to EKS"
    ↓
AI Processing: Understanding intent and context
    ↓
MCP Translation: Converting to specific tool calls
    ↓
AWS API Calls: Executing infrastructure operations
    ↓
Kubernetes Operations: Managing application deployments
    ↓
Real-time Feedback: Monitoring and reporting status
    ↓
Human-friendly Response: "Your app is deployed and healthy!"

The Tools Under the Hood

The EKS MCP Server comes packed with an impressive array of tools. Think of them as specialized functions that handle different aspects of cluster management to automate and simplify management of your Amazon EKS clusters and Kubernetes resources. Each tool performs a targeted operation and can be invoked as part of your workflow for provisioning, managing, observing, and troubleshooting infrastructure.

Cluster Management Tools 🏗️

manage_eks_stacks - Your cluster lifecycle manager. Automates lifecycle management of EKS CloudFormation stacks. Features:
- Generate CloudFormation templates for EKS clusters.
- Deploy clusters with all necessary components (VPCs, subnets, IAM roles, etc.).
- Describe stack metadata, status, outputs.
- Delete stacks and clean up associated resources.
- Operates only on stacks originally created by this tool.

Parameters:

operation: generate, deploy, describe, or delete
template_file: required for generate/deploy
cluster_name: required for all operations
- search_eks_troubleshoot_guide - Your troubleshooting expert Searches AWS EKS Troubleshoot Guide for relevant issue resolutions.

Features:

Provides solutions for common EKS issues (bootstrap, node autoscaling, etc.)
Suggests short-term fixes and long-term resolutions

Parameters:

query

Kubernetes Resource Tools ⚙️

manage_k8s_resource - Your Swiss Army knife for Kubernetes objects Manages any Kubernetes resource directly.

Features:

Supports create, replace, patch, delete, and read
Works with both namespaced and non-namespaced resources

Parameters:

operation, cluster_name, kind, api_version, name
namespace (optional), body (for create/replace/patch)
- list_k8s_resources - Your resource discovery tool Lists resources by type in a Kubernetes cluster.

Features:

Filters by namespace, label, or field selectors
Outputs metadata for matched resources

Parameters:

cluster_name, kind, api_version
namespace, label_selector, field_selector (all optional)
- apply_yaml - Your manifest deployment specialist Applies multi-resource YAML manifests to a cluster.

Features:

Accepts multi-document YAML files
Applies all resources within a specified namespace
Can force updates to existing resources

Parameters:

yaml_path, cluster_name, namespace, force
- list_api_versions your Kubernetes objects refrence Lists all API versions available in a Kubernetes cluster. Features:
- Includes both core (v1) and grouped (apps/v1, etc.) APIs
- Useful for compatibility checks and YAML generation
Parameters:
- cluster_name

Application Support Tools 🚀

generate_app_manifest - Your deployment template generator Generates basic Kubernetes manifests for your application.

Features:

Produces Deployment and Service YAML files
Configurable replicas, resources, load balancer, etc.

Parameters:

app_name, image_uri, output_dir
Optional: port, replicas, cpu, memory, namespace, load_balancer_scheme
- get_pod_logs - Your application debugger Retrieves logs from a specific pod.

Features:

Filter by time window, line count, or byte size
Supports logs from specific containers
Requires --allow-sensitive-data-access

Parameters:

cluster_name, pod_name, namespace
Optional: container_name, since_seconds, tail_lines, limit_bytes
- get_k8s_events - Your event investigator Fetches Kubernetes events for a resource.

Features:

Returns detailed info: timestamps, reasons, component, and type
Supports both namespaced and cluster-wide resources
Requires --allow-sensitive-data-access

Parameters:

cluster_name, kind, name
Optional: namespace

CloudWatch Integration Tools 📊

get_cloudwatch_logs - Your centralized logging assistant Fetches CloudWatch logs for specific EKS resources.

Features:

Query logs by time, resource type, name, filter patterns
Supports both infrastructure and application logs
Requires --allow-sensitive-data-access

Parameters:

cluster_name, log_type, resource_type
Optional: resource_name, minutes, start_time, end_time, limit, filter_pattern, fields
- get_cloudwatch_metrics - Your performance monitoring tool Fetches CloudWatch metrics for your workloads.

Features:

Query by metric name, namespace, dimensions
Configure range, granularity, and statistic
Supports custom dimensions

Parameters:

cluster_name, metric_name, namespace, dimensions
Optional: minutes, start_time, end_time, limit, stat, period
- get_eks_metrics_guidance Lists recommended metrics and dimensions for various EKS resource types.

Features:

Covers supported types: cluster, node, pod, namespace, service
Outputs available metrics, descriptions, and dimension mappings

Parameters:

resource_type

Implementation Note:
Generated from AWS Container Insights metrics using:

  uv pip install bs4
  python /scripts/update_eks_cloudwatch_metrics_guidance.py

IAM Integration 🔐

get_policies_for_role Retrieves policy details for an IAM role.

Features:

Includes assume role policy, managed policies, and inline policies

Parameters:

role_name
- add_inline_policy

Attaches a new inline policy to an IAM role.

Features:

Prevents accidental overwrite of existing policies
Accepts JSON policy document or list of statements
Requires --allow-write

Parameters:

role_name, policy_name, permissions

The Smart Design Philosophy

Why Unified Tools Instead of Separate Functions?

Traditional approaches would create individual tools for every Kubernetes resource type (pods, services, deployments, etc.). This would quickly overwhelm the AI's context window. Instead, the EKS MCP Server uses a clever approach:

Instead of:
- create_pod_tool
- create_service_tool  
- create_deployment_tool
- update_pod_tool
- update_service_tool
- ... (50+ tools)

We have:
- manage_k8s_resource (handles all CRUD operations)
- list_k8s_resources (handles all resource discovery)
- apply_yaml (handles manifest deployment)

This design keeps the context window manageable while providing comprehensive functionality.

Security and Governance: Balancing Power with Control

Understanding the Security Paradigm

When we talk about granting AI agents permissions to manage your cloud infrastructure, it's natural to have concerns. The AWS team has designed EKS MCP with a fundamental security principle in mind: MCP servers only have access to what you already have access to. They cannot magically access secrets from other accounts or perform actions beyond your existing permissions.

Think of it this way: the MCP server operates with the same level of access that you, as a developer, would have. It's essentially acting as an intelligent extension of your existing credentials, not as a privileged escalation tool.

Critical Security Considerations in Production

The Reality of AI-Powered Operations

During AWS's internal discussions, the team emphasized a crucial point: these tools are incredibly powerful, and that power requires responsibility,and as any Spider-Man fan knows: with great power comes great responsibility." 🕷️💻 As one AWS engineer put it during their live demo: "We are in some ways making it more powerful for them, making it easier for them to deploy... but again, make sure you check, please, Vibe coding and AI tools can take you far—but if you’re flying blind, you might also crash hard."

Production Environment Safeguards

The Golden Rule: When running MCP servers on production clusters, ⚠️🛑 always turn off auto-approvals for write operations. Here's why this matters:

In live demonstrations, AWS engineers showed scenarios where:

An incorrect API endpoint was automatically corrected ✅ (helpful)
But in another case, when an endpoint was wrong, the system also changed the container image saying "maybe use another image" and patched the deployment ❌ (potentially dangerous)

Recommendation: Approve write operations one by one in production environments to maintain control over what gets deployed.

Data Protection and Privacy

Redacting Sensitive Information

One of the most significant security features being implemented is automatic redaction of PII and sensitive data. This includes:

Passwords and secret keys
API tokens and credentials
Personal identifiable information
Sensitive configuration data

This data is redacted from both logs and AI model outputs, addressing concerns about secure data being passed to LLMs.

IAM Integration and Best Practices

Principle of Least Privilege in Practice

The MCP server follows AWS security best practices through:

Dedicated IAM roles designed specifically for MCP operations with minimal required permissions
Separate roles for read-only versus write operations
Resource tagging strategies to limit actions to MCP-managed resources
Regular permission audits using IAM Access Analyzer to identify and remove unused permissions

Kubernetes RBAC: Your Safety Net

Remember that even with proper IAM permissions, Kubernetes API access must be correctly configured. The MCP server operates within the same RBAC constraints that govern your manual kubectl operations.

Operational Security: The Human Element

The Importance of Vigilance

As AWS's product manager candidly shared: "I'm not an engineer by trade... I'm not exactly sure all of the guidelines that I need to make sure that I check. Since I'm not an engineer, I don't know what I don't know."

This honest admission highlights a critical point: monitoring and vigilance are essential. Whether you're a pro or new to Kubernetes world, always:

Review what's being deployed to your account
Understand the changes before approving them
Set up proper monitoring and alerting
Implement resource limits and quotas

Guardrails and Control Mechanisms

The MCP server includes several built-in safety features:

Resource validation before deploying infrastructure
Template verification to prevent arbitrary stack deletion
Allowlists and denylists for specific resources
Consent requirements for sensitive operations

Future Potential and AWS Vision: The Evolution of AI-Driven Infrastructure

Where We Are Today vs. Tomorrow

Currently, we're in in a "supervised state" with AI integrations, as AWS call it. As one AWS engineer noted: "We're not quite there yet for unsupervised agents just monitoring your clusters and making actions. It'll be some time before we fully trust agents."

But the trend is evident and the opportunities are vast.

Near-Term Evolution

Improved Remote Features

Obstacle: Some AI tooling doesn't work too well with remote MCP hosts.

Solution: The tendency in the industry is:
- Improved remote MCP server design

Pre-defined best practise templates
Automatic updating and maintenance
Enhanced reliability for distributed deployments

Agent to agent communication

One of the more promising ones is agent communication. Imagine agents that can:

Communicate with one another without direct user action
Partner with Delivery for complicated deployment scenarios – Discuss ideas and help others troubleshoot issues - Keep audit trails of all inter-agent operations

The Problem: What guidelines should you put in place so that agents are responsive while you still end up seeing the final results?

Addressing the Context Window Problem

The Current Limitation

As of now there is a restriction on addition of no of MCP tools IDE can work at a time. This poses a problem when you have to use the right tools for the job.

The Future Solution

AWS is exploring:

Dynamic tool, switch: It selects the correct MCP server depending on the current context Automatically generates tool switching, the default workflow tool selects the correct MCP server depending on the current context spinning up a dev environment outside an IDE, where you can directly modify project files.
Smart tool routing: Selection of the most appropriate tool based on context
-Standardised interfaces: MCP servers are easier to be interchanged and's more reliable.

Final Thoughts: The Pull of the Long-Term ValueError — The path from Supervised to Autonomous

Today: AI Help That’s Monitored

AI suggests actions
Humans review and approve
Clear audit trails
Safety nets and guardrails

Tomorrow: Smart Autonomous Operations

Proactive monitoring of the cluster health
Self-healing infrastructure
Predictive issue resolution
Oversight over humans with exception intervention

The Big Idea: Trust via Transparency

The journey to autonomy is not one of eliminating human overseers, but of designing AI systems so trustworthy, transparent and predictable that these overseers become strategic rather than simply tactical.

Benefiting Industry: Best Practices, Accelerated Innovation

The Feedback Loop Effect

Early feedback to AWS has revealed that supervisedcustomers are executing better practices when setting up MCP. This forms a positive feedback cycle:

AI advises the right moves → Better practice.rewire.
Improved applications → Better applications
Reliability of systems rise → More reliance on AI supporting us
More confidence- Greater acceptance of automation

Innovation Acceleration

Developers spend more time on: Here is how you spend your time more on the following and less on the previous section: Infrastructure complexity.
- Business logic and functionality
- User experience enhancements

Crafty problem-solver
Quick prototyping & iteration

Difficulties and Self-Reflection

Challenge of Summarization

As AWS engineers said while testing: ``When the LLM is trying to diagnose the problem it is asking multiple things and trying to summarize the result. Sometimes the summarization isn’t a match of what we intended to do.” *

Example: And in an EKS Auto Mode investigation, where the AI correctly figured out which policies were needed, it thought they should be added to the node role first, not the cluster role. On the second proofing, it fixed this.

The Challenge Ahead: Getting the balance on data for AI models right – enough for the right troubleshooting without clogging up the context window.

Problem Of Consistent Installation

Current problem: Not all MCP servers install the same, even on the same server config. The industry is heading toward standardization to try to make these interactions more predictable and reliable.

The Bigger Picture: Democratizing Cloud Expertise

The ultimate vision extends beyond just making Kubernetes easier. It's about:

Democratizing cloud expertise: Making advanced cloud capabilities accessible to developers regardless of their infrastructure background
Reducing the expertise gap: Helping junior developers learn through AI-guided practice
Improving security posture: Making security best practices the default, not the exception
Accelerating innovation: Removing infrastructure complexity as a barrier to creativity

The convergence of AI and cloud infrastructure management represents one of the most significant shifts in how we build and operate systems. Amazon EKS MCP is positioned at the forefront of this transformation, providing both the power to accelerate development and the guardrails to do so safely.

Conclusion

The Amazon EKS Model Context Protocol represents a transformative advancement in cloud-native development, fundamentally changing how developers interact with Kubernetes infrastructure. By bridging the gap between natural language and complex cluster operations, MCP democratizes access to enterprise-grade container orchestration while maintaining the security and operational excellence that AWS customers demand.

Key Benefits Realized

Accelerated Development Cycles: Reducing deployment times from hours to minutes
Lowered Barrier to Entry: Making Kubernetes accessible to developers of all skill levels
Enhanced Operational Excellence: Integrating best practices into every interaction
Improved Security Posture: Implementing security-by-default with granular controls
Cost Optimization: Intelligent resource management reducing unnecessary expenses

Strategic Implications

The introduction of MCP signals AWS's commitment to AI-driven infrastructure management, positioning the platform for the next generation of cloud-native applications. Organizations adopting MCP early will gain competitive advantages through:

Faster Time-to-Market: Reduced complexity in deployment pipelines
Improved Developer Satisfaction: Focus on business logic rather than infrastructure management
Enhanced Reliability: AI-assisted troubleshooting and preventive maintenance
Future-Proof Architecture: Foundation for emerging AI and ML workloads

What's Next?

To explore Amazon EKS MCP in your environment:

Start with Evaluation: Deploy MCP in read-only mode for risk-free exploration
Pilot Project: Choose a non-critical application for initial testing
Team Training: Invest in AI-assisted development practices
Gradual Adoption: Expand usage based on success metrics and team confidence
Community Engagement: Contribute feedback and use cases to shape future development

The convergence of artificial intelligence and cloud infrastructure management is no longer a future possibility—it's today's reality. Amazon EKS MCP provides the foundation for this transformation, enabling organizations to harness the full potential of AI-assisted development while maintaining the reliability, security, and scalability that modern applications demand.

Abstract

Table of Contents

Introduction

The Challenge

The Solution

What is Amazon EKS Model Context Protocol?

Why MCP Servers?

MCP in the EKS Ecosystem

Core Features and Capabilities

1. Kubernetes Resource Management

2. EKS Auto Mode Cluster Management

Automated Cluster Creation

3. Intelligent Troubleshooting Engine

4. Security-Centric Design

Default Read-Only Operation

Comparison With Traditional Approaches

The Reality Check: Before and After MCP

Enter MCP: The Game Changer

Real-World Impact: The Numbers Don't Lie

Use Cases and Real-World Examples

The "Vibe Coding" Revolution🎧💻

Use Case 1: The Startup Sprint - Multi-Tenant SaaS Deployment

The Traditional Nightmare 😰

The MCP Magic ✨

Use Case 2: The "Oh No!" Moment - Production Troubleshooting

The Traditional Detective Work 🕵️‍♀️

The MCP Superhero Moment 🦸‍♀️

How to Use MCP in EKS

Prerequisites: Getting Your Environment Ready For magic 🪄

🕵️‍♂️ Read-Only Permissions (For Observability and Safe Exploration)

✍️ Write Permissions (For Cluster Creation and Resource Deployment)

🔄 Accessing the Kubernetes API: What You Should Know

Setting Up Your AI Copilot

Option 1: Cursor IDE Setup (Recommended for Developers)

Option 2: Amazon Q Developer CLI Setup

Understanding Security Flags and Configurations 🔒

🔧 Common Command Arguments

--allow-write Flag

--allow-sensitive-data-access Flag

⚙️ Common Environment variables

🔊 FASTMCP_LOG_LEVEL (optional)

🔐 AWS_PROFILE (optional)

🌍 AWS_REGION (optional)

Best Practices for Safe MCP Usage

The "Production Safety" Checklist ✅

The "Developer Happiness" Checklist 😊

"It's Not Working!" - Common Issues and Solutions

Architecture and Visual Overview

How Everything Connects: The Big Picture

The Intelligence Behind the Simplicity

The Tools Under the Hood

Cluster Management Tools 🏗️

Kubernetes Resource Tools ⚙️

Application Support Tools 🚀

CloudWatch Integration Tools 📊

IAM Integration 🔐

The Smart Design Philosophy

Security and Governance: Balancing Power with Control

Understanding the Security Paradigm

Critical Security Considerations in Production

The Reality of AI-Powered Operations

Production Environment Safeguards

Data Protection and Privacy

Redacting Sensitive Information

IAM Integration and Best Practices

Principle of Least Privilege in Practice

Kubernetes RBAC: Your Safety Net

Operational Security: The Human Element

The Importance of Vigilance

Guardrails and Control Mechanisms

Future Potential and AWS Vision: The Evolution of AI-Driven Infrastructure

Where We Are Today vs. Tomorrow

Near-Term Evolution

Improved Remote Features

Agent to agent communication

Addressing the Context Window Problem

The Current Limitation

The Future Solution

Final Thoughts: The Pull of the Long-Term ValueError — The path from Supervised to Autonomous

Today: AI Help That’s Monitored

`--allow-write` Flag

`--allow-sensitive-data-access` Flag

🔊 `FASTMCP_LOG_LEVEL` (optional)

🔐 `AWS_PROFILE` (optional)

🌍 `AWS_REGION` (optional)