DEV Community

Cover image for I Connected 12 MCP Servers to Amazon Q. Here's What Broke

I Connected 12 MCP Servers to Amazon Q. Here's What Broke

πŸ‘‹ Hey there, tech enthusiasts!

I'm Sarvar, a Cloud Architect with a passion for transforming complex technological challenges into elegant solutions. With extensive experience spanning Cloud Operations (AWS & Azure), Data Operations, Analytics, DevOps, and Generative AI, I've had the privilege of architecting solutions for global enterprises that drive real business impact. Through this article series, I'm excited to share practical insights, best practices, and hands-on experiences from my journey in the tech world. Whether you're a seasoned professional or just starting out, I aim to break down complex concepts into digestible pieces that you can apply in your projects.

Let's dive in and explore the fascinating world of cloud technology together! πŸš€


Written from experience building AI agent integrations for AWS infrastructure management. Your mileage may vary, but the principles hold across different use cases.


Do We Still Need MCP When We Have Agent Skills?

As a cloud architect working with AI agents, I've spent the last few months exploring how to extend their capabilities. Specifically, I've been working with Amazon Q Developer Pro - AWS's AI assistant that helps with coding, infrastructure management, and cloud operations through a chat interface.

This article shares what I learned about two different approaches to extending AI agents: Model Context Protocol (MCP) and Agent Skills. While the specific implementation details are still evolving, the architectural patterns I describe here represent where the ecosystem is heading based on current capabilities and community standards.

The Problem I Was Solving

I needed Amazon Q Developer Pro to help me manage AWS infrastructure across multiple accounts. The agent needed to:

  • Check CloudWatch metrics
  • Query RDS databases
  • Send alerts to Slack
  • Generate cost reports
  • Review CloudFormation templates

I tried two approaches, and each had serious problems.

Approach 1: Connecting Everything via MCP

MCP is a standard protocol that lets AI agents connect to external tools. Think of it like USB ports on your computer - one standard interface that works with many devices.

I connected 12 MCP servers to Amazon Q Developer Pro:

  • AWS CloudWatch server
  • RDS query server
  • Slack messaging server
  • Cost Explorer server
  • GitHub server
  • And seven more

What Went Wrong

Before I even asked a question, 35% of the context window was already full. Each MCP server loaded all its available functions into memory. The CloudWatch server alone exposed 15 different functions with detailed parameter descriptions.

When I asked Amazon Q Developer Pro to "check if our xyz database is healthy," it had to scan through multiple function definitions to figure out which ones to use. Sometimes it picked the wrong ones.

Every function call and response consumed more context. After three or four operations, I was running out of space for the actual conversation.

Approach 2: Using Agent Skills

Agent Skills are different. Instead of connecting to external tools, you give the agent domain knowledge - a guide on how to think about a problem.

I created a skill called "database-health-check" with a simple file structure:

database-health-check/
  SKILL.md          (when to use this, what steps to follow)
  scripts/
    check_rds.py    (Python script to query RDS)
Enter fullscreen mode Exit fullscreen mode

The SKILL.md file contained:

# Database Health Check Skill

## Trigger
Keywords: "database health", "RDS status", "database performance", "DB issues"

## Process
1. Check CPU utilization (threshold: 80%)
2. Check active connections (threshold: 90% of max)
3. Check replication lag (threshold: 1000ms)
4. Check storage space (threshold: 85%)

## Decision Logic
- If CPU > 80%: WARN - "High CPU usage detected"
- If connections > 90%: CRITICAL - "Connection pool nearly exhausted"
- If replication lag > 1000ms: WARN - "Replication falling behind"
- If storage > 85%: CRITICAL - "Low storage space"

## Output Format
Status: [HEALTHY|WARN|CRITICAL]
Issues: [list of problems found]
Recommendations: [suggested actions]
Enter fullscreen mode Exit fullscreen mode

What Went Wrong

The skill worked perfectly on my laptop. Then my colleague tried to use it and got an error: "ModuleNotFoundError: No module named 'boto3'".

The Python script needed boto3, pandas, and psycopg2. My machine had them installed. His didn't. We had no standard way to declare or install these dependencies.

The Real Difference

After working with both, I realized they solve different problems:

MCP answers: "What can I do?"
It provides capabilities - functions the agent can call. Each MCP server is self-contained with its own dependencies and environment.

Skills answer: "How should I think?"
They provide expertise - decision logic, quality standards, and workflows. But they run in whatever environment the agent has.

The Solution: Use Both Together

Here's what actually works in real time. I'll use a real example from last week.

Example: Automated Cost Anomaly Detection

I needed Amazon Q Developer Pro to monitor our AWS costs and alert us when something unusual happens.

The MCP Layer (The Hands)

I set up two MCP servers:

  1. aws-cost-explorer-mcp - exposes functions like get_daily_costs(), get_service_breakdown()
  2. slack-notifications-mcp - exposes send_message(), create_incident()

Each server runs in its own Docker container with all dependencies installed. Amazon Q Developer Pro doesn't need to know about boto3 or the Slack SDK.

The Skill Layer (The Brain)

I created a cost-monitoring skill with this logic in SKILL.md:

Trigger: When user asks about cost anomalies or unusual spending

Process:
1. Get last 30 days of daily costs
2. Calculate average and standard deviation
3. Check if today's cost is more than 2 standard deviations above average
4. If yes, get service breakdown to identify which service spiked
5. If spike is over $500, send high-priority Slack alert
6. If spike is under $500, just report in conversation

Quality checks:
- Always compare against same day of week (Monday vs Monday)
- Exclude known scheduled events (monthly backups, etc.)
- Include percentage change, not just absolute numbers
Enter fullscreen mode Exit fullscreen mode

How They Work Together

When I ask Amazon Q Developer Pro: "Are our AWS costs normal today?"

  1. Amazon Q Developer Pro matches the question to the cost-monitoring skill
  2. The skill loads its SKILL.md (only 2KB of context)
  3. The skill requires cost data, so Amazon Q Developer Pro connects to the aws-cost-explorer-mcp server
  4. The skill orchestrates: call get_daily_costs(), analyze the data, decide if it's anomalous
  5. If anomalous, the skill calls the slack-notifications-mcp server
  6. After completion, the skill unloads from memory

The MCP servers handle the technical execution. The skill handles the business logic and decision-making.

Why This Architecture Works

Context Efficiency
Only the active skill loads into memory. MCP tools load on-demand when the skill needs them. I went from 35% context consumed at startup to 5%.

Portability
The same skill works on my laptop, my colleague's Windows machine, and our CI/CD pipeline. The MCP servers can run locally, in containers, or as remote services.

Reusability
The aws-cost-explorer-mcp server is used by three different skills:

  • cost-monitoring (detects anomalies)
  • budget-planning (forecasts spending)
  • cost-optimization (finds savings opportunities)

Each skill brings different expertise, but they share the same data source.

Maintainability
When AWS changes their Cost Explorer API, I update one MCP server. All skills continue working. When business rules change (new alert thresholds), I update the skill. The MCP servers don't need to change.

Visual Overview

Here's how the three layers work together:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Agent Layer (Amazon Q Developer Pro)  β”‚
β”‚   Matches tasks β†’ Loads skills          β”‚
β”‚   Connects to MCP servers               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Skill Layer (The Brain)         β”‚
β”‚   cost-monitoring.SKILL.md              β”‚
β”‚   - When to trigger                     β”‚
β”‚   - Decision logic                      β”‚
β”‚   - Quality checks                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         MCP Layer (The Hands)           β”‚
β”‚   aws-cost-explorer-mcp                 β”‚
β”‚   slack-notifications-mcp               β”‚
β”‚   - Tool execution                      β”‚
β”‚   - Environment isolation               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

A Practical Pattern

Here's the decision framework I use:

Put it in an MCP server if:

  • Multiple skills will use it
  • It has heavy dependencies (Python libraries, system tools)
  • It needs credentials or secrets
  • It's a stable, reusable capability

Put it in a skill if:

  • It's domain knowledge or business logic
  • It orchestrates multiple tools
  • It has conditional decision-making
  • It's specific to one workflow

Keep it as a simple script if:

  • It's a one-time operation
  • The entire workflow is tightly coupled
  • Splitting it would add unnecessary complexity

What's Still Missing

The ecosystem isn't mature yet. Here's what I wish existed:

Dependency Declaration
Skills need a standard way to say "I need these capabilities" without hardcoding specific MCP servers. Something like:

requires:
  - cloud_metrics
  - notifications
Enter fullscreen mode Exit fullscreen mode

Then the runtime figures out which MCP servers provide those capabilities.

Dynamic Loading
Right now, when you connect an MCP server, all its tools load immediately. I want skills to control this: "Load only the cost analysis tools for this task."

Graceful Fallback
If an MCP server is unavailable, the skill should automatically fall back to a built-in script or tell me clearly what's missing.

Conclusion

After six months of exploring these patterns with Amazon Q Developer Pro and other AI agents, here's what I know:

MCP and Agent Skills aren't competing approaches. They're complementary layers of the same system.

MCP gives your agent reliable, isolated capabilities. Skills give your agent the expertise to use those capabilities intelligently.

You need both. MCP without skills is a toolbox with no craftsman. Skills without MCP are expertise with unreliable tools.

The architecture that works:

  • Skills encode what to do and when
  • MCP servers provide how to do it
  • The agent runtime connects them together

This isn't just theoretical. These patterns are being implemented in actual environments, managing real AWS infrastructure.

Getting Started

If you want to explore this architecture:

For MCP:

  • Check the official MCP specification at modelcontextprotocol.io
  • Browse existing MCP servers at github.com/modelcontextprotocol/servers
  • Amazon Q Developer supports MCP through the Q CLI

For Agent Skills:

  • Review the Agent Skills specification by Anthropic
  • Start with simple skills that encode your team's operational knowledge
  • Focus on decision logic and workflows, not heavy computation

Next Steps:

  1. Identify one repetitive task you do with your AI agent
  2. Ask: Does this need external tools (MCP) or decision logic (Skill)?
  3. Build the simplest version that works
  4. Iterate based on what you learn

The ecosystem is still maturing, but the core patterns are solid. Start small, learn from erros, and build up from there.


πŸ“Œ Wrapping Up

Thank you for reading! I hope this article gave you practical insights and a clearer perspective on the topic.

Was this helpful?

  • ❀️ Like if it added value
  • πŸ¦„ Unicorn if you’re applying it today
  • πŸ’Ύ Save for your next optimization session
  • πŸ”„ Share with your team

Follow me for more on:

  • AWS architecture patterns
  • FinOps automation
  • Multi-account strategies
  • AI-driven DevOps

πŸ’‘ What’s Next

More deep dives coming soon on cloud operations, GenAI, Agentic-AI, DevOps, and data workflows follow for weekly insights.


🌐 Portfolio & Work

You can explore my full body of work, certifications, architecture projects, and technical articles here:

πŸ‘‰ Visit My Website


πŸ› οΈ Services I Offer

If you're looking for hands-on guidance or collaboration, I provide:

  • Cloud Architecture Consulting (AWS / Azure)
  • DevSecOps & Automation Design
  • FinOps Optimization Reviews
  • Technical Writing (Cloud, DevOps, GenAI)
  • Product & Architecture Reviews
  • Mentorship & 1:1 Technical Guidance

🀝 Let’s Connect

I’d love to hear your thoughts drop a comment or connect with me on LinkedIn.

For collaborations, consulting, or technical discussions, feel free to reach out directly at simplynadaf@gmail.com

Happy Learning πŸš€

Top comments (0)