Most GenAI use cases today focus on product teams. Build a customer chatbot. Generate marketing copy. Develop a new product feature.
But DevOps, Site Reliability Engineering (SRE), and Cloud Center of Excellence (CCoE) teams have use cases too. Investigate an incident. Create a runbook. Generate cost optimization recommendations.
These are repetitive tasks that take time away from reliability improvements.
It's not that operations teams don't see the potential of GenAI. They're waiting for something useful — something that fits into their actual workflows, with code they can deploy and evaluate.
The gap is relevance, not readiness. What's missing is:
- Practical use cases matched to real operational tasks
- Deployable code samples that are production-ready
- Flexible patterns that can be customized
The GenAI for Ops Demo Library was created to address this.
Introducing the GenAI for Ops Demo Library
The GenAI for Ops Demo Library is a collection of deployable code samples that demonstrate how generative AI can solve real operational challenges across security, cost optimization, resilience, and automation use cases. You can deploy each demo as-is or customize them to your environment.
There are currently 12 available demos:
| Use Case | Demos |
|---|---|
| Security | AI-Powered Security Posture with Prowler + DevOps Agent, AI Incident Response Playbook Builder |
| Cost Optimization | AI-Powered Graviton Migration Assessment, AWS GenAI Cost Optimization Kiro Power |
| Operations Automation | AI-Powered Technical Documentation Generation, AI-Powered Legacy System Automation, AI Password Reset Chatbot, AWS Services Lifecycle Tracker, AI Lambda Runtime Migration Assistant |
| Observability | Intelligent EKS Incident Investigation with Amazon DevOps Agent, Intelligent AWS Site-to-Site VPN Tunnel Investigation with Amazon DevOps Agent |
| Resilience | Natural Language Chaos Engineering with AWS FIS |
Technical Stack
Each demo is built on AWS services and AI integration patterns familiar to operations teams:
- Amazon CloudWatch for metrics, logs, and alarms
- AWS Lambda for serverless compute
- Amazon Simple Notification Service (SNS) for event routing
- AWS Cloud Development Kit (CDK) for infrastructure as code
- Amazon Bedrock and Amazon Nova for foundation model access
- Amazon Bedrock AgentCore for multi-step AI orchestration
- Model Context Protocol (MCP) servers for standardized tool integration
Demo Structure
Additionally, each demo includes a deployment guide, technical design document, deployment script(s), and cost estimates with optimization tips.
To show how these demos work in practice, here's a walkthrough of one.
Example: Site-to-Site VPN Tunnel Investigation with AWS DevOps Agent
AWS Site-to-Site VPN tunnels fail for a lot of reasons: pre-shared key mismatches, IKE proposal incompatibilities, dead-peer-detection timeouts, Border Gateway Protocol (BGP) session drops, route withdrawals, throughput degradation. When a tunnel goes down at 2:00 AM, your on-call SRE has to read through CloudWatch metrics, VPN tunnel logs, and IPsec config to figure out what happened. That takes time and negatively impacts your Mean Time to Resolution (MTTR). This demo shows how AWS DevOps Agent autonomously triages these and other incidents, providing root cause analysis and actions for resolution.
Overview
The demo deploys a self-contained VPN environment and creates a DevOps Agent Space to investigate failures automatically.
When a tunnel fails or performance drops, DevOps Agent:
- Reads VPN tunnel logs from CloudWatch and correlates metrics across both tunnels
- Queries a self-contained MCP server for business context (service dependencies, cost impact, compliance status)
- Produces a root cause analysis (RCA) and detailed mitigation plan
Architecture
The demo has three layers:
Network layer
- An Amazon Virtual Private Cloud (VPC) (10.0.0.0/16) and a simulated on-premises VPC (172.16.0.0/16) linked by a Site-to-Site VPN with two IPsec tunnels
- An Amazon EC2 instance customer gateway running Libreswan for IPsec and GoBGP for BGP on Amazon Linux 2023
Monitoring layer
- CloudWatch alarms to monitor the tunnel state, performance, and other failures
- An SNS topic to trigger a Lambda function that sends a webhook to DevOps Agent
Intelligence layer
- A DevOps Agent Space for DevOps Agent to access resources and investigate VPN operational issues
How it Works
Tunnel Fails / Performance Degrades
↓
CloudWatch Alarm Changes State
↓
SNS Notification Received
↓
Lambda Function Invoked
↓
DevOps Agent Investigation Starts
↓
Investigation Completes
→ Root Cause Identified
→ Remediation Plan Generated
Common Failure Scenarios
The demo includes 10 failure scenarios to inject and watch DevOps Agent investigate:
IKE
- PSK mismatch (key rotation gone wrong)
- DPD timeout (firewall blocking IKE traffic)
- Proposal mismatch (incompatible DH group)
- Traffic selector mismatch (subnet change breaking BGP)
- Tunnel shutdown (customer gateway-initiated teardown)
BGP
- BGP daemon down
- ASN mismatch after maintenance
- Hold timer expired (blocked keepalives)
Other
- BGP route withdrawal (prefix no longer advertised)
- Throughput degradation (performance drops while tunnels stay up)
The Results
Faster incident resolution. Autonomous investigation of VPN failures and performance degradation reduces MTTR from hours to minutes
Fewer repeat incidents. Targeted recommendations address incident root causes and strengthen VPN tunnel resilience
Greater operational efficiency. Less time spent on repetitive investigations and more time spent on high-value work
Cost Estimate
Each demo is built with AWS Well-Architected Framework Cost Optimization pillar in mind, so running costs stay minimal.
| Resource | Hourly Cost |
|---|---|
| VPN connection (1.25 Gbps) | $0.05 |
| 2× t3.micro EC2 instances | $0.03 |
| 4× Public IPv4 addresses | $0.02 |
| 4× CloudWatch alarms | < $0.01 |
| Lambda, SNS, CloudWatch | < $0.01 |
| Total | ~$0.12/hour |
This specific demo is designed to be deployed, tested, and torn down. If left running continuously, the monthly cost is estimated to be ~$88/month ($0.12 × 730 hours).
Get Started
- Explore: Browse the demo library and choose a demo that aligns with your use case
- Try: Deploy the demo in your AWS account
- Contribute: Submit a pull request with your demo
- Feedback: Take the quick survey and share your feedback
Top comments (0)