Alexander Pazik

Posted on Jun 26

GenAI Isn't Just for Product Teams

#ai #aws #devops

Most GenAI use cases today focus on product teams. Build a customer chatbot. Generate marketing copy. Develop a new product feature.

But DevOps, Site Reliability Engineering (SRE), and Cloud Center of Excellence (CCoE) teams have use cases too. Investigate an incident. Create a runbook. Generate cost optimization recommendations.

These are repetitive tasks that take time away from reliability improvements.

It's not that operations teams don't see the potential of GenAI. They're waiting for something useful — something that fits into their actual workflows, with code they can deploy and evaluate.

The gap is relevance, not readiness. What's missing is:

Practical use cases matched to real operational tasks
Deployable code samples that are production-ready
Flexible patterns that can be customized

The GenAI for Ops Demo Library was created to address this.

Introducing the GenAI for Ops Demo Library

The GenAI for Ops Demo Library is a collection of deployable code samples that demonstrate how generative AI can solve real operational challenges across security, cost optimization, resilience, and automation use cases. You can deploy each demo as-is or customize them to your environment.

There are currently 12 available demos:

Use Case	Demos
Security	AI-Powered Security Posture with Prowler + DevOps Agent, AI Incident Response Playbook Builder
Cost Optimization	AI-Powered Graviton Migration Assessment, AWS GenAI Cost Optimization Kiro Power
Operations Automation	AI-Powered Technical Documentation Generation, AI-Powered Legacy System Automation, AI Password Reset Chatbot, AWS Services Lifecycle Tracker, AI Lambda Runtime Migration Assistant
Observability	Intelligent EKS Incident Investigation with Amazon DevOps Agent, Intelligent AWS Site-to-Site VPN Tunnel Investigation with Amazon DevOps Agent
Resilience	Natural Language Chaos Engineering with AWS FIS

Technical Stack

Each demo is built on AWS services and AI integration patterns familiar to operations teams:

Amazon CloudWatch for metrics, logs, and alarms
AWS Lambda for serverless compute
Amazon Simple Notification Service (SNS) for event routing
AWS Cloud Development Kit (CDK) for infrastructure as code
Amazon Bedrock and Amazon Nova for foundation model access
Amazon Bedrock AgentCore for multi-step AI orchestration
Model Context Protocol (MCP) servers for standardized tool integration

Demo Structure

Additionally, each demo includes a deployment guide, technical design document, deployment script(s), and cost estimates with optimization tips.

To show how these demos work in practice, here's a walkthrough of one.

Example: Site-to-Site VPN Tunnel Investigation with AWS DevOps Agent

AWS Site-to-Site VPN tunnels fail for a lot of reasons: pre-shared key mismatches, IKE proposal incompatibilities, dead-peer-detection timeouts, Border Gateway Protocol (BGP) session drops, route withdrawals, throughput degradation. When a tunnel goes down at 2:00 AM, your on-call SRE has to read through CloudWatch metrics, VPN tunnel logs, and IPsec config to figure out what happened. That takes time and negatively impacts your Mean Time to Resolution (MTTR). This demo shows how AWS DevOps Agent autonomously triages these and other incidents, providing root cause analysis and actions for resolution.

Overview

The demo deploys a self-contained VPN environment and creates a DevOps Agent Space to investigate failures automatically.

When a tunnel fails or performance drops, DevOps Agent:

Reads VPN tunnel logs from CloudWatch and correlates metrics across both tunnels
Queries a self-contained MCP server for business context (service dependencies, cost impact, compliance status)
Produces a root cause analysis (RCA) and detailed mitigation plan

Architecture

The demo has three layers:

Network layer

An Amazon Virtual Private Cloud (VPC) (10.0.0.0/16) and a simulated on-premises VPC (172.16.0.0/16) linked by a Site-to-Site VPN with two IPsec tunnels
An Amazon EC2 instance customer gateway running Libreswan for IPsec and GoBGP for BGP on Amazon Linux 2023

Monitoring layer

CloudWatch alarms to monitor the tunnel state, performance, and other failures
An SNS topic to trigger a Lambda function that sends a webhook to DevOps Agent

Intelligence layer

A DevOps Agent Space for DevOps Agent to access resources and investigate VPN operational issues

How it Works

Tunnel Fails / Performance Degrades
             ↓
  CloudWatch Alarm Changes State
             ↓
    SNS Notification Received
             ↓
     Lambda Function Invoked
             ↓
DevOps Agent Investigation Starts
             ↓
     Investigation Completes
     → Root Cause Identified
     → Remediation Plan Generated

Common Failure Scenarios

The demo includes 10 failure scenarios to inject and watch DevOps Agent investigate:

IKE

PSK mismatch (key rotation gone wrong)
DPD timeout (firewall blocking IKE traffic)
Proposal mismatch (incompatible DH group)
Traffic selector mismatch (subnet change breaking BGP)
Tunnel shutdown (customer gateway-initiated teardown)

BGP

BGP daemon down
ASN mismatch after maintenance
Hold timer expired (blocked keepalives)

Other

BGP route withdrawal (prefix no longer advertised)
Throughput degradation (performance drops while tunnels stay up)

The Results

Faster incident resolution. Autonomous investigation of VPN failures and performance degradation reduces MTTR from hours to minutes

Fewer repeat incidents. Targeted recommendations address incident root causes and strengthen VPN tunnel resilience

Greater operational efficiency. Less time spent on repetitive investigations and more time spent on high-value work

Cost Estimate

Each demo is built with AWS Well-Architected Framework Cost Optimization pillar in mind, so running costs stay minimal.

Resource	Hourly Cost
VPN connection (1.25 Gbps)	$0.05
2× t3.micro EC2 instances	$0.03
4× Public IPv4 addresses	$0.02
4× CloudWatch alarms	< $0.01
Lambda, SNS, CloudWatch	< $0.01
Total	~$0.12/hour

This specific demo is designed to be deployed, tested, and torn down. If left running continuously, the monthly cost is estimated to be ~$88/month ($0.12 × 730 hours).

Get Started

Explore: Browse the demo library and choose a demo that aligns with your use case
Try: Deploy the demo in your AWS account
Contribute: Submit a pull request with your demo
Feedback: Take the quick survey and share your feedback

DEV Community