ClawGear

Posted on May 11

35 ChatGPT Prompts for DevOps Engineers (Incidents, Docs, and Architecture Faster)

#devops #chatgpt #productivity #programming

DevOps engineers sit at the intersection of code, infrastructure, reliability, and developer experience — which means the writing workload is substantial. Runbooks, incident post-mortems, architecture decision records, on-call handoff notes, pipeline documentation, SLO definitions, and stakeholder updates are all regular deliverables. And they all compete with the actual infrastructure work.

ChatGPT won't debug your Terraform config or read your Prometheus alerts. But it can dramatically reduce the time you spend on the communication and documentation layer that makes the difference between an ops team that's trusted and one that's always catching up.

These 35 prompts are organized around the actual work of DevOps: incident management, infrastructure documentation, CI/CD, observability, security, and team communication.

Security note: Never paste real credentials, API keys, internal hostnames, IP addresses, or sensitive infrastructure details into ChatGPT. Use placeholders and anonymized examples in all prompts.

1. Incident Management

Prompt 1 — Incident Command Script

Write a structured incident command script for an on-call engineer taking over a [P1 / P2] incident. The incident type: [service outage / database degradation / deployment failure / security event]. Include: how to establish command, what initial questions to ask, how to organize the response channel, when to escalate, and how to structure status updates. Format as a repeatable playbook.

Prompt 2 — Status Update Template

Create a status update template for use during an active incident. The update should communicate: current status (investigating / identified / mitigating / resolved), what we know, what we're doing, customer impact, and next update time. Format: one for an internal Slack channel, one for an external status page. Keep both under 150 words.

Prompt 3 — Post-Mortem Report

Write a blameless post-mortem for an incident where [describe what happened — use generic terms]. Include sections: incident summary, timeline (I'll add specifics), root cause analysis, customer impact, what went well, what went wrong (process/tooling gaps), action items (with owners and due dates), and lessons learned. Format follows the Google SRE post-mortem style.

Prompt 4 — Incident Timeline Reconstruction

I have these rough notes from an incident: [paste anonymized notes in chronological order]. Convert them into a clean incident timeline with: timestamp, event description, who took action, and the observable impact at each step. Flag any time gaps where we have unclear coverage.

Prompt 5 — Escalation Decision Criteria

Write a decision guide for on-call engineers on when to escalate an incident. Cover: criteria for escalating from Tier 1 to Tier 2, criteria for engaging a second service team, when to wake up a manager or executive, and how to structure an escalation message. Include a quick-reference checklist format.

Prompt 6 — On-Call Handoff Notes

Write an on-call handoff notes template for shift transition. Include: current open incidents (status, severity, next actions), known flaky alerts to watch for this week, recent deployments that might cause issues, any ongoing maintenance windows, and escalation contacts. Format: readable in 3 minutes by the incoming on-call engineer.

2. Infrastructure Documentation

Prompt 7 — Architecture Decision Record (ADR)

Write an Architecture Decision Record for this infrastructure decision: [describe the decision]. Include: context (why this decision needed to be made), decision drivers, options considered (at least 2, with pros/cons), decision made, rationale, consequences (positive and negative), and status. Format: standard ADR markdown.

Prompt 8 — Runbook: Service Recovery

Write a runbook for recovering [service name — use placeholder] when [failure scenario: e.g., the service is down / database connections are exhausted / the message queue is backing up]. Include: prerequisites, detection steps, investigation commands (generic placeholders), recovery steps in order, verification that recovery was successful, and rollback procedure if recovery makes things worse.

Prompt 9 — Infrastructure Component Overview

Write a documentation overview for a [component: load balancer / message queue / caching layer / service mesh / API gateway] in our infrastructure. Include: what it does, why we use it, how it fits into the overall architecture, key configuration parameters to know, how to monitor it, and how to troubleshoot common issues. Audience: new team member onboarding.

Prompt 10 — Disaster Recovery Plan Section

Write the [section: backup strategy / recovery procedures / RTO/RPO definitions / communication plan] section of a disaster recovery plan for a [web application / database / data pipeline]. Include: specific steps, responsible parties, decision trees, and success verification criteria. I'll review against our actual environment before finalizing.

Prompt 11 — Terraform Module Documentation

Write documentation for a Terraform module that [describes what the module does]. Include: module purpose, inputs (name, type, description, default, required), outputs, usage example (use generic placeholder values), important notes and gotchas, and dependencies. Format: standard Terraform module README structure.

3. CI/CD and Deployment

Prompt 12 — Deployment Checklist

Create a pre-deployment checklist for deploying [service type] to [production / staging]. Include: pre-deployment checks (tests passing, review complete, documentation updated, feature flags configured), deployment steps (in order), post-deployment verification steps, rollback trigger criteria, and rollback procedure. Format: runnable checklist with responsible party noted.

Prompt 13 — Pipeline Documentation

Write documentation for a CI/CD pipeline for [application type]. The pipeline stages are: [list stages: build / test / scan / deploy / notify]. For each stage: describe what it does, what tools are used (generic), what constitutes a failure, and what happens when a stage fails. Include a pipeline diagram in text/ASCII format.

Prompt 14 — Release Notes Template

Create a release notes template for engineering team releases. Include sections for: version and release date, what's new (features), bug fixes, breaking changes (with migration steps), infrastructure changes, deprecations, and known issues. Make it easy for engineers to fill out quickly and for stakeholders to scan.

Prompt 15 — Feature Flag Strategy Document

Write a feature flag strategy document for our team. Cover: when to use feature flags vs. branching, naming conventions, flag lifecycle (creation, rollout, cleanup), who can change flags in production, how to monitor flag state, and flag hygiene policies. This will be added to our engineering handbook.

4. Observability and Monitoring

Prompt 16 — SLO Definition Document

Write a Service Level Objective definition document for [service]. Include: service description, user journey being measured, SLI (what we measure), SLO target (what we commit to), error budget calculation, how we measure it (tooling placeholder), reporting cadence, and what happens when the error budget is exhausted. Follow Google SRE SLO conventions.

Prompt 17 — Alert Triage Guide

Create an alert triage guide for the [alert name / type] alert in our monitoring system. Include: what this alert means, common causes (ranked by frequency), initial investigation steps, how to determine severity, resolution paths for each common cause, and when to escalate. Format as a decision tree or numbered investigation flow.

Prompt 18 — Dashboard Documentation

Write documentation for a monitoring dashboard for [service / system]. Include: the purpose of the dashboard, who should use it, what each panel shows and why it matters, what normal looks like vs. signs of trouble, and how to use the dashboard during an incident. Audience: on-call engineers and team leads.

Prompt 19 — Capacity Planning Report

Write a capacity planning report section for [service / infrastructure component]. Include: current usage vs. capacity, growth trend (based on data I'll provide), projected time to capacity at current growth rate, recommended action (scale up / optimize / migrate), and cost implications. Audience: engineering leadership.

Prompt 20 — Observability Improvement Proposal

Write a proposal for improving observability for [service / system]. Current gaps: [describe what's missing — no distributed tracing / poor log structure / missing metrics / no SLOs]. Proposed improvements: [describe]. Include: the problem (what incidents or issues we can't diagnose well now), the solution, implementation phases, and expected outcome.

5. Security and Compliance

Prompt 21 — Security Incident Response Playbook

Write a security incident response playbook for [scenario: credential exposure / unauthorized access detected / dependency vulnerability / data exfiltration alert]. Include: detection signals, immediate containment steps, investigation checklist, notification requirements (security team / legal / exec), evidence preservation, remediation steps, and post-incident review.

Prompt 22 — Vulnerability Remediation Communication

Write an internal communication to the engineering team about a [high / critical] severity vulnerability in [dependency / infrastructure component — use generic name]. Include: what the vulnerability is, which systems are affected, the risk if left unpatched, the remediation plan (timeline and owner), and what developers need to do (if anything). Tone: clear and urgent without causing panic.

Prompt 23 — Access Review Documentation

Create a documentation template for a quarterly access review for [system/service]. Include: scope of the review, how to pull current access list, criteria for revoking access (role change, inactivity, departure), documentation of decisions, sign-off requirements, and audit trail. This will be used by the security or platform team.

Prompt 24 — Secret Rotation Runbook

Write a runbook for rotating [credential type: API key / database password / TLS certificate / service account key] for [service placeholder]. Include: when to rotate (scheduled, triggered by incident, or compromised), pre-rotation checklist, rotation steps (generic), verification steps, rollback procedure if the rotation breaks something, and post-rotation documentation.

6. Developer Experience and Team Communication

Prompt 25 — Engineering Handbook Section

Write an engineering handbook section on [topic: our deployment process / on-call expectations / how we use Slack / incident severity definitions / how to request infrastructure changes]. This will live in our internal wiki. Audience: all engineers, including new hires. Keep it practical, current, and easy to update over time.

Prompt 26 — Tech Debt Proposal

Write a proposal for addressing technical debt in [area: deployment pipeline / monitoring stack / infrastructure provisioning / legacy service]. Current pain: [describe]. Proposed solution: [describe]. Include: business case (what this costs us now in engineer time or risk), proposed approach, effort estimate (rough), and what "done" looks like. Audience: engineering leadership.

Prompt 27 — Infrastructure Cost Report Narrative

Write a monthly infrastructure cost report narrative for engineering leadership. Current spend: [total, by major category — use placeholders]. Trend vs. prior month: [up/down/flat by X%]. Biggest drivers: [describe]. Cost optimization opportunities identified: [list]. Recommended actions: [list]. Keep it to one page — decision-ready.

Prompt 28 — New Engineer Onboarding Guide: Infrastructure

Write an infrastructure onboarding guide for a new software engineer joining the team. Cover: overview of our infrastructure architecture (brief), how to get access to key systems, key tools they'll use (generic list), how deployments work (our workflow), where to find runbooks and documentation, and who to ask for help with different areas. Readable in 15 minutes.

Prompt 29 — Weekly Platform Update

Write a weekly platform / infrastructure update to share with the engineering team. Items to cover: [list: planned maintenance, recent incidents and resolutions, upcoming changes that affect developers, metrics highlights, new tooling or process changes]. Format: scannable in 2 minutes. Tone: informative and collegial.

7. Career and Professional Growth

Prompt 30 — DevOps Interview Prep

I'm interviewing for a [DevOps Engineer / SRE / Platform Engineer / Infrastructure Engineer] role at a [company type]. Generate 10 technical interview questions covering: CI/CD, infrastructure as code, observability, incident management, and cloud platforms. Add 5 behavioral/situational questions. For each, give me a framework for answering — not a scripted response.

Prompt 31 — Certification Study Plan

Create a 12-week study plan for the [AWS Solutions Architect / CKA / GCP Professional Cloud Architect / HashiCorp Terraform Associate / other] certification. Include: week-by-week topic breakdown, primary study resources (official docs + practice exams), hands-on lab focus areas, and a final-week review strategy.

Prompt 32 — Conference Talk Abstract

Write a talk abstract for a DevOps or SRE conference. Topic: [describe your experience — a tool you built / an incident you learned from / an architectural decision you made]. Include: title (under 10 words), 150-word abstract, 3 key takeaways, and target audience (beginner / intermediate / advanced DevOps/SRE). Avoid buzzwords.

Prompt 33 — Brag Document / Achievements Summary

Help me write a brag document for performance review season. My accomplishments this period: [list rough bullets — incidents resolved, automation built, cost savings achieved, reliability improvements, etc.]. For each: reframe it in terms of business impact (reduced MTTR by X%, saved $Y, enabled Z teams to ship faster). Tone: confident and specific.

Prompt 34 — Engineering Blog Post Intro

Write the opening 200 words for an engineering blog post about [topic: how we reduced deployment time / how we rebuilt our monitoring / how we handled a major incident / what we learned from a postmortem]. The intro should: hook the reader with the problem or challenge, establish why it matters, and tease the solution. Tone: direct and technical, but accessible.

Prompt 35 — Platform Team Charter

Write a platform team charter for a DevOps / SRE / platform engineering team. Include: team mission statement, what we own (services, systems, processes), what we don't own, our key stakeholders, how we take requests, our operational commitments (SLOs for the platform team itself), and our guiding principles. This will be shared with all engineering teams.

Getting the Most From These Prompts

Sanitize infrastructure details. Real hostnames, IPs, credentials, internal service names, and account IDs should never go into ChatGPT. Use placeholder names like [web-service], [db-host], and [env: production]. The output will be just as useful.

Use it for documentation velocity. The highest-leverage use in DevOps is documentation: runbooks, ADRs, post-mortems, and handoff notes. These are the things that most ops engineers deprioritize — and the things that cause the most pain 6 months later. Spend 15 minutes writing rough notes, then let ChatGPT turn them into polished docs.

Combine with your actual context. The more specific your input (what technology, what scale, what failure mode), the more useful the output. "Write a runbook for database connection pool exhaustion" is better than "write a runbook for a database issue."

Iterate on technical accuracy. ChatGPT will give you a solid structure and reasonable content, but you'll need to verify technical details (command syntax, configuration specifics, tool versions) against your actual environment and documentation.