Olivia Craft

Posted on Apr 23

Cursor Rules for DevOps: Infrastructure, CI/CD, and Container Rules That Ship

#devops #cursor #ai #docker

Cursor Rules for DevOps: Infrastructure, CI/CD, and Container Rules That Ship

Ask Cursor for a Dockerfile and you get FROM node:latest, COPY . . before the install, CMD npm start running as root. Ask for a Terraform module and you get aws_s3_bucket with no versioning, no encryption, no block-public-access. Ask for a GitHub Actions workflow and you get uses: actions/checkout@v4 (unpinned), with: { token: ${{ secrets.PAT }} } exposed to every job, and permissions: write-all at the top. Ask for a Kubernetes Deployment and you get no resource limits, no liveness probe, and imagePullPolicy: Always pointing at the :latest tag.

Every one of those defaults is what pages someone at 3am.

DevOps is the discipline where mistakes don't fail locally — they fail in production, under load, with a blast radius. AI assistants trained on a decade of public infrastructure code have seen every anti-pattern, and they will happily reproduce them. Seven rules below for Terraform, Docker, Kubernetes, GitHub Actions, and monitoring — each with the failure mode and the fix.

How Cursor Rules Work for Infrastructure Code

Use .cursor/rules/*.mdc with globs targeting infra file patterns:

.cursor/rules/
  terraform.mdc       # globs: ["**/*.tf", "**/*.tfvars"]
  docker.mdc          # globs: ["**/Dockerfile*", "**/docker-compose*.yml"]
  kubernetes.mdc      # globs: ["k8s/**/*.yaml", "helm/**/*.yaml"]
  github-actions.mdc  # globs: [".github/workflows/*.yml"]
  monitoring.mdc      # globs: ["**/prometheus*.yml", "**/alerts/*.yaml"]

alwaysApply: false for all — these should fire only when the relevant files are open. Now the rules.

Rule 1: Terraform — State, Backends, and No `local` State

The default Terraform quick-start stores state in terraform.tfstate next to your .tf files. Commit it by accident, and anyone who clones the repo sees every secret, every resource ID, every aws_access_key. Don't commit it, and two engineers run terraform apply on different machines and overwrite each other's changes.

The rule:

- Remote backend required. S3 with DynamoDB lock for AWS; GCS with
  terraform-lock for GCP; Azure Blob for Azure. No `local` backend
  in committed configs, ever.
- Backend config NEVER contains secrets. Use `-backend-config=...`
  at init time, or environment vars.
- State files are encrypted at rest (SSE-KMS) and access-logged.
- Every resource has a `tags` block (owner, environment, cost-center).
- Destructive changes (delete, replace) require a separate PR labeled
  `destructive:`.

Before: backend "local" in committed code, terraform apply from a dev laptop.
After: S3 backend with DynamoDB lock, CI is the only thing that runs apply, and a failing terraform plan blocks the PR.

Rule 2: Docker — Multi-Stage, Non-Root, Pinned, Small

The five Docker antipatterns Cursor reproduces by default: FROM <lang>:latest, single-stage builds with dev dependencies in production, running as root, COPY . . before install (so the layer cache invalidates on every code change), and no HEALTHCHECK.

The rule:

- FROM pins a digest or a specific minor version:
    FROM node:20.11.1-alpine@sha256:abc...
    NOT: FROM node:latest or FROM node:20
- Multi-stage: `builder` has compilers/dev-deps, `runtime` is minimal.
  Final stage is distroless, -alpine, or -slim.
- RUN `USER app` (or numeric UID ≥ 10000) before CMD. Never run as root.
- .dockerignore excludes .git, node_modules, .env, tests, docs.
- COPY only what's needed; never `COPY . .` into the runtime stage.
- HEALTHCHECK and STOPSIGNAL SIGTERM on every service image.
- Image is scanned (Trivy/Grype) and signed (cosign) in CI.

A production Node image following this pattern is 80–150 MB. The FROM node:latest single-stage equivalent is 1.4 GB. The small one deploys faster, has a smaller attack surface, and costs less to store.

Rule 3: Kubernetes — Resources, Probes, and Non-Root Everywhere

Cursor writes Kubernetes manifests with no resources.limits, no probes, and securityContext: {}. The pod runs, schedules anywhere, and OOMs under load; kubelet has no way to know it's unhealthy; the container runs as UID 0 with every Linux capability.

The rule:

Every Deployment / StatefulSet / DaemonSet manifest has:

resources:
  requests: { cpu: "100m", memory: "128Mi" }   # scheduling
  limits:   { cpu: "500m", memory: "512Mi" }   # cgroup cap

securityContext:
  runAsNonRoot: true
  runAsUser: 10001
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities: { drop: [ALL] }

livenessProbe:  { httpGet: { path: /healthz, port: http }, periodSeconds: 10 }
readinessProbe: { httpGet: { path: /ready,   port: http }, periodSeconds: 5 }

No `image: myapp:latest` — always a pinned digest or SHA tag.
Every namespace has a NetworkPolicy; default deny + explicit allow.

Without limits, one pod evicts the entire node. Without probes, rolling updates route traffic to pods that aren't ready. Without a non-root securityContext, container escape is root-on-host.

Rule 4: GitHub Actions — Pinned by SHA, Least Privilege, No PAT Sprawl

Every uses: actions/checkout@v4 is "whatever commit they decide to tag as v4 tomorrow." Every permissions: write-all gives every job write access to everything — code, packages, deployments. Every PAT in secrets is a long-lived credential with no expiry and no audit trail.

The rule:

- Third-party actions are pinned by commit SHA, not tag:
    uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
    NOT: uses: actions/checkout@v4
- `permissions:` declared at workflow OR job level; default to `contents: read`.
  Only grant write to the specific job that needs it.
- NEVER `permissions: write-all`.
- Use `id-token: write` + OIDC federation to AWS/GCP/Azure. No long-
  lived PATs. No access keys in `secrets`.
- Every `run:` step with untrusted input (issue titles, PR bodies)
  uses an env var, not ${{ ... }} inline (script injection risk).

Injection example Cursor will reproduce:

# BAD — ${{ github.event.issue.title }} is interpolated into shell
- run: echo "New issue: ${{ github.event.issue.title }}"

# GOOD — passed via env, shell sees a literal string
- env:
    TITLE: ${{ github.event.issue.title }}
  run: echo "New issue: $TITLE"

Rule 5: Infrastructure as Code Review — Plan Output in Every PR

Infra PRs merged without reviewing the plan are how you get "oh, it also recreated the database." The diff in .tf doesn't tell you the blast radius — only terraform plan does, and only for the exact current state.

The rule:

Every infra PR has `terraform plan` (or `helm diff`, `kubectl diff`)
output posted as a comment by CI:
  - Added resources (green).
  - Changed resources (yellow).
  - Replaced / destroyed resources (red, requires extra review).

PRs with red output require:
  1. A second reviewer.
  2. The PR description explaining the blast radius.
  3. A rollback plan.

Plan is run in a CI job with read-only credentials; apply runs
separately with elevated privileges only after merge + manual
approval.

Never terraform apply from a laptop. Apply = CI only, after merge, with a human approval gate on destructive changes.

Rule 6: Monitoring — Alerts on Symptoms, Not Causes

Cursor writes Prometheus rules like alert: HighCPU expr: cpu_usage > 0.8. That's an alert on a cause, not a symptom. CPU at 80% doesn't matter if the service is meeting its SLO. Users care about latency and errors, not CPU.

The rule:

Alerts fire on user-facing symptoms:
  - Latency: p95 response time > SLO for 5 min.
  - Errors: error rate > SLO for 5 min.
  - Saturation: queue length / dropped messages (the thing that
    will cause the symptom if it continues).

Alerts NEVER fire on:
  - CPU/memory utilization (capacity-plan, don't page).
  - Disk > 80% alone (rate of fill matters, not snapshot).
  - Pod restart count alone (restarts are normal).

Every alert has a runbook link in its annotation:
  annotations:
    runbook: "https://runbooks.internal/<alert-name>"
    summary: "p95 latency {{ $value }}s exceeds 500ms SLO"

The 3am page should be actionable. "CPU high" isn't — it tells you nothing about what to do.

Rule 7: Rollback First, Deploy Second

Every deployment has a rollback plan. Cursor writes CD pipelines that deploy forward but have no scripted way back. When the deploy breaks prod, someone improvises under pressure.

The rule:

Every deployment pipeline has a matching rollback pipeline, tested:
  - Kubernetes: `kubectl rollout undo deployment/<name>` works
    because the previous ReplicaSet is still there (revisionHistoryLimit >= 5).
  - Terraform: previous state is versioned in S3; rollback = apply
    the prior state snapshot, reviewed like any other apply.
  - Docker/serverless: previous image tag is always deployable;
    aliases/traffic-splitting allow 0→100% rollback in one command.

Before any release is marked "done," the rollback has been dry-run
at least once on a staging environment.

The test isn't "can we deploy?" The test is "can we un-deploy in 60 seconds?"

The DevOps Cursor Setup — Quick Start

.cursor/rules/devops-baseline.mdc:

---
description: DevOps baseline applied to infra, CI, and container files.
globs:
  - "**/*.tf"
  - "**/Dockerfile*"
  - "k8s/**/*.yaml"
  - ".github/workflows/*.yml"
alwaysApply: false
---

# Non-negotiables (security + reliability)
- Never commit secrets, PATs, keys, or state files.
- Pin everything: images by digest, actions by SHA, modules by version.
- Least privilege by default: IAM, GH permissions, K8s RBAC.
- Non-root containers, read-only rootfs, dropped capabilities.
- Resource requests AND limits on every workload.
- Remote Terraform state with lock + encryption.

# Operability
- Every service has liveness, readiness, and a /healthz endpoint.
- Every alert fires on symptoms, has a runbook, and is actionable.
- Every deployment has a tested rollback.

That's the spine. Cursor now writes Docker images you'd put in production, Terraform you'd actually apply, Kubernetes manifests that survive a node failure, and GitHub Actions workflows that don't leak your AWS keys to a third-party action's next release.

Want the full DevOps pack?

We maintain a Cursor Rules pack with production-ready rules for Terraform, Docker, Kubernetes, Helm, GitHub Actions, Ansible, and Prometheus — every rule tested, pinned, and scoped so your AI-written infra looks like it came from an SRE, not from Stack Overflow circa 2018.

Get the Cursor Rules pack on Gumroad →

DEV Community

Cursor Rules for DevOps: Infrastructure, CI/CD, and Container Rules That Ship

Cursor Rules for DevOps: Infrastructure, CI/CD, and Container Rules That Ship

How Cursor Rules Work for Infrastructure Code

Rule 1: Terraform — State, Backends, and No `local` State

Rule 2: Docker — Multi-Stage, Non-Root, Pinned, Small

Rule 3: Kubernetes — Resources, Probes, and Non-Root Everywhere

Rule 4: GitHub Actions — Pinned by SHA, Least Privilege, No PAT Sprawl

Rule 5: Infrastructure as Code Review — Plan Output in Every PR

Rule 6: Monitoring — Alerts on Symptoms, Not Causes

Rule 7: Rollback First, Deploy Second

The DevOps Cursor Setup — Quick Start

Want the full DevOps pack?

Top comments (0)

Cursor Rules for DevOps: Infrastructure, CI/CD, and Container Rules That Ship

How Cursor Rules Work for Infrastructure Code

Rule 1: Terraform — State, Backends, and No local State

Rule 2: Docker — Multi-Stage, Non-Root, Pinned, Small

Rule 3: Kubernetes — Resources, Probes, and Non-Root Everywhere

Rule 4: GitHub Actions — Pinned by SHA, Least Privilege, No PAT Sprawl

Rule 5: Infrastructure as Code Review — Plan Output in Every PR

Rule 6: Monitoring — Alerts on Symptoms, Not Causes

Rule 7: Rollback First, Deploy Second

The DevOps Cursor Setup — Quick Start

Want the full DevOps pack?

Rule 1: Terraform — State, Backends, and No `local` State