myougaTheAxo

Posted on Mar 11

GitHub Actions Production Deployment with Claude Code: ECS Fargate, Manual Approval, and Smoke Tests

#claudecode #githubactions #aws #devops

"Merge to main = auto deploy to production" sounds like a goal. In practice it's a disaster waiting to happen.

Production deployments need gates: manual approval from a human, smoke tests that verify the deploy actually worked, and a rollback path when it doesn't. Claude Code generates the entire pipeline design from your CLAUDE.md. Here's what that looks like.

What You Tell Claude Code (CLAUDE.md)

## Deployment Rules

- main branch = production source of truth
- Direct push to main is prohibited; PRs required
- PR CI must pass: lint + type-check + tests (80% coverage) + docker build + trivy security scan
- Staging deploys automatically on merge to main
- Production deploys require manual approval (GitHub Environment with required reviewers)
- Smoke tests run after every deploy (staging and production)
- Rollback procedure: redeploy previous image tag via workflow_dispatch

## Infrastructure

- Container registry: Amazon ECR
- Runtime: ECS Fargate (service: myapp-api, cluster: myapp-cluster)
- Health check endpoint: /health
- Readiness endpoint: /ready
- AWS region: ap-northeast-1

This tells Claude Code the exact constraints before it generates a single line of YAML.

Pipeline 1: PR CI (ci.yml)

name: CI

on:
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest

    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: testpass
          POSTGRES_DB: testdb
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'

      - run: npm ci

      - name: Lint
        run: npm run lint

      - name: Type check
        run: npm run type-check

      - name: Test with coverage
        run: npm test -- --coverage --coverageThreshold='{"global":{"lines":80}}'
        env:
          DATABASE_URL: postgres://postgres:testpass@localhost:5432/testdb

      - name: Build Docker image
        run: docker build --target production -t myapp:${{ github.sha }} .

      - name: Security scan (Trivy)
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: myapp:${{ github.sha }}
          exit-code: '1'
          severity: 'HIGH,CRITICAL'

The PR cannot merge until all five gates pass: lint, types, tests at 80% coverage, a successful Docker build targeting the production stage, and no HIGH or CRITICAL vulnerabilities from Trivy.

exit-code: '1' is critical. Without it, Trivy reports vulnerabilities but doesn't fail the job. You'd have a green CI with known critical CVEs.

Pipeline 2: Staging Auto-Deploy (deploy-staging.yml)

name: Deploy to Staging

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: staging

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ap-northeast-1

      - name: Login to ECR
        id: ecr-login
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push
        run: |
          IMAGE=${{ steps.ecr-login.outputs.registry }}/myapp:${{ github.sha }}
          docker build --target production -t $IMAGE .
          docker push $IMAGE

      - name: Deploy to ECS staging
        run: |
          aws ecs update-service             --cluster myapp-staging-cluster             --service myapp-api             --force-new-deployment             --region ap-northeast-1

      - name: Wait for stability
        run: |
          aws ecs wait services-stable             --cluster myapp-staging-cluster             --services myapp-api             --region ap-northeast-1

      - name: Smoke tests
        run: |
          BASE=https://staging.myapp.example.com
          curl --fail --retry 3 --retry-delay 5 $BASE/health
          curl --fail --retry 3 --retry-delay 5 $BASE/ready

aws ecs wait services-stable blocks the job until ECS confirms the new task is running and healthy. Only then do smoke tests run.

Smoke tests are intentionally minimal: hit /health, hit /ready. If either returns non-200, the deployment is marked failed. You'll know before any user hits the new version.

Pipeline 3: Production Deploy with Manual Approval (deploy-production.yml)

name: Deploy to Production

on:
  workflow_dispatch:
    inputs:
      image_tag:
        description: 'Image tag to deploy (e.g. abc1234)'
        required: true
        type: string

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production  # GitHub Environment with required reviewers

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ap-northeast-1

      - name: Login to ECR
        id: ecr-login
        uses: aws-actions/amazon-ecr-login@v2

      - name: Retag image as production-latest
        run: |
          REGISTRY=${{ steps.ecr-login.outputs.registry }}
          SOURCE=$REGISTRY/myapp:${{ inputs.image_tag }}
          TARGET=$REGISTRY/myapp:production-latest

          docker pull $SOURCE
          docker tag $SOURCE $TARGET
          docker push $TARGET

      - name: Deploy to ECS production
        run: |
          aws ecs update-service             --cluster myapp-cluster             --service myapp-api             --force-new-deployment             --region ap-northeast-1

      - name: Wait for stability
        run: |
          aws ecs wait services-stable             --cluster myapp-cluster             --services myapp-api             --region ap-northeast-1

      - name: Smoke tests
        run: |
          BASE=https://api.myapp.example.com
          curl --fail --retry 5 --retry-delay 10 $BASE/health
          curl --fail --retry 5 --retry-delay 10 $BASE/ready

The environment: production line is what triggers manual approval. In GitHub repository settings, you configure the production environment with required reviewers. When this workflow runs, it pauses at the deploy job and sends a notification to reviewers. No one can bypass it.

workflow_dispatch with an image_tag input means you explicitly choose which commit goes to production. The staging deploy creates a SHA-tagged image; production deploys that specific SHA, not whatever is latest on main.

Rollback Procedure

Rollback is just another workflow_dispatch run:

Run "Deploy to Production"
  image_tag: <previous-working-sha>

ECS will deploy the previous image. Because aws ecs wait services-stable blocks until healthy, you'll know rollback succeeded before the job completes. The previous production-latest tag gets overwritten with the stable version.

What Claude Code Actually Generates From This

Claude Code reads the CLAUDE.md and produces:

ci.yml with the correct service container setup for postgres, the Trivy action with exit-code: 1, and coverage thresholds wired into the test command
deploy-staging.yml with the wait services-stable step and smoke tests
deploy-production.yml with workflow_dispatch, the retag pattern, and the same smoke test structure
A DEPLOYMENT.md explaining the rollback procedure for the team

The CLAUDE.md is why Claude Code gets this right the first time. Without specifying "manual approval required" and "smoke tests after each deploy," the generated pipeline would be a straight push-to-deploy with no gates.

Summary

Pipeline	Trigger	Gates
ci.yml	PR to main	lint + types + tests (80%) + docker build + trivy
deploy-staging.yml	Push to main	ECS stable wait + smoke tests
deploy-production.yml	Manual `workflow_dispatch`	Human approval + ECS stable wait + smoke tests

Three pipelines. Clear separation of responsibilities. Production can only be deployed intentionally, with a specific image tag, after a human approves.

If you want to go deeper on structuring Claude Code prompts for infrastructure and CI/CD — including how to describe AWS environments in CLAUDE.md for consistent output — my Code Review Pack covers multi-environment deployment patterns.

Code Review Pack (¥980) — available at prompt-works.jp under /code-review

What does your production deployment gate look like? Manual approval, automated canary, or something else?

DEV Community