DEV Community

Cover image for I Built a Full AWS CI/CD Pipeline with Blue/Green Deployments, Here's Everything I Learned
augusthottie
augusthottie

Posted on

I Built a Full AWS CI/CD Pipeline with Blue/Green Deployments, Here's Everything I Learned

A hands-on walkthrough of building an end-to-end AWS-native CI/CD pipeline using CodeCommit, CodeBuild, CodeDeploy, and CodePipeline with zero-downtime blue/green deployments.

Most DevOps tutorials stop at "set up a GitHub Actions workflow." That's fine, but if you're preparing for the AWS DevOps Professional exam or interviewing for AWS-heavy roles, you need to know the AWS-native CI/CD stack inside and out.

So I built a full pipeline from scratch! No GitHub Actions, no Jenkins, no third-party tools. Just AWS services talking to each other, ending with zero-downtime blue/green deployments.

This post walks through the entire build, the problems I ran into, and what I'd do differently.

What I Built

A simple Node.js/Express API deployed through a fully automated pipeline:

CodeCommit → CodeBuild → Manual Approval → CodeDeploy (Blue/Green)

Every git push to main triggers the pipeline. CodeBuild installs dependencies and runs tests using Bun, then waits for manual approval via an SNS email notification. Once approved, CodeDeploy spins up a fresh Auto Scaling Group, deploys the new version, validates health checks through an ALB, shifts traffic, and terminates the old instances.

Zero downtime. Fully automated after approval.

Why AWS-Native Instead of GitHub Actions?

I already had several projects using GitHub Actions. That's great, but it only tells half the story. AWS has its own CI/CD ecosystem, CodeCommit, CodeBuild, CodeDeploy, CodePipeline, and companies running on AWS often use these services because they integrate tightly with IAM, VPCs, and other AWS infrastructure.

Understanding both gives you range. And for the AWS DevOps Professional exam, these services are tested heavily.

The Application

I kept the app intentionally simple, the pipeline is the project, not the app. It's an Express API with four endpoints:

  • GET /health: returns a 200 with status info (used by the ALB and CodeDeploy for validation)
  • GET /: welcome message with available endpoints
  • GET /info: app version, uptime, memory usage
  • GET /deploy-info: deployment metadata from CodeDeploy environment variables

The /health endpoint is the most important one. It's what the ALB target group uses for health checks, and it's what the validate_service.sh lifecycle script hits after deployment to confirm everything is working. If it fails, CodeDeploy rolls back automatically.

Setting Up the Pipeline

CodeCommit

Nothing fancy here, create a repo, push your code. I used git-remote-codecommit for authentication instead of HTTPS Git credentials because it uses your existing AWS CLI credentials and doesn't require generating separate passwords.

pip install git-remote-codecommit
git remote add origin codecommit::us-east-1://aws-pipeline-demo
git push origin main
Enter fullscreen mode Exit fullscreen mode

CodeBuild with Bun

CodeBuild uses a buildspec.yml file, think of it like a GitHub Actions workflow but for AWS. Mine installs Bun, runs bun install --frozen-lockfile, executes tests with bun test, and packages the artifact to S3.

version: 0.2

phases:
  install:
    runtime-versions:
      nodejs: 18
    commands:
      - curl -fsSL https://bun.sh/install | bash
      - export BUN_INSTALL="$HOME/.bun"
      - export PATH="$BUN_INSTALL/bin:$PATH"

  pre_build:
    commands:
      - export BUN_INSTALL="$HOME/.bun"
      - export PATH="$BUN_INSTALL/bin:$PATH"
      - bun install --frozen-lockfile
      - bun test

  build:
    commands:
      - export BUN_INSTALL="$HOME/.bun"
      - export PATH="$BUN_INSTALL/bin:$PATH"
      - echo "Build started on $(date)"
      - export APP_VERSION=$(bun -e "console.log(require('./package.json').version)")-$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | head -c 8)
      - echo "App version $APP_VERSION"
Enter fullscreen mode Exit fullscreen mode

One thing that caught me early: bun install --frozen-lockfile requires a bun.lockb file in the repo. If you forget to commit it, the build fails with no helpful error message. Run bun install locally first and push the lockfile.

I also enabled S3 caching for node_modules and the Bun binary directory. The first build downloads everything, but subsequent builds skip the install step entirely when dependencies haven't changed. The actual build commands run in about 7 seconds, the rest of the ~4 minute build time is CodeBuild provisioning its container, which is unavoidable with on-demand compute.

The Approval Gate

Between build and deploy, I added a manual approval stage with SNS notifications. When the build passes, CodePipeline sends an email asking you to approve or reject the deployment.

This is common in production pipelines, you don't always want every passing build to go straight to production. The approval stage gives you a checkpoint to review changes, run additional validation, or coordinate deployment timing.

One gotcha: the sns:Publish permission needs to be on the CodePipeline service role, not the CodeBuild role. The approval action is triggered by CodePipeline, not CodeBuild. I initially added it to the wrong role and spent time debugging why emails weren't sending.

Blue/Green Deployments (The Core of the Project)

This is the part I'm most proud of. In-place deployments are simpler, but blue/green is what production environments use when downtime isn't acceptable.

How It Works

CodeDeploy uses an appspec.yml file that defines where files go and which scripts to run at each lifecycle stage:

hooks:
  ApplicationStop:
    - location: scripts/stop_app.sh
  BeforeInstall:
    - location: scripts/before_install.sh
  AfterInstall:
    - location: scripts/after_install.sh
  ApplicationStart:
    - location: scripts/start_app.sh
  ValidateService:
    - location: scripts/validate_service.sh
Enter fullscreen mode Exit fullscreen mode

Each script handles a specific step: stop the old app, clean up, install dependencies, start the new version, and validate it's healthy.

The Deployment Flow

When CodeDeploy runs a blue/green deployment:

  1. Clones the ASG: creates a brand new Auto Scaling Group with the same configuration as the original
  2. Launches fresh instances: new EC2 instances spin up with the CodeDeploy agent
  3. Runs lifecycle hooks: your scripts execute in order (stop → before_install → after_install → start → validate)
  4. Health check: the ALB confirms the new instances return 200 on /health
  5. Shifts traffic: the ALB moves all traffic from the old target group to the new one
  6. Terminates the old environment: after a 5-minute wait, the original instances are killed

Users never see downtime because traffic only shifts after the new instances are confirmed healthy.

The Thing Nobody Tells You About Blue/Green

I was confused when my first blue/green deployment created a second Auto Scaling Group. I thought, "Why is it making a new one? I already have one."

Here's what nobody explains upfront: your original ASG is just a template. CodeDeploy copies it on the first deployment and replaces it. On the second deployment, it copies the replacement and replaces that. Every deployment creates a new ASG and destroys the old one. Your original ASG only survives until the first successful deployment.

Once I understood this, the whole model clicked.

The IAM Nightmare (And How to Survive It)

IAM was the single biggest time sink in this project. Blue/green deployments require an unusually broad set of permissions because CodeDeploy needs to:

  • Create and delete Auto Scaling Groups
  • Launch and terminate EC2 instances
  • Modify ALB target groups and listeners
  • Pass IAM roles to new instances (iam:PassRole)
  • Read artifacts from S3

Each of these is a separate IAM action, and missing any one of them produces a vague error message. Here's what my CodeDeploy service role ended up needing:

  • Full Auto Scaling permissions (create, update, delete ASGs, lifecycle hooks, scaling policies)
  • EC2 permissions (describe, run, terminate instances, create tags)
  • Elastic Load Balancing permissions (describe and modify target groups, register/deregister targets)
  • S3 read access to the artifact bucket
  • iam:PassRole for EC2 and Auto Scaling services
  • SNS publish for notifications
  • CloudWatch for alarms

My advice: start with the AWS managed AWSCodeDeployRole policy and add custom permissions for blue/green. Don't try to build the policy from scratch, you'll miss something and spend hours debugging.

Build Optimization

A few things I did to keep builds fast and costs low:

S3 caching: CodeBuild caches node_modules and the Bun binary between builds. When dependencies haven't changed, bun install completes almost instantly.

Bun over npm: Bun's install and test execution is noticeably faster than npm/Jest. The entire install + test cycle takes about 7 seconds.

I tried using a custom Docker image (oven/bun:1) to skip installing Bun on every build. It worked locally, but Docker Hub's unauthenticated pull rate limit blocked it in CodeBuild. The fix would be pushing the image to Amazon ECR, but for this project the S3 cache approach was simpler.

Cost Breakdown

For anyone worried about AWS bills:

Resource Cost
CodePipeline 1 free pipeline/month
CodeBuild 100 free build minutes/month
EC2 t3.micro Free tier eligible
ALB ~$16/month (the main cost)
S3 + SNS Negligible

The ALB is the biggest expense. Tear it down when you're not actively working on the project, and spin it back up when needed.

What I'd Do Differently

Start with Terraform from day one. I set everything up manually in the console first to learn how each service works, which was valuable for understanding. But recreating the setup would mean clicking through dozens of console screens. If I were doing this again, I'd codify everything in Terraform as I go.

Use ECR for custom images. Instead of Docker Hub, I'd push the Bun image to ECR to avoid rate limits and get faster pulls within AWS.

Add CloudWatch alarms for auto-rollback. Right now, rollback only triggers if the health check fails during deployment. In production, you'd want CloudWatch alarms monitoring error rates and latency post-deployment, with automatic rollback if metrics spike.

Key Takeaways

If you're learning DevOps or preparing for AWS certifications, building this pipeline taught me more than any course or practice exam:

  • IAM is the real skill. Anyone can configure a pipeline in the console. Understanding which roles need which permissions, and why, is what separates junior from mid-level DevOps engineers.

  • Blue/green isn't magic. It's just two environments, a load balancer, and a traffic switch. Once you understand the ASG cloning model, it's straightforward.

  • AWS-native CI/CD has tradeoffs. It integrates beautifully with IAM, VPCs, and other AWS services. But it's more complex to set up than GitHub Actions, and CodeBuild's container provisioning adds latency. Choose based on your environment.

  • The debugging is the learning. Every failed deployment, every permission error, every misconfigured health check taught me something I wouldn't have learned from documentation alone.

Links


I'm currently building my DevOps portfolio ahead of targeting the AWS DevOps Professional certification. If you're on a similar journey, I'd love to connect, drop a comment or find me on LinkedIn.

Top comments (2)

Collapse
 
deborah_maiyaki_de2de7b6b profile image
Deborah Maiyaki

This is really amazing 🤩.

I'm looking up to you.

Super proud of you.

Wishing you all the best as you take your AWS DevOps professional certification.

Collapse
 
augusthottie profile image
augusthottie

Thank you hun!