Rizwan Saleem

Posted on May 30

How to set up CI/CD that your team will actually use

#webdev #ai #frontend

How to set up CI/CD that your team will actually use

Building Production-Ready CI/CD Pipelines: A Complete Tutorial

What Is a CI/CD Pipeline?

A CI/CD pipeline is an automated workflow that moves code from a pull request to production through build, test, security scanning, environment promotion, and deployment stages. Pipelines are defined as code (usually YAML) in version control, making the delivery process versionable, reviewable, and reproducible.

Elite-performing teams achieve lead time under one hour from commit to production with change failure rates below 15%.

Pipeline Stages: Build, Test, Deploy

Build Stage

The build stage compiles source code and packages it into a deployable artifact:

Java: Run Maven/Gradle to produce a JAR, then docker build for a container image
Node.js: npm install, run the build script, produce a dist folder or Docker image
Unit tests and dependency scans run here early when failure is cheap to diagnose

Test Stages

Map test types to the right pipeline stage:

Test Type	When to Run	Duration	Purpose
Unit tests	Every commit	Milliseconds-seconds	Fail fast, fail early
Integration tests	After successful build	Minutes	Need running application/dependent services
Contract tests	Before service-to-service integration	Seconds-minutes	Catch API breaking changes
Load tests	Before production promotion	10-30 minutes	Gate staging-to-production transition
Regression tests	Post-deployment (scheduled)	Varies	Catch production-only regressions

Test Intelligence using ML can reduce test execution time by up to 80% by running only tests relevant to changed code.

Security Scanning

Security scanning belongs inside the pipeline (shift-left security):

SAST: Every PR (fast but noisy)
Container image scanning: Before any environment promotion
Dependency scanning (SCA): Every build
DAST: Nightly builds or before production (too slow for every commit)

Enforce SLSA compliance by generating provenance attestations at build time and verifying artifact integrity before deployment.

Optimizing for Speed: Caching Strategies

The easiest and quickest improvement is caching-it can cut pipeline time by 5-10 minutes with just three caching steps.

Key Caching Strategies

Strategy	What to Cache	Benefit
Dependencies	npm/pip/Gradle packages	Stop reinstalling every time
Build artifacts	Compiled code	Reuse where possible
Shared caches	Across stages/jobs/pipelines	Maximum reuse
Incremental builds	Unchanged modules	Skip recompilation

Cache Intelligence automatically caches dependencies between runs, typically achieving 2-4× faster build times without changing application code.

GitHub Actions caching example:

- name: Cache npm dependencies
  uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-

Environment Management

Modern pipelines provision infrastructure within the delivery flow using Terraform/OpenTofu, not as separate pre-steps.

Best Practices

Environment isolation: Separate secrets per environment
Same IaC code for production and staging stops environment drift
Pipeline progression gates on infrastructure provisioning success
GitOps model: Pipeline writes desired state to Git; Argo CD reconciles cluster to match

Environment-Specific Secrets (GitHub Actions)

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: ${{ github.event.inputs.environment }}  # staging or production
    steps:
      - name: Deploy
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}  # Environment-specific
          API_KEY: ${{ secrets.API_KEY }}
        run: ./scripts/deploy.sh

Secret Handling

Secrets management is critical-leaked API keys or credentials can cause data breaches and financial damage.

Critical Best Practices

Practice	Description	Priority
Use OIDC when possible	Eliminate long-lived credentials	Critical
Never hardcode secrets	Use CI/CD secret storage or external managers	Critical
Enable secret masking	Prevent secrets appearing in logs	Critical
Implement secret scanning	Detect accidentally committed secrets	High
Use environment isolation	Separate secrets per environment	High
Rotate secrets regularly	Limit exposure window for leaked secrets	High

OIDC Authentication (Recommended)

OIDC eliminates long-lived credentials by using short-lived tokens:

### GitHub Actions with OIDC to AWS
permissions:
  id-token: write
  contents: read

steps:
  - name: Configure AWS credentials via OIDC
    uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
      aws-region: us-east-1
      # No secrets needed-uses short-lived tokens

Secret Rotation

Automate monthly rotation to minimize exposure window:

### Secret rotation workflow
on:
  schedule:
    - cron: '0 0 1 * *'  # Monthly

Deployment Strategies

Rolling Deployment

Each instance updates separately (or in small groups). The old version handles traffic until all instances flip.

Pros: Simple to implement, no extra infrastructure cost
Cons: Both versions run concurrently-application must be backward-compatible

Blue-Green Deployment

Two identical production environments run in parallel: blue (staging/new) and green (production/stable).

Process:

Deploy new version to inactive environment (blue)
Run QA and user acceptance testing in blue
Switch traffic via load balancer from green to blue
Rollback is instant: flip load balancer back

Tradeoff: Doubles infrastructure spend during transition window

Canary Deployment

Releases incrementally to a subset of users (typically 1-5% initially).

Process:

Deploy new version alongside current version
Route 1-5% of traffic to new version
Monitor performance and user feedback
Gradually increase traffic (5% → 25% → 75% → 100%) if no issues
Fully transition once proven stable

Success criteria: Clear metrics (error rate, latency)-not just "no errors"
Tradeoff: Validation complexity requires robust observability

Canary is often a strong default for production when teams have traffic routing, observability, and rollback automation.

Testing in Pipelines

Testing Strategy by Stage

Pre-deployment: Unit, integration, contract, and load tests
Post-deployment: Automated verification comparing real-time metrics against baseline

AI-Assisted Deployment Verification

AI compares real-time metrics against a baseline window (typically same time from previous deployment):

Error rate increases above +0.2% baseline
Latency degradation (e.g., p99 latency up 15%)
Throughput drops or resource consumption spikes

When regression is detected, the pipeline rolls back automatically without waiting for human intervention. This makes daily or multiple-times-daily deployment reasonable.

Real Pipeline Walkthrough: GitHub Actions Example

Here's a complete production-ready pipeline implementing all concepts:

### .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write  # For OIDC
  contents: read

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Caching dependencies
      - name: Cache npm dependencies
        uses: actions/cache@v4
        with:
          path: ~/.npm
          key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-node-

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run unit tests
        run: npm test

      - name: SAST security scan
        uses: github/super-linter@v5
        with:
          DEFAULT_BRANCH: main

      - name: Build application
        run: npm run build

      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .

  security-scan:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Container image scanning
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'myapp:${{ github.sha }}'
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Secret scanning
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: ${{ github.event.pull_request.base.sha }}
          head: ${{ github.event.pull_request.head.sha }}
          extra_args: --only-verified

  deploy-staging:
    needs: [build, security-scan]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_STAGING_ROLE_ARN }}
          aws-region: us-east-1

      - name: Deploy to staging (rolling)
        run: |
          aws ecs update-service \
            --cluster staging \
            --service api \
            --force-new-deployment

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_PRODUCTION_ROLE_ARN }}
          aws-region: us-east-1

      - name: Canary deployment (5% → 25% → 100%)
        run: |
          # Deploy to canary with 5% traffic
          aws ecs update-service \
            --cluster production \
            --service api \
            --task-definition api-canary \
            --desired-count 1
          # Wait and monitor metrics
          sleep 1800  # 30 minutes
          # If metrics healthy, increase to 25%
          aws ecs update-service \
            --cluster production \
            --service api \
            --desired-count 5
          # Final promotion to 100%
          sleep 1800
          aws ecs update-service \
            --cluster production \
            --service api \
            --task-definition api-production \
            --force-new-deployment

      - name: Post-deployment verification
        run: |
          # Monitor error rate, latency for 10 minutes
          # Auto-rollback if error rate > 0.2% above baseline
          ./scripts/verify-deployment.sh

DORA Metrics to Track

Track these four metrics to measure pipeline performance:

Metric	Target (Elite)	Target (High)
Lead time for changes	Under 1 hour	Under 1 day
Deployment frequency	Multiple times per day	-
Change failure rate	Below 15%	-
Mean time to recovery (MTTR)	Under 1 hour	-

Quick Win Checklist

[ ] Add dependency caching (5-10 minute savings)
[ ] Use OIDC instead of long-lived credentials
[ ] Enable secret masking in all logs
[ ] Run SAST on every PR, DAST nightly
[ ] Implement canary deployment with auto-rollback
[ ] Add post-deployment metric verification
[ ] Set up secret rotation (monthly)

Start with automated tests on every push and a single staging environment, then add deployment strategies, security scanning, and performance optimization as your team grows.

Want to dive deeper into a specific tool (Jenkins, GitLab CI, AWS CodePipeline) or deployment strategy?

Rizwan Saleem — https://rizwansaleem.co

DEV Community

How to set up CI/CD that your team will actually use

How to set up CI/CD that your team will actually use

Building Production-Ready CI/CD Pipelines: A Complete Tutorial

What Is a CI/CD Pipeline?

Pipeline Stages: Build, Test, Deploy

Build Stage

Test Stages

Security Scanning

Optimizing for Speed: Caching Strategies

Key Caching Strategies

Environment Management

Best Practices

Environment-Specific Secrets (GitHub Actions)

Secret Handling

Critical Best Practices

OIDC Authentication (Recommended)

Secret Rotation

Deployment Strategies

Rolling Deployment

Blue-Green Deployment

Canary Deployment

Testing in Pipelines

Testing Strategy by Stage

AI-Assisted Deployment Verification

Real Pipeline Walkthrough: GitHub Actions Example

DORA Metrics to Track

Quick Win Checklist

Top comments (0)