DEV Community

Rizwan Saleem
Rizwan Saleem

Posted on

How to set up CI/CD that your team will actually use

How to set up CI/CD that your team will actually use

Building Production-Ready CI/CD Pipelines: A Complete Tutorial

What Is a CI/CD Pipeline?

A CI/CD pipeline is an automated workflow that moves code from a pull request to production through build, test, security scanning, environment promotion, and deployment stages. Pipelines are defined as code (usually YAML) in version control, making the delivery process versionable, reviewable, and reproducible.

Elite-performing teams achieve lead time under one hour from commit to production with change failure rates below 15%.

Pipeline Stages: Build, Test, Deploy

Build Stage

The build stage compiles source code and packages it into a deployable artifact:

  • Java: Run Maven/Gradle to produce a JAR, then docker build for a container image
  • Node.js: npm install, run the build script, produce a dist folder or Docker image
  • Unit tests and dependency scans run here early when failure is cheap to diagnose

Test Stages

Map test types to the right pipeline stage:

Test Type When to Run Duration Purpose
Unit tests Every commit Milliseconds-seconds Fail fast, fail early
Integration tests After successful build Minutes Need running application/dependent services
Contract tests Before service-to-service integration Seconds-minutes Catch API breaking changes
Load tests Before production promotion 10-30 minutes Gate staging-to-production transition
Regression tests Post-deployment (scheduled) Varies Catch production-only regressions

Test Intelligence using ML can reduce test execution time by up to 80% by running only tests relevant to changed code.

Security Scanning

Security scanning belongs inside the pipeline (shift-left security):

  • SAST: Every PR (fast but noisy)
  • Container image scanning: Before any environment promotion
  • Dependency scanning (SCA): Every build
  • DAST: Nightly builds or before production (too slow for every commit)

Enforce SLSA compliance by generating provenance attestations at build time and verifying artifact integrity before deployment.

Optimizing for Speed: Caching Strategies

The easiest and quickest improvement is caching-it can cut pipeline time by 5-10 minutes with just three caching steps.

Key Caching Strategies

Strategy What to Cache Benefit
Dependencies npm/pip/Gradle packages Stop reinstalling every time
Build artifacts Compiled code Reuse where possible
Shared caches Across stages/jobs/pipelines Maximum reuse
Incremental builds Unchanged modules Skip recompilation

Cache Intelligence automatically caches dependencies between runs, typically achieving 2-4× faster build times without changing application code.

GitHub Actions caching example:

- name: Cache npm dependencies
  uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-
Enter fullscreen mode Exit fullscreen mode

Environment Management

Modern pipelines provision infrastructure within the delivery flow using Terraform/OpenTofu, not as separate pre-steps.

Best Practices

  • Environment isolation: Separate secrets per environment
  • Same IaC code for production and staging stops environment drift
  • Pipeline progression gates on infrastructure provisioning success
  • GitOps model: Pipeline writes desired state to Git; Argo CD reconciles cluster to match

Environment-Specific Secrets (GitHub Actions)

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: ${{ github.event.inputs.environment }}  # staging or production
    steps:
      - name: Deploy
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}  # Environment-specific
          API_KEY: ${{ secrets.API_KEY }}
        run: ./scripts/deploy.sh
Enter fullscreen mode Exit fullscreen mode

Secret Handling

Secrets management is critical-leaked API keys or credentials can cause data breaches and financial damage.

Critical Best Practices

Practice Description Priority
Use OIDC when possible Eliminate long-lived credentials Critical
Never hardcode secrets Use CI/CD secret storage or external managers Critical
Enable secret masking Prevent secrets appearing in logs Critical
Implement secret scanning Detect accidentally committed secrets High
Use environment isolation Separate secrets per environment High
Rotate secrets regularly Limit exposure window for leaked secrets High

OIDC Authentication (Recommended)

OIDC eliminates long-lived credentials by using short-lived tokens:

### GitHub Actions with OIDC to AWS
permissions:
  id-token: write
  contents: read

steps:
  - name: Configure AWS credentials via OIDC
    uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
      aws-region: us-east-1
      # No secrets needed-uses short-lived tokens
Enter fullscreen mode Exit fullscreen mode

Secret Rotation

Automate monthly rotation to minimize exposure window:

### Secret rotation workflow
on:
  schedule:
    - cron: '0 0 1 * *'  # Monthly
Enter fullscreen mode Exit fullscreen mode

Deployment Strategies

Rolling Deployment

Each instance updates separately (or in small groups). The old version handles traffic until all instances flip.

Pros: Simple to implement, no extra infrastructure cost
Cons: Both versions run concurrently-application must be backward-compatible

Blue-Green Deployment

Two identical production environments run in parallel: blue (staging/new) and green (production/stable).

Process:

  1. Deploy new version to inactive environment (blue)
  2. Run QA and user acceptance testing in blue
  3. Switch traffic via load balancer from green to blue
  4. Rollback is instant: flip load balancer back

Tradeoff: Doubles infrastructure spend during transition window

Canary Deployment

Releases incrementally to a subset of users (typically 1-5% initially).

Process:

  1. Deploy new version alongside current version
  2. Route 1-5% of traffic to new version
  3. Monitor performance and user feedback
  4. Gradually increase traffic (5% → 25% → 75% → 100%) if no issues
  5. Fully transition once proven stable

Success criteria: Clear metrics (error rate, latency)-not just "no errors"
Tradeoff: Validation complexity requires robust observability

Canary is often a strong default for production when teams have traffic routing, observability, and rollback automation.

Testing in Pipelines

Testing Strategy by Stage

  • Pre-deployment: Unit, integration, contract, and load tests
  • Post-deployment: Automated verification comparing real-time metrics against baseline

AI-Assisted Deployment Verification

AI compares real-time metrics against a baseline window (typically same time from previous deployment):

  • Error rate increases above +0.2% baseline
  • Latency degradation (e.g., p99 latency up 15%)
  • Throughput drops or resource consumption spikes

When regression is detected, the pipeline rolls back automatically without waiting for human intervention. This makes daily or multiple-times-daily deployment reasonable.

Real Pipeline Walkthrough: GitHub Actions Example

Here's a complete production-ready pipeline implementing all concepts:

### .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write  # For OIDC
  contents: read

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Caching dependencies
      - name: Cache npm dependencies
        uses: actions/cache@v4
        with:
          path: ~/.npm
          key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-node-

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run unit tests
        run: npm test

      - name: SAST security scan
        uses: github/super-linter@v5
        with:
          DEFAULT_BRANCH: main

      - name: Build application
        run: npm run build

      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .

  security-scan:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Container image scanning
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'myapp:${{ github.sha }}'
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Secret scanning
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: ${{ github.event.pull_request.base.sha }}
          head: ${{ github.event.pull_request.head.sha }}
          extra_args: --only-verified

  deploy-staging:
    needs: [build, security-scan]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_STAGING_ROLE_ARN }}
          aws-region: us-east-1

      - name: Deploy to staging (rolling)
        run: |
          aws ecs update-service \
            --cluster staging \
            --service api \
            --force-new-deployment

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_PRODUCTION_ROLE_ARN }}
          aws-region: us-east-1

      - name: Canary deployment (5% → 25% → 100%)
        run: |
          # Deploy to canary with 5% traffic
          aws ecs update-service \
            --cluster production \
            --service api \
            --task-definition api-canary \
            --desired-count 1
          # Wait and monitor metrics
          sleep 1800  # 30 minutes
          # If metrics healthy, increase to 25%
          aws ecs update-service \
            --cluster production \
            --service api \
            --desired-count 5
          # Final promotion to 100%
          sleep 1800
          aws ecs update-service \
            --cluster production \
            --service api \
            --task-definition api-production \
            --force-new-deployment

      - name: Post-deployment verification
        run: |
          # Monitor error rate, latency for 10 minutes
          # Auto-rollback if error rate > 0.2% above baseline
          ./scripts/verify-deployment.sh
Enter fullscreen mode Exit fullscreen mode

DORA Metrics to Track

Track these four metrics to measure pipeline performance:

Metric Target (Elite) Target (High)
Lead time for changes Under 1 hour Under 1 day
Deployment frequency Multiple times per day -
Change failure rate Below 15% -
Mean time to recovery (MTTR) Under 1 hour -

Quick Win Checklist

  • [ ] Add dependency caching (5-10 minute savings)
  • [ ] Use OIDC instead of long-lived credentials
  • [ ] Enable secret masking in all logs
  • [ ] Run SAST on every PR, DAST nightly
  • [ ] Implement canary deployment with auto-rollback
  • [ ] Add post-deployment metric verification
  • [ ] Set up secret rotation (monthly)

Start with automated tests on every push and a single staging environment, then add deployment strategies, security scanning, and performance optimization as your team grows.

Want to dive deeper into a specific tool (Jenkins, GitLab CI, AWS CodePipeline) or deployment strategy?


Rizwan Saleem — https://rizwansaleem.co

Top comments (0)