How to set up CI/CD that your team will actually use
Building Production-Ready CI/CD Pipelines: A Complete Tutorial
What Is a CI/CD Pipeline?
A CI/CD pipeline is an automated workflow that moves code from a pull request to production through build, test, security scanning, environment promotion, and deployment stages. Pipelines are defined as code (usually YAML) in version control, making the delivery process versionable, reviewable, and reproducible.
Elite-performing teams achieve lead time under one hour from commit to production with change failure rates below 15%.
Pipeline Stages: Build, Test, Deploy
Build Stage
The build stage compiles source code and packages it into a deployable artifact:
-
Java: Run Maven/Gradle to produce a JAR, then
docker buildfor a container image -
Node.js:
npm install, run the build script, produce adistfolder or Docker image - Unit tests and dependency scans run here early when failure is cheap to diagnose
Test Stages
Map test types to the right pipeline stage:
| Test Type | When to Run | Duration | Purpose |
|---|---|---|---|
| Unit tests | Every commit | Milliseconds-seconds | Fail fast, fail early |
| Integration tests | After successful build | Minutes | Need running application/dependent services |
| Contract tests | Before service-to-service integration | Seconds-minutes | Catch API breaking changes |
| Load tests | Before production promotion | 10-30 minutes | Gate staging-to-production transition |
| Regression tests | Post-deployment (scheduled) | Varies | Catch production-only regressions |
Test Intelligence using ML can reduce test execution time by up to 80% by running only tests relevant to changed code.
Security Scanning
Security scanning belongs inside the pipeline (shift-left security):
- SAST: Every PR (fast but noisy)
- Container image scanning: Before any environment promotion
- Dependency scanning (SCA): Every build
- DAST: Nightly builds or before production (too slow for every commit)
Enforce SLSA compliance by generating provenance attestations at build time and verifying artifact integrity before deployment.
Optimizing for Speed: Caching Strategies
The easiest and quickest improvement is caching-it can cut pipeline time by 5-10 minutes with just three caching steps.
Key Caching Strategies
| Strategy | What to Cache | Benefit |
|---|---|---|
| Dependencies | npm/pip/Gradle packages | Stop reinstalling every time |
| Build artifacts | Compiled code | Reuse where possible |
| Shared caches | Across stages/jobs/pipelines | Maximum reuse |
| Incremental builds | Unchanged modules | Skip recompilation |
Cache Intelligence automatically caches dependencies between runs, typically achieving 2-4× faster build times without changing application code.
GitHub Actions caching example:
- name: Cache npm dependencies
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
Environment Management
Modern pipelines provision infrastructure within the delivery flow using Terraform/OpenTofu, not as separate pre-steps.
Best Practices
- Environment isolation: Separate secrets per environment
- Same IaC code for production and staging stops environment drift
- Pipeline progression gates on infrastructure provisioning success
- GitOps model: Pipeline writes desired state to Git; Argo CD reconciles cluster to match
Environment-Specific Secrets (GitHub Actions)
jobs:
deploy:
runs-on: ubuntu-latest
environment: ${{ github.event.inputs.environment }} # staging or production
steps:
- name: Deploy
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }} # Environment-specific
API_KEY: ${{ secrets.API_KEY }}
run: ./scripts/deploy.sh
Secret Handling
Secrets management is critical-leaked API keys or credentials can cause data breaches and financial damage.
Critical Best Practices
| Practice | Description | Priority |
|---|---|---|
| Use OIDC when possible | Eliminate long-lived credentials | Critical |
| Never hardcode secrets | Use CI/CD secret storage or external managers | Critical |
| Enable secret masking | Prevent secrets appearing in logs | Critical |
| Implement secret scanning | Detect accidentally committed secrets | High |
| Use environment isolation | Separate secrets per environment | High |
| Rotate secrets regularly | Limit exposure window for leaked secrets | High |
OIDC Authentication (Recommended)
OIDC eliminates long-lived credentials by using short-lived tokens:
### GitHub Actions with OIDC to AWS
permissions:
id-token: write
contents: read
steps:
- name: Configure AWS credentials via OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
aws-region: us-east-1
# No secrets needed-uses short-lived tokens
Secret Rotation
Automate monthly rotation to minimize exposure window:
### Secret rotation workflow
on:
schedule:
- cron: '0 0 1 * *' # Monthly
Deployment Strategies
Rolling Deployment
Each instance updates separately (or in small groups). The old version handles traffic until all instances flip.
Pros: Simple to implement, no extra infrastructure cost
Cons: Both versions run concurrently-application must be backward-compatible
Blue-Green Deployment
Two identical production environments run in parallel: blue (staging/new) and green (production/stable).
Process:
- Deploy new version to inactive environment (blue)
- Run QA and user acceptance testing in blue
- Switch traffic via load balancer from green to blue
- Rollback is instant: flip load balancer back
Tradeoff: Doubles infrastructure spend during transition window
Canary Deployment
Releases incrementally to a subset of users (typically 1-5% initially).
Process:
- Deploy new version alongside current version
- Route 1-5% of traffic to new version
- Monitor performance and user feedback
- Gradually increase traffic (5% → 25% → 75% → 100%) if no issues
- Fully transition once proven stable
Success criteria: Clear metrics (error rate, latency)-not just "no errors"
Tradeoff: Validation complexity requires robust observability
Canary is often a strong default for production when teams have traffic routing, observability, and rollback automation.
Testing in Pipelines
Testing Strategy by Stage
- Pre-deployment: Unit, integration, contract, and load tests
- Post-deployment: Automated verification comparing real-time metrics against baseline
AI-Assisted Deployment Verification
AI compares real-time metrics against a baseline window (typically same time from previous deployment):
- Error rate increases above +0.2% baseline
- Latency degradation (e.g., p99 latency up 15%)
- Throughput drops or resource consumption spikes
When regression is detected, the pipeline rolls back automatically without waiting for human intervention. This makes daily or multiple-times-daily deployment reasonable.
Real Pipeline Walkthrough: GitHub Actions Example
Here's a complete production-ready pipeline implementing all concepts:
### .github/workflows/ci-cd.yml
name: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
permissions:
id-token: write # For OIDC
contents: read
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Caching dependencies
- name: Cache npm dependencies
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run unit tests
run: npm test
- name: SAST security scan
uses: github/super-linter@v5
with:
DEFAULT_BRANCH: main
- name: Build application
run: npm run build
- name: Build Docker image
run: docker build -t myapp:${{ github.sha }} .
security-scan:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Container image scanning
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:${{ github.sha }}'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Secret scanning
uses: trufflesecurity/trufflehog@main
with:
path: ./
base: ${{ github.event.pull_request.base.sha }}
head: ${{ github.event.pull_request.head.sha }}
extra_args: --only-verified
deploy-staging:
needs: [build, security-scan]
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials via OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_STAGING_ROLE_ARN }}
aws-region: us-east-1
- name: Deploy to staging (rolling)
run: |
aws ecs update-service \
--cluster staging \
--service api \
--force-new-deployment
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials via OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_PRODUCTION_ROLE_ARN }}
aws-region: us-east-1
- name: Canary deployment (5% → 25% → 100%)
run: |
# Deploy to canary with 5% traffic
aws ecs update-service \
--cluster production \
--service api \
--task-definition api-canary \
--desired-count 1
# Wait and monitor metrics
sleep 1800 # 30 minutes
# If metrics healthy, increase to 25%
aws ecs update-service \
--cluster production \
--service api \
--desired-count 5
# Final promotion to 100%
sleep 1800
aws ecs update-service \
--cluster production \
--service api \
--task-definition api-production \
--force-new-deployment
- name: Post-deployment verification
run: |
# Monitor error rate, latency for 10 minutes
# Auto-rollback if error rate > 0.2% above baseline
./scripts/verify-deployment.sh
DORA Metrics to Track
Track these four metrics to measure pipeline performance:
| Metric | Target (Elite) | Target (High) |
|---|---|---|
| Lead time for changes | Under 1 hour | Under 1 day |
| Deployment frequency | Multiple times per day | - |
| Change failure rate | Below 15% | - |
| Mean time to recovery (MTTR) | Under 1 hour | - |
Quick Win Checklist
- [ ] Add dependency caching (5-10 minute savings)
- [ ] Use OIDC instead of long-lived credentials
- [ ] Enable secret masking in all logs
- [ ] Run SAST on every PR, DAST nightly
- [ ] Implement canary deployment with auto-rollback
- [ ] Add post-deployment metric verification
- [ ] Set up secret rotation (monthly)
Start with automated tests on every push and a single staging environment, then add deployment strategies, security scanning, and performance optimization as your team grows.
Want to dive deeper into a specific tool (Jenkins, GitLab CI, AWS CodePipeline) or deployment strategy?
Rizwan Saleem — https://rizwansaleem.co
Top comments (0)