Originally published on TechSaaS Cloud
Originally published on TechSaaS Cloud
CI/CD Pipeline Optimization: From 20-Minute to 3-Minute Builds
Real numbers from a startup that cut build times by 85% — every step with code.
The Problem: 20 Minutes of Watching Spinners
Our CI pipeline was 20 minutes. On a busy day with 30+ PRs, that meant 10 hours of cumulative CI time. Developers context-switched while waiting. Reviews stalled. Deployments backed up.
We're a 12-person team running 84 Docker containers on self-hosted infrastructure. Our stack: Python + TypeScript + Go microservices, GitHub Actions CI, Docker-based deploys, PostgreSQL + Redis.
Every optimization below is free. No paid CI tools. No enterprise cache services. Just configuration changes and architectural decisions.
The 6 Changes That Got Us to 3 Minutes
1. Docker Layer Caching (Saved: 6 minutes)
Before: Every build pulled fresh base images and reinstalled all dependencies.
# BAD: invalidates cache on every code change
FROM python:3.12-slim
COPY . /app
RUN pip install -r requirements.txt
After: Separate dependency installation from code changes.
# GOOD: dependencies cached until requirements.txt changes
FROM python:3.12-slim
COPY requirements.txt /app/
RUN pip install -r requirements.txt
COPY . /app
In GitHub Actions, enable BuildKit cache:
- name: Build
uses: docker/build-push-action@v5
with:
context: .
cache-from: type=gha
cache-to: type=gha,mode=max
push: true
Impact: First build unchanged. Subsequent builds skip the 6-minute dependency installation step entirely. Cache hit rate: ~92%.
2. Parallel Test Sharding (Saved: 5 minutes)
Before: 847 tests running sequentially: 8 minutes.
After: Split across 4 parallel runners using pytest-split:
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- name: Run tests
run: |
pytest --splits 4 --group ${{ matrix.shard }} \
--splitting-algorithm least_duration
The least_duration algorithm uses historical test timing data to balance shards evenly. We store timing data in .test_durations committed to the repo.
Impact: 8 minutes → 2.5 minutes (longest shard). The parallelism costs 4x the runner minutes, but wall-clock time dropped 68%.
For Indian startups on GitHub's free tier (2,000 minutes/month), this is a trade-off. We self-host our runners on the same server that runs production — more on that in step 6.
3. Dependency Pre-Build with Docker Compose (Saved: 3 minutes)
Before: Every microservice built its own node_modules or venv from scratch.
After: A shared base image with pre-installed dependencies, rebuilt only when lockfiles change.
# docker-compose.ci.yml
services:
deps-python:
build:
context: .
dockerfile: Dockerfile.deps-python
image: registry.local/deps-python:latest
service-api:
build:
context: ./services/api
args:
BASE_IMAGE: registry.local/deps-python:latest
# Dockerfile.deps-python
FROM python:3.12-slim
COPY requirements/*.txt /deps/
RUN pip install -r /deps/base.txt -r /deps/test.txt
A separate nightly CI job rebuilds the deps image. Feature branch builds pull it from our local registry.
Impact: Eliminated redundant dependency installation across 6 Python services. Saved ~3 minutes per build.
4. Smart Test Selection (Saved: 2 minutes)
Not every commit needs every test. We built a simple mapper:
# .github/scripts/test_selector.py
import subprocess, json, pathlib
changed = subprocess.check_output(
["git", "diff", "--name-only", "origin/main...HEAD"]
).decode().strip().split("\n")
test_map = {
"services/api/": "tests/api/",
"services/auth/": "tests/auth/",
"services/billing/": "tests/billing/",
"shared/": "tests/", # shared code = run everything
}
tests_to_run = set()
for file in changed:
for src, test_dir in test_map.items():
if file.startswith(src):
tests_to_run.add(test_dir)
# If nothing matched, run everything (safety net)
if not tests_to_run:
tests_to_run.add("tests/")
print(" ".join(tests_to_run))
- name: Select tests
id: tests
run: echo "dirs=$(python .github/scripts/test_selector.py)" >> $GITHUB_OUTPUT
- name: Run tests
run: pytest ${{ steps.tests.outputs.dirs }}
Impact: Most PRs touch 1-2 services. Running only relevant tests: 2.5 minutes → 45 seconds. Full suite still runs on merge to main.
5. Artifact Caching for Lint and Type Checks (Saved: 2 minutes)
ESLint, mypy, and tsc have incremental modes. Use them:
- name: Cache mypy
uses: actions/cache@v4
with:
path: .mypy_cache
key: mypy-${{ hashFiles('**/*.py') }}
restore-keys: mypy-
- name: Type check
run: mypy --incremental src/
For ESLint:
- name: Cache ESLint
uses: actions/cache@v4
with:
path: .eslintcache
key: eslint-${{ hashFiles('**/*.ts', '**/*.tsx') }}
- name: Lint
run: eslint --cache --cache-location .eslintcache src/
Impact: Incremental lint/type-check: 2 minutes → 15 seconds on most PRs.
6. Self-Hosted Runners (Saved: 2 minutes of queue time)
GitHub-hosted runners have 30-90 second startup times plus queue time during peak hours. We run our CI on the same bare metal server as our staging environment.
runs-on: self-hosted
# In our runner setup (systemd service)
# Runner installed at /opt/actions-runner
# Runs as dedicated ci-runner user with Docker socket access
Setup (one-time, 15 minutes):
- Download GitHub Actions runner binary
- Create systemd service
- Give the runner user Docker socket access
- Configure labels for routing
Self-hosted runners start instantly — no cloud VM boot, no image pull. Queue time went from 30-90 seconds to 0.
For teams in India or Southeast Asia, this also eliminates the latency penalty of GitHub's US-based runners pulling from your APAC Docker registry.
Impact: 2 minutes of queue/startup time eliminated. Free. Forever.
The Result
| Step | Before | After |
|---|---|---|
| Queue + startup | 1.5 min | 0 min |
| Dependency install | 6 min | 0 min (cached) |
| Lint + type check | 2 min | 0.25 min |
| Build | 3 min | 0.5 min |
| Tests | 8 min | 2.5 min |
| Total | 20.5 min | 3.25 min |
85% reduction. Zero additional cost.
Common Mistakes That Negate These Gains
We've seen teams implement all six optimizations and still have slow pipelines. Here's why.
Mistake 1: Flaky tests that force re-runs. If 5% of your test suite is flaky, you'll re-run CI on average once every 3-4 PRs. That re-run costs the full pipeline time. We quarantine flaky tests into a separate non-blocking job: they run, their results are logged, but they don't block the PR. A weekly "flaky test cleanup" ticket keeps the quarantine from growing forever.
Mistake 2: Not pinning dependency versions. If your requirements.txt has unpinned ranges (requests>=2.28), the dependency resolution step runs every time — even with caching — because pip needs to check if a newer version satisfies the constraint. Pin exact versions (requests==2.31.0) and use Dependabot or Renovate for updates. This alone can save 30-60 seconds per build.
Mistake 3: Running security scans synchronously. SAST/DAST tools (Snyk, Trivy, Bandit) are important but slow. Run them in a parallel job that doesn't block the main build. Your pipeline reports results, but developers can merge without waiting for a 3-minute vulnerability scan. Critical findings trigger a separate alert. This principle extends to secret scanning too — we cover the full secret management pipeline in our dedicated guide.
Mistake 4: Over-building in CI. Some teams build Docker images for every microservice on every PR, even when the service code didn't change. Use the same path-based filtering from Step 4 to skip builds for unchanged services. Our docker-compose.ci.yml has a --profile flag per service — CI only activates profiles for services with code changes.
Mistake 5: Ignoring the feedback loop. After optimizing, most teams stop measuring. We track CI build times in Prometheus and alert if the p95 build time exceeds 5 minutes. Performance degrades slowly — a new dependency here, an extra test there — and without monitoring, you're back to 15 minutes within 6 months.
Security Considerations in Fast Pipelines
Fast pipelines are only valuable if they're secure. Skipping security checks for speed is a false economy.
Our approach: security scans run in parallel, never blocking the main build path, but their results are mandatory before deploy. The build completes in 3 minutes, the security scan completes in 5, and the deploy job waits for both.
jobs:
build-and-test: # 3 minutes
runs-on: self-hosted
steps: [...]
security-scan: # 5 minutes, runs in parallel
runs-on: self-hosted
steps:
- uses: aquasecurity/trivy-action@master
- run: bandit -r src/ -f json -o bandit-report.json
deploy: # waits for BOTH
needs: [build-and-test, security-scan]
if: github.ref == 'refs/heads/main'
steps: [...]
This means the critical path is still 5 minutes (the slower security scan), but the developer feedback loop (did my tests pass?) is 3 minutes. Developers get fast feedback; deploys get security guarantees.
For teams handling sensitive credentials in their pipelines, the secret management best practices we published today covers how to avoid leaking secrets through CI logs — a common issue with fast, parallelized builds.
What We'd Add Next
- Bazel or Nx for true incremental builds across a monorepo. We're not there yet — our repo isn't big enough to justify the complexity.
- Test impact analysis using coverage data to be even more surgical about test selection.
- Merge queues (GitHub's native feature) to batch CI runs and reduce total runner time.
- Remote build caching (Turborepo, Gradle remote cache) for teams with larger monorepos — we've seen this shave another 40% off already-optimized builds.
The ROI Math
The ROI on CI optimization is absurd. A 12-person team saving 17 minutes per build across 30 daily builds reclaims 8.5 engineering hours per day. That's a full-time engineer's worth of productivity — recovered by spending 2 days on pipeline optimization.
But the real ROI isn't time saved — it's behavior change. When CI takes 3 minutes, developers wait for results before context-switching. When it takes 20 minutes, they start another task and the PR review sits for hours. Fast CI changes how your entire team works. The same build vs buy analysis applies here: investing 2 days in pipeline optimization is always better than buying an expensive CI SaaS tool.
Frequently Asked Questions
Q: Does this work for monorepos?
Yes, with adjustments. Steps 4 (smart test selection) and the Docker profile trick become even more valuable in monorepos because the ratio of "code changed" to "total code" is smaller. For monorepos over 50 services, consider Bazel, Nx, or Turborepo for incremental build tracking — they maintain a dependency graph that makes test selection automatic rather than manual.
Q: What about Windows or macOS builds?
Self-hosted runners (Step 6) work on all platforms, but the Docker caching strategy (Steps 1 and 3) is Linux-specific. For macOS CI (common in mobile development), focus on dependency caching (Cocoapods, Carthage) and parallel test sharding (XCTest supports this natively). The ROI is even higher for macOS builds because GitHub-hosted macOS runners are 10x more expensive than Linux runners.
Q: We use GitLab CI / Jenkins / CircleCI — does this still apply?
Every optimization except the GitHub-specific YAML applies to any CI system. Docker layer caching works everywhere Docker runs. Parallel test sharding works with any test framework. Dependency pre-builds work with any registry. Self-hosted runners exist for GitLab (gitlab-runner), Jenkins (agents), and CircleCI (self-hosted runner). The concepts transfer; only the config syntax changes.
Related Reading
- Self-Hosted LLMs vs API: Cost Comparison — the self-hosted runner approach from Step 6 applied to AI inference infrastructure
- Build vs Buy Framework — should you build your own CI tooling or buy? (Spoiler: optimize what you have first)
- Secret Management for DevOps — keeping credentials secure in fast CI/CD pipelines
We help teams audit and optimize their CI/CD pipelines. If your builds take longer than 5 minutes, there's almost certainly low-hanging fruit.
Subscribe to our newsletter for weekly deep-dives into developer productivity and infrastructure optimization.
Top comments (0)