Samson Tanimawo

Posted on Apr 24

Deployment Frequency: How We Went From Weekly to 20x/Day

#cicd #devops #deployments #sre

The Deploy Fear

We deployed once a week. On Thursdays. With a 2-hour deployment window. Three engineers on standby. A rollback plan printed on paper (yes, really).

Everyone was terrified of deployments because they were big, risky, and painful.

The Paradox: Deploy More = Fail Less

Counter-intuitive but proven: increasing deployment frequency reduces failure rate.

Weekly deploys: Big changes, high risk, hard to debug
Average changeset: 15 PRs, 2000+ lines changed
Failure rate: 18%
MTTR when fails: 45 min (too many suspects)

Daily deploys: Medium changes, moderate risk
Average changeset: 3 PRs, 400 lines changed
Failure rate: 8%
MTTR when fails: 15 min

20x/day deploys: Tiny changes, low risk, easy to debug
Average changeset: 1 PR, <100 lines changed
Failure rate: 2%
MTTR when fails: 3 min (one suspect = instant rollback)

Phase 1: Remove Manual Gates (Week 1-2)

Our deploy process had 6 manual steps:

Before:
1. Developer opens deploy request (Jira ticket)
2. Lead reviews and approves (wait 2-4 hours)
3. QA runs manual test suite (wait 1-2 hours)
4. Ops team schedules deploy window (wait 1 day)
5. Ops runs deploy script manually
6. Developer verifies in production

After:
1. Developer opens PR
2. CI runs tests automatically (10 min)
3. PR approved by peer (30 min)
4. Merge to main = auto-deploy to staging
5. Automated smoke tests pass = auto-deploy to production
6. Automated verification (health checks + canary metrics)

Total time: 4-24 hours → 45 minutes.

Phase 2: Test Confidence (Week 3-6)

You can't deploy fast without fast, reliable tests:

test_pyramid:
unit_tests:
count: 2000
run_time: 90 seconds
reliability: 99.9% # No flaky tests allowed

integration_tests:
count: 200
run_time: 5 minutes
reliability: 99.5%

e2e_tests:
count: 30
run_time: 8 minutes
reliability: 98%

total_ci_time: 14 minutes # Must be under 15

rules:
- flaky_test_policy: "Fix or delete within 48 hours"
- new_feature_requires: "unit + integration tests"
- ci_time_budget: "Never exceed 15 minutes"

Phase 3: Progressive Delivery (Week 7-10)

deploy_pipeline:
stages:
- name: build_and_test
duration: 14 min
gate: all_tests_pass

- name: deploy_staging
duration: 2 min
gate: automated_smoke_tests

- name: canary_production
traffic: 5%
duration: 10 min
gate: error_rate < 0.5%, latency < 2x baseline

- name: gradual_rollout
steps: [25%, 50%, 100%]
duration: 15 min per step
gate: all_metrics_healthy

Phase 4: Feature Flags (Week 11-14)

Deploy code without enabling features:

from feature_flags import is_enabled

def get_recommendations(user):
if is_enabled('new_recommendation_engine', user=user):
return new_engine.recommend(user) # Deployed but only for 5% of users
return old_engine.recommend(user)

This separates deployment (technical) from release (business). Deploy 20x/day, release features when ready.

The Culture Change

Old mindset: "Deployments are dangerous events"
New mindset: "Deployments are routine operations"

Old: "Let's batch these changes for Thursday"
New: "Ship it now, it's one small change"

Old: "Who's on deploy duty?"
New: "Everyone deploys their own code"

Results After 6 Months

Metric	Before	After
Deploy frequency	1x/week	18-22x/day
Lead time (commit to prod)	5 days	45 minutes
Change failure rate	18%	2.1%
MTTR	45 min	3 min
Developer satisfaction	3.2/5	4.7/5

The DORA metrics improved across the board. But the biggest win was cultural: engineers stopped fearing production.

If you want AI-powered deployment safety that gives your team the confidence to ship fast, check out what we're building at Nova AI Ops.

Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

DEV Community