The DevOps metrics that actually matter (and how to track them)
The 4 DORA metrics
Deployment Frequency Elite: Multiple/day Low: < once/month
Lead Time for Changes Elite: < 1 hour Low: > 1 month
Change Failure Rate Elite: 0-5% Low: 15-30%
MTTR Elite: < 1 hour Low: > 1 week
Track deployment frequency in GitHub Actions
- name: Record deployment metric
if: success()
run: |
aws cloudwatch put-metric-data --namespace "DevOps/Deployments" --metric-name "DeploymentCount" --value 1 --dimensions Service=${{ env.SERVICE_NAME }},Environment=production
Track lead time (commit → production)
- name: Record lead time
if: success()
run: |
COMMIT_TIME=$(git show -s --format=%ct ${{ github.sha }})
LEAD_TIME=$(($(date +%s) - COMMIT_TIME))
aws cloudwatch put-metric-data --namespace "DevOps/Deployments" --metric-name "LeadTimeSeconds" --value $LEAD_TIME --dimensions Service=${{ env.SERVICE_NAME }}
Track MTTR via alarm state changes (Lambda)
def handler(event, context):
alarm = event['detail']['alarmName']
state = event['detail']['state']['value']
ts = datetime.fromisoformat(event['time'].replace('Z', '+00:00'))
if state == 'ALARM':
ssm.put_parameter(Name=f'/incidents/{alarm}/start',
Value=ts.isoformat(), Type='String', Overwrite=True)
elif state == 'OK':
start = ssm.get_parameter(Name=f'/incidents/{alarm}/start')
mttr = (ts - datetime.fromisoformat(start['Parameter']['Value'])).total_seconds()
cw.put_metric_data(Namespace='DevOps/Incidents',
MetricData=[{'MetricName':'MTTR','Value':mttr,'Unit':'Seconds'}])
Leading indicators
- Alert fatigue rate — rising = signal quality degrading
- Deployment size — larger = higher failure rate
- Test coverage trend — declining = more future failures
Step2Dev includes deployment metrics instrumentation in the workflow it generates.
Top comments (1)
The MTTR tracking via Lambda alarm state changes is a clean approach I haven't seen before. Most teams I've worked with track MTTR manually in spreadsheets -- this automates the painful part.