The DevOps metrics that actually matter (and how to track them)

#devops #dora #aws #cloudwatch

The DevOps metrics that actually matter (and how to track them)

The 4 DORA metrics

Deployment Frequency    Elite: Multiple/day    Low: < once/month
Lead Time for Changes   Elite: < 1 hour        Low: > 1 month
Change Failure Rate     Elite: 0-5%            Low: 15-30%
MTTR                    Elite: < 1 hour        Low: > 1 week

Track deployment frequency in GitHub Actions

- name: Record deployment metric
  if: success()
  run: |
    aws cloudwatch put-metric-data       --namespace "DevOps/Deployments" --metric-name "DeploymentCount" --value 1       --dimensions Service=${{ env.SERVICE_NAME }},Environment=production

Track lead time (commit → production)

- name: Record lead time
  if: success()
  run: |
    COMMIT_TIME=$(git show -s --format=%ct ${{ github.sha }})
    LEAD_TIME=$(($(date +%s) - COMMIT_TIME))
    aws cloudwatch put-metric-data       --namespace "DevOps/Deployments" --metric-name "LeadTimeSeconds"       --value $LEAD_TIME --dimensions Service=${{ env.SERVICE_NAME }}

Track MTTR via alarm state changes (Lambda)

def handler(event, context):
    alarm = event['detail']['alarmName']
    state = event['detail']['state']['value']
    ts    = datetime.fromisoformat(event['time'].replace('Z', '+00:00'))

    if state == 'ALARM':
        ssm.put_parameter(Name=f'/incidents/{alarm}/start',
                          Value=ts.isoformat(), Type='String', Overwrite=True)
    elif state == 'OK':
        start = ssm.get_parameter(Name=f'/incidents/{alarm}/start')
        mttr  = (ts - datetime.fromisoformat(start['Parameter']['Value'])).total_seconds()
        cw.put_metric_data(Namespace='DevOps/Incidents',
                           MetricData=[{'MetricName':'MTTR','Value':mttr,'Unit':'Seconds'}])

Leading indicators

Alert fatigue rate — rising = signal quality degrading
Deployment size — larger = higher failure rate
Test coverage trend — declining = more future failures

Step2Dev includes deployment metrics instrumentation in the workflow it generates.

👉 step2dev.com

Top comments (2)

klement Gunndu • Mar 18

The MTTR tracking via Lambda alarm state changes is a clean approach I haven't seen before. Most teams I've worked with track MTTR manually in spreadsheets -- this automates the painful part.

Yash • Mar 19

Thanks! I noticed MTTR tracking is often manual, so I wanted to automate it using alarm state changes to make it more accurate and real-time.