DORA stands for DevOps Research and Assessment and is a research group acquired by Google in 2018. Their goal is to help organizations achieve high DevOps and organizational performance with data-driven insights. In their 2018 book, Accelerate, they identified a set of metrics (4) that could help organizations measure their team's software delivery performance, to improve them. Improving the delivery performance will have an impact on the business
These four key metrics are:
- Deployment Frequency: how often an organization successfully releases to production. This metric doesn't say if you are releasing valuable software or not, but if you don't release frequently there's nothing to measure
- Lead Time for Changes: the amount of time it takes a commit to get into production. This metric should be very useful for improving the release process: if it's too long maybe the process needs some changes.
- Time to Restore Services: how long it takes an organization to recover from a failure in production.
- Change Failure Rate: the percentage of deployments causing a failure in production. From a business point of view, it’s essentially an indication of the business risk of making changes.
The first two are velocity metrics, the last two are stability metrics.
I looked a lot for a tool that could help my team track these metrics. I knew that Jira, Gitlab, and some other tools implement these features but I was looking for something with a high level of customization, and not so coupled to a specific product (we use different systems for deployment, change, and incident) and one day I incidentally discovered FourKeys.
FourKeys is an open-source project developed by Google, it consists of a generalized ETL Pipeline that can be extended to process inputs from a wide variety of sources. The pipeline ingests the data into Bigquery and then they should be displayed in a Grafana dashboard.
The Github repo contains all the instructions to set up and run the project, here instead we can focus on the architecture.
The events are generated by your development environment, or better to say, by the tools you use in your dev environment. Gitlab, Github, Tekton, and Circle-CI are supported out of the box, and you can easily extend the supported set of tools.
The handler is a Cloud Run endpoint, it collects the event and pushes them into the appropriate Pub/Sub topic.
Another instance of Cloud Run consumes the events, makes some transformations and then inputs the data into BigQuery. At first, the data are sent into a raw events table (it's your operation log). From there, some scheduled query read from the log table and write to three different tables: Incidents, Changes and Deployments.
These tables are used to calculate the metrics and show them inside a dashboard.
Top comments (0)