The Art of SLOs with Alex Bramley
Today on the podcast, Jon Foust is back with Mark Mirchandani as we talk about SLOs and the importance of measuring service reliability with Alex Bramley. As a member of the Google SRE team, Alex and his coworkers help customers optimally run their services on Google Cloud. They collaborate with the client, weighing client needs and user needs to develop a plan that is affordable, efficient, and has the highest reliability for the user. Recently, they’ve been working to automate functions such as detection of outages, so that Google and the customer can work together quickly to get everything working smoothly again.
Later, Alex, describes the steps developers go through at his workshop, The Art of SLOs, which was designed to help companies measure and improve reliability. At this workshop, attendees are encouraged to set SLO targets and error budgets. They are given theoretical reliability problems to solve, allowing them to practice without the added pressure of messy, real-world problems. The Art of SLOs helps developers understand what measurements are beneficial and why and the best way to implement projects that can take those measurements accurately. Alex was able to make the materials for the workshop free online!