These are the notes from Chapter 7: The Evolution of Automation at Google from the book Site Reliability Engineering, How Google Runs Production Systems.
This is a post of a series. The previous post can be seen here:
SRE book notes: Monitoring Distributed Systems
Hercules Lemke Merscher ・ Jan 18 ・ 2 min read
doing automation thoughtlessly can create as many problems as it solves
It isn’t appropriate to automate every component of every system, and not everyone has the ability or inclination to develop automation at a particular time. Some essential systems started out as quick prototypes, not designed to last or to interface with automation.
Automate Yourself Out of a Job: Automate ALL the Things!
We graduated from optimizing our infrastructure for a lack of failover to embracing the idea that failure is inevitable, and therefore optimizing to recover quickly through automation.
A team not running automation has no incentive to build systems that are easy to automate.
The most functional tools are usually written by those who use them.
shipping and iterating rapidly might allow you to implement functionality faster, yet rarely makes for a resilient system.
A post worth reading, from the Engine Yard blog:
If you liked this post, consider subscribing to my newsletter Bit Maybe Wise.
You can also follow me on Twitter and Mastodon.
Photo by Lenny Kuhne on Unsplash
Top comments (0)