The answer is the one that a lot of people don’t like: “It depends.”
So let's see what toil actually is, when it’s worth automating, and later, a bit of my personal opinion on the topic.
What is Toil?
It isn’t just some annoying work—it has distinct characteristics:
- Manual: Requires human intervention, even if it’s just running a script.
- Repetitive: Happens frequently, consuming time without creating lasting improvements.
- Automatable: Could be replaced by a machine or designed away entirely.
- Lacks Enduring Value: Doesn’t leave the system in a better state after completion.
- Scales Linearly: Work increases linearly bounded to some source metric, making it harder to manage. Example: if the number of users jumps from one thousand to one million, would that increase this particular chore?
When Should Toil Be Automated?
Automation isn’t always the right answer, and deciding when to automate requires careful consideration. Key factors include:
- High-frequency tasks: If a repetitive issue slows down operations, automation is often the right move.
- Low cognitive load: Simple, deterministic tasks are great candidates for automation, while judgment-based work may require human oversight.
- Strategic impact: If eliminating toil frees engineers to work on high-value projects, the investment is worth it.
- System complexity: Over-automating fragile or unpredictable systems can introduce new risks instead of solving problems.
Then, how to reduce Toil without Over-Automating?
While automation is a powerful tool, other approaches can help optimize efficiency:
- Improve system design – Reduce unnecessary complexity to minimize recurring manual work.
- Refine operational processes – Standardizing workflows can make toil more manageable.
- Prioritize high-impact fixes – Using Service Level Objectives (SLOs) ensures engineers focus on the most valuable improvements.
- Balance automation and human oversight – Some tasks require both structured processes and automation to function effectively.
Opinions
The books explores directly the difference on Toil and Non-Toil as repetitive tasks that is more like a chore than cognitive load. In the near future this might change as we have access to LLMs and MCP servers to give AIs the necessary tools to also handle some of the cognitive load. Which means more time would be needed to be saved for maintenance and enhancements of such tools.
source:
- https://xkcd.com/1319/
- Google SRE Chapter 5 https://sre.google/sre-book/eliminating-toil/
- Google Workbook Chapter 6 https://sre.google/workbook/eliminating-toil/
Top comments (0)