Originally published on Failure is Inevitable.
We live in the era of reliability. The most important feature for a service is how dependable it is in the eyes of a user. Companies are hiring with this in mind. In a 2019 LinkedIn article, site reliability engineers were listed as the 2nd most promising career in the United States.
But how do you get started as an SRE? In this blog post, we’ll look at:
- Key comprehensions and skills for an SRE
- Positions and credentials that can develop into the SRE role
- The career paths of some successful SREs #Key Comprehensions and Skills for an SRE# SRE is a multifaceted role. You will contribute to an organization's code base, policy, culture, and more. To succeed, you’ll need skills in a variety of categories.
Technical skills
Writing code may not be the primary duty of an SRE, but some technical knowledge is required. Systems thinking, in particular, is helpful for SREs. TechTarget defines systems thinking as "a holistic approach to analysis that focuses on the way that a system's constituent parts interrelate and how systems work over time and within the context of larger systems."
SREs will need to communicate with different teams, from server administration to testing. You’ll need to understand their roles and responsibilities to cooperate effectively. You’ll also need to understand how the work they do affects the system as a whole. Then, you’ll need to be able to communicate this to other teams to create shared context. Here are some skills that you’ll find valuable in creating this shared context:
- Reading and writing some common programming languages such as Python
- Expertise in administrative tools such as Jira
- Configuring and understanding monitoring tools
- Understanding of server administration or cloud management
- Writing and executing runbooks
- Setting up SRE tools such as SLOs
- Knowledge of system architecture, such as Kubernetes
- Recognizing and implementing processes that can be automated
The requirements of each organization will vary based on architecture and tech stack. The important thing is to understand the fundamentals of these technical areas. You’ll be able to learn the specifics of each position more easily once you know the basics.
Policy and process skills
A major part of the SRE role is setting up policies and procedures. In some cases, this means directing the reliability strategy for an entire organization. In other cases, you’ll consult with other teams to align on reliability goals and processes. Here are some skills that can help:
- Creating templates for incident retrospectives
- Scheduling planned reviews of incident retrospectives
- Setting standards for documentation
- Making incident classification tables
- Building reliability models
- Collaboratively creating on-call schedules
- Advising on building an SRE tool stack
- Creating communication infrastructure, like Slack channels
- Determining policy around alerting and escalation
Again, there is no single right way to make any of these policies or processes. Become comfortable with tackling these challenges in any circumstance. It helps if you understand the purpose of each document. Then you can know what a successful implementation will accomplish.
Cultural implementation skills
One of the most important aspects of being an SRE is driving cultural shifts in an organization. The skills required for this relate to an attitude you cultivate through experience. Here are some examples:
- Lead blameless retrospective meetings that promote learning
- Educating the organization on the meaning of reliability
- Keeping the focus on customer satisfaction
- Championing the importance of SRE principles and practices
- Recognizing and distributing glue work
- Creating a psychologically safe environment
- Encouraging risk-taking, agency, and ownership
- Changing attitudes about failure from fear to celebration
While these "soft skills" often take a backseat to the technical ones, they are just as important. Or even more important. Without these cultural skills, SREs will find driving reliability initiatives more challenging.
Positions and Credentials for SREs
SRE is a holistic approach. As such, many roles can evolve into the SRE position. Even people from outside tech disciplines can learn to become SREs. When organizations build an SRE team, they look for a diversity in perspectives and experiences.
With that said, there are some common career paths that lead into the SRE role. We’ll take a look at some of them.
What positions can develop into SRE?
Here is a chart showing how different positions can develop into the SRE role:
What credentials help with becoming an SRE?
Here are some credentials that will help in securing an SRE position:
- Completion of the Site Reliability Engineering course on Coursera
- Certifications for specific coding languages or tools (examples here)
- Certifications in soft skills courses e.g. conflict management, communication
- Completion of the O’Reilly SRE Workbook
Accumulating these credentials can bolster your resume and help you secure interviews.
Career Paths of SREs
Many SREs have shared their journey. Learning from their paths can be an inspiration for your own. Here are a few that may light your own path.
Dan Lüdtke breaks down his path to SRE. This includes specifics on how he built his resume and the challenges he faced in interviewing.
Alice Goldfuss writes about her journey to SRE. She offers practical tips on what to look for in a position and an extensive list of further reading.
Tigran Hakobyan writes about his experiences becoming the first SRE at Buffer. He covers his motivation for becoming an SRE, and what he still hopes to achieve in the role.
Krishelle Hardson-Hurley talks about how she discovered the SRE role in an article for hackernoon. This features a fun “SRE Compatibility Quiz”!
Samira Sarraf asked SREs to describe the skills that helped them grow into the position. The article in Computerworld Australia discusses both technical knowledge and soft skills you ought to acquire.
If you’re interested in learning more about what being an SRE entails, check out Blameless, a tool used by SREs at companies like Home Depot, Iterable, Under Armour, and more. Try our sandbox environment today.
If you enjoyed this blog post, check out these resources:
Top comments (0)