re: Making On-Call Not Suck VIEW POST


Hi Molly, great article!

Let me expose my, admittedly controversial, view. Isn't the mere existence of a dedicated Team of SREs a problem?

IMO several companies have been drinking the Google Kool-Aid and "rebranding" Infrastructure / Networking / Support people and eventually Devs with some tooling expertise into "SREs". Companies promise more time for people to work on automation and product optimization, but, in reality, they are still a small number of people mostly putting out 🔥 on projects that they know little about. This is still Silo thinking under disguise.

On the other side of the wall, Devs without the required Ops experience are churning out half-baked, overly complex, wasteful and insecure deployments. Great, now Developers "own" it and everyone can be on call stressed, burned-out and panicking. Hooray, DevOps! Quick, let's all send our CVs to the next company! We'll all get a better paying job until the cycle repeats itself.

Alternative solution: Hire more people with Ops experience and a SRE mindset, place them in real DevOps teams (yes, you heard me: At least one talented Ops person per team). Let people with Ops backgrounds and Dev backgrounds really work together and learn from each other from the beginning. Make sure that SREs (as well as Devs) have a good relationship with their respective Chapters so that standards naturally emerge. Devs learn some tricks from Ops experts, write better infrastructure as a code and get to understand / account for what is required to monitor and troubleshoot the product that they are building from the start. Ops actually get the time to work on automation and learn the feature side of things as it's being built (plus they also get proper code reviews and learn a trick or two from experienced Devs. No disrespect but Going through Python / Bash / Ruby scripts written by Ops guys can be a nightmare. As bad as the Terraform / Ansible stuff that Devs put together). With enough time the team finds its pace and agree on a sane on call schedule.

SRE guy/girl is a very senior expert with a great overall picture of the product and the ability to optimize things across the board? No worries... Give him/her (and everyone else) the freedom to move across teams. But still make sure that he/she's part of a team that actually delivers features. Make sure that he/she stays in the team just long enough to spread some of their knowledge to the team, as well as learn more of the specifics of what is being built ATM.

I'm not saying that there's no place for dedicated Infrastructure and First Line Support teams with skilled engineers and innovative solutions. Neither I'm saying that we can neglect the overall picture and the few Devs that can actually handle it. I'm just saying that Google's SRE model is not for everyone. As it stands, it feels like most companies are getting DevOps wrong (as much as they get the core values of Agile wrong). IMO a team of SREs, huge or small, will be always simultaneously overworked (responsibility-wise) and underutilized (in terms of their actual potential). Even if you are blessed with a few people with the rare combination of domain / infrastructure and development expertise to do the job properly, it still sounds like a huge waste of their time and brainpower.

What do you think? Am I right? Or am I'm getting the SRE side of things completely wrong?


I don't speak for other companies but I do know for us at Kenna our SRE team has been a god send! Having a team that can focus solely on the reliability and scalability of our system has been a big win for us and it has greatly improved the quality of our platform for customers and for our devs and operation teams internally. I actually wrote a blog post about what our SRE team focuses on.

As for knowledge sharing, we actually do a dev SRE rotation that allows devs to get a peek at what we focus on as SREs. In addition, the SRE team works very closely with our dev teams pumping out features to ensure the features are performant. They are definitely not off in a corner on there own.

Sure, having an SRE team might not be for everyone but in our case, it has been an enormous benefit.

Code of Conduct Report abuse