DEV Community

Cover image for Operational Excellence examples for Well Architected
The Serverless Edge
The Serverless Edge

Posted on • Originally published at theserverlessedge.com

Operational Excellence examples for Well Architected

We talk through our Operational Excellence examples from our well architected experiences. Our post is the first in a series of conversations on the well architected architected framework and pillars.

Operational Excellence examples and Well Architected

Dave Anderson  

We've written about well architected and the well architected pillars of SCORP or SCORPS, There are now six well architected pillars. Well architected is really interesting, because AWS, Google and Azure have their own versions of well architected. They're all quite similar. We have had great success from working through these pillars.

So we figured we'd hit each pillar and have a quick chat about them starting with operational excellence. Is there anything else you would like to say at a high level about well architected.

Mark McCann  

It's something we have found  to be incredibly useful. It gives a frame of reference and a structure for asking better questions of our teams, systems, structures, and our processes and practices. It has been hugely useful for trying to evolve engineering, practices and companies. It's hardened and approved, and it's been battle tested in 1000s of companies which gives it a lot of credibility. And it's not just Dave, Mark and Mike's opinions. It's good practice that has been proven to work. 

Mike O'Reilly  

That's a major strength, isn't it? I like the ubiquity. Whether you're an architect, an engineer or a manager in one organisation, when you go to another, it'll make sense.

Operational Excellence should be part of continuous architecture

Dave Anderson  

It's not a yearly process to deliver compliance once a year with well architected. It should be part of continuous architecture. The reason why I always encourage people to get certification is not for a bit of paper or a free water bottle, it's because you have to learn well architected as part of certification. So starting with operational excellence, the AWS pillar breaks down into three areas. Each area has five or six questions. So the three areas (in the operational excellence pillar) are prepare, operate, evolve.

Image description
Photo by Ameya Sawant on Unsplash

Operational Excellence Pillar - prepare, operate and evolve

Dave Anderson  

Operational excellence means a lot of things to a lot of people, but let's chat about prepare. What have your found to be in the prepare part of this?

Mark McCann  

It's great to go in new areas and teams to asking these questions:

  • Do you know who your users are?
  • What is the purpose of your team ?
  • Do you know what your highest priority is?

Some are very simple, basic questions.

Are you set up to meet the challenges that you're faced with, the business requirements that you're going to pursue or the needs you're trying to meet?

Asking simple questions can be revealing

So simple questions like how do you determine what your priorities are can be very revealing. If you are in a safe space with the whole team involved you can get a really good conversation. We know our priorities for this week and for next week, but we're not quite sure what we're doing for the month after. It's a good conversation to tease out if you are aligned with the strategic direction? Do you have a prioritisation framework or are you making it up 'on the hoof'? 

Mike O'Reilly  

This pillar needs the whole team involved in the conversation. Some questions require management to be involved, some require the tech lead or the engineer to understand the big picture and operations. We talk about consistency.  In this section there are recommendations for playbooks/runbooks and standards for making preparations for your operation: prepare for failure or everything fails all the time.

Operational Excellence: Prepare

You have got to prepare to move onto post implementation and hand off to different team or place where you're bringing on new engineers or whatever.  Do you have the runbooks for the operations in a particular workload? Do you have the playbooks that are linked to observability in your dashboard, so that when things go wrong, there's a solid set of instructions to deal with that problem and they don't have to go in and unpack what you've built out. So there's a lot of good, solid foundational guidance. From an architecture perspective (we're all architects), it's table stakes for consistency across teams.

Dave Anderson  

'Prepare' looks at tribal knowledge like when you ask a question and the response is 'Fred says'. In other words: 'I don't know why we do that, but Fred says, we do that'. Or the response is: 'ask my manager'. But what happens when your manager isn't there? We need leadership and empowerment within the team and written down for everyone. So 'Prepare' checks team culture.

Mark McCann  

It also checks simple stuff like: do you have enough people to meet the challenges? Do you have assigned owners who are going to be responsible for processes, practices and operations. If you can get these foundations in place early, you evolve, go down through the lifecycle and start applying the other well architected pillars. Your chance for success greatly improves because your operational excellence pillar has set the foundation.

Operational Excellence: Operate

Dave Anderson  

The next pillar is operate.  So you start with prepare and then move to operate. I like operate because there's a lot of observability.  I like thinking of a workload as an asset, how to understand the health of that asset and how to monitor it to make sure it's working well.

Mike O'Reilly  

It's about getting the team ready for production. A particular bugbear of mine is when teams aren't thinking about how to validate in production and how to spot regression. What are the key performance indicators of the workload? When things go wrong, are they able to spot it and have they thought about how to remediate or correct those sorts of things.

Things do go wrong

You go back to prepare again. There's always something that is going to go wrong, something you haven't predicted or an alternate path has been missed. So when those things happen, have you got the correct procedures for learning what that defect teaches so you can bake it in and toughen up your operation going forward. It's an holistic way of thinking and you need those mechanisms to show you how your workload performance by product. 

Mark McCann  

It's critical to have those information radiators and dashboards available and not just for the team.  If you have proper observability you can show the C suite the team working on a particular capability, feature or value stream and how it relates to our vision and strategy. That's proper operational observability across everything including not only the health of your workload, but the health of your team. Door key metrics should be part of how you operate with a sustainable pace for the team.

Operational Excellence: Evolve

Dave Anderson  

The last one is evolve. You go through prepare, operate and then evolve. And it's quite simply about how you evolve operations which doesn't mean cutting costs and reducing the budget!

Mark McCann  

It's what Mike said earlier. It's about having a continuous improvement mindset with feedback loops in place. We're big into mapping and evolution is a cornerstone of Wardley mapping. If you don't take these signals from your systems and your workloads on board and use them to evolve improve and get better than there's no point having observability and dashboards.

Mike O'Reilly  

That's the key point.  We've written about the SCORPS  process, and driver of continuous improvement. Your operations are going to generate a lot of data and  useful information that you, as an engineer, manager or architect can use to evolve your current setup. You should be always looking to learn.

There is always room for operational excellence improvement

Dave Anderson  

The operational excellence pillar sets us up nicely because once you think through evolve and operations, you're evolving the other pillars of cost, security, reliability, performance, and sustainability. You can always save more money, make the thing faster, more reliable, make it cheaper, make it more secure. People think operations are done because it's rolling and it's fine. But there's always things you can improve.

Mark McCann  

You set up for success and you put the foundational building blocks in place to increase your chances of a successful development cycle.

Dave Anderson  

So that's the operational excellence pillar from well architected. That's the craic. We'll be talking some more about the pillars. There are posts on this on TheServerlessEdge.com, on Twitter @ServerlessEdge, LinkedIn and Medium. So thanks very much. 

Transcribed by https://otter.ai

Top comments (0)