DEV Community

Ryan Pothecary
Ryan Pothecary

Posted on

The re-re-rebirth of AWS Systems Manager

If you were launching an EC2 Instance back in 2015 you’d have noticed that the console had a lot of things going on. Nestled in the Instances subsection on the left hand side was a brand new option called ‘Commands’

Image description

I think the best AWS services and features are the simplest to explain and yet are usually the most powerful to use. Run Command does guess what? It runs a command or script or actually these days performs lots of actions on your fleet of Instances. So simple and yet so potentially powerful and versatile. Add to Run Command a suite of other features, all with specific functions, and all focused on managing your long-running instances.

And yet, AWS Systems Manager has never received the credit that it rightly deserves.
Is this due to the promise of the immutable nirvana of DevOps ?.

When talking about AWS Systems Manager with customers I would often hear ‘Why would you even need to ‘manage’ EC2 instances?’ The idea behind this was explained early-on to me with the rather brutal ‘Pets versus Cattle’ analogy. We shouldn’t be ‘managing’ servers, we certainly shouldn’t be ‘logging-on’ to them, nor patching them, nor taking backups nor doing any manner of things that we’d been doing for years in our on-premise existence. Instead, Instances were disposable. If something was wrong, terminate it and allow auto-scaling or your Cloudformation template to rebuild it fresh and trouble-free. Our entire immutable infrastructure is rebuilt nightly and therefore doesn’t require patching or managing.

I really didn’t want to be managing Instances so all of this is fantastic news to me.

Except….
My reality, and more accurately, my customers reality, is there are a number of server roles which need to be up and running 24/7/365. In any reasonably sized company you’ll find applications running on a single server, running something potentially critical (or more likely something unknown) and this application is not going to be happy with being rebuilt nightly.

If you are saying ‘containerisation’ to me now, then believe me I’m saying something equally unpleasant right back 'atcha.

Then there’s anything Microsoft-related, all connected to an Active Directory which was designed to register new servers and doesn’t deal very well with servers being removed and reregistered every 24 hours. And it’s those servers that AWS Systems Manager ably takes care of.

Fast forward to November 2017, myself and my colleagues had created a Workshop to run at AWS re:Invent which focused on managing those long-lived EC2 Instances and adding advocating AWS’s Directory Services as your Domain Controller replacement.
On reflection, it was lucky that our session was scheduled for the Wednesday, since in Andy Jassey’s Tuesday morning keynote he introduced a new service called AWS Systems Manager. The new AWS Systems Manager service took all the management features out of the EC2 Console and plonked them in their own service. This release, which we were very much not expecting, broke everything in our workshop. Our workshop guide was wrong and had to be re-written and everything had to be tested again. All overnight.

As an aside, re:Invent is a terrible time if you are an AWS employee. You have zero idea what is being launched until the presenters are on stage saying it. So, you find out about new stuff at the exact same time as your customers (who expect you to know all about it). Even the product teams are unaware if their new feature or service is droping during pre:Invent, re:Invent or afterwards.

We finally fixed everything and the Workshop was a success and we ran it during re:Invent 2018 also, since not a huge amount had changed for the service.

And since 2018, there have been new features which have been interesting to explore, adding more in the way of Service Management to AWS Service Manager. And also, this little service now serves 450 million nodes and runs 2.5 Billion scripts every month*.

But now we have a new version of AWS Systems Manager with a whole new raison d'être.
Since the original AWS Systems Manager the world has embraced cloud and are now working across multiple AWS accounts and even multiple cloud platforms. The new AWS Systems Manager is focused on this disparate fleet of servers and gives you a single place to manage them all.
Hang on, hasn’t AWS Systems Manager always been able to manage servers on-prem and in other cloud platforms via its agent ? Well yes, but now it looks like someone has actually designed it to be used this way. The new Node Management screen very clearly shows your managed and unmanaged instances/servers/nodes across your entire estate/accounts/clouds/on-prem. It also integrates with Amazon Q (which I am slowly falling in love with) to provide insights via familiar GenAI text prompts.

There is some real thought gone into the new AWS Systems Manager. Gaining feedback from customers, AWS have focused on 3 specific areas – VIEW, GOVERN, ACT and clearly the new experience allows you to do those things.

Image description
View, Govern, Act in the new AWS Systems Manager

One area that has recently been given a breath of fresh air is the Automation feature which now allows no-code drag and drop to create a runbook of all the steps in your automation – very powerful. It also has a range of pre-created runbooks nicely organised into the tasks that you’d like to complete. We know we should be automating everything, this has made the task a lot simpler.

I’ve a feeling that there’s a lot more to come in 2025 for this service now that new, firmer foundations are in place and there’s a vision of where the service sits. I really do feel that all companies would benefit from re-evaluating AWS Systems Manager. Its much more than 21 individual features these days.

I’d like to see a refresh of all the other features that perhaps have not received much care and attention since they launched. I’d also love to see tighter integration with Cloudwatch and a sprinkle of some more Amazon Q magic dust to draw my attention to potential issues amongst my fleet of servers in the future and perhaps pre-emptively fix them via automation? Session Manager is a great tool, but needs an authorisation workflow and if I’m advising customers to use Session Manager I want a quick way of disabling Instance Connect without delving into IAM Policies. There’s a lot to be done, but the recent changes are a great start and I hope gives customers the opportunity to look at AWS Systems Manager with a fresh pair of eyes.

What changes would you like to see?

Top comments (0)