DEV Community

Cover image for AWS Service Spotlight: AWS Systems Manager (SSM)
Durrell  Gemuh
Durrell Gemuh

Posted on

AWS Service Spotlight: AWS Systems Manager (SSM)

Welcome to my AWS Service Spotlight series, where I break down AWS services, how they work, when to use them, and how they fit into real-world DevOps systems.

This week we're talking about AWS Systems Manager (SSM) one of those services that quietly does a ton of heavy lifting in production environments, yet doesn't always get the spotlight it deserves.

What is AWS Systems Manager?

Simply put, SSM is AWS's operations hub for managing your infrastructure at scale. Think of it as a remote control for your EC2 instances — and a whole lot more.

More technically: SSM is a collection of tools that lets you automate operational tasks, run commands across fleets of instances, manage configuration, patch systems, and access instances securely — all without needing a bastion host or open SSH/RDP ports.

Why Use It?

The Problem It Solves

Managing dozens or hundreds of servers manually is a nightmare. You'd need to SSH into each one, run scripts, hope nothing breaks, and repeat. SSM eliminates that entirely.

Use it when you need to:

  • Run commands across many instances simultaneously
  • Install software or agents on a fleet without manual access
  • Access instances that have no public IP or open ports
  • Automate patching and compliance checks
  • Store and retrieve secrets and config values securely

Who should care:

  • DevOps and platform engineers managing cloud infrastructure
  • Security teams who want auditability and zero open ports
  • Anyone deploying software to EC2 at scale

How I Used It This Week

This week I had a real, practical challenge: deploy the Datadog monitoring agent across a mixed fleet of Linux and Windows EC2 instances — in a way that any AWS account could run, without hardcoding credentials or writing instance-specific scripts.

Here's what I did:

I created two public SSM Command documents — one for Linux, one for Windows — and published them from a central AWS account with public permissions, so they're callable by ARN from literally any AWS account in the world.

Each document accepts just two parameters at runtime:

  • DDApiKey — the Datadog API key
  • DDSite — the Datadog intake region (defaults to datadoghq.com)

For Linux, the document runs the official Datadog shell installer via curl. For Windows, it uses msiexec with the Datadog MSI package — and getting the PowerShell quoting right (outer single quotes, inner escaped double quotes) was the key to making it work reliably through SSM.

The result: a one-click, parameterized, cross-OS monitoring deployment that any team can run against their fleet in minutes — no SSH, no RDP, no manual steps.

Step-by-Step: How I Built and Published the SSM Documents

Here's exactly how I did it — entirely through the AWS Console, no CLI required.

Step 1: Create the Linux Document

  1. Go to AWS Console → Systems Manager → Documents
  2. Click "Create document" → "Command or Session"
  3. Fill in the details:
    • Name: InstallDatadogAgent-Linux
    • Document type: Command document
    • Content format: JSON
  4. Paste the following content:
{
  "schemaVersion": "2.2",
  "description": "Installs the Datadog Agent (v7) on Linux.",
  "parameters": {
    "DDApiKey": {
      "type": "String",
      "description": "(Required) Your Datadog API Key"
    },
    "DDSite": {
      "type": "String",
      "description": "Datadog intake site.",
      "default": "datadoghq.com",
      "allowedValues": [
        "datadoghq.com",
        "datadoghq.eu",
        "us3.datadoghq.com",
        "us5.datadoghq.com",
        "ap1.datadoghq.com"
      ]
    }
  },
  "mainSteps": [
    {
      "action": "aws:runShellScript",
      "name": "InstallDatadogLinux",
      "inputs": {
        "runCommand": [
          "DD_API_KEY={{ DDApiKey }} DD_SITE={{ DDSite }} DD_AGENT_MAJOR_VERSION=7 bash -c \"$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)\""
        ]
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode
  1. Click "Create document"

Step 2: Create the Windows Document

  1. Back in Documents → Create document → Command or Session
  2. Fill in the details:
    • Name: InstallDatadogAgent-Windows
    • Document type: Command document
    • Content format: JSON
  3. Paste the following content:
{
  "schemaVersion": "2.2",
  "description": "Installs the Datadog Agent (v7) on Windows.",
  "parameters": {
    "DDApiKey": {
      "type": "String",
      "description": "(Required) Your Datadog API Key"
    },
    "DDSite": {
      "type": "String",
      "description": "Datadog intake site.",
      "default": "datadoghq.com",
      "allowedValues": [
        "datadoghq.com",
        "datadoghq.eu",
        "us3.datadoghq.com",
        "us5.datadoghq.com",
        "ap1.datadoghq.com"
      ]
    }
  },
  "mainSteps": [
    {
      "action": "aws:runPowerShellScript",
      "name": "InstallDatadogWindows",
      "inputs": {
        "runCommand": [
          "$p = Start-Process -Wait -PassThru msiexec -ArgumentList '/qn /i \"https://windows-agent.datadoghq.com/datadog-agent-7-latest.amd64.msi\" /log C:\\Windows\\SystemTemp\\install-datadog.log APIKEY=\"{{ DDApiKey }}\" SITE=\"{{ DDSite }}\"'",
          "if ($p.ExitCode -ne 0) {",
          "  Write-Host \"msiexec failed with exit code $($p.ExitCode). Check C:\\Windows\\SystemTemp\\install-datadog.log\" -ForegroundColor Red",
          "  exit $p.ExitCode",
          "}"
        ]
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode
  1. Click "Create document"

Step 3: Make Both Documents Public

This is the step that makes the documents usable from any AWS account in the world.

For each document:

  1. Go to Documents → Owned by me
  2. Click the document name
  3. Go to the "Permissions" tab
  4. Select Public acknowledge and save.

That's it. The document is now publicly accessible via its ARN.

Step 4: Copy the Document ARN

On each document's detail page, copy the ARN. It looks like:

arn:aws:ssm:us-east-1:123456789012:document/InstallDatadogAgent-Linux
arn:aws:ssm:us-east-1:123456789012:document/InstallDatadogAgent-Windows
Enter fullscreen mode Exit fullscreen mode

Anyone in any AWS account can now reference these ARNs directly in Run Command — no need to copy or recreate the documents.

Step 5: Run the Documents

From any AWS account:

  1. Go to Systems Manager → Run Command
  2. Click "Run command"
  3. In the search box → select "Document name prefix" → paste the full ARN
  4. Fill in parameters:
    • DDApiKey → your Datadog API key
    • DDSitedatadoghq.com (or your region)
  5. Under Targets → choose instances by tag or select manually
  6. Click "Run"

A Note on Document Content

The two documents are intentionally kept minimal and focused:

  • No hardcoded values — API key and site are always passed at runtime
  • Separate documents per OS — avoids the SSM precondition quirk that causes false failures on mixed fleets
  • No environment or tag parameters — kept lean so anyone can run it without knowing your internal tagging conventions
  • Linux uses the official Datadog shell installer — the same script you'd run manually
  • Windows uses the official MSI installer via PowerShell's Start-Process — the quoting pattern (outer single quotes, inner escaped double quotes) is critical for SSM to pass the arguments correctly to msiexec

IAM Requirement

Before SSM can communicate with an instance, the instance needs an IAM Role attached with this single policy:

AmazonSSMManagedInstanceCore

To set it up:

  1. IAM → Roles → Create role → EC2
  2. Attach AmazonSSMManagedInstanceCore
  3. Name it EC2-SSM-Role → Create
  4. EC2 → Select instance → Actions → Security → Modify IAM role → attach the role

No reboot needed. SSM will recognize the instance within about a minute.

Other Real-World Use Cases

DevOps Pipelines
Trigger SSM Run Command from a CI/CD pipeline to deploy application updates across an auto-scaling group after a build completes.

Kubernetes (EKS)
Use SSM Session Manager to access EKS worker nodes securely without exposing SSH. Great for debugging node-level issues.

Security & Compliance
Use SSM Patch Manager to automatically patch OS vulnerabilities on a schedule and audit compliance across your fleet.

Secrets & Config Management
Store database passwords, API keys, and feature flags in SSM Parameter Store. Pull them securely at runtime in Lambda, ECS, or EC2 — no hardcoded secrets.

Incident Response
Use Run Command to instantly restart services, collect logs, or run diagnostics across an entire fleet during an incident — in seconds, not hours.

Hybrid & On-Prem
SSM works with on-premises servers too via Hybrid Activations. Manage your data center the same way you manage your cloud.

Key Features

  • Run Command — execute scripts across any number of instances simultaneously
  • Session Manager — browser-based terminal, no SSH keys or open ports needed
  • Parameter Store — secure storage for config values and secrets
  • Patch Manager — automated OS patching with compliance reporting
  • State Manager — enforce desired configuration state continuously
  • Distributor — package and deploy software agents at scale
  • Documents — reusable, versionable, shareable automation scripts
  • Public Documents — shareable across any AWS account via ARN

How It Works (High-Level)

Every EC2 instance runs an SSM Agent (pre-installed on most modern AMIs). This agent maintains a persistent, outbound-only connection to the SSM service endpoints over HTTPS.

When you trigger a Run Command or Session:

You (Console/API)
      ↓
AWS SSM Service
      ↓
SSM Agent on Instance (outbound HTTPS — no inbound ports needed)
      ↓
Executes command, streams output back
Enter fullscreen mode Exit fullscreen mode

The instance needs:

  1. SSM Agent installed and running
  2. IAM Role with AmazonSSMManagedInstanceCore policy attached
  3. Outbound HTTPS (port 443) to SSM endpoints

No VPN, no bastion, no open security group rules.

Integration with Other AWS Services

Service How SSM Works With It
EC2 Core target — manage instances directly via agent
IAM Role-based access controls who can run what documents
S3 Stream command output to S3 for long-running jobs
CloudWatch Send SSM logs and metrics to CloudWatch for alerting
EKS Access worker nodes securely via Session Manager
Lambda Pull config/secrets from Parameter Store at function runtime
EventBridge Trigger SSM automations on schedule or in response to events

Alternatives

AWS Alternatives:

  • AWS OpsWorks — configuration management using Chef/Puppet, heavier setup
  • AWS Config — focused on compliance auditing, not execution
  • EC2 User Data — runs scripts at launch only, not on-demand

Non-AWS Alternatives:

  • Ansible — powerful but requires network access and more setup
  • Chef / Puppet — enterprise config management, complex overhead
  • Terraform — infrastructure provisioning, not runtime operations

SSM wins when you're already in AWS and want zero additional infrastructure to manage.

When NOT to Use It

  • You need complex configuration management with dependency resolution — Ansible or Chef handles that better
  • Your instances are not on AWS and you don't want Hybrid Activations overhead
  • You need real-time streaming logs — CloudWatch or a dedicated log agent is better suited
  • You're managing containers directly — native ECS/EKS tooling is more appropriate
  • Your team is already deeply invested in Ansible — adding SSM creates duplication without enough gain

Final Thoughts

SSM is one of those services you don't fully appreciate until you've managed infrastructure without it. Once it clicks — the secure access, the fleet-wide automation, the public reusable documents — it becomes a default part of how you think about AWS operations.

This week's use case was a great reminder that SSM isn't just for patching or basic scripts. With a little thought, you can build reusable, cross-account, cross-OS automation that scales to any team or environment.

If your EC2 instances don't have SSM set up yet, that's the first thing I'd fix.

Let's Connect

If this helped you think differently about AWS operations, drop a comment or share it with someone managing EC2 fleets.

I'm also building NextGen Playground — a platform helping engineers gain real-world DevOps experience through hands-on projects, mentorship, and practical learning.

If you're trying to level up your cloud and DevOps skills with real projects, not just tutorials — check it out and let's build together.

Top comments (0)