DEV Community

Cover image for Monitor EC2 instance metrics with Datadog (step-by-step)
Esther Ninyo for AWS Community Builders

Posted on

4 1

Monitor EC2 instance metrics with Datadog (step-by-step)

Hi there,
Ever thought of a straight forward way to monitor your EC2 instance metrics with Datadog but couldn't get a simplified solution? Look no further!

The three phases to get this up and running are:
Phase one: Enable the AWS integration in Datadog
Phase two: Deploy the Datadog agent on your EC2 instance
Phase three: Start creating your monitors

Datadog agent can be installed directly on your EC2 instances which gives you the ability to collect metrics such as memory, CPU, disk etc within a short period of time.
To have a robust understanding of how this works, please visit the Datadog blog post for more detail.

Pre-requisite:

To continue with this hands-on, make sure you have the following:

  • EC2 instance

  • Datadog account

Project deep dive

For the scope of this project, we will be monitoring the following system-level EC2 metrics such as:

  • High CPU Utilization

  • High Memory Utilization

  • High Disk Utilization

PHASE ONE

This phase consist of enabling the AWS integration in Datadog to allow monitoring of the EC2 instance.

  • We will setup the Datadog integration using terraform. You can get it here.
Folder structure
--> EC2 monitoring
------> provider.tf
------> main.tf
------> variables.tf
Enter fullscreen mode Exit fullscreen mode

main.tf

data "aws_caller_identity" "current" {}

data "aws_iam_policy_document" "datadog_aws_integration_assume_role" {
   statement {
   actions = ["sts:AssumeRole"]

   principals {
      type = "AWS"
      identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]
   }
   condition {
      test = "StringEquals"
      variable = "sts:ExternalId"

      values = [
         "${datadog_integration_aws.sandbox.external_id}"
      ]
   }
   }
}

data "aws_iam_policy_document" "datadog_aws_integration" {
   statement {
        actions = [
            "ec2:Describe*",
            "ec2:GetTransitGatewayPrefixListReferences",
            "ec2:SearchTransitGatewayRoutes"
        ]

   resources = ["arn:aws:ec2:${var.region}:${data.aws_caller_identity.current.account_id}:instance/${var.instance_id}"]
   }
}

resource "aws_iam_policy" "datadog_aws_integration" {
   name = "TutorialDatadogAWSIntegrationPolicy"
   policy = "${data.aws_iam_policy_document.datadog_aws_integration.json}"
}

resource "aws_iam_role" "datadog_aws_integration" {
   name = "TutorialDatadogAWSIntegrationRole"
   description = "Role for Datadog AWS Integration"
   assume_role_policy = "${data.aws_iam_policy_document.datadog_aws_integration_assume_role.json}"
}

resource "aws_iam_role_policy_attachment" "datadog_aws_integration" {
   role = "${aws_iam_role.datadog_aws_integration.name}"
   policy_arn = "${aws_iam_policy.datadog_aws_integration.arn}"
}

resource "datadog_integration_aws" "sandbox" {
   account_id  = "${data.aws_caller_identity.current.account_id}"
   role_name   = "TutorialDatadogAWSIntegrationRole"
}
Enter fullscreen mode Exit fullscreen mode

variable.tf

variable "region" {
  type        = string
  description = "The AWS region to use."
  default     = "eu-west-1"
}

variable "datadog_api_key" {
  type        = string
  description = "The Datadog API key."
  default     = "<REDACTED>"
}

variable "datadog_app_key" {
  type        = string
  description = "The Datadog application key."  
  default     = "<REDACTED>"
}

variable "instance_id" {
  type        = string
  description = "EC2 instance ID."  
  default     = "<REDACTED"
}
Enter fullscreen mode Exit fullscreen mode

provider.tf

terraform {
  required_version = "~> 1.6"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
    }
    datadog = {
      source  = "DataDog/datadog"
    }
  }
}

# Configure the AWS Provider
provider "aws" {
  region = var.region

  default_tags {
    tags = {
      Environment = terraform.workspace,
      ManagedBy   = "Terraform"
    }
  }
}

# Configure the Datadog provider
provider "datadog" {
    api_key = var.datadog_api_key
    app_key = var.datadog_app_key
    api_url = "https://api.datadoghq.eu"
}

Enter fullscreen mode Exit fullscreen mode

Get datadog app key, api key and api url

  • Go to your datadog profile at the bottom left and click on organisation settings.

datadog profile image

  • Locate the navigation pane at the left (1), under access (2), click on application key (3) to create a new key. Also click on the api key (4) to create a new key to be used.

datadog key

  • Click on this link to access the api url depending the Datadog site you use. Replace app with api.

WHAT NEXT?
The next line of action is to initialise, plan and apply your terraform changes. To do this, use the command below in your folder home directory:
terraform init

terraform plan

terraform apply

If the terraform plan is successful, you should see the resources that will be created after running terraform apply like the result below:

terraform plan
data.aws_iam_policy_document.datadog_aws_integration: Reading...
data.aws_caller_identity.current: Reading...
data.aws_iam_policy_document.datadog_aws_integration: Read complete after 0s [id=1400131043]
data.aws_caller_identity.current: Read complete after 0s [id=134130342652]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
 <= read (data resources)

Terraform will perform the following actions:

  # data.aws_iam_policy_document.datadog_aws_integration_assume_role will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "datadog_aws_integration_assume_role" {
      + id   = (known after apply)
      + json = (known after apply)

      + statement {
          + actions = [
              + "sts:AssumeRole",
            ]

          + condition {
              + test     = "StringEquals"
              + values   = [
                  + (known after apply),
                ]
              + variable = "sts:ExternalId"
            }

          + principals {
              + identifiers = [
                  + "arn:aws:iam::<REDACTED>:root",
                ]
              + type        = "AWS"
            }
        }
    }

  # aws_iam_policy.datadog_aws_integration will be created
  + resource "aws_iam_policy" "datadog_aws_integration" {
      + arn         = (known after apply)
      + id          = (known after apply)
      + name        = "TutorialDatadogAWSIntegrationPolicy"
      + name_prefix = (known after apply)
      + path        = "/"
      + policy      = jsonencode(
            {
              + Statement = [
                  + {
                      + Action   = [
                          + "ec2:SearchTransitGatewayRoutes",
                          + "ec2:GetTransitGatewayPrefixListReferences",
                          + "ec2:Describe*",
                        ]
                      + Effect   = "Allow"
                      + Resource = "arn:aws:ec2:<REDACTED>:instance/<REDACTED>"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + policy_id   = (known after apply)
      + tags_all    = {
          + "Environment" = "default"
          + "ManagedBy"   = "Terraform"
        }
    }

  # aws_iam_role.datadog_aws_integration will be created
  + resource "aws_iam_role" "datadog_aws_integration" {
      + arn                   = (known after apply)
      + assume_role_policy    = (known after apply)
      + create_date           = (known after apply)
      + description           = "Role for Datadog AWS Integration"
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "TutorialDatadogAWSIntegrationRole"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + tags_all              = {
          + "Environment" = "default"
          + "ManagedBy"   = "Terraform"
        }
      + unique_id             = (known after apply)
    }

  # aws_iam_role_policy_attachment.datadog_aws_integration will be created
  + resource "aws_iam_role_policy_attachment" "datadog_aws_integration" {
      + id         = (known after apply)
      + policy_arn = (known after apply)
      + role       = "TutorialDatadogAWSIntegrationRole"
    }

  # datadog_integration_aws.sandbox will be created
  + resource "datadog_integration_aws" "sandbox" {
      + account_id                       = "<REDACTED>"
      + cspm_resource_collection_enabled = (known after apply)
      + external_id                      = (known after apply)
      + id                               = (known after apply)
      + metrics_collection_enabled       = (known after apply)
      + resource_collection_enabled      = (known after apply)
      + role_name                        = "TutorialDatadogAWSIntegrationRole"
    }

Plan: 4 to add, 0 to change, 0 to destroy.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.
Enter fullscreen mode Exit fullscreen mode
PHASE TWO

The second phase is to deploy the agent.
Use the command below to install the agent on ubuntu server:

DD_API_KEY=<API_KEY DD_SITE=<DATADOG_SITE> DD_APM_INSTRUMENTATION_ENABLED=host bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)"

Enter fullscreen mode Exit fullscreen mode

where:
API_KEY = your Datadog api key
DATADOG_SITE = Datadog site. For this exercise, we use "datadog.eu".

Depending on the operating system you use, navigate to this site to get the command for installing Datadog agent.

After the Datadog agent agent has been installed, go to your Datadog account, navigate to metrics and you will start to see the reports of your EC2 metrics in Datadog like the image below:

Datadog agent installation image

PHASE THREE

In this phase, we will create monitors for our EC2 instance for the metrics listed at the beginning of this tutorial.

A. HIGH CPU UTILISATION

  • On the monitors page in Datadog, on the top right, click on new monitor

metrics image

  • Click on metrics and configure your monitor

metrics image

  • The image below shows the configuration needed to monitor your EC2 cpu utilisation

metrics image

metrics image

metrics image

Your monitor should look like this after creation:

metrics image

metrics image

To understand each options used in creating the monitor, click here

B. HIGH MEMORY UTILISATION

metrics image

metrics image

metrics image

Your monitor should look like this after creation:

metrics image

C. HIGH DISK UTILISATION

metrics image

metrics image

metrics image

Conclusion

I hope you are able to follow through and also are able to create the Datadog monitors for your metrics. Do you have any question? Please send it my way. Kindly follow me on LinkedIn.

Concerned about the future of the software development career?

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Okay let's go

Community matters

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

👋 Kindness is contagious

Engage with a sea of insights in this enlightening article, highly esteemed within the encouraging DEV Community. Programmers of every skill level are invited to participate and enrich our shared knowledge.

A simple "thank you" can uplift someone's spirits. Express your appreciation in the comments section!

On DEV, sharing knowledge smooths our journey and strengthens our community bonds. Found this useful? A brief thank you to the author can mean a lot.

Okay