DEV Community

Cover image for How to monitor Openshift using Datadog Operator
Jade
Jade

Posted on

How to monitor Openshift using Datadog Operator

Co-authors: Luiz Bernardo Levenhagen and Leonardo Araujo

 

In this article, we will demonstrate how to integrate Openshift with Datadog using Datadog operator to collect metrics,logs, events and also applications' data.

In this article we use the following versions:

  • Openshift v4.13.11
  • Datadog Operator v1.3.0
  • Datadog account (more information on how to request a trial at the bottom of the blog)

 

About

  • This article is aimed at users who would like to integrate or monitor their Openshift Cluster using the Datadog monitoring solution.

  • We will use the datadog operator to instantiate our agent and collect all metrics(cluster/containers), cluster and container/pod logs, network, cpu, memory consumption as well as applications' data.

  • Red Hat does not support the DataDog operator or its configuration, for any questions related to the use of the platform or operator, contact DataDog.

Prerequisites

  • User with the cluster-admin cluster role
  • Openshift 4.10 or +
  • Datadog account (more information on how to request a trial at the bottom of the blog)

Procedure

Datadog

Add API Keys

  • To add a new datadog API Key, navigate to Organization Settings > API Keys
  • If you have the permission to create API keys, click New Key in the top right corner.
  • Define the desired name, something that can help you identify in the future.
  • Once created, copy the Key so we can use it later.

Image description

Add Application keys

  • To add a new datadog Application Key, navigate to Organization Settings > Application Keys
  • If you have the permission to create Application Keys, click New Key in the top right corner.
  • Define the desired name, something that can help you identify in the future.
  • Once created, copy the Key so we can use it later.

Image description

Openshift

Datadog Operator Install

  • In the Openshift console, in the left side menu, click Operator > OperatorHub > in the search field, type datadog

Image description

💡 Tip
Whenever available, use a certified option.

 

  • As we can see, we are using version 1.3.0 of operator, click Install.

Image description

 

  • On this screen, we will keep all the default options:
    • Update channel: stable
    • Installation mode: All namespaces the cluster(default)
    • Installed Namespace: openshift-operators
    • Update approval: Automatic
      • Obs.: If you prefer, you can use the Manual option.
    • Click Install.

Image description

 

  • Wait until the installation is complete.

Image description

Create secret with Datadog keys (not mandatory, but good practice)

  • In the terminal, access the openshift-operators namespace context


$ oc project openshift-operators


Enter fullscreen mode Exit fullscreen mode

 

  • Now let's create a secret to store in this API Key and Application Key, replace the values below with the keys we generated previously in the Datadog console.


$ oc create secret generic datadog-secret \
--from-literal api-key=`REPLACE_ME` \
--from-literal app-key=`REPLACE_ME`


Enter fullscreen mode Exit fullscreen mode

 

  • Let's now instatiate our datadog agent using the yaml below


$ cat <<EOF > datadog_agent.yaml
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
  namespace: openshift-operators
spec:
  features:
    apm:
      enabled: true
      unixDomainSocketConfig:
        enabled: true
    clusterChecks:
      enabled: true
      useClusterChecksRunners: true
    dogstatsd:
      originDetectionEnabled: true
      unixDomainSocketConfig:
        enabled: true
    eventCollection:
      collectKubernetesEvents: true
    liveContainerCollection:
      enabled: true
    liveProcessCollection:
      enabled: true
    logCollection:
      containerCollectAll: true
      enabled: true
    npm:
      collectDNSStats: true
      enableConntrack: true
      enabled: true
  global:
    clusterName: DemoLab
    credentials:
      apiSecret:
        keyName: api-key
        secretName: datadog-secret
      appSecret:
        keyName: app-key
        secretName: datadog-secret
    criSocketPath: /var/run/crio/crio.sock
    kubelet:
      tlsVerify: false
    site: datadoghq.eu
  override:
    clusterAgent:
      containers:
        cluster-agent:
          securityContext:
            readOnlyRootFilesystem: false
      replicas: 2
      serviceAccountName: datadog-agent-scc
    nodeAgent:
      hostNetwork: true
      securityContext:
        runAsUser: 0
        seLinuxOptions:
          level: s0
          role: system_r
          type: spc_t
          user: system_u
      serviceAccountName: datadog-agent-scc
      tolerations:
      - operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
EOF        


Enter fullscreen mode Exit fullscreen mode
  • Some explanations about what we are enabling in this agent

Enabling the APM (Application Performance Monitoring) feature



apm:
  enabled: true
  unixDomainSocketConfig:
    enabled: true


Enter fullscreen mode Exit fullscreen mode

 

Cluster Check extends the autodiscover function to non-containerized resources and checks if there is some integration/technology to monitor.



clusterChecks:
  enabled: true
  useClusterChecksRunners: true


Enter fullscreen mode Exit fullscreen mode

 

Dogstatsd is responsible for collecting custom metrics and events and sending them from time to time to a metrics aggregation service on the Datadog server.



dogstatsd:
  originDetectionEnabled: true
  unixDomainSocketConfig:
    enabled: true


Enter fullscreen mode Exit fullscreen mode

 

Here we are enabling the collection of all logs (including container logs) and events generated in our cluster and sending them to Datadog.



eventCollection:
  collectKubernetesEvents: true
liveContainerCollection:
  enabled: true
liveProcessCollection:
  enabled: true
logCollection:
  containerCollectAll: true
  enabled: true


Enter fullscreen mode Exit fullscreen mode

 

With NPM (Network Performance Monitoring), we can have visibility of all traffic in our cluster, nodes, containers, availability zones, etc.



npm:
  collectDNSStats: true
  enableConntrack: true
  enabled: true


Enter fullscreen mode Exit fullscreen mode

 

In the credentials block in Global, we have the definition of the secret previously created with the API and app key.



credentials:
  apiSecret:
    keyName: api-key
    secretName: datadog-secret
  appSecret:
    keyName: app-key
    secretName: datadog-secret


Enter fullscreen mode Exit fullscreen mode

 

In this block, we define the path to the cri-o service socket, we define the non-checking of tls for communication with the kubelet and in website, we define which datadog server will receive the data sent.



criSocketPath: /var/run/crio/crio.sock
kubelet:
  tlsVerify: false
site: datadoghq.eu


Enter fullscreen mode Exit fullscreen mode

 

In the clusterAgent block in override, we add SecurityContext(scc) settings and which serviceaccount should be used in the datadog-cluster-agent pods.



clusterAgent:
  containers:
    cluster-agent:
      securityContext:
        readOnlyRootFilesystem: false
  replicas: 2
  serviceAccountName: datadog-agent-scc


Enter fullscreen mode Exit fullscreen mode

❗Note
The datadog-agent-scc serviceaccount is created automatically by the operator and already has all the necessary permissions for the agent to run correctly.

 

In the nodeAgent block in override, we define settings for SecurityContext for the datadog-agent pods, we will use the same datadog-agent-scc serviceaccount and we also define the tolerations for the nodes that have taints created, in our case for the master nodes.



nodeAgent:
  hostNetwork: true
  securityContext:
    runAsUser: 0
    seLinuxOptions:
      level: s0
      role: system_r
      type: spc_t
      user: system_u
  serviceAccountName: datadog-agent-scc
  tolerations:
  - operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/master



Enter fullscreen mode Exit fullscreen mode

 

  • After some explanations, let's deploy our datadog agent. Execute this command to create the object: ```bash

$ oc -n openshift-operators create -f datadog_agent.yaml


- Once created, we will validate that our agent was created correctly

```bash


$ oc -n openshift-operators get datadogagent
$ oc -n openshift-operators get pods


Enter fullscreen mode Exit fullscreen mode

❗Note
Here we should have a datadog-agent running on each available openshift node.

Image description

❗Information

  • datadog-agent-xxxxx pods, is responsible for collecting all metrics, events, traces and logs from each node in the cluster.
  • datadog-cluster-agent-xxxxx pods, will act as a proxy between the API server and node-based agents, Cluster Agent helps to ease the server load.

 

  • Now let's validate the logs of the datadog-agent-xxxxx pods, to identify if there is any communication error.


$ oc logs -f -l app.kubernetes.io/managed-by=datadog-operator --max-log-requests 10


Enter fullscreen mode Exit fullscreen mode

Image description

 

Datadog platform/UI

  • Now on the Datadog platform, in the left side menu, click on Infrastructure > and then onInfrastructure List`

Image description

❗Information
Server data, such as status, cpu information, memory and other details, may take a few minutes to be displayed.

 

  • To view more details about a specific node, click on the node name and navigate through the available tabs. It’s just the simplest way to check your nodes/hosts.

Image description

 

  • Under the Infrastructure menu, Datadog also gives you an exclusive Kubernetes menu where you have the full picture about your cluster. You can check the state of all of your Kubernetes resources, troubleshoot patterns, access out-of-the-box Dashboards and enable some recommended Alerts to monitor your environment

Image description

  • You can also explore deeper the containers running in your Openshift environment, going to Infrastructure > Containers. Here you get chance to analyse things like logs from containers, traces, networking layer, processes running inside the container and so on...

Image description

  • To view more details about network traffic, in the left side menu, go to Infrastructure > Network Map`

Image description

 

  • To view the logs received from the cluster or from any application or technology running in your kubernetes environment, in the left side menu, go to Logs > Analytics, on this screen, we can view all the details, filter application logs and even view the processes.

Image description

 

  • To view all collected metrics, in the left side menu, go to Metrics >Explorer`, here we can view all metrics, run and save queries or create dashboards based on queries.

Image description

 

  • Datadog provides out-of-the-box Dashboards that can be used and customized. To use one available, in the left side menu, go to Dashboards > Dashboard List` > choose the dashboard and click on the name.

Image description

❗Note:
To customize a dashboard provided by Datadog, use the Clone feature to make the desired changes and save.

 
 

Conclusion

Using the Datadog Operator solution, we can have a complete monitoring solution for our Openshift cluster with main features such as APM, Network Analysis, Logs, Events and Metrics.

To request an Openshift trial and learn more about our solution, click here.

To request a Datadog trial and be able to replicate this knowledge, click here.

 

References

For more details and other configurations, start with the reference documents below.

Top comments (2)

Collapse
 
bcouetil profile image
Benoit COUETIL 💫

Hello, welcome here, and thank you for your detailed walkthrough !

I was wondering : is it different from installing to a vanilla Kubernetes, or would it apply the same ? What are the key differences ? Does it depend on the Cloud provider ?

Collapse
 
jadelassery profile image
Jade • Edited

Hey @bcouetil ! happy to see here in my first post. I have not tried an env different than Openshift, so not sure if this will work in a "simple" Vanilla k8s (even though Openshift uses Vanilla, there are a lot of functionalities out of the box extending the Kubernetes, like native Operators). And no matter the Cloud provider, it should work the same.