Stefano Maffulli for Kubecost

Posted on May 12, 2021 • Originally published at blog.kubecost.com

How to set real-time cost alerts in Kubernetes

#kubernetes #devops

Kubernetes gave the great power to engineering to scale applications infrastructure without much fuss. Sadly, budgets don't scale as easily. DevOps and finance team can rely on the open source Kubecost project for real-time alerting of cost overruns.
In corporate environments it’s crucial to give all stakeholders the ability to track resource usage and cost to each service within a shared cluster. This clear ownership improves budget-based resource planning, identifies resource bottlenecks, and even helps manage security risks. Once your ecosystem is properly organized, the next step towards clarity is to set up cost alerts.

How Kubecost Adds Clarity

Two of the key features of Kubecost are its cost allocation views for granular insights and its notifications triggered by cost alerts.

1. Granular Insights

Kubecost can break costs down to any Kubernetes component level (according to usage), down to individual workloads. The cost allocation model supports all native Kubernetes concepts, including cluster, namespace, controller, deployment, service, label, pod, and container.

The view below shows a deployment along with its allocated costs and its resource efficiency score (by comparing idle to used resources), as well as its health score (calculated based on checks conveniently pre-configured based on industry best practices).

These historical cost measurements can serve as benchmarks for setting initial alerting or budget thresholds.

2. Notifications

Kubecost supports the following types of cost alert notifications:

Recurring Update: Great for creating scheduled cost reports allocated by namespace.
Budget: Great for surfacing a budget overrun relative to a defined threshold.
Spend Change: Great for detecting a jump in your spending habits (based on a historical moving average).
Efficiency: Great for detecting over-provisioned CPU, memory, or storage according to a defined efficiency ratio threshold between 0 and 1.

Let us expand some more on these alerting use cases.

Recurring Update Alert

This mode of alerting is better thought of as a scheduled report. Suppose an engineering manager responsible for a Kubernetes cluster has created multiple namespaces to delegate self-administration to various application teams. The engineering manager can schedule a “recurring update” alert to receive a regular report of usage and cost allocated by namespace to ensure that each group uses a fair portion of the shared cluster.

Budget Alert

The budget alert compares daily spending to a preset threshold (that can be typed in the Kubecost user interface) and alerts only if the threshold is exceeded. Once you combine this alert with the cost allocation feature of Kubecost, you achieve the fastest mechanism to notify the right person who is capable of rectifying a cost over-run on the very first day that the excess occurs, thus avoiding an end-of-month surprise.

Spend Change Alert

This type of alert is most helpful if you don't have a preset budget or would like to avoid setting thresholds altogether. Instead, you would simply like to know if your spending experiences a sudden unexpected increase. By comparing your current spending to a historical trend line, you will receive an alert as soon as your spending breaks from a typical daily pattern.

Efficiency Alert

An increase in spending is not always bad as it may be directly related to increased workload or increased business activity. The efficiency index identifies waste in your Kubernetes cluster, but it also detects bottlenecks. The efficiency alert notifies administrators of over and under-provisioning of resources that often go undetected for long periods.

To set the scope of an alert, simply add criteria to the aggregation (as defined in the aggregated cost model API) and filter settings. Aggregation supports dimensions such as cluster, namespace, controller, deployment, service, label, pod, and container. Filters let you choose which aggregations (such as a specific namespace) to include in the notification.

When you set up alerts based on usage per namespace (or per cluster), you can conveniently define daily budget thresholds in the UI unique to each project or team and direct notifications to relevant parties and collaborative tools. In this way, you not only cut out the noise, but you also deliver valuable information directly to the stakeholders who can act on it.

Via Email

Via Slack

Via Webhook

A third option is to simply use a generic webhook to integrate with just about any third-party tool such as PagerDuty or OpsGenie.

How to Get Started with Kubecost Alerts

1. Install Kubecost

Installing Kubecost in your Kubernetes cluster only takes a few minutes using Helm. Follow the installation guide and can configure all of the alerts in the Helm values section as shown below.

2. Set up Cost Alerts

You can configure your cost alerts from the Kubecost Helm values file. For each alert you complete the following:

Define your thresholds based on your budgetary goals
Filter namespaces unrelated to a given project or team
Add notifications for stakeholder awareness

The following is an example of a Helm values block.

  notifications:
    # Kubecost alerting configuration
    # Ref: http://docs.kubecost.com/alerts
    alertConfigs:
      enabled: false # the example values below are never read unless enabled is set to true
      frontendUrl: http://localhost:9090 # optional, used for linkbacks
      globalSlackWebhookUrl: "https://hooks.slack.com/services/<REDACTED>" # optional, used for Slack alerts
      kubecostHealth: true # Alerts generated for kubecost uptime. Uses the globalSlackWebhookUrl to deliver the alert
      globalAlertEmails:
        - user1@example.com
      alerts: # Alerts generated by kubecost, about cluster data
          # Daily namespace budget alert on namespace `kubecost`
        - type: budget # supported: budget, recurringUpdate
          threshold: 0.50 # optional, required for budget alerts
          window: daily # or 1d
          aggregation: namespace
          filter: elasticsearch
          ownerContact: # optional, overrides globalAlertEmails default
            - user1@example.com
            - user2@example.com
          slackWebhookUrl: "https://hooks.slack.com/services/T069Z9TFF/<REDACTED>" # optional, used for alert-specific Slack alerts
          # Daily cluster budget alert (clusterCosts alert) on cluster `cluster-one`
        - type: budget
          threshold: 1.0 # optional, required for budget alerts
          window: daily # or 1d
          aggregation: cluster
          filter: prod-cluster # does not accept csv
          # Recurring weekly update (weeklyUpdate alert)
        - type: recurringUpdate
          window: weekly # or 7d
          aggregation: namespace
          filter: '*'
          # Recurring weekly namespace update on kubecost namespace
        - type: recurringUpdate
          window: weekly # or 7d
          aggregation: namespace
          filter: kubecost
          # Spend Change Alert
        - type: spendChange  # change relative to moving avg
          relativeThreshold: 0.20  # Proportional change relative to baseline. Must be greater than -1 (can be negative)
          window: 1d                # accepts 'd', 'h'
          baselineWindow: 30d       # previous window, offset by window
          aggregation: namespace
          filter: kubecost, default # accepts csv

3. Scale Across Clusters

For production deployment, we recommend that you configure the system using the Helm values file (instead of using the UI). This enables you to reference your Helm values file for configuration across multiple clusters; to reuse your Helm values file, simply provide the path of the Helm chart and values file to your continuous delivery (CD) system (such as ArgoCD) and wait for the product to be deployed.

Once deployed, you'll see the following workloads in your cluster:

4. View

If you would just like to quickly explore the product:

Run the following command:

kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

Navigate to http://localhost:9090.
Explore.

Let Kubecost run for a couple of hours before taking a full tour so that there is enough data in the system to populate all fields.

When moving to production, you can expose Kubecost via ingress, SAML, or other mechanism that meets your security requirements. This is similar to how you might give access to other resources, such as a Jenkins UI.

Setup Tips

Kubecost integrates with most Identity providers (such as Google Auth) and also supports SAML-based authentication. All of this can be configured in the Helm values file.
Kubecost uses Prometheus alertmanager for alert delivery. If you already have an instance of alertmanager running, you can configure its endpoint in the Helm values file and Kubecost can use that for alert notifications.

  notifications:
    # Kubecost alerting configuration
    # Ref: http://docs.kubecost.com/alerts
    alertConfigs:
  ...

    alertmanager: # Supply an alertmanager FQDN to receive notifications from the app.
      enabled: true # If true, allow kubecost to write to your alertmanager
      fqdn: http://kubecost-prometheus-alertmanager #example fqdn. Ignored if prometheus.enabled: true

See the Kubecost Alerts troubleshooting guide or join the Kubecost Slack group for community advice and support.

Conclusion

Kubecost provides a holistic view of infrastructure cost by collating both in-cluster and out-of-cluster resources with support for all leading public cloud providers. It takes this a step further by generating timely alerts and actionable scheduled reports for your stakeholders, allowing them to better control overspending on cloud resources.

From an operational standpoint, setup is quick and painless. Kubecost integrates with all leading identity providers and you can also use an existing alertmanager or Prometheus installation to integrate with, which is a huge plus, especially when most teams already have a Prometheus deployment running in their clusters.

DEV Community