DEV Community

amirreza valizade
amirreza valizade

Posted on

Manage Prometheus TSDB in the better way!

Prometheus is a powerful monitoring system that provides a simple solution to the retention of old data with --storage.tsdb.retention.size and --storage.tsdb.retention.time configurations. These configurations allow users to define the maximum size and age of data that should be retained in the Prometheus database. However, in some cases, users may need to store certain metrics for long-term purposes and delete unnecessaries. In this article, we will discuss how to label Prometheus targets and delete old data using the Admin API to meet such requirements.

Labeling Prometheus Targets

To retain specific metrics for a longer period, you need to label the targets from which Prometheus scrapes data. You can add a new label, such as retention_time, to the job configuration file for each target. The value of this label should represent the duration for which you want to retain the data. For example, you can set the label to "one-month", "three-month", "twelve-month", or any other value that suits your needs.

Here is an example job configuration file that adds the retention_time label:

  - job_name: 'node_exporter'
    file_sd_configs:
    - files:
      - node_exporter.yml
    relabel_configs:
      - target_label: retention_time
        replacement: "one-month"
Enter fullscreen mode Exit fullscreen mode

Deleting labeled Data Using the Admin API

Once you have labeled the targets, you can use the Prometheus Admin API to delete old data that is no longer required. The DeleteSeries endpoint deletes data for a selection of series in a time range. The data still exists on disk and is cleaned up in future compactions, or you can explicitly clean it up using the CleanTombstones endpoint. Enable admin api by --web.enable-admin-api on Prometheus.

To delete data for a particular time range, you can use the match[] URL query parameter to select the series to delete, along with the start and end timestamps. Here is an example of using DeleteSeries to delete data for series with retention_time="one-month" label that are older than one month:

$ curl -X PUT \
  -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={retention_time="one-month"}&end='"$(date +%s -d '1 month ago')"''
Enter fullscreen mode Exit fullscreen mode

URL query parameters:

  • match[]=<series_selector>: Repeated label matcher argument that selects the series to delete. At least one match[] argument must be provided.
  • start=<rfc3339 | unix_timestamp>: Start timestamp. Optional and defaults to minimum possible time.
  • end=<rfc3339 | unix_timestamp>: End timestamp. Optional and defaults to maximum possible time. Not mentioning both start and end times would clear all the data for the matched series in the database.

Note: that these endpoints mark the samples from the selected series as deleted, but they do not prevent the associated series metadata from still being returned in metadata queries for the affected time range. You can use the CleanTombstones endpoint to remove the deleted data from disk and clean up the existing tombstones.

$ curl -X POST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones
Enter fullscreen mode Exit fullscreen mode

Now we have labels that define how long we need the metrics and the request which can remove them based on labels. At the end you have a script like this:

#!/bin/sh

#calculate the end timestamp and start timestamp will be the minimum possible time
curl -X PUT -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={retention_time="one-month"}&end='"$(date +%s -d '1 month ago')"''
curl -X PUT -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={retention_time="three-month"}&end='"$(date +%s -d '3 month ago')"''
curl -X PUT -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={retention_time="twelve-month"}&end='"$(date +%s -d '12 months ago')"''

#clean_storage
curl -X 'PUT' 'http://127.0.0.1:9090/api/v1/admin/tsdb/clean_tombstones'   -H 'accept: */*'%
Enter fullscreen mode Exit fullscreen mode

Don't forget about the automating to ensure that old metrics are regularly deleted without any manual intervention.

Top comments (0)