DEV Community

loading...
Cover image for Histogram of request time in Grafana with Telegraf

Histogram of request time in Grafana with Telegraf

Jane Radetska
Web Developer from Kiev, Ukraine, currently living in Washington, DC. Book worm. Formula 1. Swing dancing. Harmony. @cheviana on github
・3 min read

This is a writing about a cool tool useful for analyzing backend call time. Code that does backend calls and monitoring setup described in previous post.

Grafana panel can not only plot line graphs, but also:

  • show last reading of metric
  • show table of metric values
  • show bar plots
  • show heatmaps (histogram over time)

Heatmap is helpful for quickly getting understanding what is distribution of backend response time: it can be the case that most requests complete in under 50 msec, but some requests are slow and complete in >500 msec. Average request time doesn't show this information. In previous examples, we're plotting just the average.

We can easily add a heatmat for request execution time:
Create heatmap
Set Y axis to msec

Need to add new panel, pick measurement details, and select "Heatmap" in "Visualization" collapsible in the right column.
Every 10 seconds, a new set of bricks appears on the panel. Brick color represents how much measurements fall into that bucket (e.g. 5 fall in the 10 msec - 20 msec range, hence that brick is pink). Set a fixed bucket size or fix the number of buckets, or let default values do their magic.

In case Telegraf sends all metrics data to InfluxDB, that's a real heatmap. Telegraf is often configured to send only aggregated values to database (min, avg, max) calculated over short period of time (10sec) in order to reduce metrics reporting traffic. Heatmap based on such aggregated value is not a real heatmap.

It is possible to configure histogram aggregate in Telegraf config (full Telegraf config with histogram aggregator):

[[aggregators.histogram]]
  period = "30s"
  drop_original = false
  reset = true
  cumulative = false

  [[aggregators.histogram.config]]
    buckets = [1.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0, 30.0, 40.0]
    measurement_name = "aiohttp-request-exec-time"
    fields = ["value"]
Enter fullscreen mode Exit fullscreen mode

I set reset=true and cumulative=false which will cause buckets values to be calculated anew for each 30 second period. Need to set value ranges (buckets) manually, as well as specify correct measurement_name. If fields is not specified, histogram buckets are computed for all fields of measurement. Here's how bucket values appear in InfluxDB:
InfluxDB raw data for buckets

The amount of request execution times that falls in a bucket is saved under "value_bucket" field name, "gt" ("greater than") and "le" ("less than or equals to") are bucket edge values that appear as tags.

Let's plot these values using "Bar gauge" panel visualization type:
Configure histogram
Configure histogram: calculate last

Let's create 2 separate panels, one for python.org stats and one for mozilla.org (add 'where domain = python.org' in query edit).

Now we can at a glance compare last 30 sec request execution time distribution for python.org and for mozilla.org:
Compare python.org and mozilla.org histogram

Discussion (0)