<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jane Radetska</title>
    <description>The latest articles on DEV Community by Jane Radetska (@cheviana).</description>
    <link>https://dev.to/cheviana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F73431%2F34df66c1-2844-42d4-a3b2-688c79559315.jpg</url>
      <title>DEV Community: Jane Radetska</title>
      <link>https://dev.to/cheviana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cheviana"/>
    <language>en</language>
    <item>
      <title>Setup Knative Eventing with Kafka from scratch, scale based on events volume, and monitor</title>
      <dc:creator>Jane Radetska</dc:creator>
      <pubDate>Thu, 04 Jan 2024 17:46:10 +0000</pubDate>
      <link>https://dev.to/cheviana/knative-switchboard-series-part-1-setup-knative-eventing-with-kafka-from-scratch-scale-based-on-events-volume-and-monitor-3pcm</link>
      <guid>https://dev.to/cheviana/knative-switchboard-series-part-1-setup-knative-eventing-with-kafka-from-scratch-scale-based-on-events-volume-and-monitor-3pcm</guid>
      <description>&lt;p&gt;I am going to describe how to create new Kubernetes cluster and install Knative eventing, Kafka flavor, in it. I am actually going to create two Kafka clusters with mirroring enabled, to be able to perform some experiments later on.&lt;/p&gt;

&lt;p&gt;I am also going to describe steps one can follow to ensure Knative scales well enough when messages volume increases. And I am going to point to the resources on how to install monitoring for such cluster.&lt;/p&gt;

&lt;p&gt;Kubernetes cluster with Knative eventing should fit in Google Cloud trial quotas, but monitoring and scaling workload on top of that might not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster creation
&lt;/h2&gt;

&lt;p&gt;Create new Kubernetes cluster, one zone, 4-6 nodes, node is Standard compute-optimized (c2-standard-4 at least), 100 Gb disk (best if pd-ssd, but can be pd-standard or pd-balanced). Trial quota is 4 nodes c2-standard-4.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Kafka and Knative
&lt;/h2&gt;

&lt;p&gt;Create namespace &lt;code&gt;knative-eventing&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Follow &lt;a href="https://strimzi.io/quickstarts/"&gt;Strimzi quickstart&lt;/a&gt; to install &lt;code&gt;kafka&lt;/code&gt; in &lt;code&gt;knative-eventing&lt;/code&gt; namespace, but use different Kafka cluster definition, see below. Knative workloads are expecting to be run in &lt;code&gt;knative-eventing&lt;/code&gt; namespace, otherwise issues arise. And it's easier to keep Knative and Kafka in one namespace.&lt;br&gt;
Use &lt;a href="https://github.com/CheViana/poc-files/blob/main/gke-yamls/knative-with-kafka-mirror/1-kafka-clusters.yaml"&gt;kafka-cluster.yaml&lt;/a&gt; as kafka cluster resource instead of the one used in Strimzi quickstart (&lt;code&gt;kafka-single-persistent.yaml&lt;/code&gt;). If you're not limited on disk, best to set &lt;code&gt;storage: size: 50Gi&lt;/code&gt; or &lt;code&gt;100Gb&lt;/code&gt; in kafka-cluster yaml, and at least 25Gb for zookeeper storage. For trial quota, you're limited to 20Gb and 10Gb for zookeeper (if we're doing 2 Kafka clusters, if one - can be more).&lt;/p&gt;

&lt;p&gt;Follow &lt;a href="https://knative.dev/docs/install/yaml-install/eventing/install-eventing-with-yaml/#install-knative-eventing"&gt;knative docs&lt;/a&gt; to install Knative eventing. Install all Kafka components too: Kafka sink, Kafka broker, Kafka event source. Use &lt;a href="https://knative.dev/blog/articles/single-node-kafka-development/#setting-the-kafka-broker-class-as-default"&gt;this publication&lt;/a&gt; to configure broker config to be &lt;code&gt;Kafka broker class&lt;/code&gt; (replication: 1).&lt;/p&gt;

&lt;p&gt;Also make sure to install &lt;a href="https://knative.dev/docs/eventing/sources/kafka-source/"&gt;Kafka source&lt;/a&gt;. kafka-source-dispatcher will have 0 pods until some Kafka sources are created.&lt;/p&gt;
&lt;h2&gt;
  
  
  Autoscaling Knative
&lt;/h2&gt;

&lt;p&gt;For trial quota GCP, you'll likely won't have space for Keda controller or upscaled Knative workloads. Otherwise,&lt;/p&gt;

&lt;p&gt;Follow &lt;a href="https://knative.dev/blog/articles/improved-ha-configuration/"&gt;this blog&lt;/a&gt; to configure HA for Knative workloads. I would set HA to 6 though, and keep an eye on memory/CPU consumption of the workloads in case you're got significant events traffic going through the system. Otherwise there's going to be slowdown in events delivery. &lt;/p&gt;

&lt;p&gt;Install scaling controller for Kafka sources - &lt;a href="https://github.com/knative-extensions/eventing-autoscaler-keda/tree/main"&gt;Keda autoscaler&lt;/a&gt;. HPA parameters are controlled by annotations on the Kafka source yaml definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;autoscaling.knative.dev/class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;keda.autoscaling.knative.dev&lt;/span&gt;
    &lt;span class="na"&gt;autoscaling.knative.dev/minScale&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0"&lt;/span&gt;
    &lt;span class="na"&gt;autoscaling.knative.dev/maxScale&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
    &lt;span class="na"&gt;keda.autoscaling.knative.dev/pollingInterval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30"&lt;/span&gt;
    &lt;span class="na"&gt;keda.autoscaling.knative.dev/cooldownPeriod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30"&lt;/span&gt;
    &lt;span class="na"&gt;keda.autoscaling.knative.dev/kafkaLagThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kafka of course has it's own parallelism mechanism - creating more brokers, which enables higher partitions amount for a given topic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring Knative and Kafka
&lt;/h2&gt;

&lt;p&gt;Follow &lt;a href="https://snourian.com/kafka-kubernetes-strimzi-part-3-monitoring-strimzi-kafka-with-prometheus-grafana/"&gt;this publication&lt;/a&gt; to setup Prometeus monitoring for Kafka cluster. DataDog has a nice description of &lt;a href="https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/"&gt;what those metrics&lt;/a&gt; mean. &lt;/p&gt;

&lt;p&gt;Knative has a &lt;a href="https://knative.dev/docs/serving/observability/metrics/collecting-metrics/"&gt;tutorial&lt;/a&gt; on how to setup monitoring. However I ended up creating &lt;code&gt;Service&lt;/code&gt; and &lt;code&gt;ServiceMonitor&lt;/code&gt; by hand for Knative workloads to be able to monitor them. &lt;/p&gt;

&lt;p&gt;Here's example &lt;code&gt;Service&lt;/code&gt; and &lt;code&gt;ServiceMon&lt;/code&gt; for &lt;code&gt;kafka-sink-receiver&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-sink-service&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-sink-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-sink-receiver&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http-metrics&lt;/span&gt;
      &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9090&lt;/span&gt;
      &lt;span class="na"&gt;target-port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http-metrics&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceMonitor&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-sink-service-monitor&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-sink-service-mon&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-sink-service&lt;/span&gt;
  &lt;span class="na"&gt;endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http-metrics&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Knative exposes a couple of it's own metrics (like processing delays) and also exposes a huge amount of Kafka metrics for it's consumers/producers. I ended up curl-ing Knative Services on the metrics port, and &lt;a href="https://github.com/CheViana/grafana-dashboard-from-metric-list"&gt;scripting a tool&lt;/a&gt; that would help to create primitive Grafana dashboard for the list of metric names and uid of datasource. See readme on how to use the tool. Or can replace datasource uid in the &lt;code&gt;dashboard-*.json&lt;/code&gt; with your datasource uid, and make sure &lt;code&gt;job&lt;/code&gt; selectors in the dashboard JSON match the service name that sends metrics.&lt;/p&gt;

&lt;p&gt;Knative dashboards together with &lt;a href="https://github.com/strimzi/strimzi-kafka-operator/tree/main/examples/metrics/grafana-dashboards"&gt;Kafka's dashboards&lt;/a&gt; it sheds light on almost any aspect of what's going on in the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  More tuning
&lt;/h2&gt;

&lt;p&gt;Some useful production-grade considerations for Knative could be found &lt;a href="https://developers.redhat.com/articles/2023/03/08/configuring-knative-broker-apache-kafka#"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Knative exposes consumer and producer configs for brokers and other workloads as &lt;code&gt;configmap&lt;/code&gt;. I had more luck with setting&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;auto.offset.reset=latest
enable.auto.commit=true
commit interval to be about 1.5 seconds, heartbeat interval/2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;for Knative sink-receiver config.&lt;/p&gt;

&lt;p&gt;More on Kafka consumer and producer tuning&lt;/p&gt;

&lt;p&gt;&lt;a href="https://strimzi.io/blog/2021/01/07/consumer-tuning/"&gt;https://strimzi.io/blog/2021/01/07/consumer-tuning/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://strimzi.io/blog/2020/10/15/producer-tuning/"&gt;https://strimzi.io/blog/2020/10/15/producer-tuning/&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Make sure it works
&lt;/h2&gt;

&lt;p&gt;You can create a Kafka topic which messages are transferred to another topic using Knative machinery:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;input-topic -&amp;gt; knative source -&amp;gt; knative broker -&amp;gt; knative trigger (opt: filter by message headers) -&amp;gt; knative sink -&amp;gt; output-topic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example definitions to use are below. Apply topics and broker, make sure they've got status Ready (&lt;code&gt;kubectl get kafkatopic -n knative-eventing&lt;/code&gt;, &lt;code&gt;kubectl get broker -n knative-eventing&lt;/code&gt;). Then apply sink and source, also make sure they're ready. Last apply trigger.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka.strimzi.io/v1beta2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaTopic&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;input-topic&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-eventing&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;strimzi.io/cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-cluster&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;partitions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;retention.ms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;7200000&lt;/span&gt;
    &lt;span class="na"&gt;segment.bytes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1073741824&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka.strimzi.io/v1beta2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaTopic&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;output-topic&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-eventing&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;strimzi.io/cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-cluster&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;partitions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;retention.ms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;7200000&lt;/span&gt;
    &lt;span class="na"&gt;segment.bytes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1073741824&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eventing.knative.dev/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Broker&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-broker&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-eventing&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;eventing.knative.dev/broker.class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kafka&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sources.knative.dev/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaSource&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;input-topic-source&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-eventing&lt;/span&gt;
&lt;span class="c1"&gt;# keda autoscaler annotations here if using keda&lt;/span&gt;
&lt;span class="c1"&gt;# see Autoscaling section of blog, above&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;consumerGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;input-topic-source-group&lt;/span&gt;
  &lt;span class="na"&gt;bootstrapServers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;my-cluster-kafka-bootstrap.knative-eventing:9092&lt;/span&gt;
  &lt;span class="na"&gt;topics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;input-topic&lt;/span&gt;
  &lt;span class="na"&gt;sink&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ref&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eventing.knative.dev/v1&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Broker&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-broker&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eventing.knative.dev/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaSink&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;output-topic-sink&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-eventing&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;output-topic&lt;/span&gt;
  &lt;span class="na"&gt;bootstrapServers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;my-cluster-kafka-bootstrap.knative-eventing:9092&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eventing.knative.dev/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Trigger&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;output-trigger&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knative-eventing&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;broker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-broker&lt;/span&gt;
  &lt;span class="c1"&gt;# can define a filter for messages based on header, input Kafka headers get `kafkaheader` prefix. So if message was sent on `input-topic` with header `Ce-my-header: my-value`, it's filter here will be `kafkaheadercemyheader: my-value`&lt;/span&gt;
  &lt;span class="c1"&gt;# filter:&lt;/span&gt;
  &lt;span class="c1"&gt;#  attributes:&lt;/span&gt;
  &lt;span class="c1"&gt;#    kafkaheadercemyheader: my-value&lt;/span&gt;
  &lt;span class="na"&gt;subscriber&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ref&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eventing.knative.dev/v1alpha1&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaSink&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;output-topic-sink&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's primitive Python &lt;a href="https://github.com/CheViana/poc-files/tree/main/webapps/echo"&gt;web app&lt;/a&gt; that simply logs message upon arrival. Can use echo app as destination sink instead of second topic. Deployment for web app echo should be in namespace &lt;code&gt;knative-eventing&lt;/code&gt;, and expose &lt;code&gt;ClusterIP&lt;/code&gt; type &lt;code&gt;Service&lt;/code&gt; that maps port 80 map to 8083. If you're not familiar with how to create deployment and service for it, use k8s docs or use Google Console "new deployment button" (gotta upload image to dockerhub or another artifact registry first though). &lt;/p&gt;

&lt;p&gt;Let's send some messages.&lt;/p&gt;

&lt;p&gt;Launch listener for output-topic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl -n knative-eventing run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.37.0-kafka-3.5.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic output-topic --from-beginning --property print.headers=true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In other tab, launch client for input-topic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl -n knative-eventing run kafka-producer -ti --image=quay.io/strimzi/kafka:0.37.0-kafka-3.5.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic input-topic --property parse.headers=true  --property headers.delimiter=\t --property headers.separator=, --property headers.key.separator=:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And post following payload to input-topic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ce-my-header:my-value\t{"msg":"content"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same message should arrive to output-topic, with original headers having kafkaheader prefix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ce_specversion:1.0,ce_id:...,ce_source:...,content-type:application/json; charset=utf-8,kafkaheadercemyheader:my-value {"msg":"content"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>knative</category>
      <category>kafka</category>
      <category>kubernetes</category>
      <category>eventdriven</category>
    </item>
    <item>
      <title>Performance testing Strimzi Kafka in the k8s cluster using xk6-kafka</title>
      <dc:creator>Jane Radetska</dc:creator>
      <pubDate>Mon, 01 Jan 2024 20:25:53 +0000</pubDate>
      <link>https://dev.to/cheviana/performance-testing-kafka-server-using-xk6-kafka-lh4</link>
      <guid>https://dev.to/cheviana/performance-testing-kafka-server-using-xk6-kafka-lh4</guid>
      <description>&lt;p&gt;I'm going to describe how to performance test reading/writing from Kafka topic with multiple partitions using &lt;a href="https://github.com/mostafa/xk6-kafka"&gt;xk6-kafka plugin&lt;/a&gt; for k6. All resources mentioned here are available in &lt;a href="https://github.com/CheViana/strimzi-kafka-xk6-test/tree/main"&gt;the repo&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Topic to test
&lt;/h2&gt;

&lt;p&gt;Here's topic definition, using &lt;a href="https://strimzi.io/"&gt;Strimzi Kafka&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka.strimzi.io/v1beta2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaTopic&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-topic&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;strimzi.io/cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cluster-1&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;partitions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="s"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Test scenario
&lt;/h2&gt;

&lt;p&gt;This topic has three partitions so it makes sense to test it with three virtual users, each reading from a separate partition.&lt;br&gt;
Test scenario is going to execute like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Virtual user 1: create producer and consumer, produce 1000 messages to partitions 0 1 2, read 333 messages from partition 0, teardown producer and consumer
Virtual user 2: create producer and consumer, produce 1000 messages to partitions 0 1 2, read 333 messages from partition 1, teardown producer and consumer
Virtual user 3: create producer and consumer, produce 1000 messages to partitions 0 1 2, read 333 messages from partition 2, teardown producer and consumer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's the code for the scenario script. It has debug prints that can be helpful to inspect how messages are consumed by virtual users. For more logs, set environment variable &lt;code&gt;LOG_LEVEL=debug&lt;/code&gt;, and pass param &lt;code&gt;connectLogger: true&lt;/code&gt; to &lt;code&gt;Writer&lt;/code&gt; and &lt;code&gt;Reader&lt;/code&gt; constructor.&lt;/p&gt;

&lt;p&gt;An important aspect is that it is important to set &lt;code&gt;groupID&lt;/code&gt;, &lt;code&gt;groupTopics&lt;/code&gt; and &lt;code&gt;groupBalancers&lt;/code&gt; when using Kafka bootstrap server. ReaderConfig has param &lt;code&gt;topic&lt;/code&gt; which doesn't quite work with bootstrap server, it works with Kafka broker's address directly and with explicit partition number set.&lt;/p&gt;

&lt;p&gt;Another important aspect is that &lt;code&gt;consumer&lt;/code&gt; is instantiated in test code (&lt;code&gt;default&lt;/code&gt; function) - meaning each virtual user will use it's own consumer object. All consumers should belong to same consumer group though (&lt;code&gt;groupID&lt;/code&gt; param). It is important to close consumer at the end of the function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Writer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;Reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;SCHEMA_TYPE_STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;SchemaRegistry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;GROUP_BALANCER_ROUND_ROBIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;SECONDS&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;k6/x/kafka&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;check&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;k6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bootstrapServers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost:9091&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;vus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;3h&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;kafka_writer_error_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;count == 0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;kafka_reader_error_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;count == 0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topicName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;topic1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;schemaRegistry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SchemaRegistry&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;messageAmount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;batchSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Writer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;brokers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bootstrapServers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;topicName&lt;/span&gt;&lt;span class="p"&gt;,,&lt;/span&gt;
      &lt;span class="na"&gt;balancer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;balancer_roundrobin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;requiredAcks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;batchSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;batchSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;connectLogger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;VU 1, writing messages. Iter &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;__ITER&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;firstMessageContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;lastMessageContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;messageAmount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
        &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;msgContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;test-value-string-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-vu-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;__VU&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-iter-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;__ITER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;firstMessageContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;msgContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nx"&gt;messageAmount&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;lastMessageContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;msgContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
        &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;batchSize&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;schemaRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
              &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msgContent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="na"&gt;schemaType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SCHEMA_TYPE_STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}),&lt;/span&gt;
          &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;First published msg: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;firstMessageContent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Last published msg: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;lastMessageContent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;consumer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Reader&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;brokers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bootstrapServers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;groupID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;topicName&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-group&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;groupTopics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;topicName&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;groupBalancers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;GROUP_BALANCER_ROUND_ROBIN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;connectLogger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;commitInterval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;SECONDS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;heartbeatInterval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;3.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;SECONDS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;consume&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;messageAmount&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;batchSize&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Amount of msgs received: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;, VU &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;__VU&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;, iter &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;__ITER&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

      &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Topic equals to&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;topic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nx"&gt;topicName&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No messages received&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's output this scenario produces. Offset for the first message of each consumer is not zero because topic had prior messages in it, and same consumer group has already read those. So offset is 33 to begin with.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: /var/test-scenario/test-scenario.js
     output: -

  scenarios: (100.00%) 1 scenario, 3 max VUs, 10m30s max duration (incl. graceful stop):
           * default: 3 iterations shared among 3 VUs (maxDuration: 10m0s, gracefulStop: 30s)

time="2024-01-01T18:50:56Z" level=info msg="VU 1, writing messages. Iter 0" source=console

...

time="2024-01-01T18:51:01Z" level=info msg="Amount of msgs received: 333, VU 3, iter 0" source=console
time="2024-01-01T18:51:01Z" level=info msg="First msg value test-value-string-99-vu-1-iter-0, offset33, partition 0, VU 3, iter 0" source=console
time="2024-01-01T18:51:01Z" level=info msg="Last msg value test-value-string-993-vu-1-iter-0, offset365, partition 0, VU 3, iter 0" source=console

...

time="2024-01-01T18:51:01Z" level=info msg="Amount of msgs received: 333, VU 1, iter 0" source=console
time="2024-01-01T18:51:01Z" level=info msg="First msg value test-value-string-2-vu-1-iter-0, offset33, partition 2, VU 1, iter 0" source=console
time="2024-01-01T18:51:01Z" level=info msg="Last msg value test-value-string-998-vu-1-iter-0, offset365, partition 2, VU 1, iter 0" source=console
time="2024-01-01T18:51:01Z" level=info msg="Amount of msgs received: 333, VU 2, iter 0" source=console
time="2024-01-01T18:51:01Z" level=info msg="First msg value test-value-string-1-vu-1-iter-0, offset33, partition 1, VU 2, iter 0" source=console
time="2024-01-01T18:51:01Z" level=info msg="Last msg value test-value-string-997-vu-1-iter-0, offset365, partition 1, VU 2, iter 0" source=console

...


     ✓ all messages returned
     ✓ Topic equals to

     █ teardown

     checks.............................: 100.00%     ✓ 6             ✗ 0            
     ...  
     iterations.........................: 3           
     kafka_reader_dial_count............: 3           
     ... 
   ✓ kafka_reader_error_count...........: 0           0/s
     kafka_reader_fetch_bytes...........: 66 kB              
     kafka_reader_fetches_count.........: 6           
     kafka_reader_lag...................: 0           min=0           max=0          
     kafka_reader_message_bytes.........: 33 kB       
     kafka_reader_message_count.........: 1001        
     kafka_reader_offset................: 366         min=366         max=368        
     ...  
     kafka_reader_rebalance_count.......: 3           
     kafka_reader_timeouts_count........: 0           
     ...                  
     kafka_writer_batch_bytes...........: 56 kB       
     kafka_writer_batch_max.............: 1           min=1           max=1          
     ... 
     kafka_writer_batch_size............: 1000        
     ... 
   ✓ kafka_writer_error_count...........: 0           0/s
     kafka_writer_message_bytes.........: 56 kB       
     kafka_writer_message_count.........: 1000        
     ...      
     kafka_writer_write_count...........: 1000        
     ...
     vus................................: 3           min=3           max=3          
     vus_max............................: 3           min=3           max=3          


running, 0/3 VUs, 3 complete and 0 interrupted iterations
default ✓ [ 100% ] 3 VUs  3/3 shared iters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test results to watch out for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;kafka_reader_error_count - should be zero or low&lt;/li&gt;
&lt;li&gt;kafka_writer_error_count - should be zero or low&lt;/li&gt;
&lt;li&gt;kafka_writer_message_count and kafka_reader_message_count should match&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There could be intermittent issues and error counts might be not zero. Yet they shouldn't be higher than like 5 out of 1000, and of course depend on how do you set SLO for your system. &lt;code&gt;Reader&lt;/code&gt; and &lt;code&gt;Writer&lt;/code&gt; are instantiated with &lt;code&gt;maxAttempts: 3&lt;/code&gt; so they'll retry writing/reading. &lt;br&gt;
If reader receives no messages this iteration, it won't fail any checks. It will just get those messages in the next test iteration. Main thing is to have total number match &lt;code&gt;kafka_writer_message_count == kafka_reader_message_count&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Pod used to run test scenario, and command to run test in the k8s cluster
&lt;/h2&gt;

&lt;p&gt;Here's pod definition that can be used to run the script in the k8s cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;creationTimestamp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-xk6-loadtest&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-xk6-loadtest-1&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;loadtest&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;run&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/var/test-scenario/test-scenario.js'&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mostafamoradian/xk6-kafka:latest&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;loadtest-xk6&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LOG_LEVEL&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;debug&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/test-scenario&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-scenario&lt;/span&gt;
  &lt;span class="na"&gt;dnsPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterFirst&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-scenario&lt;/span&gt;
      &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kx6-test-scenario&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's commands to run the scenario in your k8s cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create --namespace kafka topic.yaml &amp;lt;-- Strimzi definition of my-topic, see above
kubectl create --namespace loadtest configmap kx6-test-scenario --from-file=test-scenario.js &amp;lt;-- JS file with test scenario, see above
kubectl apply -f test-pod.yml  &amp;lt;-- Pod definition, see above
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See test results using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl logs test-xk6-loadtest-1 -n loadtest -f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In case your kafka cluster has TLS or other auth options enabled, xk6-kafka repo has useful &lt;a href="https://github.com/mostafa/xk6-kafka/blob/main/scripts/test_sasl_auth.js"&gt;examples&lt;/a&gt; on how to setup those. Can mount server cert in the pod using volumes and volumeMounts.&lt;/p&gt;

&lt;p&gt;Happy New Year!&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>k6</category>
      <category>strimzi</category>
      <category>performance</category>
    </item>
    <item>
      <title>KubeCon CloudNativeCon Europe 2023 homepage made usable</title>
      <dc:creator>Jane Radetska</dc:creator>
      <pubDate>Fri, 21 Apr 2023 19:38:53 +0000</pubDate>
      <link>https://dev.to/cheviana/kubecon-cloudnativecon-europe-2023-homepage-made-usable-1h9d</link>
      <guid>https://dev.to/cheviana/kubecon-cloudnativecon-europe-2023-homepage-made-usable-1h9d</guid>
      <description>&lt;p&gt;Especially for virtual attendees on laptop or workstation.&lt;/p&gt;

&lt;p&gt;The page in question: &lt;a href="https://kubecon-cloudnativecon-europe.com/home-full/"&gt;https://kubecon-cloudnativecon-europe.com/home-full/&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Homepage before:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Vcd5huoT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://github.com/CheViana/kubecon-homepage-fixed/blob/main/imgs/homepage-before.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Vcd5huoT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://github.com/CheViana/kubecon-homepage-fixed/blob/main/imgs/homepage-before.png%3Fraw%3Dtrue" alt="Homepage before" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Homepage after:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CPaHqEnH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/homepage-after.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CPaHqEnH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/homepage-after.png" alt="Homepage afterwards, schedule has a full screen to itself" width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I am unhappy about in the initial homepage
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;tini-tiny table with talks list&lt;/li&gt;
&lt;li&gt;lots of useless elements that occupy screen space&lt;/li&gt;
&lt;li&gt;scroll-in-a-scroll&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sched is not cutting it for me - there's no video of the talk in there, not even a link to videostream.&lt;br&gt;
I just need a long list of talk videos... right on the homepage.&lt;/p&gt;

&lt;p&gt;I found this page &lt;a href="https://kubecon-cloudnativecon-europe.com/agenda/"&gt;https://kubecon-cloudnativecon-europe.com/agenda/&lt;/a&gt; too late. "Co-located Events + Sessions" just doesn't sound like what I am looking for. And it too has useless right column...&lt;/p&gt;

&lt;h2&gt;
  
  
  How to add JS to make homepage pretty
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1, with autoloading of the script that adjusts KubeCon homepage. Using Tampermonkey or similar Chrome plugin.
&lt;/h3&gt;

&lt;p&gt;Chrome, 112.0 version&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install &lt;a href="https://chrome.google.com/webstore/detail/tampermonkey/dhdgffkkebhmkfjojejmpbldmpobfkfo"&gt;Tampermonkey plugin&lt;/a&gt; into Chrome&lt;/li&gt;
&lt;li&gt;pin Tampermonkey plugin in browser header&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NS5ZNNsK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/pin-plugin.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NS5ZNNsK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/pin-plugin.png" alt="Pin plugin" width="710" height="625"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click Tampermonkey plugin icon in browser header, select "Create a new script..."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lgacIlOV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/create-new-script.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lgacIlOV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/create-new-script.png" alt="Create new user script" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;add &lt;a href="//fix-homepage.js"&gt;fix-homepage.js&lt;/a&gt; file contents to the script body. Can examine the JS - nothing fancy in there, find element-set style.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9YiY5Z0k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/add-user-script.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9YiY5Z0k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/add-user-script.png" alt="Add user script content" width="800" height="997"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select tab "Settings" right above script edit area. Section "Includes/Excludes", "User matches" box, click "Add..." button - put &lt;a href="https://kubecon-cloudnativecon-europe.com/home-full/"&gt;https://kubecon-cloudnativecon-europe.com/home-full/&lt;/a&gt; in the pop-up window&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CZqF6c5d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/add-user-script-settings.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CZqF6c5d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/add-user-script-settings.png" alt="Update user script settings" width="800" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;!! Click "Save" !!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Cld2YEq5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/tamper-monkey-save.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Cld2YEq5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/tamper-monkey-save.png" alt="Save user script" width="561" height="1284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Reload KubeCon homepage&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It should autoload the script every time kubecon homepage loads&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Option 2. Use content snippets. No plugins. No autoload.
&lt;/h3&gt;

&lt;p&gt;Chrome browser&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load webpage &lt;a href="https://kubecon-cloudnativecon-europe.com/home-full/"&gt;https://kubecon-cloudnativecon-europe.com/home-full/&lt;/a&gt; , right click on page content, click "Inspect"&lt;/li&gt;
&lt;li&gt;In developer tools panel, select "Sources" tab, in that select "Snippets" tab&lt;/li&gt;
&lt;li&gt;Click "+ New snippet"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FP_0YDAA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/new-snippet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FP_0YDAA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/new-snippet.png" alt="New snippet" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Put contents of &lt;a href="//fix-homepage.js"&gt;fix-homepage.js&lt;/a&gt; file into the snippet body window. Save snippet (Ctrl+S).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run snippet - click on run button&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vBEJyUoF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/play-snippet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vBEJyUoF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/play-snippet.png" alt="Play snippet" width="800" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You don't have to create snippet each time, but you have to run it each time&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Manual (artistic) process of fixing the homepage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Delete the buttons on the right to the schedule
&lt;/h3&gt;

&lt;p&gt;Who needs these buttons here?! They are also in left-side menu and that's where I would look for them&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--o5mpK0R0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--o5mpK0R0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img1.png" alt="Delete right column of the homepage grid" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Make left column full-width
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---UX_rrRt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---UX_rrRt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img2.png" alt="Make left column full-width" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Remove the useless "Community in Bloom" header
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OCAqTRBc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OCAqTRBc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img3.png" alt='Remove the useless "Community in Bloom" header' width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Remove equally useless "Hello, Jane" header
&lt;/h3&gt;

&lt;p&gt;Seriously? You just put "Hello user" there?! Did you make sysadmin write this website for you? :)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--q9oUKqIy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--q9oUKqIy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img4.png" alt='Remove the "Hello user" header' width="800" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Find container with schedule and remove fixed height to get rid of, oh gosh, scroll-in-a-scroll
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Zy0k0uiZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Zy0k0uiZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://raw.githubusercontent.com/CheViana/kubecon-homepage-fixed/main/imgs/img5.png" alt="Make height of schedule box to be same as content height" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  P.S. What about Mobile?
&lt;/h2&gt;

&lt;p&gt;Are they hiding the schedule at all for mobile devices? LOL&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Histogram of request time in Grafana with Telegraf</title>
      <dc:creator>Jane Radetska</dc:creator>
      <pubDate>Fri, 18 Dec 2020 14:44:44 +0000</pubDate>
      <link>https://dev.to/cheviana/histogram-of-request-time-in-grafana-with-telegraf-2ja9</link>
      <guid>https://dev.to/cheviana/histogram-of-request-time-in-grafana-with-telegraf-2ja9</guid>
      <description>&lt;p&gt;This is a writing about a cool tool useful for analyzing backend call time. Code that does backend calls and monitoring setup described in &lt;a href="https://dev.to/cheviana/monitoring-sync-and-async-network-calls-in-python-using-tig-stack-3al5"&gt;previous post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Grafana panel can not only plot line graphs, but also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;show last reading of metric&lt;/li&gt;
&lt;li&gt;show table of metric values&lt;/li&gt;
&lt;li&gt;show bar plots&lt;/li&gt;
&lt;li&gt;show heatmaps (histogram over time)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Heatmap is helpful for quickly getting understanding what is distribution of backend response time: it can be the case that most requests complete in under 50 msec, but some requests are slow and complete in &amp;gt;500 msec. Average request time doesn't show this information. In previous examples, we're plotting just the average.&lt;/p&gt;

&lt;p&gt;We can easily add a heatmat for request execution time:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6s9tqcmzy7kxao3z8bfq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6s9tqcmzy7kxao3z8bfq.png" alt="Create heatmap"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fyyzv6x59997g1inmqcxv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fyyzv6x59997g1inmqcxv.png" alt="Set Y axis to msec"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Need to add new panel, pick measurement details, and select "Heatmap" in "Visualization" collapsible in the right column.&lt;br&gt;
Every 10 seconds, a new set of bricks appears on the panel. Brick color represents how much measurements fall into that bucket (e.g. 5 fall in the 10 msec - 20 msec range, hence that brick is pink). Set a fixed bucket size or fix the number of buckets, or let default values do their magic.&lt;/p&gt;

&lt;p&gt;In case Telegraf sends all metrics data to InfluxDB, that's a real heatmap. Telegraf is often configured to send only aggregated values to database (min, avg, max) calculated over short period of time (10sec) in order to reduce metrics reporting traffic. Heatmap based on such aggregated value is not a real heatmap.&lt;/p&gt;

&lt;p&gt;It is possible to configure &lt;a href="https://github.com/influxdata/telegraf/tree/master/plugins/aggregators/histogram" rel="noopener noreferrer"&gt;histogram aggregate&lt;/a&gt; in Telegraf config (&lt;a href="https://github.com/CheViana/network-calls-stats/blob/master/telegraf-histogram.conf" rel="noopener noreferrer"&gt;full Telegraf config with histogram aggregator&lt;/a&gt;):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

[[aggregators.histogram]]
  period = "30s"
  drop_original = false
  reset = true
  cumulative = false

  [[aggregators.histogram.config]]
    buckets = [1.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0, 30.0, 40.0]
    measurement_name = "aiohttp-request-exec-time"
    fields = ["value"]


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;I set &lt;code&gt;reset=true&lt;/code&gt; and &lt;code&gt;cumulative=false&lt;/code&gt; which will cause buckets values to be calculated anew for each 30 second period. Need to set value ranges (&lt;code&gt;buckets&lt;/code&gt;) manually, as well as specify correct &lt;code&gt;measurement_name&lt;/code&gt;. If &lt;code&gt;fields&lt;/code&gt; is not specified, histogram buckets are computed for all fields of measurement. Here's how bucket values appear in InfluxDB:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsbx7cbkqo0u6bttssu4s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsbx7cbkqo0u6bttssu4s.png" alt="InfluxDB raw data for buckets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The amount of request execution times that falls in a bucket is saved under "value_bucket" field name, "gt" ("greater than") and "le" ("less than or equals to") are bucket edge values that appear as tags.&lt;/p&gt;

&lt;p&gt;Let's plot these values using "Bar gauge" panel visualization type:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6gtddx5jtavg15q41892.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6gtddx5jtavg15q41892.png" alt="Configure histogram"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fay4geojg8c91pwq222xv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fay4geojg8c91pwq222xv.png" alt="Configure histogram: calculate last"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's create 2 separate panels, one for python.org stats and one for mozilla.org (add 'where domain = python.org' in query edit).&lt;/p&gt;

&lt;p&gt;Now we can at a glance compare last 30 sec request execution time distribution for python.org and for mozilla.org:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fhmfz4wydbr1vai0p6vmm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fhmfz4wydbr1vai0p6vmm.png" alt="Compare python.org and mozilla.org histogram"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>telegraf</category>
      <category>grafana</category>
      <category>histogram</category>
      <category>heatmap</category>
    </item>
    <item>
      <title>Monitoring sync and async network calls in Python using TIG stack</title>
      <dc:creator>Jane Radetska</dc:creator>
      <pubDate>Fri, 18 Dec 2020 14:34:16 +0000</pubDate>
      <link>https://dev.to/cheviana/monitoring-sync-and-async-network-calls-in-python-using-tig-stack-3al5</link>
      <guid>https://dev.to/cheviana/monitoring-sync-and-async-network-calls-in-python-using-tig-stack-3al5</guid>
      <description>&lt;p&gt;Republished by author. First appeared in &lt;a href="https://calendar.perfplanet.com/2020/monitoring-network-calls-in-python-using-tig-stack/" rel="noopener noreferrer"&gt;Web Performance Calendar 2020&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Web applications and API endpoints are known to perform backend calls. Often that is all application does: fetches data from a couple of backends, combines it, and produces response.&lt;/p&gt;

&lt;p&gt;Monitoring how much time fetching data took is essential. There are plenty production-ready buy-and-snap-on solutions that provide such monitoring, but they might be not good fit for some cases. And I think it's fun to dig deeper into things to get more understanding of how it all works.&lt;/p&gt;

&lt;p&gt;Let's look at code examples that use popular Python networking libraries and are instrumented to report HTTP request execution time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I'm going to explore in this post
&lt;/h3&gt;

&lt;p&gt;I'm going to compare how request timings look for fetching HTML pages using &lt;code&gt;requests&lt;/code&gt; library and for asyncronously fetching same HTML pages using &lt;code&gt;aiohttp&lt;/code&gt; library. I aim to visualize the difference in timings, and to introduce tools that can be used for such monitoring. &lt;/p&gt;

&lt;p&gt;To be fair, &lt;code&gt;requests&lt;/code&gt; library has &lt;a href="https://github.com/spyoungtech/grequests" rel="noopener noreferrer"&gt;plugins&lt;/a&gt; that enable asyncronous IO and there are so many other ways to achieve this in Python... I picked &lt;code&gt;aiohttp&lt;/code&gt; as it provides neat request timing tracing opportunities, and I use this library a lot in the wild.&lt;/p&gt;

&lt;p&gt;To monitor request timings we will use &lt;a href="https://www.influxdata.com/time-series-platform/telegraf/" rel="noopener noreferrer"&gt;Telegraf&lt;/a&gt;, &lt;a href="https://www.influxdata.com/products/influxdb/" rel="noopener noreferrer"&gt;InfluxDB&lt;/a&gt; and &lt;a href="https://grafana.com/grafana" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt; stack. These tools are very easy to setup locally, open source, free for personal usage, and could be used in production environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/CheViana/network-calls-stats/blob/master/readme.md" rel="noopener noreferrer"&gt;Running code examples section&lt;/a&gt; describes in detail how to run example code and setup monitoring infrastructure (Telegraf, InfluxDB, Grafana).&lt;/p&gt;

&lt;p&gt;All code from this writing is available in &lt;a href="https://github.com/CheViana/network-calls-stats/" rel="noopener noreferrer"&gt;repo&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 0: monitor &lt;code&gt;requests&lt;/code&gt; request time
&lt;/h2&gt;

&lt;p&gt;Let's dive into first Python code example. Here's what it does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;in forever loop, executes two HTTP requests using &lt;code&gt;requests&lt;/code&gt; Python library&lt;/li&gt;
&lt;li&gt;reports request time and request exceptions to Telegraf&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's request execution time plotted on the dashboard:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fp8aguotdgigmwx38mvhc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fp8aguotdgigmwx38mvhc.png" alt="Request execution time plot"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Full code of Example 0 can be found in &lt;a href="https://github.com/CheViana/network-calls-stats/blob/master/example-0-requests-send-stats.py" rel="noopener noreferrer"&gt;example-0-requests-send-stats.py&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;High-level execution flow can be followed from &lt;code&gt;main&lt;/code&gt; part of the program:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if __name__ == '__main__':
    while True:
        result = call_python_and_mozilla_using_requests()
        print(result)
        time.sleep(3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside &lt;code&gt;call_python_and_mozilla_using_requests&lt;/code&gt; two simple HTTP requests are performed one by one, and their response text used to compose result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def call_python_and_mozilla_using_requests():
    py_response = get_response_text('https://www.python.org/')
    moz_response = get_response_text('https://www.mozilla.org/en-US/')
    return (
        f'Py response piece: {py_response[:60].strip()}... ,\n'
        f'Moz response piece: {moz_response[:60].strip()}...'
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;get_response_text&lt;/code&gt; function executes HTTP request for a given URL with primitive exception handling, and hooks to report request execution time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def profile_request(start_time, response, *args, **kwargs):
    elapsed_time = round((
        time.perf_counter() - start_time
    ) * 1000)
    send_stats(
        'requests_request_exec_time',
        elapsed_time,
        {'domain': URL(response.url).raw_host}
    )


def get_response_text(url):
    try:
        request_complete_callback = partial(
            profile_request,
            time.perf_counter()
        )
        response = requests.get(
            url,
            hooks={'response': request_complete_callback}
        )
        response.raise_for_status()
        return response.content.decode()
    except RequestException as e:
        send_stats(
            'requests_request_exception',
            1,
            {'domain': URL(url).raw_host, 'exception_class': e.__class__.__name__}
        )
        return f'Exception occured: {e}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code uses &lt;code&gt;requests&lt;/code&gt; library (&lt;a href="https://requests.readthedocs.io/en/master/" rel="noopener noreferrer"&gt;docs&lt;/a&gt;). Basic usage to get text content from URL is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = requests.get(url).content.decode()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;requests.get&lt;/code&gt; accepts optional &lt;code&gt;hooks&lt;/code&gt; argument, where function to be called after request is completed is specified - &lt;code&gt;request_complete_callback&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;This callback function may look funny if you're not familiar with functional programming. &lt;code&gt;partial(profile_request, time.perf_counter())&lt;/code&gt; is itself a function. It's same function as &lt;code&gt;profile_request&lt;/code&gt; but the first argument is already filled in - &lt;code&gt;time.perf_counter()&lt;/code&gt; was passed as &lt;code&gt;start_time&lt;/code&gt; argument. This trick is used to supply correct &lt;code&gt;start_time&lt;/code&gt; for each request, as &lt;code&gt;request_complete_callback&lt;/code&gt; function is constructed anew for each request, while code for sending request execution time is isolated in another function &lt;code&gt;profile_request&lt;/code&gt;. We can rewrite that as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def get_response_text(url):
    try:
        start_time = time.perf_counter()

        def profile_request(response, *args, **kwargs):
            elapsed_time = round((time.perf_counter() - start_time) * 1000)
            send_stats('requests_request_exec_time', elapsed_time, ...)

        response = requests.get(url, hooks={'response': profile_request})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And it's going to work alright. Now there's a function defined inside a function, and &lt;code&gt;get_response_text&lt;/code&gt; is bloated with profiling stuff, which is not something I like.&lt;/p&gt;

&lt;p&gt;You can read more about &lt;a href="https://en.wikipedia.org/wiki/Partial_application" rel="noopener noreferrer"&gt;partial functions&lt;/a&gt; and &lt;a href="https://docs.python.org/3/library/functools.html" rel="noopener noreferrer"&gt;Python functools&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;time.perf_counter()&lt;/code&gt; is used to measure execution time in Python (&lt;a href="https://docs.python.org/3/library/time.html#time.perf_counter" rel="noopener noreferrer"&gt;docs&lt;/a&gt;). &lt;code&gt;time.perf_counter()&lt;/code&gt; returns microseconds that are converted to milliseconds using &lt;code&gt;* 1000&lt;/code&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  Sending stats
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;send_stats&lt;/code&gt; function is used to report measurements to Telegraf: metric name is &lt;code&gt;'requests_request_exec_time'&lt;/code&gt;, metric value is time request execution took, tags include additional useful information (domain of URL).&lt;br&gt;
&lt;code&gt;get_response_text&lt;/code&gt; also invokes &lt;code&gt;send_stats&lt;/code&gt; when exception occurs, passing different metric name this time - &lt;code&gt;'requests_request_exception'&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I have &lt;a href="https://dev.to/cheviana/reporting-measurements-from-python-code-in-real-time-4g5"&gt;another post&lt;/a&gt; that describes ways to send stats from Python program to Telegraf.&lt;/p&gt;

&lt;p&gt;In short, &lt;code&gt;send_stats&lt;/code&gt; accepts metric name, metric value and tags dictionary. Those are converted to one string and sent to the socket on which Telegraf listens for measurement data. Telegraf sends received metrics to a database (InfluxDB). Grafana dashboard queries the database to put a dot on graph for each metric value reported.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;profile&lt;/code&gt; decorator
&lt;/h3&gt;

&lt;p&gt;A piece of code which is a decorator suitable for any function (async, sync, method of class or pure function) is adapted here to measure execution time of function that is decorated. &lt;br&gt;
&lt;code&gt;profile&lt;/code&gt; decorator is used to profile total execition time of functions &lt;code&gt;call_python_and_mozilla_using_requests&lt;/code&gt; and &lt;code&gt;call_python_and_mozilla_using_aiohttp&lt;/code&gt; (see the following examples).&lt;br&gt;
Don't confuse with another useful tool - &lt;a href="https://github.com/rkern/line_profiler" rel="noopener noreferrer"&gt;line_profiler&lt;/a&gt; - that also provides &lt;code&gt;profile&lt;/code&gt; decorator.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;requests&lt;/code&gt; execution time on dashboard
&lt;/h3&gt;

&lt;p&gt;Let's run this example and set up all the monitoring tools. See &lt;a href="https://github.com/CheViana/network-calls-stats/blob/master/readme.md" rel="noopener noreferrer"&gt;Running code examples&lt;/a&gt;  on how to run example code and set up monitoring infrastructure.&lt;/p&gt;

&lt;p&gt;We can configure a panel that shows request execution time:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F94cpy3ul3qtqaijua7eg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F94cpy3ul3qtqaijua7eg.png" alt="Request execution time configure panel"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Blue dots of total execution time roughly correspond to the sum of time request to &lt;code&gt;python.org&lt;/code&gt; and request to &lt;code&gt;mozilla.org&lt;/code&gt; took (green and yellow dots). They measure at approximately 150 msec on average.&lt;/p&gt;
&lt;h3&gt;
  
  
  Need more exceptions
&lt;/h3&gt;

&lt;p&gt;If we change '&lt;a href="http://www.python.org" rel="noopener noreferrer"&gt;www.python.org&lt;/a&gt;' to '&lt;a href="http://www.python1.org" rel="noopener noreferrer"&gt;www.python1.org&lt;/a&gt;' in function &lt;code&gt;call_python_and_mozilla_using_requests&lt;/code&gt;, exceptions appear in terminal output, and exception metrics are sent to Telegraf:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    Reported stats: aiohttp_request_exception=1, tags={'domain': 'www.python1.org', 'exception_class': 'ClientConnectorError'}
    'Py response piece: ...Exception occured: Cannot conn... 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure a separate Grafana panel to see exceptions on the dashboard:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6rxk1xggisn981v6o2bd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6rxk1xggisn981v6o2bd.png" alt="Configure exceptions panel"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Exception class is sent as tag along with metric value. This gives us the ability to plot different lines for exceptions of different classes. To achieve this, pick 'group by - tag(exception_class)' when editing request exceptions panel.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example 0 improved: reuse connection
&lt;/h2&gt;

&lt;p&gt;Code of example 0 can be improved to reuse the same connection for all calls performed in that forever running &lt;code&gt;while&lt;/code&gt; loop - here's an &lt;a href="https://github.com/CheViana/network-calls-stats/blob/master/example-0-plus-requests-reuse-conn.py" rel="noopener noreferrer"&gt;improved version&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The only significant code change is this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_python_and_mozilla_using_requests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connection creation is moved out of the &lt;code&gt;while&lt;/code&gt; loop. Now, the connection is established once and for all.&lt;/p&gt;

&lt;p&gt;Let's compare how much time request execution takes when a connection is reused:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpjgmpnimnkecma14gw8b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpjgmpnimnkecma14gw8b.png" alt="Compare timing when connection is reused and not, for requests lib"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dots on the left are measurements for for original version of Example 0, and ones on the right came from the improved version. We can definitely notice how total execution time get lower, below 100 msec on average. &lt;/p&gt;
&lt;h2&gt;
  
  
  Example 1: monitor &lt;code&gt;aiohttp&lt;/code&gt; request time
&lt;/h2&gt;

&lt;p&gt;Let's dive into the next code example. Here's what it does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;in forever loop, executes two asyncronous HTTP requests using &lt;code&gt;aiohttp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;hooks into &lt;code&gt;aiohttp&lt;/code&gt; request lifecycle signals&lt;/li&gt;
&lt;li&gt;reports request time and request exceptions to Telegraf&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full code of Example 1 can be found in &lt;a href="https://github.com/CheViana/network-calls-stats/blob/master/example-1-aiohttp-send-stats-basic.py" rel="noopener noreferrer"&gt;example-1-aiohttp-send-stats-basic.py&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;High-level execution flow is similar to the Example 0, the way content is fetched from URLs differs.&lt;/p&gt;
&lt;h3&gt;
  
  
  The tale of two HTTP requests
&lt;/h3&gt;

&lt;p&gt;Let's start with the function &lt;code&gt;call_python_and_mozilla_using_aiohttp&lt;/code&gt; that executes two asyncronous HTTP requests and returns pieces of response content. It is the sister of &lt;code&gt;call_python_and_mozilla_using_requests&lt;/code&gt; from Example 0:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_response_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace_configs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Profiler&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ClientError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Exception occured: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="nd"&gt;@profile&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_python_and_mozilla_using_aiohttp&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;py_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;moz_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;get_response_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://www.python.org/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nf"&gt;get_response_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://www.mozilla.org/en-US/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Py response piece: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;py_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;... ,&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Moz response piece: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;moz_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;aiohttp&lt;/code&gt; library's &lt;code&gt;ClientSession&lt;/code&gt; is used to execute the request (&lt;a href="https://docs.aiohttp.org/en/stable/client.html" rel="noopener noreferrer"&gt;docs&lt;/a&gt;). Basic usage to get text content from URL is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which is basically what happens in &lt;code&gt;get_response_text&lt;/code&gt;. &lt;code&gt;get_response_text&lt;/code&gt; also calls &lt;code&gt;response.raise_for_status()&lt;/code&gt;, which raises exception when response status code is error code or timeout occurs . Exception is silenced in &lt;code&gt;get_response_text&lt;/code&gt;, so &lt;code&gt;get_response_text&lt;/code&gt; always returns &lt;code&gt;str&lt;/code&gt;, either with response content or with exception message.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;call_python_and_mozilla_using_aiohttp&lt;/code&gt; takes care of callings two URLs using &lt;code&gt;asyncio.gather&lt;/code&gt;. Execution order for &lt;code&gt;call_python_and_mozilla_using_aiohttp&lt;/code&gt; is on the right:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2w6f1nkl8pgrvuibm89d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2w6f1nkl8pgrvuibm89d.png" alt="Async and sync flow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;await asyncio.gather&lt;/code&gt; returns the result after both of these requests are complete. Total execution time is approximately the time of the longest request out of these two. You're probably aware that this is called non-blocking IO. Instead of blocking, this kind of IO operation frees execution thread until it needs it again.&lt;/p&gt;

&lt;p&gt;Synchronous, blocking IO, like in Example 0, has different following execution order (see chart above, on the left). Total execution time is approximately the sum of both requests execution time. For positive integers, it's always true that &lt;code&gt;A + B &amp;gt; MAX(A, B)&lt;/code&gt;. Hence, asynchronous execution takes less time than synchronous one, provided unlimited CPU was available in both cases.&lt;/p&gt;

&lt;p&gt;On the panel that shows requests execution time and their total execution time, it's possible to notice that total execution time &lt;code&gt;call_python_and_mozilla_using_aiohttp_exec_time&lt;/code&gt; almost matches the longer-executing request time:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbyr1ja8zn0w8omfea0db.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbyr1ja8zn0w8omfea0db.png" alt="Async requests execution time and total time of both requests"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The total execution time for both requests is 75-100 msec.&lt;/p&gt;

&lt;p&gt;Next, we're going to look at how execution time of each &lt;code&gt;aiohttp&lt;/code&gt; request is reported.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;aiohttp&lt;/code&gt; requests signals
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;aiohttp&lt;/code&gt; provides a way to execute a custom function when HTTP request execution progresses through lifecycle stages: before request is sent, when connection is established, after response chunk is received, etc. For that, object-tracer is passed to &lt;code&gt;aiohttp.ClientSession&lt;/code&gt; - &lt;code&gt;trace_configs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Profiler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TraceConfig&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_request_start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_request_start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_request_end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_request_end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_request_exception&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_request_exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace_configs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Profiler&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Profiler&lt;/code&gt; is a subclass of &lt;code&gt;aiohttp.TraceConfig&lt;/code&gt;. It "hooks up" functions that are going to be executed when request starts (&lt;code&gt;on_request_start&lt;/code&gt;), when it ends (&lt;code&gt;on_request_end&lt;/code&gt;) and when request exception is encountered (&lt;code&gt;on_request_exception&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_request_start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace_config_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;trace_config_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_request_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace_config_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;elapsed_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;
        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;trace_config_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request_start&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;send_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;aiohttp_request_exec_time&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;elapsed_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;domain&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_host&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_request_exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace_config_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;send_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;aiohttp_request_exception&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;domain&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;exception_class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__class__&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice how the timestamp is computed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is recommended to use event loop’s internal monotonic clock to compute time delta in asyncronous code.&lt;/p&gt;

&lt;p&gt;Function-hooks have arguments &lt;code&gt;session, trace_config_ctx, params&lt;/code&gt;. Let's look at what they are.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;session&lt;/code&gt; is an instance of &lt;code&gt;aiohttp.ClientSession&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;trace_config_ctx&lt;/code&gt; is context that is passed through callbacks. Custom values call be added to it when request is made:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace_request_ctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;flag&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_request_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace_config_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;trace_config_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trace_request_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;flag&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;red&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way function-hook can be programmed to behave differently for different request calls or to report additional data.&lt;/p&gt;

&lt;p&gt;Request end hook uses &lt;code&gt;trace_config_ctx.request_start&lt;/code&gt; value to compute total time request took. &lt;code&gt;trace_config_ctx.request_start&lt;/code&gt; is set in request start hook.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;params&lt;/code&gt; argument in &lt;code&gt;on_request_end&lt;/code&gt; is &lt;code&gt;aiohttp.TraceRequestEndParams&lt;/code&gt; and as such has &lt;code&gt;url&lt;/code&gt; property. &lt;code&gt;url&lt;/code&gt; property is of &lt;code&gt;yarl.URL&lt;/code&gt; type. &lt;code&gt;params.url.raw_host&lt;/code&gt; returns the domain of the URL that was requested. Domain is sent as a tag for metric, and this makes it possible to plot separate lines for different URLs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Calling asyncronous code from synchronous
&lt;/h3&gt;

&lt;p&gt;To call async function in sync execution context, special tooling is used, which is adapted from &lt;a href="https://www.roguelynn.com/words/asyncio-graceful-shutdowns/" rel="noopener noreferrer"&gt;another publication&lt;/a&gt;. I'm not going to dive into Python's asyncronous ways in this post. Read more about Python's &lt;a href="https://python.readthedocs.io/en/latest/library/asyncio.html" rel="noopener noreferrer"&gt;asyncio&lt;/a&gt;, it's pretty cool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compare results for Example 0 and 1
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fn12rqh6x23turzzkdhk3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fn12rqh6x23turzzkdhk3.png" alt="Compare example 0 and 1"&gt;&lt;/a&gt;&lt;br&gt;
Connection is not reused for both cases here. Execution time for async version is lower, as expected.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example 2: more, more stats
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;aiohttp&lt;/code&gt; provides hooks to measure more than just request execution time and request exceptions.&lt;/p&gt;

&lt;p&gt;It's possible to report stats for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DNS resolution time&lt;/li&gt;
&lt;li&gt;DNS cache hit/miss&lt;/li&gt;
&lt;li&gt;waiting for available connection time&lt;/li&gt;
&lt;li&gt;connection establishing time&lt;/li&gt;
&lt;li&gt;connection being reused&lt;/li&gt;
&lt;li&gt;redirect happening&lt;/li&gt;
&lt;li&gt;response content chunk received&lt;/li&gt;
&lt;li&gt;request chunk sent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Impressive, isn't it? Documentation on tracing in &lt;code&gt;aiohttp&lt;/code&gt; is &lt;a href="https://docs.aiohttp.org/en/stable/tracing_reference.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's add more request lifecycle hooks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Profiler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TraceConfig&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_request_start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_request_start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_request_end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_request_end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_request_redirect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_request_redirect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_request_exception&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_request_exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_connection_queued_start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_connection_queued_start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_connection_queued_end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_connection_queued_end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_connection_create_start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_connection_create_start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_connection_create_end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_connection_create_end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_dns_resolvehost_start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_dns_resolvehost_start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_dns_resolvehost_end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_dns_resolvehost_end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_response_chunk_received&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_response_chunk_received&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_connection_reuseconn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_connection_reuseconn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_dns_cache_hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_dns_cache_hit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on_dns_cache_miss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_dns_cache_miss&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I won't bore you with code for each function like &lt;code&gt;on_dns_resolvehost_end&lt;/code&gt;, it's quite similar to &lt;code&gt;on_request_end&lt;/code&gt;. Full code of Example 2 is &lt;a href="https://github.com/CheViana/network-calls-stats/blob/master/example-2-aiohttp-send-more-stats.py" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Reported stats on dashboard for example 2:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnq96oyf3peum69nt5hph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnq96oyf3peum69nt5hph.png" alt="aiohttp reporting more stats"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that DNS resolution takes couple of milliseconds and happens for every call, and the connection establishing takes 30-40 msec and happens for every call. Also, that DNS cache is not hit, DNS is resolved for every call.&lt;/p&gt;

&lt;p&gt;We can definitely improve on that - in Example 3.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example 3: &lt;code&gt;aiohttp&lt;/code&gt; reuse session
&lt;/h2&gt;

&lt;p&gt;Let's modify Example 2 code so that &lt;code&gt;ClientSession&lt;/code&gt; is created once, outside &lt;code&gt;while&lt;/code&gt; loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main_async&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace_configs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Profiler&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_python_and_mozilla_using_aiohttp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And check out how stats look now:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq624987todzk7mp1zz82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq624987todzk7mp1zz82.png" alt="aiohttp reuse session timings"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's only one dot for connection establishing, and one per DNS resoltion per domain. There's plenty of dots for connection reuse event.&lt;br&gt;
Total execution time is below 50 msec. Cool.&lt;/p&gt;

&lt;p&gt;Full source code of Example 3 is &lt;a href="https://github.com/CheViana/network-calls-stats/blob/master/example-3-aiohttp-reuse-session.py" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compare sync and async URL fetch, with and without reusing connection
&lt;/h2&gt;

&lt;p&gt;Total time for both requests (very approximate):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Connection not reused&lt;/th&gt;
&lt;th&gt;Connection reused&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sync&lt;/td&gt;
&lt;td&gt;150 msec&lt;/td&gt;
&lt;td&gt;80 msec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async&lt;/td&gt;
&lt;td&gt;80 msec&lt;/td&gt;
&lt;td&gt;40 msec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>python</category>
      <category>monitoring</category>
      <category>aiohttp</category>
      <category>requests</category>
    </item>
    <item>
      <title>Reporting Measurements from Python Code in Real Time: a Beginner-Friendly Tutorial</title>
      <dc:creator>Jane Radetska</dc:creator>
      <pubDate>Tue, 01 Dec 2020 14:55:26 +0000</pubDate>
      <link>https://dev.to/cheviana/reporting-measurements-from-python-code-in-real-time-4g5</link>
      <guid>https://dev.to/cheviana/reporting-measurements-from-python-code-in-real-time-4g5</guid>
      <description>&lt;h1&gt;
  
  
  Reporting measurements from Python code in real time
&lt;/h1&gt;

&lt;p&gt;A simple example of how to send measurements from Python code to the real-time monitoring solution (Telegraf/InfluxDB/Grafana).&lt;/p&gt;

&lt;p&gt;Code-reported measurements can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;price of an order user just submitted&lt;/li&gt;
&lt;li&gt;amount of free beds in the hospital&lt;/li&gt;
&lt;li&gt;how long did a backend call take&lt;/li&gt;
&lt;li&gt;percent of file that is already processed, and percent that's left&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;li&gt;any number of which the program is aware and which might be useful to track &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I don't think I need to make a lot of arguments in favor of real-time monitoring: it's a blessing in time of turmoil (outage). Data collected (good times data, outages data) can be analyzed later for various purposes: notice weird pattern in performance over time, notice significant features of traffic that can be leveraged, notice what happens right before outage, ... . &lt;/p&gt;

&lt;p&gt;We will start with simple examples of Python programs that report measurements data. But first we need to configure things that are going to listen, record, and display these measurements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tutorial materials
&lt;/h2&gt;

&lt;p&gt;All files mentioned are available in the repo &lt;a href="https://github.com/CheViana/python-send-stats" rel="noopener noreferrer"&gt;CheViana/python-send-stats&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking for a quick, ready, robust solution?
&lt;/h2&gt;

&lt;p&gt;Setup Grafana, InfluxDB, Telegraf and use Example 1 code snippet / Telegraf config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup Grafana, InfluxDB, Telegraf
&lt;/h2&gt;

&lt;p&gt;In short, install Grafana, InfluxDB, Telegraf:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visit &lt;a href="https://portal.influxdata.com/downloads/" rel="noopener noreferrer"&gt;https://portal.influxdata.com/downloads/&lt;/a&gt; for information on how to install InfluxDB and Telegraf&lt;/li&gt;
&lt;li&gt;Visit &lt;a href="https://grafana.com/grafana/download" rel="noopener noreferrer"&gt;https://grafana.com/grafana/download&lt;/a&gt; for information on how to install Grafana&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Launch Grafana and InfluxDB with default configs:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt; cd grafana-7.1.0
&amp;gt; bin/grafana-server


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In other terminal tab:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt; influxd -config /usr/local/etc/influxdb.conf


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Example 1. The simplest example of how to send stats from Python code in 6 lines, and of suitable Telegraf config
&lt;/h2&gt;

&lt;p&gt;First, we're going to make Telegraf listen on the Internet datagram socket for JSON-formatted measurements that Python code will send. Telegraf will write received measurements to database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/CheViana/python-send-stats/blob/master/telegraf-1-stats-simple-datagram-json.conf:" rel="noopener noreferrer"&gt;https://github.com/CheViana/python-send-stats/blob/master/telegraf-1-stats-simple-datagram-json.conf:&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

...

[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "socket-stats"

[[inputs.socket_listener]]
  service_address = "udp://:8094"
  data_format = "json"
  json_name_key = "metric_name"


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Launch Telegraf with this config:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt; telegraf -config telegraf-1-stats-simple-datagram-json.conf


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;More info on telegraf plugin that enables listening for data on socket: &lt;a href="https://github.com/influxdata/telegraf/blob/release-1.14/plugins/inputs/socket_listener/README.md" rel="noopener noreferrer"&gt;socket_listener docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/CheViana/python-send-stats/blob/master/1-stats-simple-datagram-json.py" rel="noopener noreferrer"&gt;1-stats-simple-datagram-json.py&lt;/a&gt; is simple Python program that sends measurements to UDP socket. Measurements are sent in &lt;a href="https://github.com/influxdata/telegraf/tree/master/plugins/parsers/json" rel="noopener noreferrer"&gt;Telegraf JSON format&lt;/a&gt; every 2 seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/CheViana/python-send-stats/blob/master/1-stats-simple-datagram-json.py" rel="noopener noreferrer"&gt;1-stats-simple-datagram-json.py&lt;/a&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

import time
import socket
import json
import random


while True:
    try:
        sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        sock.sendto(
            json.dumps({'metric_name': 'good_metric_name', 'value1': 10, 'value2': random.randint(1, 10)}).encode(),
            ('localhost', 8094)
        )
        print('Sending sample data...')
        sock.close()
    except socket.error as e:
        print(f'Got error: {e}')

    time.sleep(2)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Start the program that sends stats to socket:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&amp;gt; python3 1-stats-simple-datagram-json.py


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This is a complete working example. A tiny piece of code that does what you want it to do - report measurements:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

good_metric_name,value1=10,value2=7
good_metric_name,value1=10,value2=2
good_metric_name,value1=10,value2=5
...


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In this example, measurement name is not tied to Telegraf config - Telegraf uses measurement name found under key 'metric_name' in JSON that is sent to it. More about this below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Metric name gotchas
&lt;/h2&gt;

&lt;p&gt;Metric name (also tag name, tag value, any string value reported) should not contain ':', '|', ',', '='. Better to use '-', '_' or '.' as delimiter in metric name. Special characters in reported string values could cause errors during measurement parsing in Telegraf or in InfluxDB, and these errors are easy to miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Grafana Dashboard
&lt;/h2&gt;

&lt;p&gt;Add source for InfluxDB database "socket-stats".&lt;br&gt;
Create new dashboard, add panel which will display measurements sent to Telegraf client.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F567reez3f2bh9o118s2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F567reez3f2bh9o118s2m.png" alt="Example 1 Grafana dashboard config"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Provided all 4 processes are running (Grafana, InfluxDB, Telegraf and Python program that sends stats), you should see measurements appear on dashboard in real time. Exciting, isn't it?&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 2. JSON measurements over TCP socket (UNIX domain)
&lt;/h2&gt;

&lt;p&gt;For UDP sockets there's no need to keep connection open, because of how protocol works. However, it might be not possible to use UDP sockets in some network setups, or it's possible but rate of dropped packets is too big: most measurement readings are lost.&lt;br&gt;
Alternative is to use TCP sockets (also called Stream socket). For TCP sockets it's an overhead to open and close connection each time measurement is sent, which could be around 10 times per second. Opening and closing connections is a CPU-expensive operation.&lt;br&gt;
TCP socket can be UNIX domain or INTERNET domain. UNIX domain are better suited for processes that run on same network host, but can't be used when communicating processes are running on different network hosts. Better suited because low-level code that handles UNIX domain socket communication skips some checks that would be needed for INTERNET socket.&lt;br&gt;
For our Python snippets code difference for UNIX domain / INTERNET domain is just socket address and socket type value. See Example 3 for INTERNET domain example.&lt;/p&gt;

&lt;p&gt;There are resources on socket types mentioned &lt;a href="https://pymotw.com/2/socket/index.html" rel="noopener noreferrer"&gt;below&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Program that uses a TCP socket (UNIX domain) in such a way that the socket connection is established when the program starts, and the connection is closed when the program exits is available in &lt;a href="https://github.com/CheViana/python-send-stats/blob/master/2-stats-json.py" rel="noopener noreferrer"&gt;2-stats-json.py&lt;/a&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

import time
import socket
import json
import random
import atexit


def format_measurement_data_json(data):
    data['format'] = 'json'
    return json.dumps(data) + '\n'


class StatsReporter:
    def __init__(
        self,
        socket_type,
        socket_address,
        encoding='utf-8',
        formatter=None
    ):
        self._socket_type = socket_type
        self._socket_address = socket_address
        self._encoding = encoding
        self._formatter = formatter if formatter else lambda d: str(d)
        self.create_socket()

    def create_socket(self):
        try:
            sock = socket.socket(*self._socket_type)
            sock.connect(self._socket_address)
            self._sock = sock
            print('Created socket')
        except socket.error as e:
            print(f'Got error while creating socket: {e}')

    def close_socket(self):
        try:
            self._sock.close()
            print('Closed socket')
        except (AttributeError, socket.error) as e:
            print(f'Got error while closing socket: {e}')

    def send_data(self, data):
        try:
            sent = self._sock.send(
                self._formatter(data).encode(self._encoding)
            )
            print(f'Sending sample data... {sent}')
        except (AttributeError, socket.error) as e:
            print(f'Got error while sending data on socket: {e}')

            # attempt to recreate socket on error
            self.close_socket()
            self.create_socket()


reporter = StatsReporter(
    (socket.AF_UNIX, ),
    '/tmp/telegraf.sock',
    formatter=format_measurement_data_json
)
atexit.register(reporter.close_socket)


while True:
    reporter.send_data({'value1': 10, 'value2': random.randint(1, 10)})
    time.sleep(1)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This program opens the connection once and sends measurement over it every second. If the send fails, connection is reestablished. When program exits, the socket is closed using &lt;a href="https://docs.python.org/3/library/atexit.html" rel="noopener noreferrer"&gt;atexit&lt;/a&gt;. Even better way would be to reestablish connection once in a while, say every one minute.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;StatsReporter&lt;/code&gt; class encapsulates operations with socket: &lt;br&gt;
creating, sending data, closing; it also keeps reference to open socket as a field which all those methods can use.&lt;/p&gt;

&lt;p&gt;Formatting of measurement data from Python dict into string sent over wire is performed in &lt;code&gt;format_measurement_data_json&lt;/code&gt; function. This function is passed as an argument to &lt;code&gt;StatsReporter&lt;/code&gt; class, so it will be easy to change data format in future examples. &lt;br&gt;
A tag which corresponds to data format is added in order to distinguish between measurements reported in a different example, and just as an example of a tag.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\n&lt;/code&gt; at the end of string that is sent is crucial, this is how Telegraf recognizes the end of a measurement. Without &lt;code&gt;\n&lt;/code&gt; at the end of measurement string one can encounter errors like:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

  2020-11-10T14:42:17Z E! [inputs.socket_listener] Unable to parse incoming line: invalid character '{' after top-level value
```

Stop Example 1 Python program and Telegraf, and run Example 2 Python program [2-stats-json.py](https://github.com/CheViana/python-send-stats/blob/master/2-stats-json.py) and launch Telegraf for it with config [telegraf-2-stats-json.conf](https://github.com/CheViana/python-send-stats/blob/master/telegraf-2-stats-json.conf):
```
&amp;gt; python3 2-stats-json.py

In other terminal tab
&amp;gt; telegraf -config telegraf-2-stats-json.conf
```

 You should see measurements in real time on dashboard:

![Example 2 Grafana dashboard config and results](https://dev-to-uploads.s3.amazonaws.com/i/ubu8v7s51j5jvrlgxda1.png)


[telegraf-2-stats-json.conf](https://github.com/CheViana/python-send-stats/blob/master/telegraf-2-stats-json.conf#L658) specifies field {% raw %}`name_override = "good_metric_name"`, which is used as measurement name in database records:

```
[[inputs.socket_listener]]
  service_address = "unix:///tmp/telegraf.sock"
  data_format = "json"
  name_override = "good_metric_name"
  tag_keys = ["format"]
```

 Default measurement name would be a non-descriptive input plugin name (e.g. `socket_listener`). It is also possible to specify the key `json_name_key` in Telegraf config to store a measurement in the database with a custom name:

```
[[inputs.socket_listener]]
  service_address = "unix:///tmp/telegraf.sock"
  data_format = "json"
  json_name_key = "metric_name"
```

Then when Telegraf receives the following measurement data:

```
{"metric_name": "speed", "value": 10}
```

The measurement named `speed` with `value=10` will be saved to DB.
This way is more flexible and avoids the need to update config when measurement name varies.

 See more in [JSON Telegraf format docs](https://github.com/influxdata/telegraf/tree/master/plugins/parsers/json).

 Example 2 telegraf config also specifies `tag_keys = ["format"]` - meaning from measurement data dictionary `{'value': 1, 'format': 'json'}` `format` will be used as a tag for measurement (consult [InfluxDB docs](https://docs.influxdata.com/influxdb/v2.0/reference/key-concepts/) if that doesn't mean much to you).


## Example 3. Wavefront (VMWare) Telegraf data format over TCP socket (INTERNET domain)

Python code to send measurement in wavefront format [3-stats-wavefront.py](https://github.com/CheViana/python-send-stats/blob/master/3-stats-wavefront.py), telegraf config [telegraf-3-stats-wavefront.conf](https://github.com/CheViana/python-send-stats/blob/master/telegraf-3-stats-wavefront.conf). Stop other examples and run this one:

```
&amp;gt; python3 3-stats-wavefront.py

In other terminal tab
&amp;gt; telegraf -config telegraf-3-stats-wavefront.conf
```

[3-stats-wavefront.py](https://github.com/CheViana/python-send-stats/blob/master/3-stats-wavefront.py) code differs from Example 2 in couple of lines - formatting function and socket type/address:

```
...
import math

...

def format_measurement_data_wavefront(data):
    lines = []
    for key, value in data.items():
        line = (
            f'prefix_metric_name.{key} {value} '
            f'{math.floor(time.time())} '
            f'source=localhost format="wavefront"\n'
        )
        lines.append(line)
    return ''.join(lines)

...

reporter = StatsReporter(
    (socket.AF_INET, socket.SOCK_STREAM),
    ('127.0.0.1', 8094),
    formatter=format_measurement_data_wavefront
)

...

```

Wavefront format uses timestamp in seconds, so timestamp is set in Python code using `time.time()` without decimal fraction. Omitting timestamp didn't work out for me.
`\n` at the end of str that is sent is quite crucial (same as for Example 2, or any code snippet using TCP socket). Wavefront format also requires `source` tag. `format="wavefront"` part of string is example of how measurement tags should be added.
More about Wavefront data format - in [wavefront docs](https://docs.wavefront.com/wavefront_data_format.html).

Wavefront code piece is using TCP socket, INTERNET domain. This code snippet is suitable when program that sends metrics and Telegraf process run on different hosts. Generally, this code snippet should work in any network configuration, so it can be called more universal than previous examples. TCP connection is reused in similar fashion as in Example 2 for Unix stream socket.

Wavefront Example also has different names of measurements. It can only do single field value per measurement, whereas JSON and Influx Line formats can do measurements with multiple fields - [more about multiple fields measurements](https://stackoverflow.com/questions/45368535/influxdb-single-or-multiple-measurement). So will have to update dashboard or make new panel to see results:

![Example 3 Grafana dashboard config and results](https://dev-to-uploads.s3.amazonaws.com/i/vxje435q0gnun5p1d1wx.png)


## Example 4. Influx Line format over UDP socket

Python code to send measurement in Influx Line format: [4-stats-influx-line.py](https://github.com/CheViana/python-send-stats/blob/master/4-stats-influx-line.py), telegraf config [telegraf-4-stats-influx-line.conf](https://github.com/CheViana/python-send-stats/blob/master/telegraf-4-stats-influx-line.conf). Stop other examples and run this one:

```
&amp;gt; python3 4-stats-influx-line.py

In other terminal tab
&amp;gt; telegraf -config telegraf-4-stats-influx-line.conf
```

Grafana config is same as for Example 2 so you should be able to see real-time results on dashboard:

![Example 4 Grafana dashboard config and results](https://dev-to-uploads.s3.amazonaws.com/i/vdjjiow5r69i16nqcdbe.png)

[4-stats-influx-line.py](https://github.com/CheViana/python-send-stats/blob/master/4-stats-influx-line.py) code differs from Example 2 and 3 in couple of lines - formatting function and UDP socket related things:

```
...
def format_measurement_to_str_influxline(data):
    measurement_name = 'good_metric_name'

    fields = []
    for key, value in data.items():
        fields.append(f'{key}={value}')
    fields_str = ','.join(fields)

    tags = {'format': 'influxline'}
    tags_strs = []
    for tag_key, tag_value in tags.items():
        tags_strs.append(f'{tag_key}={tag_value}')
    tags_str = (',' + ','.join(tags_strs)) if tags else ''

    return f'{measurement_name}{tags_str} {fields_str}\n'

...

def create_socket(self):
    try:
        sock = socket.socket(*self._socket_type)
        # no sock.connect
        self._sock = sock

...

def send_data(self, data):
    try:
        sent = self._sock.sendto(  # sendto not send
            self._formatter(data).encode(self._encoding),
            self._socket_address  # socket address
        )

...

reporter = StatsReporter(
    (socket.AF_INET, socket.SOCK_DGRAM),
    ('localhost', 8094),
    formatter=format_measurement_to_str_influxline
)

...
```

Influx Line data format is string of form `'{measurement_name}{tags_str} {fields_str}'`.
More about Influx Line data format in [it's docs](https://docs.influxdata.com/influxdb/v1.8/write_protocols/line_protocol_tutorial/).

Influx Line example code piece uses UDP socket (Internet type datagram socket).
Notice the difference of networking code for UDP socket code compared to Examples 2 and 3: no need to connect to socket (no `socket.connect` call). Datagram is just send over to specified network address. No need to keep established connection, no need to recreate connection once in a while. Which is rather convenient for sending stats, less socket management code. Downside is UDP doesn't guarantee datagrams delivery, like TCP does for packets of one data transmission sent over established connection. UDP communication might not be good option for every network setup - need to measure how much packets are lost before using it.

I am not covering UNIX type datagram socket config in this tutorial, but if Telegraf config will have:
```
  service_address = "unixgram:///tmp/telegraf.sock"
```
and code of Example 4 will have:
```
  reporter = StatsReporter(
      (socket.AF_UNIX, socket.SOCK_DGRAM),
      '/tmp/telegraf.sock',
      ...
  )
```
that should do it. I haven't tried though.



## More about sockets

If curious to learn more about sockets, suggested reading is this - https://pymotw.com/2/socket/index.html (and "see also" list on that page). Code is for Python 2 so method names might be outdated, but concepts are valid (and older than Python itself).

I'm providing code snippets that send measurements to UNIX stream socket (Example 2), Internet stream socket (Example 3) and Internet datagram socket (Examples 1 and 4). Can just use those if you're not interested in technical details of network communications. If unsure which one is best for you, I suggest to use code and config from Example 1 or Example 4.

You can check out how socket Telegraf process uses look using command `lsof -p [pid of Telegraf process]`. To get `pid` (process id) of Telegraf process, can use `ps aux | grep telegraf` command. `lsof` will show stuff like device name which is associated with Telegraf's socket, socket type, other curiosities. 


## Troubleshooting

If data doesn't appear on dashboards, can launch Telegraf with `--debug` option, to make it print out more information about errors in processing of received data.

When Telegraf successfully receives and write to InfluxDB measurements, it should produce console output similar to:

![telegraf output](https://dev-to-uploads.s3.amazonaws.com/i/gojd6qzhjon4dy5bce0h.png)


You can see it also says that buffer is not full. Means all incoming metrics are making it to database, no dropped readings on Telegraf's side. In real setup, some metrics could be lost in network before they got to Telegraf, but this is not likely when everything runs on same machine.

Also good idea is to check in case of issues:
- InfluxDB is launched
- InfluxDB address in Telegraf config matches the one in InfluxDB config
- Grafana dashboard configuration - address of InfluxDB and database name, measurement names
- Python code sends data to correct socket address, the one Telegraf listens on (specified in Telegraf config)

### InfluxDB data investigation

To debug what's being written to InfluxDB, can use [Influx CLI](https://docs.influxdata.com/influxdb/v1.8/query_language/explore-data/) or [influx flux query language](https://docs.influxdata.com/influxdb/v2.0/query-data/get-started/). I've used Influx CLI and `SELECT` statements, as this is something I'm more familiar with.
Launch Influx CLI with command `influx`. To show list of available databases, use command `show databases`. Switch to database Telegraf sends data to using `use "socket-stats"` command. Show all measurement names using `show measurements`. To see what's going on in particular measurement, can use `select *::field from "value1"` - it will show all fields and all data for measurement called "value1". `select *::field from "value1" limit 3` will show 3 oldest data points, `select last(*::field) from "value1"` will show newest data point.

![Influx CLI example](https://dev-to-uploads.s3.amazonaws.com/i/zc29oeqbm04bf7ctnl8q.png)
![Influx CLI latest measurement](https://dev-to-uploads.s3.amazonaws.com/i/kefuyxl4t0hjtumx5mak.png)

These screenshots show my trouble: `value2` timestamp value is not correct, it's millisecond-precision Unix time whereas data format requires nanosecond-precision Unix time (like "test.value2" timestamp). So `value2` timestamp is interpreted as way older timestamp than it should be (it has late 60s vibe), and won't show up on "last 5 min" Grafana dashboard.

![Readings from the past](https://dev-to-uploads.s3.amazonaws.com/i/xhpiqxduv3u4uh2g4j0i.png)


### Measurement timestamp

It is possible to report timestamp of measurement from Python code, or leave it up to InfluxDB to record timestamp of when reading arrives. Delay between two event is usually negligible: on same machine - real tiny, over network - depends on network, but like couple milliseconds, maybe hundred milliseconds. My suggestion is to leave it up to InfluxDB, to avoid issues when reported time from Python is not correct due to bugs, or different machines have different clock time. If exact time of reading with nanosecond precision is important to you, add timestamp field in Python code. 
Anyway, if reporting program and InfluxDB run on different machines, make sure [Network Time Protocol (NTP)|http://www.ntp.org/] is utilized to keep clocks in sync.

### Dashboard issues

In case you're having difficulties configuring Grafana dashboards, complete JSON that could be used to export dashboard configuration is in [grafana-dashboard-complete.json](https://github.com/CheViana/python-send-stats/blob/master/grafana-dashboard-complete.json) file. Can try to export it in new dashboard or compare it's panels JSON with your panels.


## What I might write about in next post:

- overloading TCP socket (Unix socket, UDP socket) with metrics, and checking out what happens; looking into `read_buffer_size` in Telegraf config and system socket listen queue size; techniques to measure dropped readings rate
- reporting stats of backend calls (`aiohttp` and `requests`)
- optimal uWSGI configurations, for best performance when all is good, and backend failure-resistant configurations
- uWSGI serving Django with aiohttp communications
- babel 7 configurations for less JS in bundle
- running python tests in parallel, and tests coverage  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>python</category>
      <category>monitoring</category>
      <category>tutorial</category>
      <category>telegraf</category>
    </item>
    <item>
      <title>uWSGI stats monitoring from scratch using Telegraf InfluxDB and Grafana</title>
      <dc:creator>Jane Radetska</dc:creator>
      <pubDate>Tue, 25 Aug 2020 00:35:00 +0000</pubDate>
      <link>https://dev.to/cheviana/uwsgi-stats-monitoring-from-scratch-using-tig-stack-2ik9</link>
      <guid>https://dev.to/cheviana/uwsgi-stats-monitoring-from-scratch-using-tig-stack-2ik9</guid>
      <description>&lt;p&gt;At the end of this tutorial, you'll end up with &lt;a href="https://snapshot.raintank.io/dashboard/snapshot/Y71MST4SUXUnJaqYyakeKRagjmcl1SyC" rel="noopener noreferrer"&gt;dashboard like this&lt;/a&gt;. Each panel shows some uWSGI metric as time series.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F88esovg47ken2a2ohg5x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F88esovg47ken2a2ohg5x.png" alt="uWSGI stats, changing in time"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Watching over uWSGI-reported statistics like worker busyness is super helpful for investigating uWSGI configurations. Also useful in production environment - it's possible to extend described approach to monitor real-life uWSGI web applications, but this post doesn't aim to cover that.&lt;/p&gt;

&lt;p&gt;In future, I plan to add posts about how different uWSGI options influence behavior of web server, and how to monitor Python web app aspects, such as networking, using same tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tutorial roadmap
&lt;/h2&gt;

&lt;p&gt;Here's what will be described:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;setup and run simple uWSGI webserver&lt;/li&gt;
&lt;li&gt;load that webserver using wrk2&lt;/li&gt;
&lt;li&gt;install and run InfluxDB&lt;/li&gt;
&lt;li&gt;install, configure and run Telegraf&lt;/li&gt;
&lt;li&gt;install and run Grafana&lt;/li&gt;
&lt;li&gt;create dashboard in Grafana to show uWSGI metrics&lt;/li&gt;
&lt;li&gt;each monitored uWSGI metric dive-in&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code and configs mentioned in here can be found in the &lt;a href="https://github.com/CheViana/uwsgi-playground-monitoring" rel="noopener noreferrer"&gt;repo&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The diagram
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0ybme0sza9l4ztb689ei.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0ybme0sza9l4ztb689ei.jpg" alt="Diagram of uWSGI webserver and monitoring tools"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  uWSGI stats
&lt;/h2&gt;

&lt;p&gt;uWSGI can &lt;a href="https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html" rel="noopener noreferrer"&gt;expose stats&lt;/a&gt; on a separate socket. &lt;/p&gt;

&lt;p&gt;The simplest way to see these metrics is to use &lt;code&gt;uwsgitop&lt;/code&gt; - &lt;a href="https://github.com/xrmx/uwsgitop" rel="noopener noreferrer"&gt;https://github.com/xrmx/uwsgitop&lt;/a&gt;. However, &lt;code&gt;uwsgitop&lt;/code&gt; only shows current metrics readings, not history data, just like Linux &lt;code&gt;top&lt;/code&gt; command. &lt;/p&gt;

&lt;p&gt;Some time ago I created a &lt;a href="https://github.com/CheViana/uwsgitop" rel="noopener noreferrer"&gt;fork&lt;/a&gt; of &lt;code&gt;uwsgitop&lt;/code&gt; because of an encoding-related bug in its output. I had fun a time trying to figure out why &lt;code&gt;uwsgitop&lt;/code&gt; won't work. That bug seems to be fixed now though.&lt;/p&gt;

&lt;p&gt;I think it's nicer to be able to see &lt;em&gt;uWSGI stats metrics over a continuous period of time&lt;/em&gt; as that provides more information than just this moment readings. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;uwsgitop&lt;/code&gt; is like a speedometer readings - change every second. ⌚&lt;/p&gt;

&lt;p&gt;Monitoring dashboard is like a cardiogram - recorded readings over time. 📈&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing monitoring tools
&lt;/h2&gt;

&lt;p&gt;One option for continuous monitoring is to use Prometheus with exporter for uWSGI stats; here, I'll describe other option - &lt;a href="https://hackernoon.com/monitor-your-infrastructure-with-tig-stack-b63971a15ccf" rel="noopener noreferrer"&gt;TIG stack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;TIG stack differs from Prometheus mainly in the idea of relying on some other tool to push metrics to time series DB, instead of pulling model which Prometheus uses. Actually, both of the stacks can do push and pull, and there's &lt;a href="https://giedrius.blog/2019/05/11/push-vs-pull-in-monitoring-systems/" rel="noopener noreferrer"&gt;discussion&lt;/a&gt; going on about &lt;a href="https://prometheus.io/docs/introduction/comparison/" rel="noopener noreferrer"&gt;which method works better&lt;/a&gt; for which type of thing you want to watch over. &lt;/p&gt;

&lt;p&gt;For the kind of local comparative experiments I'm planning to run on uWSGI web servers in following posts, &lt;em&gt;I don't think there's much difference&lt;/em&gt; which monitoring solution to use.&lt;/p&gt;

&lt;p&gt;I picked TIG for this post because I have some experience with it and Prometheus approach is described &lt;a href="https://www.apsl.net/blog/2018/10/01/using-prometheus-monitoring-django-applications-kubernetes/" rel="noopener noreferrer"&gt;elsewhere&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One little pebble thrown in the direction of Prometheus: its &lt;a href="https://github.com/timonwong/uwsgi_exporter" rel="noopener noreferrer"&gt;uwsgi-exporter&lt;/a&gt; seems to &lt;a href="https://github.com/timonwong/uwsgi_exporter/issues/21" rel="noopener noreferrer"&gt;not support&lt;/a&gt; reporting of uWSGI worker status which is indeed very useful metric. Probably support for that will be added in time.&lt;/p&gt;

&lt;p&gt;List of what Telegraf uWSGI plugin can monitor is &lt;a href="https://github.com/influxdata/telegraf/tree/master/plugins/inputs/uwsgi" rel="noopener noreferrer"&gt;rather impressive&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple uWSGI web server
&lt;/h2&gt;

&lt;p&gt;Here's code of "Hello World" uWSGI app in &lt;a href="https://github.com/CheViana/uwsgi-playground-monitoring/blob/master/uwsgi-hello-world.py" rel="noopener noreferrer"&gt;uwsgi-hello-world.py&lt;/a&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import time


def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    time.sleep(0.25)  # sleep 250 msec
    return [b'Hello World']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;And basic uWSGI configs in &lt;a href="https://github.com/CheViana/uwsgi-playground-monitoring/blob/master/uwsgi-hello-world-configs.ini" rel="noopener noreferrer"&gt;uwsgi-hello-world-configs.ini&lt;/a&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[uwsgi]
http-socket = 127.0.0.1:9090
wsgi-file = uwsgi-hello-world.py
master = true
processes = 4
threads = 2 
stats = 127.0.0.1:9191
stats-http = true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This code doesn't do anything useful. Worker sleeps for 250 msec and returns a string.&lt;br&gt;
Can use &lt;code&gt;wsgi.py&lt;/code&gt; of more interesting Django or Flask server or any other Python web app (change &lt;code&gt;wsgi = path/to/Django-app/wsgi.py&lt;/code&gt; in options).&lt;/p&gt;

&lt;p&gt;uWSGI can be installed as Python package. Let's assume you have &lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;Python installed&lt;/a&gt;, know how to create and activate Python virtual environment (if not, please consult &lt;a href="https://virtualenv.pypa.io/en/latest/index.html" rel="noopener noreferrer"&gt;virtualenv docs&lt;/a&gt; and &lt;a href="https://virtualenvwrapper.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;virtualenvwrapper is handy&lt;/a&gt;). &lt;/p&gt;

&lt;p&gt;To install uWSGI Python package:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; mkvirtualenv --python=python3 uwsgi-playground
&amp;gt; pip install uWSGI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To run uWSGI server:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; uwsgi --ini uwsgi-hello-world-configs.ini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That last command should produce output similar to:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
uwsgi socket 0 bound to TCP address 127.0.0.1:9090
*** Operational MODE: preforking+threaded ***
WSGI app 0 ... ready in 0 seconds ...
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: ...)
spawned uWSGI worker 1 (pid: ..., cores: 2)
spawned uWSGI worker 2 (pid: ..., cores: 2)
spawned uWSGI worker 3 (pid: ..., cores: 2)
spawned uWSGI worker 4 (pid: ..., cores: 2)
*** Stats server enabled on 127.0.0.1:9191 fd: ... ***
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This means that the web server has launched and it's listening on port 9090.&lt;/p&gt;

&lt;p&gt;Can visit hello-world server in local web browser - on &lt;code&gt;http://127.0.0.1:9090/&lt;/code&gt;: &lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fonn0zzkbiijrvbz1b4ji.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fonn0zzkbiijrvbz1b4ji.png" alt="hello world web server response"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;uWSGI process outputs to console that it processed requests after we visited hello-world page:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[pid: ...|app: 0|req: 1/1] 127.0.0.1 () {42 vars in 783 bytes} [Sun Jul 19 20:21:38 2020] GET / =&amp;gt; generated 11 bytes in 250 msecs (HTTP/1.1 200) 1 headers in 44 bytes (2 switches on core 0)
[pid: ...|app: 0|req: 1/2] 127.0.0.1 () {38 vars in 670 bytes} [Sun Jul 19 20:21:38 2020] GET /favicon.ico =&amp;gt; generated 11 bytes in 143 msecs (HTTP/1.1 200) 1 headers in 44 bytes (2 switches on core 0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;uWSGI can also be configured to log those lines to a file (see &lt;a href="https://uwsgi-docs.readthedocs.io/en/latest/Logging.html" rel="noopener noreferrer"&gt;uWSGI logging docs&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Can also visit stats endpoint in local web browser - on &lt;code&gt;http://127.0.0.1:9191/&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
"workers": [
    {
        "id":2,
        "pid":...,
        "accepting":1,
        "requests":1,
        "delta_requests":1,
        "avg_rt":71928
        ....
]
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That JSON contains "speedometer" readings of what's going on inside uWSGI webserver.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loading web server with benchmark tool wrk2
&lt;/h2&gt;

&lt;p&gt;To keep uWSGI web server busy (so that stats are not just all zero), let's load uWSGI hello-world server using &lt;a href="https://github.com/giltene/wrk2" rel="noopener noreferrer"&gt;&lt;code&gt;wrk2&lt;/code&gt; benchmarking tool&lt;/a&gt;. Can use &lt;a href="https://k6.io/blog/comparing-best-open-source-load-testing-tools" rel="noopener noreferrer"&gt;any other artificial load generator&lt;/a&gt;. I picked &lt;code&gt;wrk2&lt;/code&gt; because I don't need complicated test scenario for this post. Some load testing tools can be easily configured to write to InfluxDB, which makes observing results of load test real handy, right next to web server stats and stats reported from web app code.&lt;/p&gt;

&lt;p&gt;To install wrk2:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; git clone https://github.com/giltene/wrk2
&amp;gt; cd wrk2
&amp;gt; make
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If &lt;code&gt;make&lt;/code&gt; doesn't work for you, can use &lt;code&gt;wrk&lt;/code&gt; tool - which provides &lt;a href="https://github.com/wg/wrk/wiki/Installing-wrk-on-Mac-OS-X" rel="noopener noreferrer"&gt;install wiki pages&lt;/a&gt; for all platforms.&lt;/p&gt;

&lt;p&gt;To run wrk2 or wrk:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; ./wrk -t2 -c2 -d1200s -R1 http://127.0.0.1:9090/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This command creates 2 threads that try to load webserver with 1 RPS, keeping 2 HTTP connections open.&lt;br&gt;
It will create load for 1200 sec which is 20 min. Feel free to adjust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools for real-time monitoring: Graphana, Telegraf, InfluxDB (also known as TIG stack)
&lt;/h2&gt;

&lt;p&gt;To install all tools locally on MacOS:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; brew install influxdb  &amp;lt;-- Database for metrics
&amp;gt; brew install telegraf  &amp;lt;-- agent-collector of metrics
&amp;gt; brew install graphana  &amp;lt;-- UI for metrics exploration and plotting
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To download all tools binaries locally on Linux:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; wget https://dl.influxdata.com/influxdb/releases/influxdb-1.8.2_linux_amd64.tar.gz
&amp;gt; tar xvfz influxdb-1.8.2_linux_amd64.tar.gz
&amp;gt; wget https://dl.influxdata.com/telegraf/releases/telegraf-1.15.2_linux_amd64.tar.gz
&amp;gt; tar xf telegraf-1.15.2_linux_amd64.tar.gz
&amp;gt; wget https://dl.grafana.com/oss/release/grafana-7.1.4.linux-amd64.tar.gz
&amp;gt; tar -zxvf grafana-7.1.4.linux-amd64.tar.gz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;For other platforms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; go to &lt;a href="https://portal.influxdata.com/downloads/" rel="noopener noreferrer"&gt;https://portal.influxdata.com/downloads/&lt;/a&gt; for InfluxDB and Telegraf&lt;/li&gt;
&lt;li&gt;for Grafana visit &lt;a href="https://grafana.com/grafana/download" rel="noopener noreferrer"&gt;https://grafana.com/grafana/download&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker containers are available in &lt;a href="https://hackernoon.com/monitor-your-infrastructure-with-tig-stack-b63971a15ccf" rel="noopener noreferrer"&gt;this post&lt;/a&gt;, or:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; docker pull influxdb
&amp;gt; docker pull telegraf
&amp;gt; docker run -d --name=grafana -p 3000:3000 grafana/grafana
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Run time series database InfluxDB
&lt;/h3&gt;

&lt;p&gt;Launch InfluxDB:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; influxd -config /usr/local/etc/influxdb.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This starts up DB process.&lt;br&gt;
Create database for uWSGI metrics (in other shell tab):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; influx -precision rfc3339  &amp;lt;-- opens CLI
Connected to http://localhost:8086 version v1.8.1
InfluxDB shell version: v1.8.1
&amp;gt; CREATE DATABASE localmetrics  &amp;lt;-- creates our DB
&amp;gt; SHOW DATABASES  &amp;lt;-- shows DB list
name: databases
name
----
_internal
localmetrics
&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;It is now possible to run "INSERT ..." command using CLI, which will add metric reading to the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run Telegraf - stats pull/push agent
&lt;/h3&gt;

&lt;p&gt;We want to have uWSGI stats sent to database automatically every N seconds, and in a format that InfluxDB will understand.&lt;br&gt;
uWSGI passively exposes stats data on socket 9191.&lt;br&gt;
Need something that will query uWSGI stats endpoint and send metrics data to InfluxDB database.&lt;/p&gt;

&lt;p&gt;This is where Telegraf comes into light. Telegraf has ability to retrieve metrics data, transform it to format InfluxDB understands using plugins and send it over to InfluxDB database.&lt;br&gt;
Telegraf has a bunch of input plugins: to watch over CPU consumption levels, to read and parse log file tail, to listen to messages on socket and various web servers integrations, including uWSGI.&lt;/p&gt;

&lt;p&gt;It's a matter of adding a few lines to Telegraf config to enable uWSGI stats reporting. Resulting telegraf.conf is available in the &lt;a href="https://github.com/CheViana/uwsgi-playground-monitoring/blob/master/telegraf.conf" rel="noopener noreferrer"&gt;repo&lt;/a&gt;.&lt;br&gt;
To understand Telegraf tool better, let's use telegraf sample-config utility to compose configs with which telegraf can consume uWSGI metrics:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; telegraf -sample-config --input-filter uwsgi --output-filter influxdb &amp;gt; telegraf.conf
&amp;gt; cat telegraf.conf
...
# Read uWSGI metrics.
[[inputs.uwsgi]]
## List with urls of uWSGI Stats servers. URL must match pattern:
## scheme://address[:port]
##
## For example:
## servers = ["tcp://localhost:5050", "http://localhost:1717", "unix:///tmp/statsock"]
servers = ["tcp://127.0.0.1:1717"]

## General connection timeout
# timeout = "5s"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;code&gt;tcp://127.0.0.1:1717&lt;/code&gt; part doesn't match where our uWSGI exposes stats on. It's &lt;code&gt;http://127.0.0.1:9191&lt;/code&gt;. Need to update that in &lt;code&gt;telegraf.conf&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Another thing that needs to be updated is the address of where to send stats:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
## The full HTTP or UDP URL for your InfluxDB instance.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
# urls = ["unix:///var/run/influxdb.sock"]
# urls = ["udp://127.0.0.1:8089"]
urls = ["http://127.0.0.1:8086"]

## The target database for metrics; will be created as needed.
## For UDP url endpoint database needs to be configured on server side.
database = "uWSGI"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;I uncommented &lt;code&gt;urls = ["http://127.0.0.1:8086"]&lt;/code&gt; line and added &lt;code&gt;database = "uWSGI"&lt;/code&gt; so metrics from uWSGI stats server will flow in separate database.&lt;/p&gt;

&lt;p&gt;Resulting telegraf.conf is in the &lt;a href="https://github.com/CheViana/uwsgi-playground-monitoring/blob/master/telegraf.conf" rel="noopener noreferrer"&gt;repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To run telegraf:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; telegraf -config telegraf.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Looking in InfluxDB console output, can see:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2020-07-20T01:18:26.727302Z info    Executing query {"log_id": "0O6K1AQG000", "service": "query", "query": "CREATE DATABASE uWSGI"}
[httpd] 127.0.0.1 - - [19/Jul/2020:21:18:26 -0400] "POST /query HTTP/1.1" 200 57 "-" "Telegraf/1.14.5" e9bd1368-ca26-11ea-8005-88e9fe853b3a 428
[httpd] 127.0.0.1 - - [19/Jul/2020:21:18:40 -0400] "POST /write?db=uWSGI HTTP/1.1" 204 0 "-" "Telegraf/1.14.5" f1a77b04-ca26-11ea-8006-88e9fe853b3a 137748
[httpd] 127.0.0.1 - - [19/Jul/2020:21:18:50 -0400] "POST /write?db=uWSGI HTTP/1.1" 204 0 "-" "Telegraf/1.14.5" f79d5380-ca26-11ea-8007-88e9fe853b3a 7695
[httpd] 127.0.0.1 - - [19/Jul/2020:21:19:00 -0400] "POST /write?db=uWSGI HTTP/1.1" 204 0 "-" "Telegraf/1.14.5" fd928134-ca26-11ea-8008-88e9fe853b3a 8534
[httpd] 127.0.0.1 - - [19/Jul/2020:21:19:10 -0400] "POST /write?db=uWSGI HTTP/1.1" 204 0 "-" "Telegraf/1.14.5" 0388c512-ca27-11ea-8009-88e9fe853b3a 9345
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This means Telegraf is already busy writing uWSGI metrics to database, but we can't see them yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run Grafana - UI for metrics monitoring
&lt;/h3&gt;

&lt;p&gt;Launch grafana web server:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; cd path/to/dir/with/installed/graphana
&amp;gt; bin/grafana-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Navigating to &lt;code&gt;http://localhost:3000/&lt;/code&gt; in browser opens window with Grafana UI. Login using "admin" "admin" and create new pass as it asks to.&lt;/p&gt;

&lt;p&gt;I'm providing screenshots of how to deal with Grafana UI as it was confusing to me. Here Grafana version v7.1.0 is featured on screenshots, UI might look different for other version.&lt;/p&gt;

&lt;p&gt;What's left to setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;create data source in Grafana for InfluxDB&lt;/li&gt;
&lt;li&gt;add panels that will show metrics readings over time - can do "import it all" way, can do manually&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Create InfluxDB data source in Grafana
&lt;/h3&gt;

&lt;p&gt;Pick "Configuration -&amp;gt; Data Sources -&amp;gt; Add data source". Select InfluxDB.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Faxtsfvi7tq1yacl2tnv6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Faxtsfvi7tq1yacl2tnv6.png" alt="Grafana configuration menu"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Put "&lt;a href="http://127.0.0.1:8086" rel="noopener noreferrer"&gt;http://127.0.0.1:8086&lt;/a&gt;" in HTTP/URL input and database name "uWSGI" in "InfluxDB Details/Database" input.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqmldpn522ei7f5b4dk0i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqmldpn522ei7f5b4dk0i.png" alt="New datasource screen in Grafana"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on "Save and Test" button in the bottom of the screen, green noty reading "Data source is working" should appear.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpicc491nkgdfc7pdpkth.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpicc491nkgdfc7pdpkth.png" alt="Success new datasource in Grafana"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Visualizing uWSGI metrics in Grafana
&lt;/h3&gt;

&lt;p&gt;Now Grafana can read metrics from database, let's visualize them - add graph panel for each metric.&lt;/p&gt;

&lt;p&gt;Grafana dashboard consists of panels, panel can show how particular metric changed in time. There are lots of types of panels but we will only deal with Time Series panel in this tutorial.&lt;/p&gt;

&lt;p&gt;Here's &lt;a href="https://snapshot.raintank.io/dashboard/snapshot/IFfCGXltm0Z6Kz5T65Vf0UGfEsWEaKpN" rel="noopener noreferrer"&gt;dashboard snapshot&lt;/a&gt; with measurements of hello-world app with 4 workers being loaded by artificial users. That dashboard is monitoring following uwsgi metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;harakiri count&lt;/li&gt;
&lt;li&gt;worker status (busy/idle/cheap/total)&lt;/li&gt;
&lt;li&gt;listen queue size&lt;/li&gt;
&lt;li&gt;workers amount&lt;/li&gt;
&lt;li&gt;worker requests&lt;/li&gt;
&lt;li&gt;in-request sum&lt;/li&gt;
&lt;li&gt;respawn count&lt;/li&gt;
&lt;li&gt;worker avg request time&lt;/li&gt;
&lt;li&gt;worker running time&lt;/li&gt;
&lt;li&gt;load sum&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are more things that can be monitored: amount of transmitted data for worker, amount of exceptions for worker, etc. I didn't need these metrics or prefer to monitor them in other ways.&lt;/p&gt;

&lt;p&gt;Can also configure &lt;a href="https://uwsgi-docs.readthedocs.io/en/latest/Options.html#memory-report" rel="noopener noreferrer"&gt;uWSGI memory reporting&lt;/a&gt; - how much memory uWSGI consumes. I prefer to watch memory consumption from system monitoring though, along with CPU consumption.&lt;/p&gt;

&lt;p&gt;uWSGI even allows to &lt;a href="https://uwsgi-docs.readthedocs.io/en/latest/Metrics.html" rel="noopener noreferrer"&gt;expose your own metrics&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Cheaper subsystem plugin, and other plugins, have their own metrics too.&lt;/p&gt;

&lt;p&gt;I am providing instructions on how to &lt;strong&gt;setup all panels at once&lt;/strong&gt;, using dashboard JSON, or how to &lt;strong&gt;manually add panels one by one&lt;/strong&gt;, and learn a bit more in the process.&lt;/p&gt;

&lt;h4&gt;
  
  
  Manual dashboard setup: add panels one by one
&lt;/h4&gt;

&lt;p&gt;Might be beneficial to read this first &lt;a href="https://grafana.com/docs/grafana/latest/panels/add-a-panel/" rel="noopener noreferrer"&gt;How to add a panel in Grafana&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;How to create new panel in new dashboard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to "Dashboards -&amp;gt; Manage dashboards", click on "New dashboard"&lt;/li&gt;
&lt;li&gt;click "New panel" or "Add panel" in top right corner&lt;/li&gt;
&lt;li&gt;on dropdown next to new panel title pick "Edit"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How to populate panel with avg worker request time data:&lt;br&gt;
In panel edit mode, select Query source - InfluxDB.&lt;br&gt;
Modify query builder inputs: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;From default &lt;em&gt;uwsgi_workers&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Select field(&lt;em&gt;avg_rt&lt;/em&gt;) mean()
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fa2nexncht47p9ygr09nb.png" alt="Edit panel query"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is important to tell the panel that the metric it displays is measured in microseconds: in the right column, expand "Axes", "Left Y", "Unit" - "microseconds".&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fra7go02d80v4saxghyxx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fra7go02d80v4saxghyxx.png" alt="Edit panel series units"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, it's nice to configure points thickness so that you can see them clearly ("Display" - "Line width" in the right column).&lt;br&gt;
Give a meaningful name to that panel (in top of right column), save panel (click "Apply" in top right corner).&lt;/p&gt;

&lt;p&gt;Congrats, now you have a panel that shows avg request time of all uwsgi workers in real time. Make sure that the refresh frequency selector is set to like 10 sec (tiny dropdown in top right corner of dashboard page) and that the webserver, wrk2, telegraf, and InfluxDB are all still running.&lt;/p&gt;

&lt;p&gt;Similar process should be repeat for rest of panel - find queries to use below in this post, and can paste queries in panel "edit query" input.&lt;/p&gt;

&lt;h4&gt;
  
  
  Automatic dashboard setup: import uWSGI stats dashboard JSON
&lt;/h4&gt;

&lt;p&gt;To setup uWSGI stats monitoring dashboard, can use JSON in &lt;a href="https://github.com/CheViana/uwsgi-playground-monitoring/blob/master/uwsgi-dashboard-model.json" rel="noopener noreferrer"&gt;uwsgi-dashboard-model.json&lt;/a&gt;, and export that to new dashboard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to "Dashboards -&amp;gt; Manage dashboards", click on "New dashboard"&lt;/li&gt;
&lt;li&gt;Go to Dashboard settings ("wheel" button at the top right of the page with new dashboard)&lt;/li&gt;
&lt;li&gt;In the left menu, pick "JSON Model"&lt;/li&gt;
&lt;li&gt;In JSON, find the "panels" field (it should be empty) and paste there contents of field "panels" from &lt;a href="https://github.com/CheViana/uwsgi-playground-monitoring/blob/master/uwsgi-dashboard-model.json" rel="noopener noreferrer"&gt;uwsgi-dashboard-model.json&lt;/a&gt;. Pasting all JSON doesn't work for me for some reason.&lt;/li&gt;
&lt;li&gt;Click "Save changes" at the bottom of Dashboard settings page.
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fap9fsc6djvoayhwqlljw.png" alt="Export dashboard 1"&gt;
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fib3bwuqlwoavqjycn7o7.png" alt="Export dashboard 2"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Congrats, now you have dashboard that shows uWSGI stats in real time. Make sure refresh frequency selector is set to like 10 sec, tiny dropdown in top right corner of dashboard page, and that the webserver, wrk2, telegraf, and InfluxDB are all still running.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grafana specifics: time filter and time interval
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;$timeFilter&lt;/code&gt;, seen in panel query builder, stands for time range picked in Grafana UI, the period of time for which you want to see metrics.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$__timeInterval&lt;/code&gt;, seen in panel query builder, stands for time interval. Time interval depends on what's the time range you're looking at in dashboard, meaning the time interval between the two nearest dots on series. Looking at a 1 hour range in dashboard (put &lt;code&gt;from=now-1h&amp;amp;to=now&lt;/code&gt; in URL or use the time range picker in the upper right corner of dashboard), I see a dot at 11:01:02. The next dot is 11:01:06, so the time interval is 4 seconds.&lt;/p&gt;

&lt;p&gt;The time interval is important to use so that the dashboard for a large range (e.g. 30 days) loads in a reasonable amount of time. &lt;/p&gt;

&lt;p&gt;Telegraf is configured to send data from uWSGI stats server to InfluxDB once every 10 seconds. If the time interval is bigger than 10 seconds, one dot in the series corresponds to avg/sum/... (set in query which one) of multiple uWSGI stats readings (those that got into time interval).&lt;/p&gt;

&lt;h2&gt;
  
  
  uWSGI stats - by metric
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Harakiri count
&lt;/h3&gt;

&lt;p&gt;uWSGI has a useful feature, to kill worker if worker executes request for longer than defined time (e.g. 10 sec). It's called "harakiri". To configure it, set uwsgi option &lt;code&gt;harakiri=10&lt;/code&gt; (in uwsgi.ini), where 10 is the time of the longest allowed request in seconds.&lt;br&gt;
This option can be used to protect from DDoS attacks that exploit long-executing requests. Such attacks flood a website with long-executing requests to such an extend that the website doesn't have the capacity (free workers) to serve regular user traffic, since all the workers are busy executing attacker's requests.&lt;br&gt;
Setting low harakiri can bite you if the web server is expected to serve long-executing requests in some rare cases. One needs to analyze what's the longest valid request time. Consider refactoring code to avoid long-executing requests in worker processes. There are lots of options depending on the specifics of the problem: uWSGI spooler to send emails, uWSGI mules, or some kind of task queue for long-executing requests that's separate from the webserver. &lt;/p&gt;

&lt;p&gt;Harakiri panel query is as follows:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT sum("harakiri_count") FROM "uwsgi_workers" WHERE $timeFilter GROUP BY time($__interval) fill(null)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This will shows sum of harakiri event for time interval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Worker status
&lt;/h3&gt;

&lt;p&gt;uWSGI worker can be in few states: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;idle (not working on a request)&lt;/li&gt;
&lt;li&gt;busy (working on requests)&lt;/li&gt;
&lt;li&gt;cheap (see uWSGI cheaper subsystem docs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I configured Worker status panel to show idle, busy, cheap and total amounts of workers.&lt;/p&gt;

&lt;p&gt;Query:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT count("status") FROM "uwsgi_workers" WHERE  $timeFilter and "status"='busy' GROUP BY time($__interval) fill(null)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;for "busy" series, for other series replace "busy" with "idle" or "cheap", for total - omit status clause (... WHERE  $timeFilter GROUP BY ...).&lt;/p&gt;

&lt;p&gt;Can configure this panel in different way - to show worker busyness in percent for previous period of time. I thought this approach more useful for &lt;code&gt;uwsgitop&lt;/code&gt; and more confusing when metrics are monitored in time by separate monitoring stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Listen queue size
&lt;/h3&gt;

&lt;p&gt;If all uWSGI workers are busy working on requests while new requests are arriving, those new ones are first put in queue (socket listen queue) to wait for next free worker.&lt;br&gt;
The size of the listen queue is configurable using the option &lt;code&gt;listen=64&lt;/code&gt; but max allowed value depends on system max socket listen queue size, so you might need to &lt;a href="https://community.webcore.cloud/tutorials/uwsgi_your_server_socket_listen_backlog_is_limited/" rel="noopener noreferrer"&gt;increase system value first&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Panel query is:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT sum("listen_queue") FROM "uwsgi_overview" WHERE $timeFilter GROUP BY time($interval)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  The number of workers
&lt;/h3&gt;

&lt;p&gt;The number of workers uWSGI is currently running. Makes more sense when cheaper subsystem is in use, or when cluster has multiple web server instances with scaling (all reporting to same DB) - then this count changes.&lt;/p&gt;

&lt;p&gt;The query to get the number of workers with IDs 1 - 4: &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT count("avg_rt") FROM "uwsgi_workers" WHERE $timeFilter AND ("worker_id"='1' OR "worker_id"='2' OR "worker_id"='3' OR "worker_id"='4') GROUP BY time($__interval) fill(null)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This is an indirect metric that counts how many times &lt;code&gt;avg_rt&lt;/code&gt; metric was reported for "worker_id"='1' during the time interval. It the time interval is bigger than 10 sec (telegraf queries uWSGI stats once every 10 sec) - e.g. when time range you're looking at is 6 hours, this actually shows incorrect data. &lt;br&gt;
A question to figure out if you're curious and don't mind digging into Grafana docs: how does one make it show correct data regardless of chosen time range? Comment below if you find out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Worker requests
&lt;/h3&gt;

&lt;p&gt;How much requests one worker has executed since worker process was started. When webserver serves requests, this measurement rises smoothly. When a worker restarts, it falls to zero.&lt;/p&gt;

&lt;p&gt;Query is:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT mean("requests") FROM "uwsgi_workers" WHERE $timeFilter GROUP BY time($interval), "worker_id"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  In-request sum
&lt;/h3&gt;

&lt;p&gt;How many requests uWSGI is working on right now, across all workers, sum for time interval.&lt;/p&gt;

&lt;p&gt;Query:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT sum("in_request") FROM "uwsgi_cores" WHERE $timeFilter GROUP BY time($interval)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Respawn count
&lt;/h3&gt;

&lt;p&gt;Query:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT mean("respawn_count") FROM "uwsgi_workers" WHERE $timeFilter GROUP BY time($interval), "uwsgi_host"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;How many times workers were respawned since uWSGI start.&lt;/p&gt;

&lt;p&gt;For production system it's recommended to gracefully respawn uwsgi workers after some time, to avoid excessive memory consumption of long-living processes, etc.&lt;/p&gt;

&lt;p&gt;To limit amount of executed requests after which to respawn, add in uWSGI options:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;max-requests=10000
max-requests-delta=1000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This will respawn worker after 10000+(1000*worker_id) requests. Purpose of delta is not to respawn all workers at the same time.&lt;/p&gt;

&lt;p&gt;To limit the amount of time passed after which to respawn, add in options:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;max-worker-lifetime=36000
max-worker-lifetime-delta=3600
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Lifetime values are in sec. This will respawn a worker if 10h+(1h*worker_id) of time has passed since last respawn.&lt;/p&gt;

&lt;p&gt;One can use both limits together; whichever one occurs first for particular worker will take effect.&lt;/p&gt;

&lt;h3&gt;
  
  
  Worker avg request time
&lt;/h3&gt;

&lt;p&gt;Query:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT mean("avg_rt") FROM "uwsgi_workers" WHERE $timeFilter GROUP BY time($__interval), "worker_id" fill(null)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Amount of time worker spends on request, on average, per each worker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Worker running time
&lt;/h3&gt;

&lt;p&gt;Query:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT mean("running_time") FROM "uwsgi_workers" WHERE  $timeFilter GROUP BY time($interval), "worker_id"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;How long is worker running; time since the last respawn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load sum
&lt;/h3&gt;

&lt;p&gt;Query:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT mean("load") FROM "uwsgi_overview" WHERE $timeFilter GROUP BY time($interval)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;I am 100% sure what's this exactly. It seems to me this is something like "load average" for CPU top, but a "sum". This metric rises dramatically when web server is overloading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;At this point, we have a basic uWSGI web server and a monitored playground to watch over it while we experiment with uWSGI configurations.&lt;/p&gt;

&lt;p&gt;I encourage you to try out the effects of changing uWSGI options on the web server. Start by setting &lt;code&gt;max-requests=10&lt;/code&gt; in &lt;code&gt;uwsgi.ini&lt;/code&gt; and compare how that changes stats for web server under load. You can also check how it affects the results seen in load tool (wrk2) summary. &lt;/p&gt;

</description>
      <category>uwsgi</category>
      <category>webdev</category>
      <category>grafana</category>
      <category>monitoring</category>
    </item>
  </channel>
</rss>
