<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nikolay Sivko</title>
    <description>The latest articles on DEV Community by Nikolay Sivko (@def).</description>
    <link>https://dev.to/def</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F812051%2F40157b16-49e9-4c13-a345-c7bd3504d2d0.png</url>
      <title>DEV Community: Nikolay Sivko</title>
      <link>https://dev.to/def</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/def"/>
    <language>en</language>
    <item>
      <title>Delay accounting: an underrated feature of the Linux kernel</title>
      <dc:creator>Nikolay Sivko</dc:creator>
      <pubDate>Tue, 15 Mar 2022 16:02:59 +0000</pubDate>
      <link>https://dev.to/def/delay-accounting-an-underrated-feature-of-the-linux-kernel-2hjf</link>
      <guid>https://dev.to/def/delay-accounting-an-underrated-feature-of-the-linux-kernel-2hjf</guid>
      <description>&lt;p&gt;Nowadays, in the era of microservices, infrastructures have become super-complex: dynamic nodes provisioning, autoscaling, dozens or even hundreds of containers working side by side. In order to maintain control over such infrastructure, we need to be able to know what has happened to each application at any given time.&lt;/p&gt;

&lt;p&gt;First, let's look at computing resources. Usually, when engineers talk about resources, they are actually referring to utilization which is not always correct. For example, the high utilization of a CPU by a container is not an issue in general. The real problem here is that this can cause another container on the same machine to perform slower due to a lack of CPU time.&lt;/p&gt;

&lt;p&gt;Only a few people know that the Linux kernel counts exactly how long each task has been waiting for kernel resources to become available. For instance, a task could wait for CPU time or synchronous block I/O to complete. Such delays usually directly affect application latency, so measuring this seems quite reasonable.&lt;/p&gt;

&lt;p&gt;When researching how we could detect CPU-related issues, we came to the conclusion that the CPU delay of a container is a perfect starting point for investigation. The kernel provides per-PID or per-TGID &lt;a href="https://www.kernel.org/doc/html/latest/accounting/delay-accounting.html"&gt;statistics&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The per-PID (Process ID) statistics correspond only to the main thread of a process.&lt;/li&gt;
&lt;li&gt;The per-TGID (Task Group ID) statistics are the sum over all threads of a process.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/coroot/coroot-node-agent"&gt;Node-agent&lt;/a&gt; gathers per-TGID statistics from the kernel through the &lt;a href="https://man7.org/linux/man-pages/man7/netlink.7.html"&gt;Netlink&lt;/a&gt; protocol and aggregates it to the per-container metrics:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ItM0b8oj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/delay_accounting_aggregation.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ItM0b8oj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/delay_accounting_aggregation.svg" alt="aggregation" width="880" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's see how one of the resulting metrics can help in detecting various CPU-related issues. I'll use failure scenarios from Failurepedia, our library of recorded failure scenarios, to illustrate this.&lt;/p&gt;

&lt;h2&gt;
  
  
  A noisy neighbor
&lt;/h2&gt;

&lt;p&gt;In this scenario, a CPU-intensive container (&lt;code&gt;stats-aggregator&lt;/code&gt;) runs on the same node as a latency-sensitive app (&lt;code&gt;reservations&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--816yzH6u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hzngfw3sd2orvgf3t89n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--816yzH6u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hzngfw3sd2orvgf3t89n.png" alt="slo1" width="880" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the chart above, we can see that the &lt;code&gt;reservations&lt;/code&gt; app is not meeting its SLOs: requests are performing slowly (the red area) and failing (the black area). Now let's look at the CPU metrics related to this application:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2-4SEVuw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cn4aam22u8vyypae1v17.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2-4SEVuw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cn4aam22u8vyypae1v17.png" alt="cpu1" width="880" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Comparing these metrics, we can easily conclude that the &lt;code&gt;stats-aggregator&lt;/code&gt; has caused the issue. Yet, could we do such analysis without using the &lt;a href="https://coroot.com/docs/metrics/node-agent#container_resources_cpu_delay_seconds_total"&gt;container_resources_cpu_delay_seconds_total&lt;/a&gt; metric?&lt;/p&gt;

&lt;p&gt;On the contrary, here are the same metrics for a case where &lt;code&gt;stats-aggregator-(be)&lt;/code&gt; has started with the lowest CPU priority (the &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-besteffort"&gt;Best-Effort&lt;/a&gt; QOS class was assigned).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---0IMdBI_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g0bn0lnm3ixdm40bw3m9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---0IMdBI_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g0bn0lnm3ixdm40bw3m9.png" alt="slo2" width="880" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the CPU utilization of the node is 100%, but this is not significantly affecting the &lt;code&gt;reservations&lt;/code&gt;' SLIs.&lt;/p&gt;

&lt;p&gt;At Coroot, we came up with the following algorithm to detect CPU-related issues of a particular application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the app is meeting its SLOs, it's not necessary to check anything else.&lt;/li&gt;
&lt;li&gt;If the total CPU delay of all containers of the app is correlating with the affected SLIs of this app, this means the app is experiencing a lack of CPU time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I must clarify why checking correlation can often be tricky. As seen in the first scenario above, several SLIs (latency and success rate) were affected all at once. This is why Coroot also checks the correlation between the metric and the composition of the relevant SLIs.&lt;/p&gt;

&lt;p&gt;In the first stage, we had an issue with the application response time, so there was a strong correlation between this SLI and the total CPU delay.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hoyRbQEc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kk27rwp0cmdsuwxozu8q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hoyRbQEc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kk27rwp0cmdsuwxozu8q.png" alt="correlation1" width="880" height="204"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the errors appeared, the correlation between cpu_delay and the response time became weak. Yet, the correlation between the sum of errors and slow requests was strong enough to assure that the container did indeed experience a lack of CPU time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JXOj3iAO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bhkjxc4q4t8w8hul06ks.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JXOj3iAO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bhkjxc4q4t8w8hul06ks.png" alt="correlation2" width="880" height="204"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A lack of CPU time can be caused by several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A container competes for CPU time against the other containers running on the same node.&lt;/li&gt;
&lt;li&gt;The Linux CPU scheduler (CFS) &lt;a href="https://tempesta-tech.com/blog/nginx-tail-latency#the-linux-scheduler"&gt;preempts&lt;/a&gt; the processes of a container in favor of the other containers on the same node.&lt;/li&gt;
&lt;li&gt;A container consumes all available CPU time on its own.&lt;/li&gt;
&lt;li&gt;A container has reached its CPU limit and has been throttled by the system.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CPU throttling
&lt;/h2&gt;

&lt;p&gt;If a container is limited in CPU time (throttled), the &lt;code&gt;cpu_delay&lt;/code&gt; metric is bound to increase. However, in this case, we have a separate metric that measures how long each container has been throttled — &lt;a href="https://coroot.com/docs/metrics/node-agent#container_resources_cpu_throttled_seconds_total"&gt;container_resources_cpu_throttled_seconds_total&lt;/a&gt;. Thus, Coroot uses it in addition to &lt;code&gt;cpu_delay&lt;/code&gt; to determine that a container has reached its CPU limit and it is that which caused a lack of CPU time.&lt;/p&gt;

&lt;p&gt;Here is a failure scenario where the number of requests to the &lt;code&gt;image-resizer&lt;/code&gt; app is gradually increased until its containers reach their CPU limits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TX2t2nHb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/756i2l9rqj3l54c64ccw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TX2t2nHb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/756i2l9rqj3l54c64ccw.png" alt="slo3" width="880" height="659"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Detection also works by checking the correlation between the affected SLIs of the app and throttled_time. Thus, if the correlation is strong, we can conclude that the cause of a CPU lack is throttling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_XjHmjOm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3bv3etbnprchae07brh9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_XjHmjOm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3bv3etbnprchae07brh9.png" alt="correlation4" width="880" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/coroot/coroot-node-agent"&gt;node-agent&lt;/a&gt; (Apache-2.0 License)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://coroot.com/docs/inspections/cpu"&gt;CPU Inspection&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://coroot.com/auth/signup"&gt;Try Coroot for free&lt;/a&gt; (14-day trial)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>monitoring</category>
      <category>sre</category>
      <category>kubernetes</category>
      <category>linux</category>
    </item>
    <item>
      <title>How ping measures network round-trip time accurately using SO_TIMESTAMPING</title>
      <dc:creator>Nikolay Sivko</dc:creator>
      <pubDate>Mon, 28 Feb 2022 15:13:07 +0000</pubDate>
      <link>https://dev.to/def/how-ping-measures-network-round-trip-time-accurately-using-sotimestamping-1b5g</link>
      <guid>https://dev.to/def/how-ping-measures-network-round-trip-time-accurately-using-sotimestamping-1b5g</guid>
      <description>&lt;p&gt;While working on &lt;a href="https://github.com/coroot/coroot-node-agent"&gt;node-agent&lt;/a&gt;, we set out to measure network latency between containers and the services they communicate with. Since the agent has already discovered the endpoints that each container communicates with, we just need to measure network latency. We embedded "pinger" directly into the agent to measure end-to-end latency because the ICMP Echo requests should be sent from within the network namespace of each container.&lt;/p&gt;

&lt;p&gt;I've looked at the most popular pinger implementations on Go, and unfortunately, all of them use userspace-generated timestamps to calculate RTTs (Round-Trip Times). This can lead to significant measurement errors, especially in the case of a lack of CPU time on the node due to throttling or high utilization.&lt;/p&gt;

&lt;p&gt;Let's look at how ping actually measures RTT.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2fcKkxxT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/ping_rtt.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2fcKkxxT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/ping_rtt.svg" alt="ping" width="880" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Though it looks simple enough, the devil is in the details. First, let's see how a packet is sent:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ff5kh4__--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/ping_sending.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ff5kh4__--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/ping_sending.svg" alt="ping sending" width="880" height="179"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main problem here is that a considerable amount of time can elapse between the request for a current timestamp and the actual sending of a packet. We can also see the same error in the receiving stage:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3djLW_Om--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/ping_receiving.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3djLW_Om--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/ping_receiving.svg" alt="ping receiving" width="880" height="179"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To illustrate the case where the error is quite noticeable, let's run ping in a container that is limited in CPU time. The &lt;code&gt;-U&lt;/code&gt; flag tells ping to use userspace-generated timestamps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-ti&lt;/span&gt; &lt;span class="nt"&gt;--cpu-period&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10000 &lt;span class="nt"&gt;--cpu-quota&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000 &lt;span class="nt"&gt;--name&lt;/span&gt; pinger ping 8.8.8.8 &lt;span class="nt"&gt;-U&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I'll start a CPU-consuming app in the same container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-ti&lt;/span&gt; pinger stress &lt;span class="nt"&gt;--cpu&lt;/span&gt; 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2TxaQ3nB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/ping.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2TxaQ3nB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/ping.png" alt="ping affected" width="880" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the measurement results were severely affected. If we run ping without &lt;code&gt;-U&lt;/code&gt;, stress does not affect RTT at all. How does ping generate timestamps to exclude extra time spent before the packet is actually sent or after it is received?&lt;/p&gt;

&lt;p&gt;There is an interface called &lt;a href="https://www.kernel.org/doc/Documentation/networking/timestamping.txt"&gt;SO_TIMESTAMPING&lt;/a&gt; that allows a userspace program to request the transmission/reception timestamp of a particular packet from the kernel. &lt;code&gt;SO_TIMESTAMPING&lt;/code&gt; supports multiple timestamp sources, but we will only use those that do not require special hardware or device driver support.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SOF_TIMESTAMPING_TX_SCHED&lt;/code&gt;: Request tx timestamps prior to entering the packet scheduler.&lt;br&gt;
&lt;code&gt;SOF_TIMESTAMPING_RX_SOFTWARE&lt;/code&gt;: These timestamps are generated just after a device driver hands a packet to the kernel receive stack.&lt;br&gt;
&lt;code&gt;SO_TIMESTAMPING&lt;/code&gt; can be enabled for a particular packet (by using &lt;a href="https://man7.org/linux/man-pages/man3/cmsg.3.html"&gt;control messages&lt;/a&gt;) or for every packet passing through the socket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;unix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOF_TIMESTAMPING_SOFTWARE&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;unix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOF_TIMESTAMPING_RX_SOFTWARE&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;unix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOF_TIMESTAMPING_TX_SCHED&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="n"&gt;unix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOF_TIMESTAMPING_OPT_CMSG&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;unix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOF_TIMESTAMPING_OPT_TSONLY&lt;/span&gt;

&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetsockoptInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;socketFd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOL_SOCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SO_TIMESTAMPING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Immediately after we have sent a packet to the socket, we can request the kernel for the timestamp of the packet transmission:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pkt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;// ...&lt;/span&gt;
&lt;span class="n"&gt;oob&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// a buffer for Out-Of-Band data where the kernel will write the timestamp&lt;/span&gt;
&lt;span class="c"&gt;// MSG_ERRQUEUE indicates that we want to receive a message from the socket's error queue&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oobn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Recvmsg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;socketFd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pktBuf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oob&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MSG_ERRQUEUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sentAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;getTimestampFromOutOfBandData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oob&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oobn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then parsing the received message to extract the timestamp:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;getTimestampFromOutOfBandData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oob&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oobn&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;cms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSocketControlMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oob&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;oobn&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cm&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;cms&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOL_SOCKET&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;cm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SO_TIMESTAMPING&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="n"&gt;unix&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ScmTimestamping&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;binary&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;binary&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LittleEndian&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unix&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"no timestamp found"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upon receiving, ancillary data is available along with the packet data, so there is no need to read anything from the socket error queue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oobn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ra&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadMsgIP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pktBuf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;//...&lt;/span&gt;
&lt;span class="n"&gt;receivedAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;getTimestampFromOutOfBandData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oob&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;oobn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And finally, we can calculate the RTT:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;rtt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;receivedAt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentAt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Another thing worth noting is that the timestamp of when the packet has been sent must be stored in the app's memory, not in the packet's payload. Otherwise, it can lead to &lt;a href="https://twitter.com/m_ou_se/status/1480184732562374656"&gt;funny consequences&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The resulting round-trip time reflects the real network latency much more accurately and is not subject to the mentioned errors. The agent uses this method to collect the &lt;a href="https://coroot.com/docs/metrics/node-agent#container_net_latency_seconds"&gt;container_net_latency_seconds&lt;/a&gt; metric for each container. Since this metric is broken down by &lt;code&gt;destination_ip&lt;/code&gt;, you can always find out what the network latency between the container and each one of the services it communicates with is. The limitation of this method is that it doesn't work if ICMP traffic is blocked, such as in the case of Amazon RDS instances.&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/coroot/coroot-node-agent"&gt;node-agent&lt;/a&gt; (Apache-2.0 License)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coroot.com/docs/inspections/upstream-service#network"&gt;How Coroot checks the correlation between network RTT and the app's SLIs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://coroot.com/auth/signup"&gt;Try Coroot for free&lt;/a&gt; (14-day trial)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>monitoring</category>
      <category>linux</category>
    </item>
    <item>
      <title>Building a service map using eBPF</title>
      <dc:creator>Nikolay Sivko</dc:creator>
      <pubDate>Fri, 25 Feb 2022 13:12:24 +0000</pubDate>
      <link>https://dev.to/def/building-a-service-map-using-ebpf-2bfl</link>
      <guid>https://dev.to/def/building-a-service-map-using-ebpf-2bfl</guid>
      <description>&lt;p&gt;Distributed request tracing is a popular method for monitoring distributed systems because it allows you to see the specific execution stages of any request, such as calls to other services and databases. However, the costs of integrating can be significantly high, since it requires changing the code of every component. Additionally, it's practically impossible to achieve 100% coverage, due to the fact that many third-party components don't support such instrumentation.&lt;/p&gt;

&lt;p&gt;To address said disadvantages, we implemented eBPF-based container tracing which is a part of our open source Prometheus exporter &lt;a href="https://github.com/coroot/coroot-node-agent"&gt;node-agent&lt;/a&gt;. It passively monitors all TCP connections on a node, associates every connection with the related container, and exports metrics in Prometheus format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# HELP container_net_tcp_successful_connects_total Total number of successful TCP connects&lt;/span&gt;
&lt;span class="c"&gt;# TYPE container_net_tcp_successful_connects_total counter&lt;/span&gt;
container_net_tcp_successful_connects_total&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;actual_destination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.128.0.43:443"&lt;/span&gt;,container_id&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/k8s/default/prometheus-0/prometheus-server"&lt;/span&gt;,destination&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.52.0.1:443"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt; 2

&lt;span class="c"&gt;# HELP container_net_tcp_active_connections Number of active outbound connections used by the container&lt;/span&gt;
&lt;span class="c"&gt;# TYPE container_net_tcp_active_connections gauge&lt;/span&gt;
container_net_tcp_active_connections&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;actual_destination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.128.0.43:443"&lt;/span&gt;,container_id&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/k8s/default/prometheus-0/prometheus-server"&lt;/span&gt;,destination&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.52.0.1:443"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt; 1

&lt;span class="c"&gt;# HELP container_net_tcp_failed_connects_total Total number of failed TCP connects&lt;/span&gt;
&lt;span class="c"&gt;# TYPE container_net_tcp_failed_connects_total counter&lt;/span&gt;
container_net_tcp_failed_connects_total&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;container_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/k8s/kube-system/konnectivity-agent-56cdbd78f-f7r7j/konnectivity-agent"&lt;/span&gt;,destination&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.48.2.2:10250"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt; 20

&lt;span class="c"&gt;# HELP container_net_tcp_listen_info Listen address of the container&lt;/span&gt;
&lt;span class="c"&gt;# TYPE container_net_tcp_listen_info gauge&lt;/span&gt;
container_net_tcp_listen_info&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;container_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/k8s/default/paymentservice-5849646947-b744v/server"&lt;/span&gt;,listen_addr&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.48.0.4:50051"&lt;/span&gt;,proxy&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These metrics are obtained using the &lt;code&gt;sock:inet_sock_set_state&lt;/code&gt; kernel tracepoint. As the name implies, this tracepoint is called whenever a TCP connection changes its state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connection establishing
&lt;/h2&gt;

&lt;p&gt;First, let's recall the TCP state transitions while establishing an outbound connection:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kUSL84NR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/tcp_states_connect.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kUSL84NR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/tcp_states_connect.svg" alt="tcp connect" width="880" height="131"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As seen in the diagram, a connection can be considered successfully established if the transition &lt;code&gt;SYN_SENT&lt;/code&gt; -&amp;gt; &lt;code&gt;ESTABLISHED&lt;/code&gt; has occurred. Conversely, &lt;code&gt;SYN_SENT&lt;/code&gt; -&amp;gt; &lt;code&gt;CLOSED&lt;/code&gt; means a failed connection attempt. Now, the tricky thing is that an eBPF program can only get the &lt;code&gt;PID&lt;/code&gt; of the connection initiator during the &lt;code&gt;CLOSED&lt;/code&gt; -&amp;gt; &lt;code&gt;SYN_SENT&lt;/code&gt; transition. The solution for that is saving the initiator's PID to a kernel space map using the socket's pointer as a key, then we can use it on further transitions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// saving the connection initiator's PID&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oldstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_CLOSE&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_SYN_SENT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;sk_info&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
    &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_get_current_pid_tgid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;bpf_map_update_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sk_info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skaddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF_ANY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;//getting the PID&lt;/span&gt;
&lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;sk_info&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map_lookup_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sk_info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skaddr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;bpf_map_delete_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sk_info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skaddr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full source code of the &lt;code&gt;inet_sock_set_state&lt;/code&gt; handler can be found &lt;a href="https://github.com/coroot/coroot-node-agent/blob/92ef04b486daf0703591edd7526b34fdb4d9cf54/ebpftracer/ebpf/tcp_state.c#L55"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connection closing
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--65OFHhOD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/tcp_states_close.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--65OFHhOD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/tcp_states_close.svg" alt="tcp close" width="880" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Depending on which peer closes the connection, either an &lt;code&gt;ESTABLISHED&lt;/code&gt; -&amp;gt; &lt;code&gt;FIN_WAIT1&lt;/code&gt; (active close) or an &lt;code&gt;ESTABLISHED&lt;/code&gt; -&amp;gt; &lt;code&gt;CLOSE_WAIT&lt;/code&gt; (passive close) transition will occur. The only thing worth noting here is that we cannot get a &lt;code&gt;PID&lt;/code&gt; when it's in a passive close, so we decided to resolve this in the userspace code. Since the handler does not need to differentiate between active and passive closes, it triggers in both cases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oldstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_ESTABLISHED&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_FIN_WAIT1&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_CLOSE_WAIT&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EVENT_TYPE_CONNECTION_CLOSE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  TCP LISTEN
&lt;/h2&gt;

&lt;p&gt;In addition to outgoing connection tracing, we need to discover all listening sockets and containers associated with them. This can also be done by handling the &lt;code&gt;CLOSED&lt;/code&gt; -&amp;gt; &lt;code&gt;LISTEN&lt;/code&gt; and &lt;code&gt;LISTEN&lt;/code&gt; -&amp;gt; &lt;code&gt;CLOSED&lt;/code&gt; transitions. The initiator's &lt;code&gt;PID&lt;/code&gt; is available in both cases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oldstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_CLOSE&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_LISTEN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EVENT_TYPE_LISTEN_OPEN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oldstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_LISTEN&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newstate&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;BPF_TCP_CLOSE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EVENT_TYPE_LISTEN_CLOSE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the IP address of a listen socket is unspecified (&lt;code&gt;0.0.0.0&lt;/code&gt;) the agent replaces it with the IP addresses assigned to the corresponding network namespace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Network Address Translation (NAT)
&lt;/h2&gt;

&lt;p&gt;Each event sent by the eBPF-program to the agent contains the source and destination &lt;code&gt;IP:PORT&lt;/code&gt; pairs. However, the destination address can be virtual, as in the case of Kubernetes services. In this case, &lt;code&gt;ClusterIP&lt;/code&gt; of the service will be replaced by the IP of a particular Pod by &lt;code&gt;iptables&lt;/code&gt; or &lt;code&gt;IPVS&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---1Xb-FBu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/k8s_service_nat.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---1Xb-FBu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://coroot.com/static/img/blog/k8s_service_nat.svg" alt="tcp close" width="880" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since we need to know the services that each container interacted with as well as the specific containers involved in this interaction, the agent resolves the actual destination of each connection by querying the &lt;code&gt;conntrack&lt;/code&gt; table using the Netlink protocol. As a result, each metric contains both &lt;code&gt;destination&lt;/code&gt; and &lt;code&gt;actual_destination&lt;/code&gt; labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;destination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.96.0.1:80"&lt;/span&gt;, &lt;span class="nv"&gt;actual_destination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"192.168.1.11:80"&lt;/span&gt;, &lt;span class="nv"&gt;container_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/k8s/default/client-pod/server"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Initialization
&lt;/h2&gt;

&lt;p&gt;In addition to being notified of every new connection, the agent needs to detect all connections that were established before it started. To do this, it reads the information about the established connections from &lt;code&gt;/proc/net/tcp&lt;/code&gt; and &lt;code&gt;/proc/net/tcp6&lt;/code&gt; in each network namespace on startup. The inode of each connection from these files is mapped with &lt;code&gt;/proc/&amp;lt;pid&amp;gt;/fd&lt;/code&gt; to find the corresponding container.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we use these metrics
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;container_net_tcp_successful_connects_total&lt;/code&gt; and &lt;code&gt;container_net_tcp_active_connections&lt;/code&gt; metrics show which &lt;code&gt;IP:PORT&lt;/code&gt; each container is communicating with.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;container_net_tcp_listen_info&lt;/code&gt; metric shows on which &lt;code&gt;IP:PORT&lt;/code&gt; each container is accepting inbound TCP connections.
So, joining these metrics by &lt;code&gt;IP:PORT&lt;/code&gt; allows us to build a map of container-to-container communications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GeWJPhRe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2zwdj553x0zdj9fx4mp2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GeWJPhRe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2zwdj553x0zdj9fx4mp2.png" alt="instance-to-instance" width="880" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To build a map of service-to-service communications, Coroot aggregates individual containers into applications, such as &lt;code&gt;Deployments&lt;/code&gt; or &lt;code&gt;StatefulSets&lt;/code&gt; by using metrics from the &lt;code&gt;kube-state-metrics&lt;/code&gt; exporter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--REIfGPjs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/55lpeyan47ljz7n4qvkj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--REIfGPjs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/55lpeyan47ljz7n4qvkj.png" alt="app-to-app" width="880" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A service map like this can give you an overview of the distributed system architecture.&lt;/p&gt;

&lt;p&gt;Here I've only described a portion of the network metrics needed to build a service map. Next time we will talk about metrics, which are extremely useful for troubleshooting network-related issues. Stay tuned!&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/coroot/coroot-node-agent"&gt;node-agent&lt;/a&gt;  (Apache-2.0 License)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://coroot.com/docs/inspections/upstream-service#network"&gt;Upstream Service Inspection&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://coroot.com/auth/signup"&gt;Try Coroot for free&lt;/a&gt; (14-day trial)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ebpf</category>
      <category>sre</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Mining metrics from unstructured logs</title>
      <dc:creator>Nikolay Sivko</dc:creator>
      <pubDate>Thu, 24 Feb 2022 11:27:39 +0000</pubDate>
      <link>https://dev.to/def/mining-metrics-from-unstructured-logs-a8h</link>
      <guid>https://dev.to/def/mining-metrics-from-unstructured-logs-a8h</guid>
      <description>&lt;p&gt;Looking through endless postmortem reports and talking to other SREs, I feel that about 80% of outages are caused by similar factors: infrastructure failures, network errors/delays, lack of computing resources, etc. However, the cause of the remaining 20% can vary significantly since not all applications are built the same way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logs
&lt;/h2&gt;

&lt;p&gt;All well-designed applications log their errors, so using a log management system can help you navigate through the logs quite efficiently. Yet, the cost of such systems can be unreasonably high. In fact, engineers don't even need all the logs when investigating an incident, because it's impossible to read every single message in a short period of time. Instead, they try extracting some sort of summary from the logs related to a specific timespan. If using console utilities, it should look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Getting the stream of the related log:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt; cat | tail | journalctl | kubectl logs &amp;gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Identifying the format to filter messages by severity:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;... | grep ERROR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If needed, doing something with multi-line messages:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;... | grep -C&amp;lt;N&amp;gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Clustering (grouping) messages to identify the most frequent errors:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;... |  &amp;lt; some awk | sed | sort magic &amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The resulting summary is usually a list of message groups with a sample and the number of occurrences.&lt;/p&gt;

&lt;p&gt;It seems like this particular type of analysis can be done without a centralized log management system. Moreover, with some automation, we can greatly speed up this type of investigation.&lt;/p&gt;

&lt;p&gt;At Coroot, we implemented an automated log parsing in our open-source Prometheus exporter &lt;a href="https://github.com/coroot/coroot-node-agent" rel="noopener noreferrer"&gt;node-agent&lt;/a&gt;. To explain how it works, let's follow the same steps as I mentioned above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Container logs discovery
&lt;/h2&gt;

&lt;p&gt;The agent can discover the logs of each container running on a node by using different data sources depending on the container type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A process might log directly to a file in &lt;code&gt;/var/log&lt;/code&gt;. In this case, the agent detects this by reading the list of open files of each process from &lt;code&gt;/proc/&amp;lt;pid&amp;gt;/fd&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Services managed by &lt;code&gt;systemd&lt;/code&gt; usually use &lt;code&gt;journald&lt;/code&gt; as a logging backend, so the agent can read a container's log from &lt;code&gt;journald&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To find logs of Kubernetes pods or standalone containers, the agent uses meta-information from &lt;code&gt;dockerd&lt;/code&gt; or &lt;code&gt;containerd&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Handling multi-line messages
&lt;/h2&gt;

&lt;p&gt;To extract multi-line messages from a log stream, we use a few dead-simple heuristics. The general one: if a line contains a field that looks like a datetime, and the following lines don't — these are probably parts of a multi-line message. In addition to that, the agent uses some special &lt;a href="https://github.com/coroot/logparser/blob/d998966a74541e3fde1a69fde7801bc2e58317c8/multiline.go#L101" rel="noopener noreferrer"&gt;parsing rules&lt;/a&gt; for Java stack traces and Python tracebacks.&lt;/p&gt;

&lt;p&gt;According to our tests, this simple approach works pretty accurately and it can even handle custom multi-line messages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Severity level detection
&lt;/h2&gt;

&lt;p&gt;I would say it's not actually detection, it's more like guessing. For most log formats, the agent simply looks for a mention of known levels at the beginning of each message. In addition, we implemented special parsers for &lt;a href="https://github.com/kubernetes/klog" rel="noopener noreferrer"&gt;glog/klog&lt;/a&gt; and &lt;a href="https://github.com/coroot/logparser/blob/d998966a74541e3fde1a69fde7801bc2e58317c8/level.go#L112" rel="noopener noreferrer"&gt;redis&lt;/a&gt; log formats. If it is impossible to detect the level of severity, the &lt;code&gt;UNKNOWN&lt;/code&gt; level is used.&lt;/p&gt;

&lt;h2&gt;
  
  
  Messages clustering
&lt;/h2&gt;

&lt;p&gt;A great compilation of the publications related to automated log parsing is available &lt;a href="https://github.com/logpai/logparser#log-parsers-currently-available" rel="noopener noreferrer"&gt;here&lt;/a&gt;. After reviewing these papers, we realized that our problem is much simpler than the ones most researchers have tried to solve. We just want to cluster messages to groups identified by some fingerprint. In doing so, we do not need to recognize the message format itself.&lt;/p&gt;

&lt;p&gt;Our approach is entirely empirical and based on the observation that a logging entity is actually a combination of a template and variables, such as date, time, severity level, component, and other user-defined variables. We discovered that after removing everything that looks like a variable from a message, the remaining part itself can be considered as the pattern of this message.&lt;/p&gt;

&lt;p&gt;In the first step, the agent removes quoted parts, parts in brackets, HEX numbers, UUIDs, and numeric values. Then, the remaining words are used to calculate the message group fingerprint. We extracted the code related to log parsing to this separate &lt;a href="https://github.com/coroot/logparser" rel="noopener noreferrer"&gt;repository&lt;/a&gt; along with a command-line tool that can parse a log stream from &lt;code&gt;stdin&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is a summary for a log from the &lt;a href="https://github.com/logpai/loghub" rel="noopener noreferrer"&gt;logpai/loghub&lt;/a&gt; dataset (kudos to the Logpai team for sharing this dataset):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat Zookeeper_2k.log | docker run -i --rm ghcr.io/coroot/logparser
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcduhowe5bi66zya4anf5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcduhowe5bi66zya4anf5.png" alt="logs summary"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Metrics
&lt;/h2&gt;

&lt;p&gt;Node-agent parses the logs of every container running on a node and exports the &lt;a href="https://coroot.com/docs/metrics/node-agent#container_log_messages_total" rel="noopener noreferrer"&gt;container_log_messages_total&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;container_log_messages_total&lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;container_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;ID of the container writing this log&amp;gt;"&lt;/span&gt;,
    &lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;log path or the stream source&amp;gt;"&lt;/span&gt;,
    &lt;span class="nv"&gt;severity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;DEBUG|INFO|WARNING|ERROR|FATAL|UNKNOWN&amp;gt;"&lt;/span&gt;,
    &lt;span class="nv"&gt;pattern_hash&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;hash of the recognized pattern&amp;gt;"&lt;/span&gt;,
    &lt;span class="nv"&gt;sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;a sample message that is matched to this pattern&amp;gt;"&lt;/span&gt;, &lt;span class="c"&gt;# this can be replaced with Prometheus examplars in the future&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/coroot/coroot-aws-agent" rel="noopener noreferrer"&gt;AWS-agent&lt;/a&gt; exports the similar metrics &lt;a href="https://coroot.com/docs/metrics/aws-exporter#aws_rds_log_messages_total" rel="noopener noreferrer"&gt;aws_rds_log_messages_total&lt;/a&gt; related to every discovered RDS instance.&lt;/p&gt;

&lt;p&gt;After Prometheus has collected the metrics from the node-agents, you can query these metrics using PromQL expressions like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sum &lt;/span&gt;by&lt;span class="o"&gt;(&lt;/span&gt;level&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;rate&lt;span class="o"&gt;(&lt;/span&gt;container_log_messages_total&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;container_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/docker/prometheus"&lt;/span&gt;&lt;span class="o"&gt;}[&lt;/span&gt;1m]&lt;span class="o"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How we use these metrics
&lt;/h2&gt;

&lt;p&gt;Coroot's Logs Inspection (part of our commercial product) uses the log-based metrics to highlight errors that are correlating with the application SLIs. Here is a sample report:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv77gkppk4gpj5lgz9lj2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv77gkppk4gpj5lgz9lj2.png" alt="logs inspection"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For each pattern, Coroot provides a full-text sample and a timeline of events broken down by the app's instances.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vr8cpy3grnbua80pb0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vr8cpy3grnbua80pb0y.png" alt="full log sample"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/coroot/coroot-node-agent" rel="noopener noreferrer"&gt;node-agent&lt;/a&gt;  (Apache-2.0 License)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/coroot/logparser" rel="noopener noreferrer"&gt;logparser&lt;/a&gt;  (Apache-2.0 License)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coroot.com/docs/inspections/logs" rel="noopener noreferrer"&gt;Logs Inspection&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://coroot.com/auth/signup" rel="noopener noreferrer"&gt;Try Coroot for free&lt;/a&gt; (14-day trial)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>monitoring</category>
      <category>sre</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
