<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: captain.io</title>
    <description>The latest articles on DEV Community by captain.io (@mattiacapitanio).</description>
    <link>https://dev.to/mattiacapitanio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F384144%2F41965c27-d4d6-4cb4-a511-2233d587fc84.png</url>
      <title>DEV Community: captain.io</title>
      <link>https://dev.to/mattiacapitanio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mattiacapitanio"/>
    <language>en</language>
    <item>
      <title>Distributed tracing: how to propagate the context with Redis</title>
      <dc:creator>captain.io</dc:creator>
      <pubDate>Sun, 17 May 2020 13:34:22 +0000</pubDate>
      <link>https://dev.to/mattiacapitanio/distributed-tracing-with-redis-451i</link>
      <guid>https://dev.to/mattiacapitanio/distributed-tracing-with-redis-451i</guid>
      <description>&lt;p&gt;Photo by &lt;a href="https://www.flickr.com/photos/baccharus/"&gt;Milestoned&lt;/a&gt; on &lt;a href="https://www.flickr.com/photos/baccharus/5817342671/"&gt;flicker.com&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;During an application-monitoring workshop, I was introduced to the distributed tracing. I was immediately interested in that and I understood the potentiality that this method can offer to monitor a distributed system in production.&lt;/p&gt;

&lt;p&gt;So, I started to learn about &lt;a href="https://www.jaegertracing.io/"&gt;Jaeger&lt;/a&gt;¹, &lt;a href="https://opentracing.io/"&gt;OpenTracing&lt;/a&gt;²,  traces, spans, tags, …&lt;br&gt;
Then I instrumented my first APIs using HTTP headers for the transport layer and everything was fine. I was able to see the tracing in action! &lt;/p&gt;

&lt;p&gt;But then I wondered: how can I instrument services that don’t expose APIs and that don’t speak each other through HTTP? I’m speaking about services that are part of the same pipeline, that work on the same data in an event-driven architecture or that use direct requests to communicate, but in which I can’t use an HTTP header to propagate down the context information.&lt;/p&gt;
&lt;h2&gt;
  
  
  Distributed tracing: a very brief description
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Distributed tracing, is a method used to profile and monitor applications, especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.³&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That means you can easily understand and monitor your services in production in a visual way. Tracing adds observability. &lt;/p&gt;

&lt;p&gt;Thanks to that, the troubleshooting teams can analyze issues and debug a system in all its parts. It simplifies and reduces the time for the root cause discovery. In addition, it can be useful to the developers to better understand how to develop a new feature or introduce an improvement in the system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Kx7rcDpy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2148/1%2AQRpDqUaLtHCVhGhW1QM_JA.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Kx7rcDpy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2148/1%2AQRpDqUaLtHCVhGhW1QM_JA.jpeg" alt="alt text" title="Example of trace and spans"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since the scope of this article is not an introduction about distributed tracing I don’t deepen into this. To understand better what tracing is, I posted below some useful articles.&lt;/p&gt;
&lt;h2&gt;
  
  
  Redis as a message queue
&lt;/h2&gt;

&lt;p&gt;Referring to my previous evaluation, in case you can’t propagate the information context through HTTP Headers, you can use a &lt;em&gt;message queue&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So, I created the example application &lt;a href="https://github.com/mattiacapitanio/distributed-tracing-with-redis"&gt;&lt;em&gt;Distributed Tracing with Redis&lt;/em&gt;&lt;/a&gt;⁴ in which I simulated a pipeline composed of 3 different apps: one &lt;em&gt;main service&lt;/em&gt; and two different &lt;em&gt;workers&lt;/em&gt; that work on the same data. &lt;/p&gt;

&lt;p&gt;In this example, the &lt;em&gt;main service&lt;/em&gt; starts the two &lt;em&gt;workers&lt;/em&gt; by executing a shell command. The &lt;em&gt;main service&lt;/em&gt; passes a parameter — that contains the &lt;em&gt;job id&lt;/em&gt; — to the apps.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h3&gt;
  
  
  The application’s transaction cycle
&lt;/h3&gt;

&lt;p&gt;In the application, the &lt;em&gt;main service&lt;/em&gt; starts a new tracing span and then propagates the span context saving it in Redis, our message queue. The context is saved using the &lt;em&gt;job id&lt;/em&gt; as the key. Then it starts the execution of the two &lt;em&gt;workers&lt;/em&gt;, just providing the &lt;em&gt;job id&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;When the two &lt;em&gt;workers&lt;/em&gt; start, it is their responsibility to retrieve the context from Redis, create a new span, and establish a relationship with the main span.&lt;/p&gt;

&lt;p&gt;The two &lt;em&gt;workers&lt;/em&gt; are executed in a sequential way, so only when the &lt;em&gt;first worker&lt;/em&gt; is completed, the second will start. All the apps simulate the execution of internal tasks, sending some tracing spans, tags, logs to the Jeager Agent.&lt;/p&gt;

&lt;p&gt;The following image can be useful to describe better what these apps do.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--W-niZkjN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A8-f-T2GJyBFSAfm3KWO0jw.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--W-niZkjN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A8-f-T2GJyBFSAfm3KWO0jw.jpeg" alt="alt text" title="The application's transaction cycle"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The architecture
&lt;/h3&gt;

&lt;p&gt;The application’s architecture is mainly composed of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Apps module&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Redis, to propagate the span context&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the Jaeger stack: Agent, Collector, Query/UI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elasticsearch, to save the traces&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The image below shows the tracing architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DA8jRaCf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2008/1%2Ah49fpKFCDuBZM9VziE1dPg.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DA8jRaCf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2008/1%2Ah49fpKFCDuBZM9VziE1dPg.jpeg" alt="alt text" title="Tracing architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s jump into the code!
&lt;/h2&gt;

&lt;p&gt;I instrumented the components using OpenTracing. So, I added in each of them the code to send the spans to the Jaeger Agent. In order to do that, I used the  &lt;a href="https://www.jaegertracing.io/docs/1.17/client-libraries/"&gt;Jaeger clients&lt;/a&gt; and the &lt;a href="https://opentracing.io/guides/"&gt;OpenTracing libraries&lt;/a&gt; for NodeJS and Python.&lt;/p&gt;

&lt;p&gt;In the following sections, I briefly described how I instrumented the code to propagate the context. &lt;/p&gt;

&lt;h3&gt;
  
  
  The main service
&lt;/h3&gt;

&lt;p&gt;In the following snippets of code, the &lt;em&gt;main service&lt;/em&gt; creates the new &lt;em&gt;main span&lt;/em&gt;, saves the context information, and then starts up the two &lt;em&gt;workers&lt;/em&gt; providing the &lt;em&gt;job id&lt;/em&gt;. &lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;As shown in the code, the context span is saved in Redis using a key-value pair, providing the &lt;em&gt;job id&lt;/em&gt; as the key. The value is instead filled with the context span in the &lt;a href="https://doc.esdoc.org/github.com/opentracing/opentracing-javascript/variable/index.html#static-variable-FORMAT_TEXT_MAP"&gt;TEXT_MAP format&lt;/a&gt; provided by OpenTracing.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h3&gt;
  
  
  The worker apps
&lt;/h3&gt;

&lt;p&gt;Here below you can find the snippet of code developed for the two apps to read the span context from Redis. &lt;/p&gt;

&lt;p&gt;When the &lt;em&gt;worker&lt;/em&gt; apps start, the &lt;em&gt;job id&lt;/em&gt; is provided as an input parameter. The &lt;em&gt;worker&lt;/em&gt; creates a &lt;em&gt;continuation span&lt;/em&gt; that will be attached to the main span before being extracted from Redis using the &lt;em&gt;job id&lt;/em&gt; key.&lt;/p&gt;

&lt;p&gt;Then some tasks are executed internally by the &lt;em&gt;worker&lt;/em&gt;. &lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;During the execution of the internal tasks, new &lt;em&gt;child spans&lt;/em&gt; are created. All of them are created establishing a &lt;em&gt;follows from&lt;/em&gt; reference to the propagated parent span context. All the &lt;em&gt;child spans&lt;/em&gt; will be displayed in the same trace, as children of the main process span.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;A very similar Python version of the code is displayed below for the &lt;em&gt;second worker&lt;/em&gt;.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h2&gt;
  
  
  Visualizing traces
&lt;/h2&gt;

&lt;p&gt;If you deploy and start all the components and you open a browser on &lt;a href="http://localhost:16686/"&gt;http://localhost:16686/&lt;/a&gt;, you will be able to see all the traces sent by the three apps through the Jaeger UI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TQYs2_ND--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3040/1%2ApYgtT_-i3AFFr21bmQNoZA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TQYs2_ND--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3040/1%2ApYgtT_-i3AFFr21bmQNoZA.png" alt="alt text" title="Jaeger UI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you select a trace, you can see the details and the time spent from each component, also when they execute their internal tasks. &lt;/p&gt;

&lt;p&gt;In the example below, we can see: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;the main span created by the &lt;em&gt;main service&lt;/em&gt; (in blue)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the spans created around the tasks executed by the &lt;em&gt;first worker&lt;/em&gt; (in yellow)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the spans created and attached by the &lt;em&gt;second worker&lt;/em&gt; (in brown)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All the spans are part of the same trace.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6pNide93--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3072/1%2AvR6KGFdStQt5zWcTK8ZW7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6pNide93--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3072/1%2AvR6KGFdStQt5zWcTK8ZW7g.png" alt="alt text" title="Application's trace and spans"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the article, I presented a possible solution to instrument apps for distributed tracing when you have a system in which the services are triggered using direct requests, but not through Rest APIs. So, HTTP headers are not available to propagate the context and you don’t want to introduce them in your system.&lt;/p&gt;

&lt;p&gt;The aim can be reached by instrumenting the apps using Jeager, OpenTracing, and adding a message queue in the architecture for the transportation layer.&lt;/p&gt;

&lt;p&gt;The presented solution can be also modified and used in an event-driven architecture, in which the services are using a message queue to share the process information. You can simply add the context spans information to the messages in the queue.&lt;/p&gt;




&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;p&gt;Some very useful Medium articles that helped me:&lt;/p&gt;


&lt;div class="ltag__link"&gt;
  &lt;a href="https://medium.com/swlh/microservices-observability-with-distributed-tracing-32ae467bb72a" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hpJaKeaE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/fit/c/96/96/1%2AqtGKCVmp772Rr8G9rfTcbg.jpeg" alt="Uzziah Eyee"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://medium.com/swlh/microservices-observability-with-distributed-tracing-32ae467bb72a" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Microservices Observability with Distributed Tracing.&lt;/h2&gt;
      &lt;h3&gt;Uzziah Eyee ・ &lt;time&gt;Jul 23, 2019&lt;/time&gt; ・ 8 min read
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_EkM13RG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/medium_icon-2c57b3ec653e92a3d6207e708f1e4987099fc69342e556aaf9f035b1968b3f26.svg" alt="Medium Logo"&gt;
        Medium
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;



&lt;div class="ltag__link"&gt;
  &lt;a href="https://medium.com/@codeboten/redis-tributed-distributed-tracing-through-redis-9b671187da47" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0AbJV8JI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/fit/c/96/96/1%2AR-kx0Dvn4tAZ4fs-fBej7g.png" alt="Alex Boten"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://medium.com/@codeboten/redis-tributed-distributed-tracing-through-redis-9b671187da47" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Redis-tributed: distributed tracing through Redis - Alex Boten - Medium&lt;/h2&gt;
      &lt;h3&gt;Alex Boten ・ &lt;time&gt;Mar 31, 2019&lt;/time&gt; ・ 5 min read
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_EkM13RG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/medium_icon-2c57b3ec653e92a3d6207e708f1e4987099fc69342e556aaf9f035b1968b3f26.svg" alt="Medium Logo"&gt;
        Medium
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;



&lt;div class="ltag__link"&gt;
  &lt;a href="https://medium.com/jaegertracing/weave-trimmed-troubleshooting-fat-cut-api-response-time-from-seconds-to-milliseconds-with-jaeger-ab594c22c921" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7TmcZiUV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/fit/c/96/96/0%2ADMbsu4pYegfyqfxU." alt="Orate"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://medium.com/jaegertracing/weave-trimmed-troubleshooting-fat-cut-api-response-time-from-seconds-to-milliseconds-with-jaeger-ab594c22c921" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Weave trimmed troubleshooting fat, cut API response time from seconds to milliseconds with Jaeger&lt;/h2&gt;
      &lt;h3&gt;Orate ・ &lt;time&gt;Feb 28, 2020&lt;/time&gt; ・ 8 min read
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_EkM13RG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/medium_icon-2c57b3ec653e92a3d6207e708f1e4987099fc69342e556aaf9f035b1968b3f26.svg" alt="Medium Logo"&gt;
        Medium
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;





&lt;h3&gt;
  
  
  Footnotes
&lt;/h3&gt;

&lt;p&gt;[1] &lt;em&gt;Jaeger&lt;/em&gt;: an open-source software for tracing transactions between distributed services.&lt;/p&gt;

&lt;p&gt;[2] &lt;em&gt;OpenTracing&lt;/em&gt;: a specification and standard way that helps developers to easily instrument their code base for distributed tracing.&lt;/p&gt;

&lt;p&gt;[3] &lt;em&gt;What is tracing&lt;/em&gt;: &lt;a href="https://opentracing.io/docs/overview/what-is-tracing/"&gt;https://opentracing.io/docs/overview/what-is-tracing/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;em&gt;GitHub project&lt;/em&gt;: &lt;a href="https://github.com/mattiacapitanio/distributed-tracing-with-redis"&gt;https://github.com/mattiacapitanio/distributed-tracing-with-redis&lt;/a&gt;&lt;/p&gt;

</description>
      <category>tracing</category>
      <category>jaeger</category>
      <category>observability</category>
      <category>redis</category>
    </item>
  </channel>
</rss>
