<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Julien Breux</title>
    <description>The latest articles on DEV Community by Julien Breux (@julienbreux).</description>
    <link>https://dev.to/julienbreux</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F162865%2F25f4cf26-50dd-40ec-b53f-8f8ce6a958e4.JPG</url>
      <title>DEV Community: Julien Breux</title>
      <link>https://dev.to/julienbreux</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/julienbreux"/>
    <language>en</language>
    <item>
      <title>From Istio 1.3.x to 1.4.x after a memory leak 🚀</title>
      <dc:creator>Julien Breux</dc:creator>
      <pubDate>Wed, 08 Jul 2020 12:27:28 +0000</pubDate>
      <link>https://dev.to/julienbreux/from-istio-1-3-x-to-1-4-x-after-a-memory-leak-443l</link>
      <guid>https://dev.to/julienbreux/from-istio-1-3-x-to-1-4-x-after-a-memory-leak-443l</guid>
      <description>&lt;h1&gt;
  
  
  Context
&lt;/h1&gt;

&lt;p&gt;Recently I designed the new Ornikar platform using &lt;a href="https://kubernetes.io/"&gt;Kubernetes&lt;/a&gt; and &lt;a href="https://istio.io/"&gt;Istio&lt;/a&gt; using &lt;a href="https://cloud.google.com/"&gt;Google Cloud&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kubernetes.io/"&gt;Kubernetes&lt;/a&gt;, it's good, everyone knows and it's fantastic. OK.&lt;br&gt;
But &lt;a href="https://kubernetes.io/"&gt;Kubernetes&lt;/a&gt; without service-mesh and without the super powers of &lt;a href="https://istio.io/"&gt;Istio&lt;/a&gt;, it's a bit like Superman can only fly two hours a day, it's cool, but it's very limited.&lt;/p&gt;

&lt;p&gt;Joking aside, so I set up &lt;a href="https://istio.io/"&gt;Istio&lt;/a&gt; in version &lt;code&gt;1.3.x&lt;/code&gt;. The troubles started when I detected a memory leak on one of our gateways.&lt;/p&gt;

&lt;h2&gt;
  
  
  First memory leak detection
&lt;/h2&gt;

&lt;p&gt;The first detection was really made thanks to external observability.&lt;br&gt;
In fact, I use &lt;a href="https://www.pingdom.com/"&gt;Pingdom&lt;/a&gt; to measure the uptime, availability and response time of my external endpoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkh1czk6c1mqvvdlvw9cb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkh1czk6c1mqvvdlvw9cb.png" alt="Pingdom problem" width="684" height="932"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Naturally, I first try to see if there were no false positives aside &lt;a href="https://www.pingdom.com/"&gt;Pingdom&lt;/a&gt;.&lt;br&gt;
But that was not it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory leak detection confirmation
&lt;/h2&gt;

&lt;p&gt;Then I easily found the information with our amazing stack &lt;a href="https://prometheus.io/"&gt;Prometheus&lt;/a&gt;, &lt;a href="https://thanos.io/"&gt;Thanos&lt;/a&gt; and &lt;a href="https://grafana.com/grafana/"&gt;Grafana&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fa8j5qpauu8nfqks6yfkc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fa8j5qpauu8nfqks6yfkc.png" alt="Grafana problem" width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This graph is to be correlated with the uptime graph.&lt;br&gt;
We can clearly see the memory leak.&lt;br&gt;
At this moment, I must act!&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem solve
&lt;/h2&gt;

&lt;p&gt;To correct the problem, a simple upgrade to the higher version of &lt;a href="https://istio.io/"&gt;Istio&lt;/a&gt; was enough.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fi572dfn282vmon8qn593.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fi572dfn282vmon8qn593.png" alt="Github PR" width="800" height="624"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Very simple with us because the whole infrastructure is "as-code" and with use &lt;a href="https://github.com/"&gt;Github&lt;/a&gt; and &lt;a href="https://codefresh.io/"&gt;Codefresh&lt;/a&gt; to deploy.&lt;/p&gt;




&lt;p&gt;To finish, I just had to restart each deployment to update &lt;a href="https://www.envoyproxy.io/"&gt;Envoy&lt;/a&gt; in sidecar.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6ercq2r1c7uclucchzje.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6ercq2r1c7uclucchzje.png" alt="Grafana fix graph daily" width="800" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see on the graph above that the upgraded version of Istio did fix the memory leaks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4yhv5h49pqwtcxey6hr5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4yhv5h49pqwtcxey6hr5.png" alt="Grafana fix graph" width="800" height="198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We also see that this solution is stable over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some conclusions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  First conclusion
&lt;/h3&gt;

&lt;p&gt;When we talk to you about "observability", that should be taken seriously.&lt;br&gt;
Many people tend to forget that observability is above all to have eyes on what we do.&lt;br&gt;
In this case, if I had not set up &lt;a href="https://www.pingdom.com/"&gt;Pingdom&lt;/a&gt; and &lt;a href="https://grafana.com/grafana/"&gt;Grafana&lt;/a&gt; I would never have been able to detect the failure.&lt;br&gt;
And in a non-proactive way, it's the platform users who have reported the disturbances.&lt;/p&gt;

&lt;h3&gt;
  
  
  Second conclusion
&lt;/h3&gt;

&lt;p&gt;Each time a component is installed, whether infrastructure or software.&lt;br&gt;
Add the right metrics that let you know if the component is healthy or not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Last conclusion
&lt;/h3&gt;

&lt;p&gt;I like Istio, I like service mesh and I love Open Source. 🚀&lt;/p&gt;

&lt;h3&gt;
  
  
  Cover credit
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://julienberthier.org/love-love.html"&gt;julienberthier.org&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>istio</category>
      <category>grafana</category>
      <category>pingdom</category>
    </item>
  </channel>
</rss>
