DEV Community

Cover image for From Istio 1.3.x to 1.4.x after a memory leak πŸš€
Julien Breux
Julien Breux

Posted on

2 2

From Istio 1.3.x to 1.4.x after a memory leak πŸš€

Context

Recently I designed the new Ornikar platform using Kubernetes and Istio using Google Cloud.

Kubernetes, it's good, everyone knows and it's fantastic. OK.
But Kubernetes without service-mesh and without the super powers of Istio, it's a bit like Superman can only fly two hours a day, it's cool, but it's very limited.

Joking aside, so I set up Istio in version 1.3.x. The troubles started when I detected a memory leak on one of our gateways.

First memory leak detection

The first detection was really made thanks to external observability.
In fact, I use Pingdom to measure the uptime, availability and response time of my external endpoints.

Pingdom problem

Naturally, I first try to see if there were no false positives aside Pingdom.
But that was not it.

Memory leak detection confirmation

Then I easily found the information with our amazing stack Prometheus, Thanos and Grafana.

Grafana problem

This graph is to be correlated with the uptime graph.
We can clearly see the memory leak.
At this moment, I must act!

Problem solve

To correct the problem, a simple upgrade to the higher version of Istio was enough.

Github PR

Very simple with us because the whole infrastructure is "as-code" and with use Github and Codefresh to deploy.


To finish, I just had to restart each deployment to update Envoy in sidecar.

Grafana fix graph daily

We can see on the graph above that the upgraded version of Istio did fix the memory leaks.

Grafana fix graph

We also see that this solution is stable over time.

Some conclusions

First conclusion

When we talk to you about "observability", that should be taken seriously.
Many people tend to forget that observability is above all to have eyes on what we do.
In this case, if I had not set up Pingdom and Grafana I would never have been able to detect the failure.
And in a non-proactive way, it's the platform users who have reported the disturbances.

Second conclusion

Each time a component is installed, whether infrastructure or software.
Add the right metrics that let you know if the component is healthy or not.

Last conclusion

I like Istio, I like service mesh and I love Open Source. πŸš€

Cover credit

julienberthier.org

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (3)

Collapse
 
renetillamenva07 profile image
Rene Tillamenva β€’
Comment hidden by post author
Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
renetillamenva07 profile image
Rene Tillamenva β€’

How are you doing

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more

Sentry image

See why 4M developers consider Sentry, β€œnot bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

πŸ‘‹ Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay