Vasyl Kutsyk

Posted on Nov 8, 2021

Know your Azure environment better with Grafana

#grafana #azure #monitoring #devops

Originaly posted here

Monitoring of production environment is an essential part of any products not depending on the size. But monitoring big production environments efficiently is the essential difference between good quality of service and bad service.

Brief prehistory — our product is delivered to a few different clients. It means we have a multi-tenant environment and each of them requires proper monitoring. Every tenant has its own production subscription and independent infrastructure.

At the start all our monitoring was organized on native Azure Cloud resources, meaning all data go into Application Insight (Log analytics) and dashboards were built in the same environment. But as a result when we needed to investigate the root cause of the issue in our environment we were spending approximately 1 hour finding an exact component that caused the issue. Which was really painful for our clients. Incidents were like the challenge of finding a needle in a haystack.

Here is an example of a dashboard in Azure:

If you are using Azure I’m sure you know how bad their mobile compatibility is so I won’t even include a screenshot of that here.

So points that we were missing here are:

No multi-chart graphs
No dynamic variables — to choose resources that we want to show easily
One of the worst date pickers — that one can compete for worst design award
Responsive view
Represent data from another source than Azure metrics
Complicated access management
No change history

We started looking for a better approach in our observability and logically we tried the top open-source tool for this — Grafana.

Grafana — is the open-source analytics & monitoring solution for every database. https://grafana.com/grafana

I won’t cover how to spin up your grafana instance. But I’ll show you how we improved our observability and how that improved our issue detecting mechanism.

Some of the pros of using Grafana that we found are:

Different data sources
Responsive dashboards
Multi-charts graphs
Great choice of plugin
Awesome date picker that just works
Dynamic variables
Clear access management
Change history — versioning
Reach community dashboards library

All these features brought a new level of observability to our reliability team.

Here I’ll show you an example of how we are monitoring the load on our database depending on the number of requests and nodes from our API.

We just need to choose Subscription in the first variable and all other variables will be automatically substituted with a proper value with dynamic queries.

This is only a brief example of all possibilities that become open for observabilities in Grafana for Azure resources. Availability to connect different data sources helps build a dashboard on which you can monitor your background jobs hosted in azure and the data layer which is hosted in Elastic search.

To connect your Azure resources to Grafana you should use the Azure Monitor plugin — https://grafana.com/grafana/plugins/grafana-azure-monitor-datasource/;

With this plugin, you can create dynamic variables which will ease choosing the resource for which you want to get data.

A full list of query requests is available on the plugin page. After you created your variables, you should configure your charts to use template variables for data source:

This gives you possibilities to build awesome dashboards that will help you detect issues more efficiently.

Here are some of our dashboards:

Taking all these possibilities into 1 dashboard, we build a General dashboard that represents all important metrics of each system and shows if there are any issues. This decreased our time of detection from ~1h to ~10min. Which helped us prevent a lot of incidents in production and increased the speed of incident resolution.

Our postmortems became more scientific, as we were able to see much more metrics and precision thanks to awesome date picker became mathematical instead of approximate.

From the moment we introduced Grafana in our product, we are considering this as one of the best choices that we did to improve our monitoring.

And what did you do to improve your observability?

Thanks for reading!
If you have an interesting experience with Grafana or you are interested in another topic, please add comments and upvote 👍. I'm interested in the dialog.

DEV Community

Know your Azure environment better with Grafana

Top comments (0)

Read next

Deploying Traefik Proxy with Cloudflare Origin CA Certificate on k0s

Ironforge: A Comprehensive DevOps Platform for Web3 Applications

Introduction à Terraform avec Proxmox

🚀 When to Use VPS, Vercel, and Cloudflare Worker: A Detailed Comparison