DEV Community: Sonia Belokur

What should your key considerations be while choosing IT monitoring solution?

Sonia Belokur — Thu, 07 Apr 2022 12:03:45 +0000

Choosing an IT monitoring solution? What should your key considerations be?

Define a problem you try to solve.

For example, you provide a software solution to your clients, but you're only aware of an issue after they’ve reported it by themselves;
You use specialized software that requires a lot of specialists to support;
You have a #monitoringsystem, but you're unhappy with the cost / quality / functionality.
After considering a problem it would be easier to move on.
Consider your preferences.

Think carefully about the monitoring solution you need.
Remember that free #monitoringtools come with limited features, so you will need to run different IT monitoring tools together to get complete functionality, which can complicate maintenance and configuration.
Decide your budget.
Different infrastructure solutions are charged on a different model – per server, per instance, etc.
Consider a Pay-as-you-go model to save your budget and avoid paying extras. You can start small by monitoring a single cluster and expand smoothly in time online. Note that even in “free systems” the training, implementation, and supporting fees end up costing between 50,000 and 100,000 $.
Implementation takes time.
Remember that choosing a monitoring solution doesn’t mean gaining proficiency tomorrow.
Adjusting #compatibility between an IT infrastructure and a monitoring solution is a complex task for your provider. In the case of a free monitoring system, pay attention to the risks like poor performance and a cumbersome user interface, as often these systems are strung together from a large number of open-source modules written by different people.
Pay attention to monitoring essentials

Out-of-the-box #automation, customizable visualization, or multi-vendor support are aspects that enhance your monitoring procedures hence allowing you to scale IT infrastructure. It's much easier to have all of the above in one monitoring solution to save time and resources for maintenance.

Need #ITmonitoring solution or just a recommendation regarding your monitoring? Contact our InsightCat team via https://insightcat.com/
We appreciate new contacts and your monitoring cases :)

If you need an observability solution - check it out!

Sonia Belokur — Fri, 11 Feb 2022 13:43:44 +0000

https://youtu.be/tqcv_4bpRBE

InsightCat eBook about full-stack monitoring is coming soon!

Sonia Belokur — Thu, 03 Feb 2022 11:59:15 +0000

What is full-stack monitoring and why do you need it within your organization? What are the main processes and how does it work?

Get ready to learn more about full-stack #monitoring and keep up to date on the latest news monitoring tendencies. InsightCat #eBook is coming soon. Stay tuned to not miss it out!

We do our best to fill the eBook with the most relevan information. Share your thoughts about topics that would be interesting for you!

IT infrastructure monitoring: will your IT infrastructure survive the holiday season?

Sonia Belokur — Tue, 14 Dec 2021 13:50:41 +0000

At the end of the year, companies worldwide start to prepare the corporate IT infrastructures for intensive traffic and unstable loads. It's the time when corporations, especially e-commerce, wholesale and retail industries, make most of their annual revenue in just a few weeks. But, the risks of being hit with a software outage increase equally, no matter what industry your business belongs to.

Don’t see risks of software downtime? Trust your provider? Well, as we saw in December 2021, even AWS was hit with an outage that took down some websites and services such as Disney+, Slack, Alexa, etc. In a matter of minutes, Amazon’s warehouse and delivery operations reported on Reddit that the issue became nationwide. Reddit news: https://www.reddit.com/.../so_apparently_amazon_is_down.../

We have prepared useful tips and recommendations that can help your IT infrastructure handle traffic increases, avoid software downtimes, and prevent you from losing revenue during the holiday season.

Top 10 holiday-proof tips to keep IT infrastructure up while holidays:

Monitor business-critical metrics with a proper threshold. Choose the proper framework for IT infrastructure monitoring and alerting.
Get actionable insights. Don’t work with raw data but get the most out of it to know what steps should be taken, especially under the pressure of intensive traffic.
Streamline alerting. Ensure your alerts matter to not waste time on fake occasions, e.g. when someone accidentally hits the load balancer with faulty requests.
Ensure that your service is ready to handle more loads through auto-scaling. Check that the autoscale policy is configured and working.
Ensure that your storage & disks have enough free space. Don’t rely on luck but always have an extra amount of resources available.
Perform synthetic monitoring for your websites and APIs. It’s better to benchmark web services through emulation than real and unpredictable traffic loads.
Decide a “plan B” workflow in the case of third-party services getting down. Be circumspect since even the most widely used platforms may fail.
Identify on-call workflow and engineers for weekends and holidays. It’s not a manager’s preference, it’s a necessity.
Keep software up-to-date. Apply updates, patches, and hotfixes in advance to avoid a bug causing downtime.
Enhance IT security. The holiday season (in fact, all seasons) isn't the right time to experience data breaches, leakage, or get contaminated pieces of data from the Internet.

Take care of your IT infrastructures!

Provided by InsightCat https://insightcat.com/

Full-Stack Monitoring & Incident Response Tendencies 2021

Sonia Belokur — Tue, 30 Nov 2021 15:48:33 +0000

While figures are persuasive and experiences might be useful (and sometimes painful) - trends might reveal some unexpected facts. We have put them together in a human-readable format to explain the main tendencies in the #monitoring and #incidentresponse niches.

What are the most frequently used monitoring approaches? What do DevOps struggle with? Why is full-stack monitoring worth investing in?

InsightCat has analyzed the survey data from technical experts, the surveys on third-party sources and Social Media, and, of course, our own experience from day-to-day interaction with businesses worldwide, from SMBs to Enterprises.

There were 100+ respondents in this study, incl. tech experts and managers. The third-party resources are listed below:

Learn more about full-stack monitoring and incident response tendencies in 2021 👇

https://www.facebook.com/insightcat

Root cause analysis for enterprise & SMB infrastructure. Feedback is appreciated!

Sonia Belokur — Tue, 16 Nov 2021 14:25:01 +0000

For the past few months, my team worked hard on developing our new incident root cause solution - Incident Timeline.

InsightCat launched the Incident Timeline, the root cause analysis solution developed for IT experts who manage, view, and investigate software incidents. The solution is implemented in the InsightCat platform to provide IT specialists with automated root cause analysis, downtime details, and behavior.

Incident Timeline allows you to:

📷 Obtain root cause analyze
📷 Surface relevant insights
📷 Enhance observability

Check out InsightCat's new update below and see how it works in practice or you can register to InsightCat and try Incident Timeline for free.

Any feedback is appreciated, don't hesitate to share it in the comment section :)

InsightCat registration: https://portal.insightcat.com/register
Website: https://insightcat.com/

What do you do to prevent software downtime? Assessing? Tools? Metrics?

Sonia Belokur — Wed, 10 Nov 2021 10:01:28 +0000

I hope I'm not the only person who is interested in this question because his IT infrastructure is prone to fail (as any software in the world :)).

For the past few months, I have been in search of information about software downtime prevention, tips and tricks, best practices, recommendations. I came across items like preventative maintenance, personnel training, etc.

I've also heard that vendors have software that predicts downtimes. Consequently, it allows IT experts to receive tech metrics that show the possibility of downtime, track anomalies, reduce unplanned failures, etc. For example, you might be aware of the solutions that diagnose IT infrastructures like InsightCat https://insightcat.com/, Datadog https://www.datadoghq.com/, Dynatrace https://www.dynatrace.com/, etc.

How do you assess your system health and predict downtimes? Do you use a downtime prevention tool for this? What critical metrics indicate that something is wrong with the system?

Thank you in advance.

Full-stack monitoring in one solution or is it better to split out the functions between several applications?

Sonia Belokur — Tue, 02 Nov 2021 12:25:09 +0000

Firstly, this question had my attention, now it has my curiosity :) Since I dig into the full-stack monitoring niche, I would appreciate your opinion about the question above.

On the one hand, it's always convenient to have a single monitoring solution for several purposes, e.g. IT infrastructure monitoring, log management, synthetic monitoring, etc. It doesn't matter if your company is a startup or enterprise, the benefits of an all-in-one solution are obvious:

One place for monitoring, logging, user experience, etc., consequently, one console instead of multiple.
Observability. Explore why an application or system behaves in a certain manner.
Real-time metrics. Correlate application metrics, transaction metrics, and infrastructure metrics to see what’s going on in your apps.

Logically, such an option helps to save company's budget and manager's & DevOps mental health :)

But, on the other hand, the vast majority of people I know are sure that the existing B2B solutions are not always capable to face all the processes and produce expected results. I don't know why. Of course, we talk about solutions such as InsightCat, Elastic.co, Splunk, etc. The main argument is that "there is no decent “full stack” option".

Perhaps, it depends on the company budget and tech people's preferences. But, from my personal perspective, a single full-stack solution is a good option to think about, especially, if you have multiple clouds, SaaS, and a few hundred apps.

My company uses InsigthCat https://insightcat.com/ and we like it. So, I'm curious to know which side is preferable in your company and why.

Do you stick to all-in-one or splitting?

Thank you.

What is the full-stack monitoring solution you use?

Sonia Belokur — Mon, 25 Oct 2021 08:26:17 +0000

Hello guys!

Could you, please, share which full-stack monitoring solution you prefer to use within your organization?

I've heard that most companies are using InsightCat (https://insightcat.com/), Datadog (https://www.datadoghq.com/), Zabbix (https://www.zabbix.com/).
Personally, I've heard a lot about Zabbix (at least from my DevOps and Sys Admin colleagues :)) If you use one of them, please, tell me why and what do you really recommend to set up within a big organization (enterprise, SMB)?

To make the question more detailed, let's concentrate on full-stack, infrastructure monitoring, and log management.

Thank you!

Full-stack infrastructure monitoring solution and my experience with InsightCat. From DevOps newcomer.

Sonia Belokur — Tue, 05 Oct 2021 08:54:05 +0000

Hello all!

A little bit about me. I'm a newcomer to the DevOps world and right now I try to understand what should I, as a junior DevOps, know and use to monitor IT infrastructure successfully.

My first major tasks sounded frightened to me at first. But, later, I understood that I just need to find the right solution that is accompanied by automation, so I can confidently rely on it. So, my goal was to find a SaaS to monitor a big IT ecosystem, prevent downtimes, analyze log data, etc.

I went through different blogs, forums, review websites, and the systems I came across everywhere were Datadog, New Relic, Zabbix, ELK, etc. Thanks to my team, we decided to test all of them and then decide which one is our favorite.

We started with Datadog. I can admit as a person, who doesn't have a strong tech background, you can't really understand what to do on the initial steps. Really steep learning curve that doesn't allow you to stay calm when you don't understand how to set the product up. Almost the same picture relates to other products.

Ok, I know that perhaps my scenario isn't for senior DevOps or SecOps specialists. When you've spent years in IT, it's obvious for you, how to write tones of code and configure tools like Datadog. But my case is more about guys who try to figure everything out quickly and hope that this tool will cover all business-critical needs. Especially, when you want your boss to be proud of you :)

So, the choice was https://insightcat.com/. I have never met them before and was curious to try this product.

The first good sign. I set it up quickly without the need to write a lot of code. Wow! I installed the agent by using Telegraf. Then I connected my infra to InsightCat server. And, finally, I started to receive data. Because of auto-discovery, InsightCat explored my system automatically.

My personal favorite parts are Logs and Insights. I worried that I won't find a solution that can display log data in so user-friendly framework.

And, my need to have downtime prevention was filled by Insights. This tool allows you to see the most important metrics and see whether the figures were increased and get root cause analysis.

I'm not saying that other solutions are bad. Once again, I'm not a deep tech person. Yes, my skills are required to be deeper. But even in my situation, there was a relevant solution.

Good job, guys from InsightCat!

Share your experience if you also used InsightCat or any other products mentioned above. Was a Datadog also difficult for you? Maybe New Relic?

Thank you!