
TL;DR
Monitoring can be complex, especially given the number of available tools and the dynamic nature of environments.
In this article...
For further actions, you may consider blocking this person and/or reporting abuse
Definitely going to give odigos a spin. That and Prometheus. Both would be for my self hosted server stuff.
Happy to hear any feedback 😃
You also have have Zabbix, takes a little tweaking but worth the effort. We use Zabbix as it's flexible and the data insert API is so simple a preschooler could write a client side, we monitor about about 250m data points every 24hrs.
IMHO: in 2023 zabbix works only for infrastructure monitoring like hypervisor or windows/linux server/desktop, but for micro-services or local development better to use something like VictoriaMetrics/Prometheus/Netdata. I think that you know zabbix issues with SQL databases and how fast its database grow from GiB to TiB.
Great overview of monitoring tools! Having scaled our dev teams at Bubobot, I'd add a key consideration: monitoring frequency.
Most tools on this list check at 1-5 minute intervals, which can be problematic for critical services. For teams managing customer-facing applications, I recommend looking at monitoring interval as a critical factor. Our approach at Bubobot was to offer 20-second checks while preventing alert fatigue through "Confirmation Period" and "Recovery Period".
Feel free to try here bubobot.com
Just a note: you start the article talking about K8s but most of these tools are not specific to that technology. Otherwise a good roundup! I'd also add SignalFx/Splunk to the list, though they are expensive (see-also: Datadog)
I ended up with hyperdx it's also a good choice and is open source and can be self hosted. My previous choice was uptrace which is also not bad.
I recommend to try docs.victoriametrics.com/Single-se... for self hosted which is support ingestion metrics from DataDog or victoriametrics.cloud .
New Relic is pretty good too. It has synthetic monitoring to test and monitoring API.
downhound.com/ provides a good overview of external services that are down, in case you use any of those. Monitors several hundred services. (Disclosure: I'm the developer.)