When Monitoring Becomes “Wrong”: The Limits of Watching Only Ping and Disk in Zabbix

#zabbix #monitoring #devops #observability

Monitoring systems like Zabbix are often introduced with a clear promise: visibility, control, and early warning. In theory, if something starts to fail, you should see it before users notice. In practice, however, monitoring can quietly shift from being a tool for understanding systems to something that merely confirms they are still technically “alive.” This is where it starts to go wrong.

A common pattern in many environments is to reduce monitoring down to a small set of indicators, often because of time constraints or lack of clarity about what actually matters. Ping checks and basic disk usage are typical examples. They are easy to set up, easy to understand, and they produce clean green or red states in a dashboard. A host is either reachable or it is not. A disk is either above a threshold or below it. On the surface, this looks like responsible system oversight.

The problem is that these signals say very little about whether a system is actually healthy.

A server can respond to ping perfectly while the application running on it is completely broken. Services can be degraded, queues can be backing up, authentication can be failing, and yet from a monitoring perspective everything appears fine. Ping only confirms network reachability, not usability. It is the equivalent of checking whether a building’s front door can open, without ever looking inside to see if anything is on fire.

Disk usage has a similar limitation. Knowing that a disk is 70 percent full does not tell you whether performance is degrading, whether logs are spiraling out of control, or whether a sudden spike is about to cause a critical outage. More importantly, it does not reflect the actual user experience or business impact. A system can have “healthy” disk levels and still be functionally unusable due to database locks, slow queries, or application-level failures.

The deeper issue is not the choice of metrics themselves, but the illusion of completeness they create. When monitoring is reduced to a handful of system-level checks, it becomes very easy to assume that “green” means “healthy.” This shifts attention away from what monitoring is supposed to achieve: understanding system behavior in a meaningful, contextual way.

Zabbix, like many monitoring tools, is not the problem. It is capable of deep observability if it is used that way. The issue lies in how it is often configured to reflect infrastructure states rather than service states. Infrastructure tells you what is happening at the machine level. Services tell you what is happening for users. The gap between those two is where incidents hide.

A more mature approach to monitoring focuses less on whether individual components are responding and more on whether the system as a whole is delivering its intended outcome. That means looking at request success rates instead of just server availability, latency instead of just CPU load, and error rates instead of just disk space. It means treating infrastructure metrics as supporting evidence rather than the main story.

When monitoring is limited to ping and disk, it becomes reactive and shallow. It can tell you that something has already failed, but rarely why it is about to fail or how it affects real usage. Over time, teams begin to trust dashboards that are technically correct but operationally misleading.

Good monitoring should introduce doubt, not false certainty. It should make it harder to assume everything is fine when it is not. And it should reflect the system as users experience it, not just as machines report it.

In that sense, monitoring does not become “wrong” because it is inaccurate. It becomes wrong when it is too narrow to tell the truth that actually matters.

DEV Community

When Monitoring Becomes “Wrong”: The Limits of Watching Only Ping and Disk in Zabbix

Top comments (0)