Monitoring and observability toolsโโโGrafana, Prometheus, traces, logsโโโtell you that something is wrong and where. They do not tell you what the host operating system was doing at that moment: which processes were consuming memory, what the kernel OOM killer decided, whether a filesystem was having an I/O contention problem, what the block device queue looked like, what firewall rules were in effect. That data lives on the node, is often ephemeral, and disappears or changes as the system recovers.
The purpose of integrating the widely available open-source ๐๐ผ๐ ๐ฟ๐ฒ๐ฝ๐ผ๐ฟ๐ Linux command into the pipeline is to ๐ฐ๐ฎ๐ฝ๐๐๐ฟ๐ฒ ๐๐ต๐ฎ๐ ๐ข๐ฆ-๐น๐ฒ๐๐ฒ๐น ๐๐ป๐ฎ๐ฝ๐๐ต๐ผ๐ ๐ฎ๐๐๐ผ๐บ๐ฎ๐๐ถ๐ฐ๐ฎ๐น๐น๐, ๐ฎ๐ ๐๐ต๐ฒ ๐บ๐ผ๐บ๐ฒ๐ป๐ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฎ๐น๐ฒ๐ฟ๐, ๐ฏ๐ฒ๐ณ๐ผ๐ฟ๐ฒ ๐๐ต๐ฒ ๐ฒ๐๐ถ๐ฑ๐ฒ๐ป๐ฐ๐ฒ ๐ฑ๐ฒ๐ด๐ฟ๐ฎ๐ฑ๐ฒ๐ without requiring a human to log into the node and collect it manually.
More specifically it achieves four things:
๐ฆ๐ฝ๐ฒ๐ฒ๐ฑ ๐ผ๐ณ ๐ฑ๐ถ๐ฎ๐ด๐ป๐ผ๐๐ถ๐. The data is already collected and analysed by the time the SRE opens the alert. They review findings instead of gathering evidence.
๐๐๐ถ๐ฑ๐ฒ๐ป๐ฐ๐ฒ ๐ฝ๐ฟ๐ฒ๐๐ฒ๐ฟ๐๐ฎ๐๐ถ๐ผ๐ป. Memory state, kernel ring buffer entries, and process tables are ephemeral. Automated collection catches them before the system recovers and overwrites them.
๐ฅ๐ฒ๐ฑ๐๐ฐ๐ฒ๐ฑ ๐๐ผ๐ถ๐น. Manual OS diagnostics during an incident are slow, error-prone, and inconsistent between engineers. Presets make the collection reproducible and automatic.
๐๐ผ๐บ๐ฝ๐น๐ฒ๐๐ฒ๐ป๐ฒ๐๐. Every incident of the same type produces the same shape of data, making cross-incident comparison and pattern recognitionโโโincluding by an AI analysis tool meaningful and reliable.
In short: monitoring tells you the what, tracing tells you the where, and sos report presets tell you the why automatically, consistently, and fast enough to be useful during the incident rather than after it.
The best part is that you do not need to install anything.
If you like to know how can this be done, this article contains detailed instructions on how can this be achieved for a concrete production environment involving Kubernetes, Grafana and Ansible
visit sos-vault for a complete reference on how to use sos report command effectivley
Top comments (0)