DEV Community

Discussion on: For the Love of Bleep! Building a Scalable Monitoring System

Collapse
 
rafaeljesus profile image
Rafael Jesus • Edited

Good to see yet another SRE team taking ownership of monitoring! At HelloFresh we were into the same scenario, actually we had tons of infrastructure and product services w/out any sort of monitoring.

With our move to k8s, anything that runs on top if can leverage system metrics (CPU, Memory, Network etc...). Services whose expose HTTP endpoints at the k8s edge (ingress) can have RED metrics (Req, Err Duration) automatically. Since edge metrics are common we were able to automate away dashboards by creating one general allowing ppl to filter by service name. Automating alerts were also possible.

We are truly believers that w/out monitoring software ownership is not possible. Now on, incidents are much faster to be detected (MTTD) and recovered (MTTR).

We tune alerts religiously, TBH I don't even know how we could be flying w/out the monitoring we have nowadays

Collapse
 
molly profile image
Molly Struve (she/her)

Right?! Once you have a good monitoring system in place its hard to envision life without it!