When you build something, and tweak it to satisfy all of the scenarios it can cover, how/when/how much do you test it for failure?
Let's say that I build a monitor to watch the ratio between two specific metrics, and I want it to alert me when that ratio drops below 0.8, rather than 1 (indicating that there is no issue), or 0.9 (indicating that we might have something righting itself, i.e. an autoscaling host being killed off as it's no longer needed).
I've built this monitor and tweaked the thresholds based on historical examples of:
- Times when we wanted to be alerted, based on what was going on, and what the ratio looked like at that time
- Times when we expect the ratio to not be 1, but we don't need to alert, as we have scheduled a change during that time period
I've researched this, tested in, even did some new example tests of #1 and #2. Based on everything I've tested thus far, the new monitor I've built satisfies everything and would have alerted on all times when we wanted it to, and would have ignored all of the times we wanted it to ignore the metric ratio. I present the results of my testing, my research, my reasonings, and the monitor, to my manager, who says:
Remember, I have:
- tested different metrics, and combinations/ratios, to find the optimal way to monitor these scenarios
- tweaked my thresholds to satisfy when I do and do NOT want the monitor to alert us
Testing for unknown unknowns is always difficult. In this case, I'm being asked to make a monitor that is completely perfect, and will not need to be tweaked in the future even if our infrastructure changes.
Can/should this be done? How/why/why not?
Throughout the last year, I have worked part-time as a working student and also studied at the university. I was not the first and not the last one who has combined that during their studies, but the problem for me was, that at the end of the day I have felt absolutely exhausted mentally and physically. That caused problems with my health and motivation to continue working on my goals or anything. (yeah, “goals,” I wish I had something more specific at that time).