Discussion on: How do you wrap your head around observability?

View post

Replies for: I hadn't thought before about there being a mental model or developing it. I guess our model is around health of a system and how to check on the h...

We also have a need to know if a service is appearing stable in one way but irregular in another.

Like we are setting up anomaly detection to tell us when an error rate exceeds an acceptable threshold.

Or when the number of items on a dead letter queue or queue exceeds a certain amount for a period. Because our system is too slow or not able to recover well when it hits errors or low volume.

Oh that's the other thing. We correlate metrics. So we be detectives and figure out - did the increase in volume correspond to an increase in error rate? Did one cause the other or just coincidence?
Why is it that one synthetic test location or one of the servers is consistently slower than the other? Or one server gets less volume of requests than the others, even though they are all weighted equally?