Discussion on: How do you wrap your head around observability?

View post

To me a system has observability when I can ask questions of collected telemetry that I didn't know I was going to need to ask before hand. The telemetry may take many forms such as application logs, db query logs, traces, metrics, probes or even user analytics!

In an ideal world all of the telemetry you collect would have as much context as possible and common threads such that given an error log I can step back through traces, graph related metrics, partition by user type or geographic location etc. This is possible for logs, traces and analytics but at present most metrics stores will choke on high cardinality dimensions.

It's an important capability that a lot of the current tools are lacking... to be able to drill in to telemetry with questions like, how many users were affected, does it affect a specific type of user or all users, does it affect all services or just a couple, what were response times like around these specific events etc.

When I speak to folks about observability I tend to frame it as thinking about what information they'd throw in to a debug statement to troubleshoot their service. Often times (not always, but often) this is good information for observability. This information can then be wrapped in to whatever telemetry tools you have available.

Of course, collecting all of this telemetry is worthless if you don't have the tools to explore it and answer the questions you have... which brings me back around to my starting paragraph. Observability is the ability to ask questions of collected telemetry that I didn't know I was going to need to ask. Just collecting metrics, logs and traces is not enough for a system to have observability.