The Missing Piece

#programming #productivity #observability #metrics

Somewhere in 2017, I made the decision to move into DevOps and DevTools. As a notorious bookworm, I made my way through quite a few books while learning about this new realm. Particularly, I remember reading through the Google SRE handbook, or as some of my friends like to call it, “The Dev Bible.” There is an endless amount of takeaways you can take from it, but my personal favorite is how Google built the first planet-scale web service using nothing more than metrics for their core Observability.

For the past few years, I have felt that metrics are something we are missing out on at Rookout. We’ve focused heavily on perfecting the core debugging experience and integrating with some of the cooler kids on the block (distributed tracing, I’m looking at you), and really never got around to taking a stab at it.

And then, our engineering team stepped up and decided to make it happen.

Do you already have great metrics?

Well, metrics are roughly divided into two categories. Metrics for off-the-shelf code (databases, queues, libraries, frameworks, etc.) and metrics for your code.

Metrics for off-the-shelf code are easy. Everybody knows that code - and most APM services - provide excellent instrumentation out of the box for it. Period.

Metrics for your code are hard. Unfortunately, these are the important metrics. These are the metrics Google talked about in that book. Another well-known example of this type of metric is the Facebook Like button, which became the company’s Northstar and the cornerstone of its operational monitoring.

Code-level and business metrics provide you insights into vital questions such as:

Are you serving your customers well?
Are you generating value for the relevant stakeholders?
Is this function being used?
Is this code path hot or cold?

Only engineers on your team can effectively add those code-level metrics. After all, you kind of have to understand the code to do that.

What’s so hard about that?

Well, I’m glad you asked. Every time you want to add a new code-level metric, you need to write a line of code that reports that metric. And then, you need to build and deploy a new version of your application with that line to the appropriate environment. That’s where things often go wrong.

Defining the metrics you need to monitor your code effectively is a process of experimentation, trial, and error. Over time the metrics will improve, but you’ll undoubtedly run into various blindspots. Maybe you are troubleshooting an incident and need to split a metric into two. Maybe you are designing a new feature and need to answer a one-time question. Either way, traditional, static, code-level metrics won’t be able to get you the data in an efficient and timely manner.

This leads us to metrics FOMO (you might have read my previous blog post on Logging FOMO), which is adding endless metrics in the hopes that you’ll cover everything and have no missing data ever. Well, you won’t. With most Observability providers charging by the volume of data, you won’t hear them complaining about it. But your CFO might.

I can go on and on about how hard it is to correlate metrics with the code that’s producing them and how metrics that were added to answer a single question once are left in the code for years. But it might be time to shift to that new feature I promised.

How’s Rookout different?

In the new Live Metrics mode, you can instantly add a new metric to any line of code by simply adding a Non-Breaking Breakpoint on that line. Think of it as zooming out of our “regular” Non-Breaking Breakpoint, where each hit would have taken a full snapshot of the application state.

Live Metrics offers rate and counter metrics without any quotas or usage-based fees, with additional metric types and export functionality in the works.

Summary

Live Metrics is in early access, so reach out, and we’ll hook you up :)
https://www.rookout.com/

DEV Community

The Missing Piece

Do you already have great metrics?

What’s so hard about that?

How’s Rookout different?

Summary

Top comments (0)

Read next

Episode 24/16: New Template Compiler, Zoneless in Angular 18

2370. Longest Ideal Subsequence

Supabase Storage: now supports the S3 protocol

Code Smell 249 - Constants as Numbers