Cloud-Native Observability: Breaking the Link Between Cost and Depth in Monitoring

#ebpf #kubernetes #cloudnative #observability

Traditional observability architectures and tools, including those of distributed cloud-native systems, made the balance point between performance, affordability, and data depth, one of the first compromises to be made by engineering teams. And inevitably, left dev teams feeling like they’re getting the short end of the stick. But the application monitoring game has come a long way, opening up strong alternatives to these outdated solutions.
By rethinking established monitoring fundamentals, it’s possible to build a monitoring workflow that provides deep insights both quickly and affordably, giving everybody what they need.

The good old days

We’ve all heard the adage that you can only have two of these three options – good, fast, and cheap - but back in the day, that decision point just didn’t exist. When services ran on systems the size of family cars in single locations, there was no complexity to work with and no vast quantities of data to collect, ship, and store. Just grab an integration plugin and get siphoning those logs and metrics, straight out of the app. No problem to design, dead easy to do, super-quick, and the cost was almost non-existent.
Sounds like a good thing, right? Well, there had to be something good about the old monolithic way of doing things. All those massive single points of failure couldn’t have been all bad.

Monoliths to microservices

Most technology teams worldwide have spent a good proportion of the last ten years migrating their legacy systems and services to the cloud. Once you start taking advantage of the cloud model to improve availability and performance while reducing TCO and enabling endless scale, it very quickly becomes apparent that there is a downside here too.
What was a single integration with one giant system has just become dozens of individual microservice integrations in several different locations, each with its own set of requirements. Pulling logs direct from a single system has now become data collection from several endpoints, likely all very different.
Just when you were wondering whether this cloud computing revolution was all it is cracked up to be, along came egress charges – moving data, particularly between cloud providers, is not cheap, then you’ve got the storage fees to deal with. Sure, there are cheaper storage tiers, but when you’re dealing with data at scale even a lower cost will soon mount up, and fast.
Wouldn’t it be great if you could speed all this up at the same time as reducing costs? It seems sensible to only collect and transport the data you need rather than pulling in every log you can find. But then to do that you’d need to know what you want to analyze ahead of time, as well as risking incomplete datasets and repetitive jobs that make the existing situation worse. And just like that we’re back to the speed, cost, and quality conundrum we started with. But fear not!

Have the cake and eat it

By rethinking what you need from your monitoring solution, and focusing on the outcomes rather than the inputs, it is completely achievable to break out of the cost/depth loop. It’s not complicated – instead of collecting every single dataset you can get hold of for analysis, or taking random cuts of data as it streams by, apply intelligence.
Identify the right datasets at source and send only the data you need to your observability platform, translating that data into actionable metrics that trigger automated analysis processes on arrival. An instrumentation approach to monitoring means wrapping input data in code to manage the monitoring tool’s logic. If the logic is too complex, the overheads can kill the process – the more analysis happens to data before it’s sent out, the greater the impact. And this is why traditional monitoring tools keep it simple either by pulling all the data, or taking random cuts.
This can be overcome with eBPF, resulting in lower volumes of data that need to be transferred, analyzed, and stored. With the focus only on data that is relevant, the data you select for analysis will be the data you need for reliable insights.

The power of cloud-native observability

Cloud-native data management is a complicated game, and the complexity and volume of logs, metrics, and trace data, will likely only increase. Maintaining visibility without paying ever-increasing storage and egress charges means a change in approach, and that’s what inspired groundcover.
There was a time when you couldn’t have it all, but now you can – an observability architecture that allows you to scale sustainably by targeting relevant data at source. Freeing the power of cloud and eliminating the cost, speed, and quality decision point by reducing overhead on source services, improving efficiency and eliminating irrelevant data, reducing network usage, and reducing costs.