DEV Community

loading...
Cover image for How Dashbird innovates serverless monitoring
Dashbird

How Dashbird innovates serverless monitoring

taavirehemagi profile image Taavi Rehemägi ・4 min read

At first glance, all serverless monitoring services seem similar and aim to solve the same problems. However, in Dashbird, we have made decisions that fundamentally differentiate us from our competitors since day one. Over time, those differences have magnified and we have found increasing confirmation and confidence in our approach.

Dashbird product strategy is based on three core pillars. According to our customers, it makes up the most complete and compelling serverless monitoring offering in the market.

The three pillars of Dashbird are:

  • Centralization of distributed data
  • Automation of alerts
  • Continuous Well-Architected insights (we'll go into each one later)

Additionally, Dashbird is the only serverless monitoring service that does not instrument Lambda functions.

Below, I will walk you through the decisions, benefits, and* strategy behind our platform and the fundamental idea of what a good serverless monitoring approach should consist of* in this day and age.

Focus on centralisation, instead of tracing

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.

Observability in Wikipedia

The code idea of Dashbird is that we can determine the inner state of an application by collecting, correlating, and analyzing already available system outputs (logs, metrics, (X-ray) traces, and resource configurations. Currently, we integrate with seven AWS services (AWS Lambda, API Gateway, DynamoDB, Step Functions, SQS, Kinesis, and ECS) with a total setup time of less than two minutes (deploying a simple CloudFormation template to your AWS).

Virtually all other serverless monitoring services instrument functions or other compute resources, collecting telemetry during execution time. We believe, this is not the optimal approach in serverless and the perfect solution lies outside of low-level information gathering.

This opinion is enforced by the learnings we've obtained by building large-scale serverless applications ourselves and from speaking to thousands(!!) of serverless users over the years.

  1. In serverless, the complexity shifts from code-level into orchestration level, reducing the importance of getting granular function execution details (tracing). At the same time, application logic is now distributed across hundreds of moving parts, increasing cognitive overload and the amount of available data.
  2. AWS Lambda is a big part of serverless, but other services introduce risks and challenges too. What all serverless services have in common is the limited code access while providing logs, metrics, and traces in a predefined format.
  3. The true effectiveness of an engineering team is dictated by its ability to access, interrogate, and understand operational data.

Abstract and automate failure detection across the stack

The biggest a-ha moment so far has been when we launched an* error detection feature from CloudWatch logs. For Dashbird users, that means that the moment they onboard they'll immediately be able to reduce their mean time to discovery (MTTR) by up to 80%. From our own experience and from talking to customers, finding a failing function, or other resources, out of 100s of resources is a daunting and challenging task *for most teams.

This is why the second pillar of Dashbird is that we continuously analyze every log line and every metric across the system, and apply prebuilt alarms and filters, ready to detect a single point of failure amongst thousands of signals. For transparency, we have also published the library of our alarms here.

Prebuilt alarms and insights in Dashbird.

There are multiple reasons why this is especially valuable for our customers:

  1. It is very hard to map out all the possible known and unknown failure scenarios across the stack. Dashbird's value is in the research and analysis we have done for those services and offering all of those alarms by default to all of our customers.
  2. Manually managing log filters and metric alarms across the stack is a lot of work, and can also be really expensive.

Continuous insights towards the Well-Architected Framework and best practices

Adopting serverless assumes using a variety of different managed cloud services and educating the whole team in best practices and ways to use those services. There are two fundamental challenges that organizations struggle with when building and operating a serverless stack.

Therefore, the third pillar of Dashbird is all about:

  1. Automatically aligning the stack with community best practices.
  2. Educating engineering teams about the best practices and optimal settings and patterns of serverless.

Insights library for API Gateway Insights.

The approach of collecting and analysing all types of data about the infrastructure landed us in a great position to build a collection of checks that identify reliability, security, performance and cost optimization, and operational excellence insights. Today, this Well-Architected insights engine features over 70 checks, ran continuously from 5 minutes to 1-day intervals.

From a users' perspective

Combining those three pillars gives teams an end-to-end experience to their modern cloud stack via:

  1. Increased service quality and reduced risk of incidents. This is driven by Dashbird drastically reducing the time to discovery for most incidents in the cloud.
  2. Time back to developers to focus on the product and customer. When operating a serverless application in production, you are going to have to set alarms, build dashboards, and make monitoring data consumable. Dashbird takes away the heavy lifting and undifferentiated work of that.
  3. Posture and best practices management. Users of Dashbird achieve better performance, cost, reliability, and security of their cloud environment with significantly less effort.

Five years from now

The seismic shift in cloud computing will be the adoption of single-purpose, managed cloud services, that enable engineers not to focus on creating undifferentiated value but focus most of their time and effort on creating value special to their organization. Over time, the importance of computing will deteriorate in favor of managed services. Applications will be built out of lego-blocks offered by cloud providers and third-party vendors.

Dashbird will be a centralization platform that does not just ingest and centralize operational data from popular managed services but transforms that data into universally understandable insights. Those three pillars of Dashbird mentioned above will fundamentally transcend to all managed services, reducing the barrier to entry and increasing the simplicity of serverless monitoring, operating, and scaling.

Discussion (0)

pic
Editor guide