DEV Community

Cover image for Building a new shift-left approach for alerting
Tal Borenstein for Keep

Posted on

Building a new shift-left approach for alerting

Hi Community!
Looking forward to hearing your thoughts on this!

Keep is an open-source alerting CLI tool that @shaharglazner and I wrote out of a pain we felt throughout our careers as developers and developers managers.
Alerting (aka monitors/alarms) always felt like a second-class citizen within all the different monitoring/observability/infrastructure tools with a very narrow feature set, which in turn results in poor alerts, alert fatigue (yes, your muted Slack channel), unreliable product and a complete alerting-hell.

Alerts as part of SDLC

It's not only that we couldn't create better applicative/infrastructure alerts, but it's also that it is tough to maintain them and ensure they work over time.

Organizations today have so many tools they use for alerting that it's becoming an absolute nightmare.

Alerting as a first-class citizen

The best way to describe what we had in mind when we first built Keep is how one of our first users puts it:

Keep is doing to alerting what GitHub actions did to CI/CD

There were three main guidelines when we started coding:

  1. Good alerts are not just over thresholds/logs BUT should be treated as workflows with multiple "tests" (steps/actions).

  2. The tool should be 100% data agnostic - agnostic to where data resides (& not only "traditional" data sources but also a DB, for example). There's no real reason why it shouldn't be abstracted from developers.

  3. Maintained and lives in your code - allowing it to be integrated with all CI/CD processes (imagine a gate that fails your PR when you break alerts).

Multi-step alert example

What's Ahead?

We constantly try to improve with our promised:

Try our first mock alert and get it up and running in <5 minutes

So we're adding plenty more deployment options, providers, and functions. We're working on simplifying the syntax furthermore.

What do you think about the need for this kind of "abstraction"? What do you think about alerts as post-production tests? How do you manage and control your alerting chaos right now?

Would love to hear your thoughts; feel free to comment here / on our Github repo / in our Slack

Top comments (4)

Collapse
 
vijayjangir profile image
Vijay Jangir

a couple of questions.
From implementation point of view:

  • If alerts are part of the PR, how changing only alert threshold for existing release version will work?, new pr will mean cutting out a new version and new deployment?? If keeping it in a seperate repository, it'll again start to become a problem, as we still try to refrain from getting a new repo for gitops as well. From configuration point of view
  • How to configure alerts based, do you have any examples which covers most of the scenarios for classic alerts used with SRE principals?
Collapse
 
talboren profile image
Tal Borenstein

If alerts are part of the PR, how changing only alert threshold for existing release version will work?, new pr will mean cutting out a new version and new deployment?? If keeping it in a seperate repository, it'll again start to become a problem, as we still try to refrain from getting a new repo for gitops as well. From configuration point of view

It's an interesting question and depends on the implementation details, but the user should configure how to handle it.
Perhaps a separate version for alerts?

How to configure alerts based, do you have any examples which covers most of the scenarios for classic alerts used with SRE principals?

I'm not fully sure I got your question here, mind to refine it? 🙏🏼

Collapse
 
shaharglazner profile image
shaharglazner

Shahar here to answer any question ❤️

Collapse
 
talboren profile image
Tal Borenstein

🚨