Chalom Ellezam

Posted on Jun 15 • Edited on Jun 18

Stop sending every alert to Slack. The 4-channel routing matrix I built after 18 months of being paged at the wrong time.

#devops #observability #productivity #webdev

Disclosure: I am a senior backend tech lead in Paris and I run Belmo, a small European PaaS. This article mentions Belmo once near the end. Everything else is platform-agnostic.

It is 02:47 on a Tuesday. My phone is silent. Stripe's webhook signature has been failing for two hours, every paid signup is queueing into a retry table that nobody is draining, and the alert sits in a Slack channel I last opened on Friday at 18h. I find out at 09:14 when a customer emails asking why his subscription is "stuck on pending".

That morning I rebuilt my routing. Eighteen months and roughly 4,000 alerts later, the version below is what stuck. None of it is platform-specific. The mistake I made, and that I see in almost every solo founder I review, is treating "alerts" as one thing that goes to one place.

Why piping everything into one channel stops working around month three

When you ship your first MVP, you have maybe five alerts a week. They fit in one Slack channel called #alerts, you open it when you sit at your desk, and life is good. Around month three something changes. You added Sentry, a status check, a cron monitor, a payment-failure log, an LLM cost-spike detector. Now you have 50 alerts a week. Most are noise, a few are urgent, and one of them is the one bleeding money right now.

Slack rewards triage during business hours. After hours it is no better than email, because nobody opens Slack at 23h to "check the alerts channel". Telegram on the other hand pushes notifications past Do Not Disturb if you let it, which is exactly what you want for a P0 and exactly what you do not want for a deploy-finished ping. Email is great for digests and awful for anything time-sensitive. Discord is where your community lives, which means alerts there either get drowned by #general or read by users you did not intend.

So the question is never "which channel is best". It is "which alert belongs in which channel". Every production alert is answering four questions. Does this need to wake me up? Does it need a human in the next five minutes? Is it for the team, or only for me? And is the content sensitive (DB errors with row data, customer emails, Stripe events)?

Once you answer those four, the channel picks itself.

The four channels, evaluated by what they actually do at 03h

I want to be specific, because most "Slack vs Telegram" posts compare features in a vacuum. Here is what each channel actually does to your sleep and your inbox.

Slack. Slack is great for team-facing alerts that need context (a thread, a screenshot, a back-and-forth). It is terrible for waking a single human up. Its mobile notifications respect your phone's Do Not Disturb, which is what 99% of users want, but it means a 03h alert in Slack will not ring your phone unless you specifically configured a high-priority keyword for it. Slack is where your team triages incidents the next morning. It is not where you find out the site is down.

Telegram bot DMs. A Telegram bot that messages you directly (not a group chat, a 1:1 DM with the bot) is the cheapest pager you will ever build. It bypasses DND if you whitelist the contact, the API is one curl away, there is no SaaS bill, and the message lands on your lock screen with a sound. The downside: it is private to you. Anything you want the team to see in the morning has to go elsewhere too.

Discord. Discord is what your community is in, and that is a feature for community-facing alerts (a "new signup" cheer in a founder Slack, a deploy log in a build channel) and a bug for anything else. Webhook posting is one line, but if your alert channel lives in the same server as your users, accidents happen. Use a separate, private Discord server for ops if you go this route.

Email. Email is excellent for one thing: digests. The daily 09h summary, the weekly cost report, the "here are 12 things that happened overnight that you should know about but did not need to act on". It is awful for anything that needs a response in less than four hours, because by then it is already buried under 30 newer emails.

I deliberately leave SMS and PagerDuty off this list. SMS costs money per message and rate-limits at the worst time. PagerDuty is excellent and also overkill for a solo founder with two services. If you are a team of five with on-call rotation, add PagerDuty. Below that, Telegram-as-pager is fine.

The matrix that picks the channel for you

Here is the routing logic I run for every project I ship now. It maps a single severity tag (P0, P1, P2, P3) and an audience flag (me, team, community) to a channel.

Severity	What it means	Audience	Channel
P0	Site down, payments failing, data loss in progress	me	Telegram bot DM (+ optional SMS)
P1	Degraded, urgent, customer-visible within 1h	me	Telegram bot DM
P2	Anomaly, needs eyes today, not bleeding	team	Slack (or private Discord)
P3	FYI, monitoring noise, "we processed 412 jobs"	me	Email digest, daily
P3	Deploy finished, build succeeded, new signup	community/team	Discord or Slack

The thing this matrix forces you to do is tag every alert at the source. You no longer write "if (error) sendToSlack". You write sendAlert({ severity: "P0", audience: "me", message: ... }) and a tiny router decides where it lands. Severity tagging is the unsexy work that pays off the most. It is the same discipline that PagerDuty enforces with their incident levels, but you do not need to pay PagerDuty to enforce it on yourself.

A 30-line router that does all of this

Here is the Node.js version. Python and Go are trivial translations. The thing I want you to see is how short it is. Most founders assume "alert routing" is a project. It is a function.

// alerts.js
const TELEGRAM_TOKEN = process.env.TELEGRAM_BOT_TOKEN;
const TELEGRAM_CHAT_ID = process.env.TELEGRAM_CHAT_ID;
const SLACK_WEBHOOK = process.env.SLACK_WEBHOOK_URL;
const DISCORD_WEBHOOK = process.env.DISCORD_WEBHOOK_URL;

async function sendTelegram(text) {
  await fetch(`https://api.telegram.org/bot${TELEGRAM_TOKEN}/sendMessage`, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({
      chat_id: TELEGRAM_CHAT_ID,
      text,
      parse_mode: "Markdown",
    }),
  });
}

async function sendSlack(text) {
  await fetch(SLACK_WEBHOOK, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({ text }),
  });
}

async function sendDiscord(text) {
  await fetch(DISCORD_WEBHOOK, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({ content: text }),
  });
}

// Email digest is just an INSERT into a daily_digest table.
// A cron at 09h flushes it. No external service needed.
async function queueEmailDigest(text) {
  await db.query(
    "INSERT INTO daily_digest (created_at, body) VALUES (now(), $1)",
    [text]
  );
}

export async function sendAlert({ severity, audience, message }) {
  const prefix = `[${severity}] `;
  const text = prefix + message;

  if (severity === "P0" || severity === "P1") {
    return sendTelegram(text);
  }
  if (severity === "P2") {
    return audience === "team" ? sendSlack(text) : sendTelegram(text);
  }
  if (severity === "P3") {
    if (audience === "community" || audience === "team") {
      return sendDiscord(text);
    }
    return queueEmailDigest(text);
  }
}

Every alert in your code now becomes a single function call with three keys. The router takes care of the rest. The two extra lines of discipline (tagging severity and audience at the call site) are what gives you back your weekend.

The three routing mistakes that cost me real sleep

I do not want this post to read as if I figured this out in a meeting. I figured it out by getting it wrong, three different ways, on three different projects.

Mistake one: routing P0 to Slack because "the team is there". This is the 03h Stripe-webhook story above. The team is there but the team is asleep, and Slack is asleep with them. P0 means "wake one human up", and that human is you. Slack will not. A Telegram bot will.

Mistake two: routing everything to Telegram because "it works". I did this for six weeks after the Stripe incident. By week three I had alert fatigue so badly that I silenced the bot. By week five I missed a real P0 because I had trained myself to ignore the buzz. Alerts have a budget. Demote ruthlessly. If it is not actionable within an hour, it does not go to Telegram.

Mistake three: no dedup window. A failing cron that runs every minute will send you 1,440 identical Telegram messages in a day. The fix is two lines: hash the alert message, look up the hash in Redis or a small sent_alerts table, skip if seen in the last 30 minutes. I added this after the cron that watches my email worker spammed me with 600 messages on a Sunday. The number to remember is 30 minutes: long enough to suppress noise, short enough that a real incident gets a fresh ping if it persists.

What I built around all of this

The reason I think about this so much: I built Belmo's monitoring around exactly this routing instinct. Past clients like BeReal had whole observability teams writing these routers in-house. The smaller clients I review on the side had none of it. Alerts went to one Slack channel, the channel got muted in week six, then a customer complaint became the monitoring system. So I built an AI watcher that reads logs, classifies anomalies into severities (retry loops, token spikes, hot Sentry fingerprints, silent cron failures), and pushes the P0s and P1s straight to a Telegram bot. The free tier never sleeps, which matters specifically because the alerting layer must not be the thing that fails. That is the whole pitch. You do not need Belmo for any of this to work, only a tagging discipline and the 30-line router.

What to do tonight, regardless of which platform you ship on

Seven steps, in order, none of them more than 10 minutes.

Open every place an alert currently fires in your code. Count them. If you do not know the number, that is the first finding.
Add a severity and audience argument to your existing alert function. Default both to safe values (P2, me) so nothing breaks.
For every existing alert, set the severity. Be honest. Most of what you have today is P2 or P3.
Set up the Telegram bot. There is a whole post on this earlier in the series (#4 below). Total time, 5 minutes.
Set up a daily digest table and a 09h cron to email yourself everything that hit P3 during the night. This is the alert-fatigue antidote.
Add the 30-minute dedup window. Two lines and a Redis key, or a row in Postgres with a unique constraint on the hash and an expiry.
Test it. Throw a fake P0 at the router. Make sure your phone buzzes when it should and does not when it should not. Then go to sleep.

What I still get wrong

The matrix above is not right for every team. If you are at five engineers with on-call rotation, PagerDuty pays for itself within a quarter. If your alerts contain customer PII, Telegram bots are not where you want them landing (encrypted email or self-hosted Mattermost is better). And there is a category I have not solved well: alerts that need both wake-up and team triage. Today I send those to Telegram first, then re-post the resolved post-mortem to Slack the next morning. Two steps where one would be better.

The question I want to leave you with: which channel is your phone showing right now, and how many of those alerts have you ignored for more than 48 hours? The answer tells you which severity is in the wrong place. Drop the count in the comments. I will publish whatever pattern I see most often next.

DEV Community