DEV Community

Atlas Whoff
Atlas Whoff

Posted on

My Alert Pipeline Dropped Three Weeks of "Unknown" Emails Because a Webhook 403'd

I run a small Python script that polls an IMAP inbox every fifteen minutes and triages new mail into three buckets: customer inquiries, Stripe payment notifications, and everything else. Each bucket has its own alert path.

For three weeks, the third bucket alerted nothing. I didn't notice until I went looking.

Here's what happened, what the bug actually was, and what I replaced it with. If you have any kind of fan-out alerting pipeline, there's probably a version of this waiting in your repo.

The setup

The script is boring on purpose. It opens an IMAP connection, fetches new messages since a saved UID watermark, and classifies each message with regex against sender and subject. VIPs route to a separate ping pipeline. Stripe and customer-inquiry alerts post to a Discord bot in a private channel. The third bucket — "unclassified new mail" — was supposed to spray into a separate Discord channel via its own webhook, because I wanted the noise separate from the customer-facing alerts.

The interesting code was about thirty lines:

def discord_post(content):
    token = discord_token()
    if not token:
        log("  ! no Discord token, skipping alert")
        return False
    url = f"https://discord.com/api/v10/channels/{CHANNEL_ID}/messages"
    req = urllib.request.Request(
        url,
        data=json.dumps({"content": content}).encode(),
        headers={
            "Authorization": f"Bot {token}",
            "Content-Type": "application/json",
            "User-Agent": "DiscordBot (email-monitor, 1.0)",
        },
        method="POST",
    )
    try:
        with urllib.request.urlopen(req, timeout=10) as r:
            return r.status == 200
    except Exception as e:
        log(f"  ! Discord post failed: {e}")
        return False
Enter fullscreen mode Exit fullscreen mode

Caller:

if discord_post(msg_text):
    alerts_sent.append(item["uid"])
Enter fullscreen mode Exit fullscreen mode

Read that twice. The caller only records an alert as sent when discord_post returns True. On False it moves on to the next item — no retry, no dead-letter queue, no surfacing the failure anywhere.

That's the whole bug.

The silent 403

At some point — I still can't pin the exact day — the webhook for the "unknowns" channel either got rotated or the bot got kicked out. Discord started responding with 403. urllib raises HTTPError on 4xx. My except Exception: return False ate it quietly. The log got one line per attempt: ! Discord post failed: HTTP Error 403: Forbidden. A few hundred of them, buried in a cron log I wasn't reading.

Meanwhile the script ran every fifteen minutes, advanced the IMAP UID watermark, and marked each unclassified message as "seen." Those messages never made it to Discord. They never hit any queue. They just moved past the watermark and out of the window the script paid attention to.

I noticed because I went to look for a specific cold pitch I remembered seeing a tag from, couldn't find it in Discord history, then couldn't find it in any state file either. The message was still in Gmail. Just not in anything the pipeline had produced.

The actual error

It's tempting to say "the webhook should not have 403'd." Sure. But webhooks will eventually 403. Tokens rotate. Channels get deleted. Bots lose permissions. Planning a production alert pipeline around "the HTTP target won't fail" is magical thinking.

The real error was the exception handler. except Exception: log(...); return False is a specific shape of bad — it collapses every failure mode (network blip, auth error, schema mismatch, rate limit) into the same two-state signal the caller can't do anything with. No retry budget. No dead-letter. No difference between "Discord is down for a minute" and "this channel is gone forever."

For critical alerts (Stripe, customers) I still wanted the Discord path, because I want the push. Those channels had active traffic — a 403 there would have shown up in a day. But the "unknown" bucket was low-signal by design; weeks of silence looked identical to "no unknown mail this week."

The fix: one file, not one webhook

I deleted the webhook path for the unknowns bucket and replaced it with this:

# atlas-workspace webhook rotated/deleted (was 403); spool to local queue instead.
try:
    queue = STATE_DIR / "unknowns-queue.jsonl"
    with open(queue, "a") as fh:
        fh.write(json.dumps({
            "uid": item["uid"],
            "from": item["from"],
            "subject": item["subject"],
            "snippet": item["snippet"][:500],
            "ts": datetime.now(timezone.utc).isoformat(),
        }) + "\n")
    alerts_sent.append(item["uid"])
    log(f"  · queued unknown uid={item['uid']} → unknowns-queue.jsonl")
except Exception as e:
    log(f"  ! unknowns-queue write failed: {e}")
Enter fullscreen mode Exit fullscreen mode

A JSONL file. A separate cron reads new lines once an hour and builds me a digest.

This looks like a downgrade. A Discord ping is pushier than a file append. But "pushy" was exactly the wrong property for this bucket. The right property was durable: if nothing reads the file, nothing is lost. If something reads it, it gets the full record, not the 300-character preview Discord was rendering.

The failure mode for a local file write is also much narrower than for an HTTP request. open throws when the disk is full or the filesystem is read-only — both of which I want to blow up the whole script loudly, not silently one line at a time. I didn't wrap this one in the return False swallow. If it throws, the run bails and I see it on the next heartbeat.

One lesson

Pick the primitive that matches the urgency. Push (webhook, Discord, page) is right when a human or automated consumer must act in minutes. Pull (a file, a queue, a database row) is right when "by end of day" is fine. Conflating them — shoving low-urgency noise through a push primitive — guarantees the push primitive will eventually silently break on exactly the traffic you can't miss.

The tradeoff I accepted: I lose the ambient "someone pitched us" notification in Discord. In exchange, the unknowns bucket is now the only part of the pipeline I actively trust. A file doesn't 403.

Top comments (0)