DEV Community

Cover image for The Push Notification Bug That Took Three Layers to Find
Muhammad Niaz Ali
Muhammad Niaz Ali

Posted on

The Push Notification Bug That Took Three Layers to Find

1:00 AM to 2:27 AM. One bug, three root causes, zero clean error messages.

It started with a simple complaint: an admin sends a push notification, and the user never receives it. No crash, no red error in the console, nothing obviously broken. Just silence on the other end.

That kind of bug is the most frustrating kind. Everything looks like it's working. The permission prompt shows up fine. The admin panel says "sent." And yet nothing arrives.

By 1 AM, after a long day already spent on a fairly large project, this was the last thing left to fix before calling it a night. It turned into an hour and a half of tracing one silent failure into another.

Layer One: The CSP Was Blocking the Fix Before It Could Even Start

The first clue showed up in the browser console: a Content Security Policy violation, quietly blocking a script that OneSignal's SDK needed to complete its own initialization. The permission popup looked completely normal, so it was easy to assume the subscription step was working. It wasn't. The script that OneSignal used internally to finish setting up the subscription was being blocked by the site's own security headers.

The fix was small: add the missing domain to the script-src directive. But finding it meant not trusting what the UI looked like it was doing, and instead reading the actual network requests line by line.

Layer Two: "Sent" and "Delivered" Are Not the Same Thing

Once the CSP was fixed, notifications appeared to send successfully. The API returned a success response, an ID was created, and the admin panel showed a "sent" confirmation.

Except the user still got nothing.

This turned out to be a subtler problem. OneSignal's newer API doesn't return a recipient count in that initial response, so a message could be "created" successfully by OneSignal's servers while still reaching zero actual devices. The code was treating message creation as proof of delivery, which is not the same thing at all.

The fix involved polling OneSignal's delivery-status endpoint a few seconds after sending, and actually checking how many devices were reached, instead of trusting the presence of an ID as a signal of success.

Layer Three: A Ghost Service Worker

This was the one that took the longest to catch, and honestly, the one that felt the most satisfying to finally nail down.

Even after confirming real delivery through OneSignal's own stats, one specific user still wasn't receiving anything. Digging through the project's public folder turned up a leftover default service worker file from an earlier setup, sitting quietly next to the custom one that was supposed to be handling push events.

Any browser that had visited the site before this custom worker was introduced may have registered that old default file first. Even after the new service worker file was deployed, a browser that already had the old one registered wouldn't automatically know to drop it and start using the new one right away. The push was arriving. It just had nowhere active to go.

The fix was two-fold: remove the unused leftover file from the project, and have the affected user manually unregister their existing service worker once, so the browser could pick up the correct one cleanly.

What This Actually Was

Nothing in this bug involved a single obvious crash. It was three separate systems, each behaving in a way that looked reasonable in isolation:

A security policy doing exactly what security policies are supposed to do, just one line too strict.
An API response that was technically accurate, just incomplete.
A browser caching behavior that exists for good reasons, just working against the deployment here.

Every layer passed the blame quietly to the next one, which is exactly why it took so long to trace. There was no single error message pointing at the real cause. Just three assumptions that each seemed fine on their own, and only fell apart when followed all the way through.

By 2:27 AM, the notification finally landed on the right device. Tired, but working, which on a night like that is a good enough place to stop.

Top comments (0)