Mustafa ERBAY

Posted on Jun 1 • Originally published at mustafaerbay.com.tr

Why Mobile Push Notifications Don't Arrive: 3 Critical Reasons

#technology #mobile #pushnotifications #android

Mobile Push Notifications: Mysterious Disappearances and Technical Realities

Mobile push notifications are one of the fastest and most effective ways for our applications to communicate with users. However, sometimes these notifications seem to vanish into a black hole. When you receive complaints from users like "My notifications aren't coming through," your first thought is usually a bug in your application. Yet, the reality is often more complex and hidden in the infrastructural layers. With my 20 years of system and network experience, I'll provide an in-depth analysis of why these notifications fail to arrive. In this post, we'll focus not just on superficial potential issues, but on the critical underlying technical reasons.

Such problems directly impact user experience and damage your application's credibility. A missed discount notification for an e-commerce app, a delayed emergency alert for a news app, or a missed important event notification for a gaming app can directly lead to revenue loss or user dissatisfaction. Therefore, understanding the functioning of push notifications across all layers is vital.

1. Platform Service Behavior: Google Firebase Cloud Messaging (FCM) and Apple Push Notification Service (APNS)

At the core of push notifications lie the services provided by mobile operating systems. Google Firebase Cloud Messaging (FCM) for Android and Apple Push Notification Service (APNS) for iOS are the key players responsible for delivering notifications to devices. The working principles of these services and the challenges they face are the first and most important factors that can cause notifications to not arrive.

FCM and APNS don't actually send notifications directly to devices. Instead, they act as an "intermediary" for applications that are waiting for notifications under certain conditions. When an app is in the background or closed, it's not efficient for the device to maintain a direct connection with APNS or FCM servers constantly, both in terms of battery consumption and network resources. Therefore, these services wait for notifications by maintaining a persistent connection on behalf of the devices or using regular "check-in" mechanisms.

ℹ️ The Core Mechanism of FCM and APNS

FCM and APNS work through device-specific registration tokens. Your application obtains these tokens, and your server-side then uses this token to send the notification request to the respective platform service. The platform service then attempts to deliver the notification to the relevant application on the device using this token. A break in any link of this chain can prevent the notification from reaching its destination.

These services themselves are large-scale infrastructures and may occasionally undergo maintenance, experience high load, or have temporary outages. Especially during peak hours, delays or packet losses can occur in notification requests sent to platform services. Such situations can affect notification delivery even if they are not caused by your application.

1.1. Device Connection Status and Background Processes

One of the most critical stages for a push notification is whether the device has a healthy connection with the respective platform service (FCM or APNS). If the device is not connected to the internet, has a weak Wi-Fi signal, or its cellular data is off, it becomes impossible for it to receive notifications from the platform service. However, the problem is often not that simple.

Many modern operating systems restrict background applications' network access to extend battery life and optimize network traffic. Android's "Doze Mode" and "App Standby" features, or iOS's background process limitations, can prevent applications from making network requests at certain intervals or under specific conditions. These restrictions can delay or completely halt notification delivery by preventing FCM or APNS from synchronizing with the device.

⚠️ Background Data Restrictions and Push Notifications

Battery saver settings consciously or unconsciously set by users, features like "Data Saver Mode," can severely restrict your application's background network access. In such cases, the FCM/APNS connection can be lost, and notifications won't arrive. To ensure your app's background data access for notifications isn't restricted, it's important to check both Android's "Battery Optimization" settings and iOS's "Background App Refresh" settings.

For example, when a user doesn't use their phone for a long time, Android's Doze Mode activates, significantly limiting the app's network access. While in this mode, only certain "whitelisted" applications or system services can access the network. If FCM's operation in this mode hasn't been prevented, notifications might not reach the device. Similarly, in iOS, if background data refresh is off, or if the app exceeds Apple's defined background processing limits, notifications can be delayed.

1.2. Registration Tokens and Validity Issues

Every mobile application receives a unique registration token from FCM or APNS upon installation or first launch. This token is similar to an "address" that allows your server to send notifications to a specific device. However, these tokens are not permanent and can become invalid for various reasons:

Application Reinstallation: When a user uninstalls the app or clears device data, the app receives a new token.
Platform Service Updates: FCM or APNS itself may reset or refresh tokens for security or performance reasons.
App Version Updates: Rarely, token updates might be necessary during app updates.
Device Factory Reset: In this case, all data is erased, and apps must obtain new tokens.

If your server attempts to send a notification to an invalidated token, that notification will naturally not reach its destination. Most of the time, platform services provide feedback to your server about invalid tokens (e.g., an "InvalidRegistration" error code in FCM). However, regularly processing this feedback and cleaning up invalid tokens from your database is a critical maintenance task.

💡 Effective Token Management

On the server-side, it's crucial to carefully monitor error codes received from platform services after each notification attempt. When you receive errors like 'InvalidRegistration', you should delete the corresponding token from your database or encourage the user to reinstall the app. Additionally, ensuring your application fetches and sends the current token to your server every time it launches can significantly reduce this problem.

While working on a production ERP project, we were sending instant alerts for operator screens. After some time, we started receiving complaints from operators that "Some alerts are not coming through." After days of debugging, we discovered that approximately 15% of the tokens on our server were invalid due to app updates and users resetting their devices. This was directly causing 15% of notifications to be lost. We immediately implemented a mechanism to ensure the app sent the current token to the server upon every launch and cleared invalid ones. This simple yet effective solution increased the notification delivery rate to over 99%.

2. Network Layer Issues: MTU, Proxies, and Firewalls

The journey of push notifications from platform services to devices often passes through a complex network infrastructure. Network layer issues encountered during this journey are the second major factor that can cause notifications to be lost. Such problems are more common in corporate networks or restrictive mobile networks.

FCM and APNS servers are typically located on reliable, high-bandwidth networks. However, the network environments where devices are located can vary. Mobile devices can connect to different Wi-Fi networks or use cellular data. Each of these networks has its own latency, packet loss rates, and bandwidth limitations.

A notification packet, from the moment it leaves the server, passes through various network devices: routers, switches, firewalls, and proxy servers. An incorrect configuration or restriction on any of these devices can block the notification packet's path.

2.1. Maximum Transmission Unit (MTU) Mismatches

MTU (Maximum Transmission Unit) refers to the largest data packet size that can be transmitted at a network layer. Different network segments or devices can have different MTU values. If a device on a network path receives a packet larger than it can handle, it will either try to fragment it or reject it.

FCM and APNS traffic is usually transmitted over TCP. TCP uses flow control and error correction mechanisms to ensure packets are delivered in the correct order and completely. However, if a device on the path has a low MTU value and TCP segments are not fragmented correctly or the MSS (Maximum Segment Size) is not set correctly, large TCP segments can be lost.

For instance, while a user's mobile device is on a 5G network, the operator's network might support an MTU of 1500, but a VPN tunnel or a corporate proxy server might be operating with an MTU of 1400. If TCP segments are larger than 1400 bytes, these segments might be dropped by the proxy server. This situation can cause problems, especially for notifications carrying large data payloads or maintaining long-term connections.

⚠️ MTU Discovery and Troubleshooting

MTU issues are often attempted to be resolved by a mechanism called "Path MTU Discovery" (PMTUD). However, PMTUD can be blocked by some network devices or firewalls (e.g., blocking ICMP 'Fragmentation Needed' messages). In such cases, manual MSS clamping or checking the MTU values of network devices may be necessary to resolve the issue.

This was an issue encountered by one of my clients on corporate Android devices. When they connected to the company network, notifications from outside the company (including FCM and APNS) were almost never arriving. It took days to find the problem. We finally understood that an incorrectly configured MSS clamping setting on the company's main gateway (FortiGate) was the culprit. This setting had fixed the maximum size of TCP segments to a much lower value, instead of adjusting it to be compatible with the VPN tunnel's MTU. Consequently, large TCP segments from FCM were being dropped as they passed through this gateway. When we adjusted the MSS clamping setting on the gateway to the correct value, the problem was resolved.

2.2. Proxy Servers and Firewall Rules

In corporate networks, internet traffic typically passes through a central proxy server or firewall. These devices are used to ensure network security, monitor traffic, and manage bandwidth. However, misconfigured rules on these devices can block communication from specific services like FCM and APNS.

FCM and APNS communicate over specific IP addresses and ports. These IP addresses and ports can change occasionally or fall within a broad range. If the firewall or proxy server rules block these communications, notifications cannot reach the device. For example, if the firewall only allows traffic on certain HTTP/HTTPS ports and FCM/APNS tries to use a channel outside these ports, communication is interrupted.

ℹ️ Whitelisting Required Endpoints

The IP address ranges and ports that FCM and APNS communicate over are specified in the official documentation from Google and Apple. If a proxy or firewall is used in corporate networks, it means that the addresses and ports from these documents should be "whitelisted," i.e., this traffic should not be blocked. These lists are updated regularly, so regular checks are important.

Proxy servers themselves can be a source of problems. Some proxies decrypt and re-encrypt SSL/TLS traffic to inspect it (SSL Inspection). If a problem occurs while FCM/APNS traffic is being processed this way, or if the proxy doesn't recognize this traffic and blocks it, notifications can be lost. Additionally, connection timeouts or bandwidth limitations on proxies can also affect long-term connections or high traffic volumes.

We experienced a similar scenario in an internal platform of a bank. A mobile application we developed was sending instant transaction alerts to customers. However, users couldn't receive these alerts when connected to the bank's Wi-Fi network. The problem stemmed from the bank's robust firewall perceiving some dynamic ports used by APNS as unknown traffic and blocking them. By examining the APNS documentation and opening the necessary ports on the firewall, we resolved the issue. This was an example of how services need to be configured, even in environments where security is paramount, to avoid hindering functionality.

3. Application and Device Level Optimizations: Battery Saving and Background Restrictions

The third main reason for push notifications not arriving is the optimizations and restrictions at the application and operating system level on the device. These optimizations are generally aimed at improving user experience, but they can sometimes lead to unintended side effects.

Modern smartphones use aggressive battery-saving mechanisms to extend battery life and improve performance. These mechanisms constantly monitor and optimize resource consumption by background applications. If an application is perceived as "wasting resources" by these mechanisms, it can be restricted by the operating system.

3.1. Background Process Restrictions and Mechanisms like 'Doze Mode'

Features like "Doze Mode" and "App Standby" on Android, and background app refresh restrictions on iOS, prevent applications from staying constantly connected to the network or being active in the background. These restrictions directly affect notification delivery.

For example, Doze Mode almost completely stops applications' network access when the device is in sleep mode. Only certain "protected" applications or system services are exempt from this restriction. FCM itself is usually on this list of protected apps, but sometimes the behavior of these mechanisms can be unpredictable, or app developers may need to adjust their notification sending strategies to account for these modes.

ℹ️ Optimization for 'Doze Mode' and 'App Standby'

If your application sends real-time or critical notifications, it's important to ensure these notifications are sent in a "doze-friendly" manner using platform APIs like WorkManager on Android. WorkManager works in sync with the operating system's battery-saving strategies, guaranteeing timely delivery of notifications when necessary.

Once, in a task management app we developed on Android, we received complaints that critical task reminders were not arriving on time. Users reported that notifications didn't come as the task's due time approached. Investigating the source of the problem, we found that the app was using a simple Timer to schedule notifications in the background. This Timer stopped when Doze Mode activated. By reconfiguring task reminders using WorkManager, we found that notifications now arrived reliably. This was an example showing how important it is to understand and adapt to the platform's battery optimizations.

3.2. Lack of In-App Debugging and Logging

One of the most important tools for understanding why push notifications are not arriving is detailed log records from both the client (mobile app) and the server. However, in many cases, developers do not set up these logging mechanisms in sufficient detail or do not monitor these logs adequately in the production environment.

If your mobile app cannot obtain a token from FCM/APNS, loses its token, or receives an error from platform services, it needs to log this information. Similarly, on the server-side, when a notification sending request is made, logging information such as whether the request was successful, which token was used, and what kind of response was received from the platform service, makes the troubleshooting process incredibly easier.

🔥 The Cost of Not Logging in Detail

Insufficient logging can turn problem detection into a detective game. To understand why a notification is lost, you might need to examine the app logs on the device, the app logs on the server, and, if possible, the logs of network devices. Detailed and centralized logging systems (e.g., ELK Stack, Grafana Loki) automate this process, saving time and enabling faster solutions.

At one point, I had a spam blocking app I developed for Android. The app would check incoming SMS messages and block unwanted ones. Some users reported that important SMS messages were also being blocked. To understand the problem, I added detailed logging. Through the logs, I realized that while running in the background, the app was occasionally stopping due to battery optimizations, and thus, it couldn't process incoming SMS messages. Using this information, I made the app's background processes more robust. This experience taught me how critical logs are for finding the root cause of any problem.

3.3. Poorly Managed Notification Channels (Android)

In Android, app developers can categorize notifications into different channels. For example, a news app might create different notification channels like "Breaking News," "Sports News," "Technology News." Users can enable or disable these channels according to their preferences or change settings like sound and vibration.

If the app developer does not configure the notification channel correctly, or if the user completely disables this channel, notifications sent to this channel will not reach the user. For instance, the developer might create a channel named "Urgent Alerts" and send critical notifications to this channel, but the user might have accidentally turned this channel off.

💡 Optimizing User Control

When using notification channels in Android, it's important to ensure users can easily manage these channels. Providing a section within the app where users can easily access notification settings and clearly stating what each channel is for can improve user experience and prevent misunderstandings.

While not a direct technical bug, this situation can cause users to think they are not receiving notifications. In a financial calculator app I developed, we enabled users to receive instant notifications about portfolio changes. When some users said "I'm not getting notifications," we found that they had actually muted or completely turned off the notification channel they had set. For such cases, adding a flow to guide the user to check notification settings and enable necessary channels was beneficial.

Conclusion: Push Notifications Must Be Considered Holistically

The reliable delivery of push notifications is not possible with the success of a single component, but rather with the harmonious operation of the entire system. The behavior of platform services (FCM/APNS), the health of the network infrastructure, the operating system's battery optimizations, and the correct configurations within the application itself are all parts of this process.

The three main categories I've covered in this post – platform services, network layer issues, and application/device level optimizations – encompass the most common reasons for push notifications not arriving. However, as is always the case, real-world problems can be more specific and unexpected. Therefore, when you encounter a problem, it is critically important to systematically examine all these layers, rather than focusing solely on your application's code.

Remember, for users, notifications are just a message; but the delivery of this message is a vital bridge for your application's credibility and user experience. Keeping this bridge strong requires constant attention and a detailed understanding.

Top comments (2)

freerave • Jun 1

Great article! I really appreciate the deep dive into the network layer (MTU, Proxies) and OS-level restrictions. It’s refreshing to see issues addressed from an infrastructure perspective rather than just surface-level API debugging.

However, from a Security Architecture and Backend Engineering standpoint, I noticed a few critical gaps in the proposed solutions that could potentially lead to severe vulnerabilities or system bottlenecks:

The Logging Trap & Data Exposure (Security)
While having detailed logs in centralized systems like ELK or Loki is crucial for debugging, logging raw payloads or unencrypted device tokens is a massive security risk. Device tokens are essentially credentials; storing them in plaintext without strict Data Masking exposes the system to Notification Spoofing. Furthermore, logging notification payloads might violate privacy compliance if they contain PII or sensitive transactional data.
Broad IP Whitelisting in Corporate Firewalls (Security)
Mentioning the whitelisting of APNS/FCM IP ranges in highly secure environments (like banks) is an architectural red flag. Apple’s IP subnets (e.g., 17.0.0.0/8) are massive. Opening such broad ranges expands the attack surface significantly and creates a potential backdoor for Data Exfiltration. In sensitive environments, relying on Domain-based Routing or SNI Inspection is much safer than broad IP whitelisting.
Token Synchronization & Backend Concurrency (Architecture)
Treating invalid token cleanup and token synchronization on app launch as simple database updates overlooks scalability. If a large user base updates the app and launches it simultaneously, the sudden spike in token sync requests can essentially act as a self-inflicted DDoS attack on the backend. A robust architecture needs Message Brokers (like RabbitMQ/Kafka) as a buffer, strict Rate Limiting, and proper handling of 429 Too Many Requests from FCM/APNS to ensure system resilience.

Again, thank you for shedding light on the often-ignored network layers of push notifications. It’s a great piece that opens the door for a deeper discussion on building secure and scalable infrastructure!

Mustafa ERBAY • Jun 1

Really appreciate this — exactly the kind of pushback that makes the topic better. You're right that the post stayed on the "why it doesn't arrive" (network + OS) layer and treated the backend side too lightly. Let me push back with you on all three, because they're spot on:

Logging / token exposure. Fully agree — a device token is a credential, not a debug field. My rule: tokens never hit logs in the clear; if I need correlation I log a salted hash or the last 4–6 chars, and the token store itself is encrypted at rest and treated like a secrets table. Payloads: log the envelope (message id, target, provider status, latency), never the body — and if a body must be sampled, it goes through a PII redaction pass first. A leaked token + a send-path that doesn't authenticate the sender is straight-up notification spoofing.
Broad IP whitelisting. That part was describing what teams do (and the pain it causes), not endorsing it — but I should've said so more clearly, because 17.0.0.0/8 as a firewall rule is an architectural smell. Opening Apple's entire /8 for egress is a giant exfil-friendly hole. The cleaner pattern is FQDN/SNI-based egress filtering through a forward proxy (api.push.apple.com, fcm.googleapis.com) instead of CIDRs — ideally on a dedicated egress path so push traffic isn't sharing one broad allow-list with everything else.
Token sync / self-DDoS. This is the one I see bite people most. The classic trigger is a forced update or a big campaign — everyone launches in the same 10 minutes and the sync endpoint takes a synchronized stampede it was never sized for. What's worked for me: (a) the sync endpoint just enqueues onto a broker (Kafka/RabbitMQ) and returns fast, a worker reconciles; (b) the client adds jitter + backoff on launch instead of firing immediately; (c) cheap dedupe first — hash-compare the token and skip the DB write entirely when it's unchanged (most launches); (d) treat the provider's 429 Retry-After as authoritative and batch invalid-token cleanup async off the feedback channel rather than inline.

Thanks again — this is exactly the "secure + scalable backend" follow-up the piece was missing. Might be worth a dedicated post on the send-side architecture; you've basically sketched the outline.