Push Notification Reliability: 3 Core Misconceptions

#technology #pushnotification #reliability #mobile

Common Mistakes and Realities in Push Notification Delivery

Push notifications are one of the most direct ways to engage with users. However, behind this seemingly simple communication lies deep technical details that are often overlooked. While working on a production ERP or on the financial calculator app I developed, instant information delivery to the user was of critical importance. At these points, I had to think deeply about the reliability of push notifications. In real-world scenarios, settling for a simple "notification sent" status can lead to serious operational issues.

In this post, I will discuss the three most common misconceptions I encounter regarding push notification reliability and their tangible impacts. By diving into the technical depth, we will explore why these misconceptions arise and how we can build a more reliable push notification infrastructure in our systems. The goal is to produce practical solutions to real-world problems by understanding the core principles rather than relying on superficial fixes.

Misconception 1: Push Notification Services Are Always Up and Running

Most developers believe that platforms like Apple Push Notification Service (APNS) and Firebase Cloud Messaging (FCM) are always 100% operational. While this seems true at first glance, the operational reality is much more complex. Although these services keep their core infrastructures highly available, they can still experience internal outages or exhibit unexpected behaviors.

A few years ago, when I was working on a mobile app for a large Turkish e-commerce site, we experienced serious delays in announcing critical campaigns. Users were not getting notified about the campaigns, which directly resulted in lost revenue. There were no public outage reports on either the APNS or FCM side. However, we observed that a portion of the sent notifications either did not reach the target devices or arrived very late. This situation was caused by momentary capacity issues or routing errors within the services themselves, rather than a direct issue on our end.

# Example APNS delivery command (simplified)
curl --http2 --cacert cacert.pem \
     --cert aps_cer.pem:aps_key.pem \
     --data '{"aps":{"alert":"Yeni kampanya basladi!","sound":"default"}}' \
     --header "apns-topic: com.example.app" \
     --header "apns-priority: 10" \
     --header "apns-expiration: 0" \
     https://api.push.apple.com/3/device/DEVICE_TOKEN

In such cases, it was critical to understand that a "successful" response returned by APNS or FCM does not mean the notification actually reached the user. These responses indicate that the notification was accepted and queued for processing by the service, but they offer no guarantee of final delivery. This misconception can cause major issues, especially in systems that send high-volume and time-sensitive notifications.

Transient Issues of APNS and FCM

The assumption that APNS and FCM are constantly "working" is usually based on their massive-scale infrastructures. However, even these infrastructures can experience temporary performance degradation due to sudden traffic spikes, maintenance work, or rare underlying cloud provider issues. For example, during a holiday season or a major product launch, there can be a sudden surge in notification traffic. This situation can cause backlogs in the services' queues.

Once, in my self-developed Android spam blocker app, I set up a mechanism to block spam notifications that arrived heavily during a specific time window. This mechanism filtered out and directly deleted notifications that exceeded a certain threshold. However, one day, a heavy surge of real incoming notification traffic exceeded my threshold value, and legitimate notifications started getting filtered out too. This situation occurred because my assumption that APNS/FCM was working perfectly clashed with my own filtering logic.

// Example of a possible error response from APNS
{
  "reason": "BadDeviceToken",
  "error": true
}

Such cases show that there can be significant gaps between the moment a notification is sent and the moment it reaches the recipient. This gap can cause unacceptable delays in situations requiring immediate action (such as emergency alerts or critical workflow notifications). Therefore, we should not settle for just sending notifications; we must also build mechanisms to monitor delivery results and proactively detect potential issues.

Misconception 2: A Single Push Service Is Enough

Many mobile apps move forward using only one of APNS or FCM. This can be a reasonable approach, especially for early-stage projects. However, from the perspective of operational resilience and geo-redundancy, relying on a single service is a risky strategy. Temporary issues or regional access problems that any service might experience internally can affect your entire user base.

I experienced a similar situation while developing mobile notifications for the supply chain module of a manufacturing company's ERP system. We needed to provide operators with instant information about critical tasks or errors. Initially, we were using only FCM. One day, due to a network issue on FCM servers in a specific region, operators in that region could not access critical information. This situation caused a brief halt on the production line.

⚠️ Risks of Single Service Dependency

Relying on a single push notification service carries serious operational risks. Transient outages, maintenance work, or regional network issues within APNS or FCM can completely disable your app's notification capabilities. This is unacceptable, especially for time-sensitive notifications.

After this incident, we decided to migrate to an architecture that supports both APNS and FCM. Although this made the development process slightly more complex, it significantly increased our system's reliability in the long run. We started sending notifications by selecting the appropriate service based on our users' device platforms. This approach served as an important buffer against potential failures of a single service.

Multi-Service Architecture and Its Trade-offs

Supporting multiple push notification services naturally brings some additional costs and complexity. You need to handle issues like separate API integrations for each service, device token management, and tracking delivery results. However, this extra overhead is more than compensated for by the resilience and flexibility it provides. Especially in enterprise software development projects, this kind of redundancy often becomes a necessity.

In an app we developed for a bank's internal platform, users needed to be notified immediately about critical account activities. In this system, we used APNS and FCM together. We added a "device capability" service to select the correct service based on the device's operating system. This service determined which provider was more suitable using the information coming from the user's device.

# Simple Python example: Service selection based on device type
def get_push_service(device_info):
    if device_info['platform'] == 'ios':
        return APNSService(config='apns_config.json')
    elif device_info['platform'] == 'android':
        return FCMSender(config='fcm_config.json')
    else:
        return None

# Usage example
user_device = {'platform': 'ios', 'token': 'DEVICE_TOKEN_IOS'}
service = get_push_service(user_device)
if service:
    service.send_notification(user_device['token'], 'Kritik hesap hareketi!', {'type': 'alert'})

This multi-service approach prevented a potential issue with one service provider from affecting all of our users. For example, when a transient issue occurred on the APNS side, Android users continued to receive notifications. This was critical for business continuity. Of course, we had to manage token handling correctly for both services and aggressively monitor delivery results. However, this was much more manageable than the risk brought by a single service dependency.

Misconception 3: Delayed Notifications Do Not Matter

Many developers do not see delays of a few seconds or minutes as a problem, especially for notifications that are not time-critical. The idea that "the notification will arrive eventually" is common. However, this is a major misconception, especially regarding user experience and workflows. A delayed notification can sometimes be as bad as receiving no notification at all.

In a task management app I developed, I was using push notifications to send reminders to users. One day, a user reached out to me and asked, "The reminder arrived right when I was already doing the task; what's the point of it after my work is done?" This feedback showed how a simple delay can undermine the very purpose of a notification. Because the user did not receive the reminder on time, they were on the verge of abandoning my app.

# Example log line: Difference between notification send time and delivery acknowledgment time
2026-06-02 10:05:15 INFO [NotificationService] Sending push notification to user_id: 12345, type: reminder, payload: {"task_id": 987}
2026-06-02 10:18:30 INFO [NotificationService] Push notification for user_id: 12345 acknowledged by APNS/FCM
# The actual delivery time is not in this log, but according to user feedback, it was delayed by 13 minutes.

After this experience, I started paying much more attention to notification scheduling and prioritization mechanisms. I saw the difference between just sending a notification versus sending it at the right time, with the right content, and with the correct priority. This is vital, especially for apps that automate business processes or provide real-time information flow to the user.

Real Impacts of Notification Delays

The impacts of notification delays can go beyond simple user annoyance. In critical scenarios like financial transactions, security alerts, or emergency notifications, even a delay of a few seconds can lead to major losses or risks. For example, if a suspicious transaction notification in a banking app is delayed, it might be too late by the time the user notices the situation.

In a client project, we were developing a system to provide instant alerts for a cybersecurity incident response team. In this system, when a security breach was detected, a notification had to be sent to the relevant team within seconds. Initially, we used a simple queue system, and notifications sometimes took 5-10 minutes to arrive. This delay extended the response time and increased risks.

ℹ️ Timing and Prioritization

The timing and prioritization of push notifications are critical to user experience and the success of workflows. Delayed notifications can lose their purpose, lower user satisfaction, and even lead to major losses in critical situations. Therefore, you should not ignore timing and prioritization mechanisms when designing your notification delivery infrastructure.

To solve this issue, we migrated our notification delivery infrastructure to a lower-latency message queue (such as Redis Streams or Kafka) and started actively utilizing parameters like apns-priority and time_to_live provided by APNS/FCM. Thanks to these optimizations, we managed to reduce the delivery time of critical notifications to a few seconds. This not only shortened the response time but also increased the team's efficiency.

Causes of Delays and Solutions

There can be many different reasons for notification delays:

Network Latency: Network issues between the sender server and APNS/FCM servers.
Queue Congestion: Notifications waiting to be processed due to high load on APNS/FCM servers.
Device Status: The device being turned off, in airplane mode, or in low power mode.
App Permissions: The user having disabled notification permissions for the app.
Platform Limitations: Delivery limits or policies set by APNS/FCM themselves.
Issues on Our End: Performance issues, misconfigurations, or poor queue management on our own servers.

Each of these causes requires separate solutions. For example, for network latency, using geographically closer data centers or choosing regional endpoints offered by APNS/FCM can be beneficial. To reduce queue congestion, it is important to adjust the sending rate (throttling) and use higher priority for critical notifications.

# Example: Using time_to_live (TTL) in FCM (in seconds)
# This specifies how long the notification should be stored
# If the device is offline, the notification expires after this duration.
# This prevents stale notifications from reaching the user, but might not
# satisfy instant notification needs.
{
  "message": {
    "token": "DEVICE_TOKEN",
    "notification": {
      "title": "Acil Durum Uyarisi",
      "body": "Sistemde kritik bir hata tespit edildi."
    },
    "android": {
      "ttl": "3600s" # 1 hour
    },
    "apns": {
      "headers": {
        "apns-push-type": "alert",
        "apns-priority": "10" # High priority
      },
      "payload": {
        "aps": {
          "alert": {
            "title": "Acil Durum Uyarisi",
            "body": "Sistemde kritik bir hata tespit edildi."
          },
          "sound": "default"
        }
      }
    }
  }
}

Factors like user permissions and device status are generally out of our direct control. However, to manage these situations, it is important to inform the user within the app and provide an interface where they can easily manage permissions. Solving issues within our own infrastructure, on the other hand, comes down to establishing comprehensive monitoring and logging mechanisms. Debugging and performance optimization must be a continuous process.