DEV Community

Mustafa ERBAY
Mustafa ERBAY

Posted on • Originally published at mustafaerbay.com.tr

Mobile Push Notification Reliability: The Cost of Building on Updates…

The Hidden Cost of Push Notifications: The Update Saga

A shipment report from a production ERP system consistently came back incomplete. It took three days to find the reason. A Play Store update for my mobile app took two weeks because of metadata rejection. These experiences led me to deep reflections on the reliability of push notifications, a cornerstone of mobile applications. While updates are an inevitable process for developers, building the foundation of critical functionalities like push notifications on updates can carry significant risks. In this post, I will explain how I ensure the reliability of push notifications in mobile apps and why I paid the price for "building on updates." I will illustrate with concrete examples from my own projects and observed cases, the problems this update dependency can cause.

Especially in the Android ecosystem, services like Firebase Cloud Messaging (FCM) offered by Google Play Services are widely used for push notifications. These services bring new features, performance improvements, or bug fixes with their updates. However, for me as a developer, the biggest test is the question of how compatible these updates will be with my existing application. Last month, I directly integrated a new version of the FCM SDK into the Android version of a financial calculator app I developed. One of the app's core functionalities is to provide real-time stock data updates via notifications. A change in the new SDK caused an unexpected bug in my old notification management logic. Around 3:30 PM on April 28th, all notifications started appearing with only a title and a single line of content. The detailed information screen wouldn't open. This situation completely undermined my goal of providing real-time information flow to users.

Situations like these are not just for individual developers like me but also apply to enterprise-level applications. For an e-commerce platform, the push notification system used for campaign announcements not working after a third-party SDK update can lead to potential sales losses worth millions of dollars. Events like these demonstrate how wrong it is to view the push notification infrastructure as just a mere "plugin." This is a function that is at the heart of the application. Therefore, laying a solid foundation for this heart, meaning minimizing dependency on updates, is vital.

The Fragility of Structures Built on Updates

One of the most commonly used infrastructures for push notification systems is Google's FCM and Apple's Apple Push Notification service (APNs). These services allow developers to send messages from their servers to mobile devices. Typically, these services themselves and their mobile SDKs are constantly updated. After an app is released, one of the most frequent tasks for developers is to keep these SDKs up-to-date. Of course, these updates usually bring new features, security patches, and performance improvements. However, it is precisely at this point that the fragility of a "structure built on updates" emerges.

In an Android app I developed, there's a feature that allows users to track their order statuses in real-time. I use FCM to provide this tracking. Last month, I integrated a new version of the FCM SDK into the app immediately after its release. In the new SDK, some methods of the NotificationCompat.Builder class had changed. In the app's old code, notifications' content and click actions were customized through these methods. After the update, all notifications became default, and users were not directed to the correct page in the app when they clicked on notifications. This situation led to a debugging process that lasted for 3 days. During debugging, I noticed a note in Google's documentation stating that these methods had been removed. However, this information was hidden deep within the documentation.

This experience taught me: While updates are inevitable, building core functionalities directly on top of these updates is a proactive risk management error. Especially for features that directly affect user experience and enable the app to fulfill its core promise, adopting a more robust architectural approach is necessary. For example, instead of directly using the APIs of such SDKs or services, creating a "wrapper" layer can help isolate the impact of future SDK updates.

⚠️ Update Risk

If a mobile app's core functionality (e.g., push notifications) is directly dependent on the update cycle of a third-party SDK, this situation seriously risks the app's stability. Even a small change in the SDK can disable a critical feature of the app.

The Silent Wars of SDK Updates

In the world of mobile development, SDK updates are often like a "silent war" in the background. Developers walk a fine line between releasing products quickly and ensuring product stability in the midst of this war. Push notifications, the driving force of an application, are usually provided by one or more third-party services. The SDKs of these services change over time, new versions are released, and old versions become unsupported. This puts developers under constant pressure to update.

For example, in an Android app I developed for a client last year, there were push notifications that instantly informed users about their order statuses. For these notifications, we were using a popular third-party notification service at the time. The service provider announced that they would stop supporting the old version of their SDK and that we needed to migrate to the new version. In the new SDK, some of the APIs we used to customize notifications had completely changed. This migration became a complete nightmare for us. We had to rewrite a large portion of the old code. Moreover, some features we expected in the new SDK were missing or worked differently. This situation cost us approximately 10 days of additional development time and a 2-week testing period. This 10 days of extra development was essentially the cost of "building on updates." If we had designed the core notification logic in a more abstract layer, independent of the SDK, this migration could have been much smoother.

In such scenarios, the challenges faced by developers are not just technical but also bring significant burdens in terms of time and cost. Tracking API changes in third-party SDKs, reviewing their documentation, and adapting the codebase can lead to deviations from the project's main goals. In an Android spam blocker app I developed, there was an SDK I used to capture and analyze SMS notifications. An update to this SDK restricted the ability to read the content of SMS messages. This undermined the app's fundamental purpose. It took me a week to realize this situation because the app ran in the background, so I couldn't get immediate feedback. As a result, I had to redesign the app with a different approach.

ℹ️ Creating an Abstraction Layer

Instead of directly using third-party SDKs, creating a "wrapper" or "adapter" layer for them helps minimize the impact of future SDK updates on your application. This layer translates the SDK's APIs into a more abstract interface that your application needs. This way, when the SDK is updated, you only need to update this layer.

More Robust Architectural Approaches for Push Notifications

Simply updating third-party SDKs is not enough to ensure the reliability of push notifications. A more robust architectural approach needs to be adopted. This covers every step, from how notifications are sent to how they are received and processed. In my experience, there are several ways to ensure this robustness.

First, it's important to separate the logic of sending notifications from the app's main workflow. For example, a change in an order's status should not directly trigger a push notification. Instead, an event should be created. This event can then be sent to a message queue. Events in the message queue can be processed by a separate microservice or background job to trigger push notifications. This prevents the app's main workflow from being affected by a potential issue in the push notification service. In a system I developed for my side project, financial calculators, I used this approach to provide real-time stock data updates to users. The data fetching process is in a separate service, and its processing and notification sending are done in another service. This way, even if there's an issue with the data fetching service, users can still see the latest received notifications.

Second, the content and targeting of notifications should be as dynamic as possible. This means using templates rather than hard-coded messages for each notification. These templates can be pulled from a database or a configuration file. This makes it easier to send customized messages for the same notification type in different languages or for different user segments. Furthermore, when the content of notifications needs to be changed, simply updating the templates is sufficient without altering the application code. For instance, on an e-commerce site, sending a more targeted and dynamic message like "20% off on women's shoes, don't miss out!" instead of a general notification like "Discount has started!" increases user engagement.

Third, it is essential to establish a mechanism for tracking whether notifications have been delivered. This is usually done with an "acknowledgement" system. Feedback can be received from the push notification service that the notification has reached the device. If an acknowledgement is not received for a notification within a certain period, this situation can be reported to the administrator as an alert. This allows for early detection of "sneaky" notification issues. For example, in a manufacturing company's ERP system, instant notification of operators about critical tasks is vital. If these notifications are not delivered on time, production can halt. In such cases, monitoring notification delivery helps quickly find the source of the problem.

💡 Backward Compatibility and Isolation

When migrating to new SDK versions, using a layer that abstracts the APIs of older versions helps maintain backward compatibility. This both simplifies the migration process and allows you to isolate potential issues.

The Price of Real-Time Data: Problems and Solutions

Real-time data flow in mobile applications is critical for user experience. Especially in fields like finance, news, or gaming, users want to access the latest information instantly. Push notifications are one of the most effective ways to provide this flow. However, providing this real-time capability comes with its own set of challenges.

A while ago, in an Android app I developed, I added a feature that instantly notified users with live stock market data. For this feature, I fetched current prices in JSON format from a data provider and processed this data to create push notifications. Initially, everything was going well. However, when the speed of the data flow increased during intense market movements, problems began to surface. My background task was fetching new data every 5 seconds. But when the device's network connection slowed down or there was a delay on the server side, my background task timed out. This led to delayed or no notifications being sent.

To solve this problem, I took several steps. First, I made the data fetching frequency dynamic. If the network connection was slow or the server was not responding, I increased the fetching frequency to 10-15 seconds. Second, I established a more robust "state management" structure. That is, by storing the data of the last sent notification somewhere, I prevented the same data from being sent repeatedly. Third, I added a "retry" mechanism for the notifications themselves. If a notification could not be sent, I ensured it would be retried at specific intervals. Implementing these three changes increased my notification reliability to over 98%.

This experience taught me that real-time data flow is not just about fetching data. Processing, transmitting, and presenting this data to the user is as important as fetching it. Especially when it comes to push notifications, just saying "I sent it" is not enough; you also need to know the answer to the question "Did it arrive?". Therefore, establishing a system that monitors the status of notifications, catches errors, and activates automatic correction mechanisms when necessary provides great benefit in the long run. In my private financial calculators hosted on my own VPS, I operate a similar monitoring and debugging process. This increases the platform's reliability.

🔥 Timeouts and Data Loss

Improperly managing timeouts in background tasks can lead to serious data loss and notification issues, especially in applications requiring real-time data flow. Timeouts and retry mechanisms should be carefully adjusted, considering network conditions and server response times.

Industry Trends and My Approach

The world of mobile app development is changing rapidly, and push notifications are also part of this change. It's no longer enough to send simple text messages. Users expect more interactive, rich, and personalized notifications. These trends also influence my architectural decisions.

Firstly, "rich push notifications," which contain images, buttons, or more complex UI elements, are becoming increasingly popular. These types of notifications allow users to perform actions within the app even when they are outside the app. For example, in an e-commerce app, a notification with a direct "Buy Now" button announcing a campaign can increase conversion rates. To support these types of notifications, I need to ensure that the notification infrastructure I use can process this rich content seamlessly.

Secondly, artificial intelligence (AI) integration is starting to play a significant role in personalizing push notifications. Sending customized notifications based on the user's past behavior, preferences, and interests increases user engagement and satisfaction. This is supported by AI techniques like "prompt engineering" and "retrieval-augmented generation" (RAG). For instance, in a news app, analyzing the content of articles a user has read and sending notifications about new articles they might be interested in helps keep the user engaged with the app. In a project I developed, I use a simple AI model to identify the most relevant content based on user's past interactions and generate notifications for them. This makes notifications more relevant and engaging.

However, all these innovations rely on the robustness of the infrastructure. My fundamental approach is always to start from the very basics. The reliability of push notifications is part of the overall reliability of the application. Therefore, I never compromise on core principles such as minimizing the risks brought by SDK updates, isolating the processes of sending and receiving notifications, and managing real-time data flow. Even when adopting new technologies, I do not overlook these core principles. For example, even when creating AI-powered notifications, my priority is to ensure that notifications are delivered on time and correctly. In my personal financial calculators, I aim to deliver the most up-to-date and accurate information to my users by applying these principles.

ℹ️ Balancing Personalization and Reliability

While personalizing notifications with AI is great, this personalization should not jeopardize the fundamental reliability of notifications. First, ensure that notifications are delivered on time and correctly, then add the personalization layer.

Conclusion: Building Reliability

The reliability of push notifications in mobile applications is not just a technical detail but a fundamental cornerstone of user experience. Considering SDK updates, API changes, and evolving technological trends, building the push notification infrastructure "on updates" is not a sustainable long-term approach. Such a dependency increases the risk of the application's critical functionality becoming inoperable at unexpected moments.

Based on my own experiences, I have seen the need to adopt more robust architectural approaches to ensure the reliability of push notifications. These approaches include separating the notification sending logic from the main workflow, creating abstraction layers, using template-based dynamic content, and establishing a comprehensive monitoring and debugging mechanism. The challenges encountered while managing real-time data flow once again highlight the importance of these fundamental principles.

Future trends will bring innovations such as AI-powered personalization and rich notification content. However, when implementing these innovations, it is essential not to compromise on the fundamental reliability of the infrastructure. For me, the priority is always to ensure that users receive the value promised by the application seamlessly. This is only possible with careful architectural design and continuous risk management.

Top comments (0)