Mustafa ERBAY

Posted on May 16 • Originally published at mustafaerbay.com.tr

Offline-First Synchronization Strategies in Mobile Applications

#tutorials #mobiledevelopment #synchronization #offlinefirst

The Importance of Data Synchronization in Mobile Applications

Mobile applications are no longer just tools we use when we're online. Users expect full performance from their apps, even in environments with weak or no internet connectivity. This is where offline-first synchronization strategies in mobile applications come into play. While developing the mobile interface for a production ERP system, ensuring data consistency for situations where the internet was unavailable in the field was critical. Scenarios like these directly impact user experience and determine the reliability of the application.

The offline-first architecture advocates for data to be primarily stored on the device and then synchronized with the server when appropriate. This not only improves user experience but also reduces the risk of data loss. This approach is indispensable, especially for critical data such as financial transactions, inventory tracking, or field service reports. Implementing these strategies correctly will enhance your application's value in the eyes of your users.

Core Offline-First Synchronization Models

There are several fundamental ways to implement an offline-first architecture. The model you choose will depend on the complexity of your application, its data structure, and how frequently and up-to-date synchronization needs to be. We generally encounter two main models:

One-Way Synchronization (Client-to-Server): In this model, changes made on the device are sent to the server, but changes on the server are not pulled back to the device or are fetched manually at specific intervals. This can work for simple task lists or note-taking apps where data flows in a single direction. However, it's insufficient for most modern mobile applications.
Two-Way Synchronization (Bi-directional Synchronization): This is the most common and complex model. Both device and server data are synchronized with each other. This is ideal for situations where multiple devices access the same data, or where data can be edited both online and offline. Consider the data flow between operator screens on a production line and the central ERP system; two-way synchronization is essential here.

Each of these models has its own advantages and disadvantages. While one-way synchronization offers a simpler structure, two-way synchronization brings additional challenges, such as managing data conflicts.

One-Way Synchronization: Simplicity and Speed

One-way synchronization, as the name suggests, refers to a scenario where data moves in a single direction. In its simplest form, changes you make on your mobile device are only transmitted to the server. For example, when you mark a task as complete in a mobile app, this information is sent to the server. When there's an update in the server's database, it doesn't automatically reflect on the mobile device.

The biggest advantage of this approach is its simplicity. The application development process becomes less complex because you don't need to manage data conflict scenarios. Also, since you're only sending data to the server, network traffic can be lower. You could consider a scenario like developing an Android spam blocker app where you only save the blocked numbers list to the device and use a central blacklist in read-only mode. However, this might not be sufficient to ensure data consistency.

ℹ️ Limitations of One-Way Synchronization

One-way synchronization falls short, especially in situations where users can update the same data from multiple devices or both online and offline. In this model, server updates are not reflected on devices, which can lead to data inconsistency.

When a data change occurs on the server side, an additional mechanism might be needed for the mobile device to receive this change. This often involves periodically querying the server or waiting for the user to manually issue a "synchronize" command. This situation may be unacceptable for applications requiring real-time data currency.

Two-Way Synchronization: Complexity and Consistency

Two-way synchronization is the most needed and frequently encountered scenario in mobile applications. In this model, both device and server data are brought into alignment. When a user updates a record on their mobile device, this change is sent to the server. Simultaneously, changes made on the server by another user or a background process are also synchronized to the mobile device.

The biggest challenge with this approach is managing data conflicts. The question of what to do when there are conflicting changes made to the same data from two different places becomes critically important. For example, you marked an order's status as "Completed" from the mobile app, and at the same time, you changed it to "Cancelled" from the web interface. In this case, a decision must be made about which change is valid.

Data Conflict Resolution Strategies

Various strategies exist for managing conflicts:

Last Write Wins (LWW): One of the simplest strategies. It looks at which change was made later and considers that change valid. The disadvantage of this strategy is that one change can completely overwrite another. In a production planning module, operator data being overwritten by the plan from the central system can lead to undesirable outcomes. Timestamps form the basis of this strategy.
First Write Wins (FWW): The reverse of LWW. It looks at which change was made first and considers that change valid. This can lead to data loss, just like LWW.
Coexistence / Ask the User: In this strategy, the system does not make an automatic decision in case of a conflict. Instead, it presents both changes to the user and asks them to decide which one is valid. This can complicate the user experience but is one of the safest ways for data integrity. In a cost calculator, asking the user about a conflict between a manually entered value and a system-calculated value might be logical.
Conflict Resolution Logic: You can develop custom solutions based on your application's business logic. For example, an "cancel" operation on an order status might always take precedence over a "complete" operation, or in an inventory count, a manually entered count value might be considered more important than the system's stock.

Conflict management is the most critical and challenging part of two-way synchronization. While databases like PostgreSQL use MVCC (Multi-Version Concurrency Control) mechanisms to help manage these situations, you may also need to develop additional logic at the application level.

⚠️ Conflict Management Requires Attention

Conflict management is not just a technical problem but also a business logic problem. The strategy chosen should be determined by the application's use case and the criticality of the data. An incorrect choice can lead to data loss or inconsistency.

Synchronization Mechanisms and Data Models

There are fundamental data models and mechanisms that support offline-first synchronization. These determine how data is stored, how updates are tracked, and how communication with the server occurs.

Local Database Selection

Local databases are commonly used to store data on mobile devices. These databases must be able to hold the information necessary for synchronization (e.g., when changes were made, which data changed).

SQLite: The most widely used local database on mobile platforms. It is lightweight, fast, and has cross-platform support. However, it does not directly support complex synchronization logic, thus requiring an additional layer.
Realm: A database specifically designed for mobile devices. It offers real-time synchronization capabilities and may have more advanced features than SQLite. Its own "Realm Sync" service simplifies two-way synchronization.
Others: NoSQL-based solutions like Couchbase Lite and PouchDB can also be preferred for offline-first and synchronization needs. They are advantageous, especially when dealing with large datasets and flexible schemas.

In a customer project I worked on, for the mobile interface of a production ERP, we used SQLite as the data model. For each table, we added timestamp columns like last_modified and a sync_status field (pending, synced, failed) to track changes. This allowed us to monitor whether the data was being sent to the server.

Change Tracking

At the core of synchronization is knowing which data has changed. Several methods are used for this:

Timestamp-Based Tracking: Each record has a created_at and updated_at timestamp. During synchronization, the client, knowing the last synchronization time, asks the server for all records changed since that time. The server, using the same logic, sends the changes. While this method is simple, it can lead to conflicts due to time zone issues or clock synchronization errors.
Version Numbers: Each record has a version number. The version number increments with each update. This provides a more precise ordering for strategies like LWW or FWW. System columns like xmin in PostgreSQL can also contain such versioning information.
Change Logs / Journals: A separate table or structure is used to record all changes made at the database or application level. These records detail which data changed, when, and how. This is the most comprehensive tracking method and supports complex conflict resolution.

For example, in a spam blocking app I developed for Android, I used a simple timestamp-based approach for updating the blocked numbers list. However, this caused minor data synchronization issues on some devices due to differing device clocks. In such cases, version numbers or a more advanced change logging system would offer a more robust solution.

💡 Synchronization with Change Data Capture (CDC)

For more advanced scenarios, using Change Data Capture (CDC) mechanisms at the database level can be an effective way to track changes. PostgreSQL's Logical Replication or tools like Debezium can capture the stream of changes in the database, automating the synchronization process.

Synchronization Protocols and Network Communication

How data is transmitted between the device and the server is crucial for the efficiency and reliability of synchronization.

HTTP/REST vs. WebSocket

HTTP and WebSocket are the most commonly used communication protocols for synchronization.

HTTP/REST: Uses the traditional request-response model. The mobile device sends a request to the server (e.g., "send me changes from the last hour"), and the server responds to this request. This is suitable for simple scenarios but may require constant server polling, which increases battery consumption and uses network resources inefficiently. A few years ago, while developing the backend for an e-commerce site, we used a REST API for data synchronization between the mobile app and the server. However, the need for constant data fetching led to performance issues, especially on low-bandwidth networks.
WebSocket: Opens a persistent two-way communication channel. This allows the server to send data to the client at any time. The mobile app keeps the connection open and receives updates from the server in real-time. This can be more efficient in terms of battery consumption and offers a better user experience. Using WebSocket for real-time stock updates for the operator screens in a manufacturing company's ERP made a significant performance difference.

When porting a custom financial calculator to a mobile app, I used a WebSocket-based API for real-time currency exchange rate updates. This allowed users to see instant changes in exchange rates seamlessly.

Data Compression and Batch Operations

Data compression and batch operations are critical for reducing network traffic and increasing synchronization speed.

Data Compression: Compression algorithms like Gzip can be used to reduce the size of the transmitted data. This significantly reduces data usage and synchronization time, especially on mobile networks.
Batch Operations: Sending multiple changes in a single request instead of individual records reduces the overhead of network communication. For example, sending 100 changes in one batch request to the server is much more efficient than sending 100 separate requests.

In a customer project, we worked on optimizing the batch size to improve the application's synchronization performance. Initially, we used batches of 10 records, but our tests showed that batches of 50-100 records offered a more balanced performance in terms of network traffic and server load.

ℹ️ Optimizations for Efficiency

When designing synchronization mechanisms, it's important to consider not only functionality but also efficiency. Data compression, intelligent batch sizes, and choosing appropriate communication protocols directly impact user experience.

Security and Authorization

In offline-first synchronization, security is as important as data consistency. Even when devices are offline, data must be protected against unauthorized access.

Local Data Encryption

Encrypting data stored on the mobile device is essential to ensure data privacy in case the device is physically compromised. Most mobile operating systems offer full-disk encryption, but adding additional encryption layers at the application level for sensitive data is also good practice.

For instance, if you are storing users' personal information in a financial calculator app, you must encrypt this information using a strong encryption algorithm like AES before saving it to the local database.

Authentication and Authorization

When synchronizing with the server, the client's identity must be verified, and it must be ensured that it can only access authorized data.

Token-Based Authentication (JWT, OAuth2): A token obtained after the user logs in is used for all subsequent requests. This token carries the user's identity and permissions. The server verifies the incoming token and processes the request.
API Keys: By using an API key for the application itself, the application's identity can be verified on the server side. However, this is not sufficient for user-based authorization.

In a customer project, we set up a JWT-based authentication system for the mobile application's backend. When users logged in, they received an access token, which was then used in synchronization requests. We easily implemented this token verification mechanism using FastAPI.

⚠️ Security Principles Are Indispensable

Offline-first architecture means data is stored locally on the device. Therefore, encrypting local data and using secure authentication methods during server communication are critical steps for data security.

Real-World Scenarios and Application Examples

Let's look at a few real-world scenarios to make this theoretical knowledge concrete.

Scenario 1: Field Service Application

A field service technician is at a customer's site fixing a fault. There is no internet connection. The technician uses the mobile app to log the fault, list the work performed, and record spare parts used. The app saves all this data to the device's local database. When the technician finishes their work and returns to the office, or when an internet connection is established, the app automatically synchronizes the local data with the server. In this scenario, two-way synchronization and conflict management (if multiple technicians update the same customer record simultaneously) are important.

Scenario 2: E-Commerce Mobile App

A user adds items to their shopping cart. The internet connection is lost. The user continues to make changes to the cart. When the connection is restored, the cart information is synchronized with the server. In this scenario, one-way or two-way synchronization may be sufficient, but LWW or more advanced conflict resolution strategies are important to ensure that the user's changes are not lost.

Scenario 3: Production Tracking System

An operator on a production line enters the status of the machine they are using and the number of parts produced via a mobile terminal. The system must transfer this information to the central ERP system almost instantly. However, the network connection in the operator's location may not be stable. In this case, reliable offline storage and fast, two-way synchronization mechanisms are required to prevent data loss and provide near real-time tracking. In this project, I saw that we managed synchronization services with systemd units and that PostgreSQL's WAL (Write-Ahead Logging) mechanism ensured data durability.

💡 A Snippet from My Own Experience

While developing a production ERP, the offline working capability for mobile operator screens was critical. Initially, we used a simple synchronization mechanism, but connection issues encountered in the field led to data loss. Later, we overcame this problem with a more robust two-way synchronization mechanism, either a more advanced solution like Realm or one we developed ourselves. This experience taught me that offline-first synchronization is not just a feature but the cornerstone of user experience.

Conclusion and Next Steps

Offline-first synchronization strategies in mobile applications are indispensable for maximizing user experience in today's connected world. With the right data model, effective conflict management, secure communication protocols, and robust security measures, you can provide your users with a seamless experience.

When implementing these strategies, it is important to choose the most suitable solution by considering your project's specific needs. While one-way synchronization may suffice for simple applications, two-way synchronization and advanced conflict resolution mechanisms are inevitable for situations requiring more complex data interactions.

As a next step, analyze your application's data flow and decide which synchronization model is most appropriate for you. Then, begin building the necessary technical infrastructure to bring your chosen model to life.

DEV Community