DEV Community

rajat
rajat

Posted on

Inconsistency Resolution in Distributed Systems: Versioning

Modern distributed databases such as Amazon DynamoDB and Apache Cassandra replicate data across multiple servers to improve scalability, fault tolerance, and availability.

However, replication introduces a major challenge: data inconsistency during concurrent updates.

To address this problem, distributed systems use a technique called versioning.

In this article, we’ll explore why versioning is needed, how it works, and how it helps resolve conflicts in eventually consistent systems.


Why Versioning Is Needed

In an eventually consistent system, updates do not reach all replicas at the same time due to network delays or partitions.

Consider a system with three replicas:

N = 3
Replicas: s0, s1, s2
Enter fullscreen mode Exit fullscreen mode

Two clients update the same key at the same time:

Client A → put(key1 = 10)
Client B → put(key1 = 20)
Enter fullscreen mode Exit fullscreen mode

Because of network delays, the replicas may temporarily store different values:

s0 = 10
s1 = 20
s2 = 10
Enter fullscreen mode Exit fullscreen mode

Now the system contains conflicting values for the same key.

This situation is called a write conflict.


What Versioning Does

Versioning allows the system to track the history of writes.

Each update is assigned a version identifier.

Example:

(key1, value = 10, version = v1)
(key1, value = 20, version = v2)
Enter fullscreen mode Exit fullscreen mode

Instead of immediately overwriting data, the system may temporarily store multiple versions of the same value.

This helps the system determine:

  • Which write happened earlier
  • Whether two writes occurred concurrently

Two Possible Scenarios

1. One Version Is Newer

v1 → value = 10
v2 → value = 20
Enter fullscreen mode Exit fullscreen mode

If the system detects that v2 happened after v1, then the newer version replaces the older one.

v2 replaces v1
Enter fullscreen mode Exit fullscreen mode

In this case, no conflict occurs because the system knows the correct order of writes.


2. Concurrent Versions

Sometimes the system cannot determine the order of writes.

Example:

v1 → value = 10
v2 → value = 20
Enter fullscreen mode Exit fullscreen mode

These updates are considered concurrent.

Instead of choosing one automatically, the database may return both versions:

key1:
 value1 = 10
 value2 = 20
Enter fullscreen mode Exit fullscreen mode

Now the client application must resolve the conflict.


Client Reconciliation

The application decides how to resolve conflicting values.

Common strategies include:

1. Last Write Wins

The system selects the value with the latest timestamp.

Latest timestamp → chosen value
Enter fullscreen mode Exit fullscreen mode

2. Merging Values

Sometimes values can be combined instead of overwritten.

Example in a shopping cart system:

cart1 = {apple}
cart2 = {banana}
Enter fullscreen mode Exit fullscreen mode

Merged result:

cart = {apple, banana}
Enter fullscreen mode Exit fullscreen mode

This approach ensures that no data is lost.


3. Application Logic

The application can apply custom business rules to determine the correct value.

For example:

  • Priority-based updates
  • User-based conflict resolution
  • Domain-specific merge logic

Versioning with Vector Clocks

Some distributed databases use vector clocks to track the history of updates.

A vector clock stores a list of node counters representing update history.

Example:

v1 = {s0:1}
v2 = {s1:1}
Enter fullscreen mode Exit fullscreen mode

If two versions have different histories, the system treats them as concurrent updates.

Vector clocks help databases automatically detect conflicts without relying only on timestamps.


Key Takeaways

Versioning allows distributed databases to:

  • Detect conflicting updates
  • Temporarily store multiple versions of data
  • Resolve conflicts through reconciliation strategies

This approach enables eventual consistency while keeping the system highly available and scalable.


Final Thoughts

Versioning is a fundamental technique in distributed systems. It allows modern databases to maintain consistency without blocking writes, which is critical for systems operating at large scale.

By tracking the history of updates and allowing conflict resolution later, versioning ensures that distributed systems remain reliable, flexible, and highly performant.

Top comments (1)

Collapse
 
pierre profile image
👨‍💻Pierre-Henry ✨ • Edited

Always a great read Rajat! This one hits different after a prod incident 😅