DEV Community

rajat
rajat

Posted on

How Vector Clocks Work in Distributed Systems

In distributed databases, multiple servers replicate the same data to improve availability and scalability. However, concurrent updates to the same data item can lead to conflicts because different replicas may process writes independently.

To detect and resolve these conflicts, distributed systems often use vector clocks.

A vector clock helps determine whether two versions of data are:

  • Ordered (one happened after another)
  • Concurrent (a conflict exists)

What Is a Vector Clock?

A vector clock is a collection of [server, version] pairs associated with a data item.

It tracks which server modified the data and how many updates that server has made.

A vector clock can be represented as:

D([S1, v1], [S2, v2], …, [Sn, vn])
Enter fullscreen mode Exit fullscreen mode

Where:

  • D → data item
  • Si → server ID
  • vi → version counter for that server

When a server writes a data item:

  • If [Si, vi] already exists → increment vi
  • Otherwise → create a new entry [Si, 1]

Example: How Vector Clocks Work

Let’s walk through the example shown in the diagram.


1. First Write

A client writes data item D1.

The write is handled by server Sx.

Vector clock becomes:

D1([Sx, 1])
Enter fullscreen mode Exit fullscreen mode

This means server Sx has processed the first version of the data.


2. Second Update

Another client reads D1, updates it, and writes it back.

The write is again handled by Sx.

The counter for Sx increments:

D2([Sx, 2])
Enter fullscreen mode Exit fullscreen mode

This version descends from D1, so it overwrites the previous version.


3. Update by Another Server

A client reads D2, modifies it, and the write is handled by Sy.

Vector clock becomes:

D3([Sx, 2], [Sy, 1])
Enter fullscreen mode Exit fullscreen mode

This indicates:

  • Two updates occurred on Sx
  • One update occurred on Sy

4. Concurrent Update

Another client also reads D2, modifies it, and writes through Sz.

Vector clock becomes:

D4([Sx, 2], [Sz, 1])
Enter fullscreen mode Exit fullscreen mode

Now the system has two different versions:

D3([Sx,2], [Sy,1])
D4([Sx,2], [Sz,1])
Enter fullscreen mode Exit fullscreen mode

These updates occurred independently, creating a conflict.


5. Conflict Resolution

When a client reads both D3 and D4, it detects a conflict.

This happened because D2 was modified by both Sy and Sz.

The client reconciles the conflict and writes a new version handled by Sx.

Vector clock becomes:

D5([Sx, 3], [Sy, 1], [Sz, 1])
Enter fullscreen mode Exit fullscreen mode

This new version incorporates both updates.


Detecting Conflicts Using Vector Clocks

Vector clocks allow systems to determine relationships between versions.


Ancestor Relationship (No Conflict)

Version X is an ancestor of Y if every counter in Y is greater than or equal to X.

Example:

D([s0,1], [s1,1])
D([s0,1], [s1,2])
Enter fullscreen mode Exit fullscreen mode

Since every counter in the second version is greater or equal, the first version preceded the second.

Therefore:

No conflict exists.


Sibling Relationship (Conflict)

Two versions are siblings if neither dominates the other.

Example:

D([s0,1], [s1,2])
D([s0,2], [s1,1])
Enter fullscreen mode Exit fullscreen mode

Here:

  • First version has higher s1
  • Second version has higher s0

Because neither version fully dominates the other, they are concurrent.

Therefore:

A conflict exists.


Downsides of Vector Clocks

Although vector clocks are powerful, they have some drawbacks.

1. Client Complexity

Clients must implement conflict resolution logic, which increases application complexity.


2. Growing Vector Size

The number of [server, version] pairs may grow quickly as more servers update the data.

To control this, systems often:

  • Set a maximum vector size
  • Remove the oldest entries

However, removing entries can make it harder to determine exact ancestor relationships.


Final Thoughts

Vector clocks are a fundamental technique used in distributed systems to detect causal relationships between updates.

They help distributed databases:

  • Detect concurrent writes
  • Identify conflicting versions
  • Allow applications to reconcile conflicts safely

Systems such as Amazon DynamoDB introduced vector clocks to maintain eventual consistency while preserving high availability.

Despite their complexity, vector clocks remain one of the most effective mechanisms for conflict detection in distributed databases.

Top comments (0)