In distributed databases, multiple servers replicate the same data to improve availability and scalability. However, concurrent updates to the same data item can lead to conflicts because different replicas may process writes independently.
To detect and resolve these conflicts, distributed systems often use vector clocks.
A vector clock helps determine whether two versions of data are:
- Ordered (one happened after another)
- Concurrent (a conflict exists)
What Is a Vector Clock?
A vector clock is a collection of [server, version] pairs associated with a data item.
It tracks which server modified the data and how many updates that server has made.
A vector clock can be represented as:
D([S1, v1], [S2, v2], …, [Sn, vn])
Where:
- D → data item
- Si → server ID
- vi → version counter for that server
When a server writes a data item:
- If
[Si, vi]already exists → incrementvi - Otherwise → create a new entry
[Si, 1]
Example: How Vector Clocks Work
Let’s walk through the example shown in the diagram.
1. First Write
A client writes data item D1.
The write is handled by server Sx.
Vector clock becomes:
D1([Sx, 1])
This means server Sx has processed the first version of the data.
2. Second Update
Another client reads D1, updates it, and writes it back.
The write is again handled by Sx.
The counter for Sx increments:
D2([Sx, 2])
This version descends from D1, so it overwrites the previous version.
3. Update by Another Server
A client reads D2, modifies it, and the write is handled by Sy.
Vector clock becomes:
D3([Sx, 2], [Sy, 1])
This indicates:
- Two updates occurred on Sx
- One update occurred on Sy
4. Concurrent Update
Another client also reads D2, modifies it, and writes through Sz.
Vector clock becomes:
D4([Sx, 2], [Sz, 1])
Now the system has two different versions:
D3([Sx,2], [Sy,1])
D4([Sx,2], [Sz,1])
These updates occurred independently, creating a conflict.
5. Conflict Resolution
When a client reads both D3 and D4, it detects a conflict.
This happened because D2 was modified by both Sy and Sz.
The client reconciles the conflict and writes a new version handled by Sx.
Vector clock becomes:
D5([Sx, 3], [Sy, 1], [Sz, 1])
This new version incorporates both updates.
Detecting Conflicts Using Vector Clocks
Vector clocks allow systems to determine relationships between versions.
Ancestor Relationship (No Conflict)
Version X is an ancestor of Y if every counter in Y is greater than or equal to X.
Example:
D([s0,1], [s1,1])
D([s0,1], [s1,2])
Since every counter in the second version is greater or equal, the first version preceded the second.
Therefore:
No conflict exists.
Sibling Relationship (Conflict)
Two versions are siblings if neither dominates the other.
Example:
D([s0,1], [s1,2])
D([s0,2], [s1,1])
Here:
- First version has higher s1
- Second version has higher s0
Because neither version fully dominates the other, they are concurrent.
Therefore:
A conflict exists.
Downsides of Vector Clocks
Although vector clocks are powerful, they have some drawbacks.
1. Client Complexity
Clients must implement conflict resolution logic, which increases application complexity.
2. Growing Vector Size
The number of [server, version] pairs may grow quickly as more servers update the data.
To control this, systems often:
- Set a maximum vector size
- Remove the oldest entries
However, removing entries can make it harder to determine exact ancestor relationships.
Final Thoughts
Vector clocks are a fundamental technique used in distributed systems to detect causal relationships between updates.
They help distributed databases:
- Detect concurrent writes
- Identify conflicting versions
- Allow applications to reconcile conflicts safely
Systems such as Amazon DynamoDB introduced vector clocks to maintain eventual consistency while preserving high availability.
Despite their complexity, vector clocks remain one of the most effective mechanisms for conflict detection in distributed databases.

Top comments (0)