Franck Pachot

Posted on Nov 24

What writeConcern: {w: 1} really means? Isolation and Durability

#mongodb #document #database #acid

In MongoDB, a write concern of w:1 indicates that a write operation is considered successful once the primary node acknowledges it, without waiting for the data to be replicated to secondary nodes. While this reduces latency, it also introduces the risk that if the primary fails before replication occurs, the written data could be lost. In replica sets with multiple voters, such writes can be rolled back if a failure happens before a majority acknowledges the change.

This is not the default setting. Most clusters (Primary-Secondary-Secondary) use an implicit w:majority write concern, which ensures durability in the event of a zone failure. The implicit default write concern is w:1 only when an arbiter is present (Primary-Secondary-Arbiter) or when the topology lowers the number of data-bearing voters.

For performance reasons, you may sometimes write with w:1. However, it's important to understand the consequences this setting might have in certain failure scenarios. To clarify, here is an example.

I started a three-node replica set using Docker:

docker network create lab

docker run -d --network lab --name m1 --hostname m1 mongo --bind_ip_all --replSet rs

docker run -d --network lab --name m2 --hostname m2 mongo --bind_ip_all --replSet rs

docker run -d --network lab --name m3 --hostname m3 mongo --bind_ip_all --replSet rs

docker exec -it m1 mongosh --host m1 --eval '
 rs.initiate( {_id: "rs", members: [
  {_id: 0, priority: 3, host: "m1:27017"},
  {_id: 1, priority: 2, host: "m2:27017"},
  {_id: 2, priority: 1, host: "m3:27017"}]
 });
'

I create a collection with one "old" document:

docker exec -it m1 mongosh --host m1 --eval '
  db.myCollection.drop();
  db.myCollection.insertOne(
   {name: "old"},
   {writeConcern: {w: "majority", wtimeout: 15000}}
);
'

I checked that the document is there:

docker exec -it m1 mongosh --host m1 --eval 'db.myCollection.find()'

[ { _id: ObjectId('691df945727482ee30fa3350'), name: 'old' } ]

I disconnected two nodes, so I no longer had a majority. However, I quickly inserted a new document before the primary stepped down and became a secondary that would not accept new writes:

docker network disconnect lab m2

docker network disconnect lab m3

docker exec -it m1 mongosh --host m1 --eval '
  db.myCollection.insertOne(
   {name: "new"},
   {writeConcern: {w: "1", wtimeout: 15000}}
);
'

Note the use of writeConcern: {w: "1"} to explicitly reduce consistency. Without this setting, the default is "majority". In that case, the write operation would have waited until a timeout, allowing the application to recognize that durability could not be guaranteed and that the write was unsuccessful.

With writeConcern: {w: 1}, the operation was acknowledged and the data became visible:

docker exec -it m1 mongosh --host m1 --eval 'db.myCollection.find()'

[
  { _id: ObjectId('691df945727482ee30fa3350'), name: 'old' },
  { _id: ObjectId('691dfa0ff09d463d36fa3350'), name: 'new' }
]

Keep in mind that this is visible when using the default 'local' read concern, but not when using 'majority':

docker exec -it m1 mongosh --host m1 --eval '
db.myCollection.find().readConcern("majority")
'

[
  { _id: ObjectId('691df945727482ee30fa3350'), name: 'old' }
]

I checked the Oplog to confirm that the idempotent version of my change was present:

docker exec -it m1 mongosh --host m1 local --eval '
  db.oplog.rs
    .find({ns:"test.myCollection"},{op:1, o:1, t:1})
    .sort({ ts: -1 });
'

[
  {
    op: 'i',
    o: { _id: ObjectId('691dfa0ff09d463d36fa3350'), name: 'new' },
    t: Long('1')
  },
  {
    op: 'i',
    o: { _id: ObjectId('691df945727482ee30fa3350'), name: 'old' },
    t: Long('1')
  }
]

The primary node accepted w:1 writes only briefly, during the interval between losing quorum and stepping down. Afterwards, it automatically switches to SECONDARY, and since no quorum is present, there is no PRIMARY. This state can persist for some time:

docker exec -it m1 mongosh --host m1 --eval '
 rs.status().members
'

[
  {
    _id: 0,
    name: 'm1:27017',
    health: 1,
    state: 2,
    stateStr: 'SECONDARY',
    uptime: 1172,
    optime: { ts: Timestamp({ t: 1763572239, i: 1 }), t: Long('1') },
    optimeDate: ISODate('2025-11-19T17:10:39.000Z'),
    optimeWritten: { ts: Timestamp({ t: 1763572239, i: 1 }), t: Long('1') },
    optimeWrittenDate: ISODate('2025-11-19T17:10:39.000Z'),
    lastAppliedWallTime: ISODate('2025-11-19T17:10:39.685Z'),
    lastDurableWallTime: ISODate('2025-11-19T17:10:39.685Z'),
    lastWrittenWallTime: ISODate('2025-11-19T17:10:39.685Z'),
    syncSourceHost: '',
    syncSourceId: -1,
    infoMessage: '',
    configVersion: 1,
    configTerm: 1,
    self: true,
    lastHeartbeatMessage: ''
  },
  {
    _id: 1,
    name: 'm2:27017',
    health: 0,
    state: 8,
    stateStr: '(not reachable/healthy)',
    uptime: 0,
    optime: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeDurable: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeWritten: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeDate: ISODate('1970-01-01T00:00:00.000Z'),
    optimeDurableDate: ISODate('1970-01-01T00:00:00.000Z'),
    optimeWrittenDate: ISODate('1970-01-01T00:00:00.000Z'),
    lastAppliedWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastDurableWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastWrittenWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastHeartbeat: ISODate('2025-11-19T17:26:03.626Z'),
    lastHeartbeatRecv: ISODate('2025-11-19T17:10:37.153Z'),
    pingMs: Long('0'),
    lastHeartbeatMessage: 'Error connecting to m2:27017 :: caused by :: Could not find address for m2:27017: SocketException: onInvoke :: caused by :: Host not found (authoritative)',
    syncSourceHost: '',
    syncSourceId: -1,
    infoMessage: '',
    configVersion: 1,
    configTerm: 1
  },
  {
    _id: 2,
    name: 'm3:27017',
    health: 0,
    state: 8,
    stateStr: '(not reachable/healthy)',
    uptime: 0,
    optime: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeDurable: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeWritten: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeDate: ISODate('1970-01-01T00:00:00.000Z'),
    optimeDurableDate: ISODate('1970-01-01T00:00:00.000Z'),
    optimeWrittenDate: ISODate('1970-01-01T00:00:00.000Z'),
    lastAppliedWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastDurableWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastWrittenWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastHeartbeat: ISODate('2025-11-19T17:26:03.202Z'),
    lastHeartbeatRecv: ISODate('2025-11-19T17:10:37.153Z'),
    pingMs: Long('0'),
    lastHeartbeatMessage: 'Error connecting to m3:27017 :: caused by :: Could not find address for m3:27017: SocketException: onInvoke :: caused by :: Host not found (authoritative)',
    syncSourceHost: '',
    syncSourceId: -1,
    infoMessage: '',
    configVersion: 1,
    configTerm: 1
  }
]

When there is no primary, no further writes are accepted—even if you set writeConcern: {w: "1"}:

docker exec -it m1 mongosh --host m1 --eval '
  db.myCollection.insertOne(
   {name: "new"},
   {writeConcern: {w: "1", wtimeout: 15000}}
);
'

MongoServerError: not primary

The system may remain in this state for some time. When at least one sync replica comes back online, it will pull the Oplog and synchronize the write to the quorum, making the acknowledged write durable.

Using writeConcern: {w: "1"} boosts performance, as the primary doesn't wait for acknowledgments from other nodes. This write concern tolerates a single node failure since the quorum remains, and can even withstand another brief failure. However, if a failure persists, additional writes aren't accepted, reducing the risk of unacknowledged writes. Usually, when a node recovers, it synchronizes via the Oplog, and the primary resumes accepting writes.

In the common scenario where brief, transient failures may occur, using writeConcern: {w: "1"} means the database remains available if the failure is just a momentary glitch. However, the point here is to illustrate the worst-case scenario. If one node accepts a write that is not acknowledged by any other node, and this node fails before any others recover, that write may be lost.

To illustrate this possible scenario, I first disconnected this node and then proceeded to connect the remaining ones:

docker network disconnect lab m1
docker network    connect lab m2
docker network    connect lab m3

In this worst-case scenario, a new quorum is formed with a state that predates when the write could be synchronized to the replicas. However, progress continues because a new primary is established:

docker exec -it m2 mongosh --host m2 --eval '
 rs.status().members
'

> '
[
  {
    _id: 0,
    name: 'm1:27017',
    health: 0,
    state: 8,
    stateStr: '(not reachable/healthy)',
    uptime: 0,
    optime: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeDurable: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeWritten: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    optimeDate: ISODate('1970-01-01T00:00:00.000Z'),
    optimeDurableDate: ISODate('1970-01-01T00:00:00.000Z'),
    optimeWrittenDate: ISODate('1970-01-01T00:00:00.000Z'),
    lastAppliedWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastDurableWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastWrittenWallTime: ISODate('2025-11-19T17:10:34.194Z'),
    lastHeartbeat: ISODate('2025-11-19T17:39:02.913Z'),
    lastHeartbeatRecv: ISODate('2025-11-19T17:10:38.153Z'),
    pingMs: Long('0'),
    lastHeartbeatMessage: 'Error connecting to m1:27017 :: caused by :: Could not find address for m1:27017: SocketException: onInvoke :: caused by :: Host not found (authoritative)',
    syncSourceHost: '',
    syncSourceId: -1,
    infoMessage: '',
    configVersion: 1,
    configTerm: 1
  },
  {
    _id: 1,
    name: 'm2:27017',
    health: 1,
    state: 1,
    stateStr: 'PRIMARY',
    uptime: 1952,
    optime: { ts: Timestamp({ t: 1763573936, i: 1 }), t: Long('2') },
    optimeDate: ISODate('2025-11-19T17:38:56.000Z'),
    optimeWritten: { ts: Timestamp({ t: 1763573936, i: 1 }), t: Long('2') },
    optimeWrittenDate: ISODate('2025-11-19T17:38:56.000Z'),
    lastAppliedWallTime: ISODate('2025-11-19T17:38:56.678Z'),
    lastDurableWallTime: ISODate('2025-11-19T17:38:56.678Z'),
    lastWrittenWallTime: ISODate('2025-11-19T17:38:56.678Z'),
    syncSourceHost: '',
    syncSourceId: -1,
    infoMessage: 'Could not find member to sync from',
    electionTime: Timestamp({ t: 1763573886, i: 1 }),
    electionDate: ISODate('2025-11-19T17:38:06.000Z'),
    configVersion: 1,
    configTerm: 2,
    self: true,
    lastHeartbeatMessage: ''
  },
  {
    _id: 2,
    name: 'm3:27017',
    health: 1,
    state: 2,
    stateStr: 'SECONDARY',
    uptime: 58,
    optime: { ts: Timestamp({ t: 1763573936, i: 1 }), t: Long('2') },
    optimeDurable: { ts: Timestamp({ t: 1763573936, i: 1 }), t: Long('2') },
    optimeWritten: { ts: Timestamp({ t: 1763573936, i: 1 }), t: Long('2') },
    optimeDate: ISODate('2025-11-19T17:38:56.000Z'),
    optimeDurableDate: ISODate('2025-11-19T17:38:56.000Z'),
    optimeWrittenDate: ISODate('2025-11-19T17:38:56.000Z'),
    lastAppliedWallTime: ISODate('2025-11-19T17:38:56.678Z'),
    lastDurableWallTime: ISODate('2025-11-19T17:38:56.678Z'),
    lastWrittenWallTime: ISODate('2025-11-19T17:38:56.678Z'),
    lastHeartbeat: ISODate('2025-11-19T17:39:02.679Z'),
    lastHeartbeatRecv: ISODate('2025-11-19T17:39:01.178Z'),
    pingMs: Long('0'),
    lastHeartbeatMessage: '',
    syncSourceHost: 'm2:27017',
    syncSourceId: 1,
    infoMessage: '',
    configVersion: 1,
    configTerm: 2
  }
]

This replica set has a primary and is accepting new writes with a new Raft term (configTerm: 2). However, during recovery, it ignored a pending write from the previous term (configTerm: 1) that originated from an unreachable node.

A write made with w:1 after the quorum was lost but before the primary stepped down was lost:

docker exec -it m2 mongosh --host m2 --eval '
  db.myCollection.find()
'

[ { _id: ObjectId('691df945727482ee30fa3350'), name: 'old' } ]

After reconnecting the first node, it enters recovery mode and synchronizes with the other nodes, all of which are on term 2:

docker network    connect lab m1

docker exec -it m1 mongosh --host m1 --eval '
 db.myCollection.find()
'

MongoServerError: Oplog collection reads are not allowed while in the rollback or startup state.

The rollback process employs the 'Recover To A Timestamp' algorithm to restore the node to the highest majority-committed point. While rolling back, the node transitions to the ROLLBACK state, suspends user operations, finds the common point with the sync source, and recovers to the stable timestamp.

After recovery, changes made in term 1 that did not receive quorum acknowledgment are truncated from the Oplog. This behavior is an extension to the standard Raft algorithm:

docker exec -it m1 mongosh --host m1 --eval '
 rs.status().members
'

[
  {
    op: 'i',
    o: { _id: ObjectId('691df945727482ee30fa3350'), name: 'old' },
    t: Long('1')
  }
]

A w:1 write that was visible at one point, and acknlowledged to the client, but never actually committed to the quorum, has now disappeared:

docker exec -it m1 mongosh --host m1 --eval '
  db.myCollection.find()
'

[ { _id: ObjectId('691df945727482ee30fa3350'), name: 'old' } ]

With writeConcern: {w: 1}, the developer must be aware that such issue can arise if a write occurs immediately after quorum is lost and the primary fails before other nodes recover.

While SQL databases typically abstract physical concerns such as persistence and replication, MongoDB shifts more responsibility to developers. By default, acknowledged writes are considered durable only once a majority of nodes confirm they are synced to disk.

In some cases, strict write guarantees are unnecessary and can be relaxed for improved performance. Developers can adjust the write concern to suit their application's needs. When using writeConcern: {w: 1}, this affects two aspects of ACID:

Durability: If there is a failure impacting both the primary and replicas, and only replicas recover, writes not acknowledged by replicas may be rolled back—similar to PostgreSQL's synchronous_commit = local.
Isolation: Reads with 'local' concern might not see writes confirmed to the client until these are acknowledged by a majority. There is no PostgreSQL equivalent to MongoDB's 'majority' read concern (MVCC visibility tracking what was applied on the replicas).

Although writeConcern: {w: 1} is sometimes described as permitting 'dirty reads', this term is misleading as it is also used as a synonym of 'read uncommitted' in relational databases. In SQL databases with a single read-write instance, 'uncommitted read' refers to single-server isolation (the I in ACID). However, with writeConcern: {w: 1} and a 'majority' read concern, uncommitted reads do not occur and only committed changes are visible to other sessions. The real challenge involves durability (the D in ACID) in the context of a replica set. With traditional SQL databases replication, writes might be visible before all peers (replica, WAL, application) have fully acknowledged them, since there's no single atomic operation covering all. MongoDB's w:1 is similar, and calling it a 'dirty read' is useful to highlight the implications for developers.

DEV Community

What writeConcern: {w: 1} really means? Isolation and Durability

Top comments (0)