Franck Pachot

Posted on Mar 29

Consistency boundaries in SQL databases vs. MongoDB

#oracle #mongodb #database #consistency

Two common myths surround database consistency:

Myth 1: Only SQL databases provide strong consistency. In fact, MongoDB—a NoSQL database—supports ACID transactions with strong consistency, even in sharded clusters, with clear serializable boundaries: the document.
Myth 2: SQL consistency is simple and fully abstracted, so you can treat views like tables and run transactions as if race-condition anomalies never occur. In reality, no RDBMS fully achieves this ideal, so you must understand each system’s consistency boundaries (column?, scanned rows?, returned rowset?, range?, partition?, table?).

With MongoDB, the consistency boundaries follow domain driven design aggregates: concurrent writes to the same document conflict, even if they effectively read or write different fields. This guarantees that the database cannot break the business invariants set by the application. I've detailed an example in Serializable Transactions in MongoDB: The Doctor's On-Call Shift example. Document-level operations provide stronger integrity constraints, since schema validation can cover the entire aggregate, including one-to-many relationships. SQL databases need assertions to provide the same guarantees, but these are rarely implemented in RDBMSs, so only a limited subset of integrity constraints are available, like foreign keys.

Relational databases and SQL are meant to hide locking internals from developers, and full support for the serializable isolation level would make this feasible. In practice, though, serializable is often incomplete or hurts scalability, so developers use weaker isolation levels and must consider what is locked. They commonly add explicit locking, like SELECT FOR UPDATE, to prevent race-condition anomalies.

Before looking at an example, ask yourself whether you understand the consistency boundaries of your databases. Updating a single column often locks the entire row of that column. Some databases lock at a finer granularity (for example, YugabyteDB’s column-level locking), while others may lock more via lock escalation.

Normalization also affects this by spreading a logical aggregate—which in Domain-Driven Design corresponds to a consistency boundary—across multiple tables and rows. If you run a SELECT ... FOR UPDATE with a JOIN, do you expect all the tables to be locked? The FOR UPDATE clause’s OF option lets PostgreSQL specify tables and Oracle Database specify columns, but in both cases, the underlying effect is to lock all the rows involved.

Defaults may be inconsistent. SELECT FOR UPDATE without OF locks all joined rows, but UPDATE on a join behaves differently as we will demonstrate below. Column-level locking also raises another issue without joins: check constraints must see a consistent state, even for columns that were not updated (see the now-fixed YugabyteDB issue: https://github.com/yugabyte/yugabyte-db/issues/15196).

SQL is an abstraction where the tables referenced in queries can also be views. In Oracle Database, join views can be updatable, which raises an important question about consistency: when you update a single column from one underlying table through a join view, should locks be taken on both underlying tables, or only on the one that’s actually being updated?

Demonstration

I'll use Oracle Database for this demonstration because it’s the only one I know that supports updates on joins. Those who say “this wouldn’t happen on PostgreSQL” are right that PostgreSQL has a true serializable isolation level, but updatable views are more limited and would not allow to build such an example. There’s no single best database—only different ones with different trade-offs. Expertise means understanding those differences.

Here is an example. I create two tables to store the employees salary and commission separately, and build a view on top of them to expose it as one:

create table emp_salary ( 
  empno      number primary key,
  salary     number
);

create table emp_commission ( 
  empno      number primary key,
  commission number
);

insert into emp_salary     values (99, 1000);
insert into emp_commission values (99,  100);
commit;

create view emp as 
 select * from emp_salary 
 join emp_commission 
 using ( empno )
;

This view is updateable and exposes all information per employee:

21:44:07 SQL> select * from emp;

   EMPNO    SALARY    COMMISSION
________ _________ _____________
      99     1,000           100

Now, an instruction is sent to the Human Resources department to add 42 to this employee, and the HR agent can choose to add it either to the salary or to the commission.

Alice, in HR, starts a transaction (using the serializable isolation level, which does not change the behavior here because in Oracle it only achieves Snapshot Isolation). She checks the employee’s salary and commission and decides to add 42 to the commission. As an extra safety measure, to ensure the operation is idempotent, she adds a condition on the salary and commission so that the update occurs only if neither has already been increased:

21:43:58 SQL> set transaction isolation level serializable;

Transaction ISOLATION succeeded.

21:44:07 SQL> select * from emp;

   EMPNO    SALARY    COMMISSION
________ _________ _____________
      99     1,000           100

21:44:11 SQL> update emp
  2   set commission=commission+42
  3   where empno=99
  4   and salary=1000 and commission=100
  5* ;

1 row updated.

21:44:19 SQL>

Alice didn't commit yet, and her colleague, Bob, in another office got the same instruction. He does the same, but decides to add 42 to the salary:

21:43:58 SQL> set transaction isolation level serializable;

Transaction ISOLATION succeeded.

21:44:26 SQL> select * from emp;

   EMPNO    SALARY    COMMISSION
________ _________ _____________
      99     1,000           100

21:44:29 SQL> update emp
  2   set salary=salary+42
  3   where empno=99
  4   and salary=1000 and commission=100
  5* ;

1 row updated.

21:44:37 SQL>

As both check the current state (salary=1000 and commission=100) only one should actually update a row. However, nothing blocks, and nothing fails. Alice is able to commit:

21:44:19 SQL> commit;

Commit complete.

21:44:45 SQL> select * from emp;

   EMPNO    SALARY    COMMISSION
________ _________ _____________
      99     1,000           142

Bob is also allowed to commit:

21:44:37 SQL> commit;

Commit complete.

21:44:50 SQL> select * from emp;

   EMPNO    SALARY    COMMISSION
________ _________ _____________
      99     1,042           142

The employee incorrectly received a raise of 42 in both salary and commission because the database failed to prevent a race condition.

Explanation

This is a Multi-Version Concurrency Control (MVCC) anomaly: the WHERE clause was evaluated on a consistent snapshot, stale, but the UPDATE was applied to the current state, and the resulting conflict went undetected. You can reproduce this in db<>fiddle by using an autonomous transaction to simulate a second user.

Neither Bob or Alice are responsible for this error because both of them checked the initial salary and commission in the same statement that updated one of them. It's the database that lied to them, or rather didn't expose that it was not one consistnt state. Users are not supposed to know that they query a view where data is physically stored in different tables and that the database locked only one of the two rows that appeared as one after the join.

Solutions

Write to lock. For safety, users should update both columns, even when values do not change, so that write conflicts protect against concurrent updates. Hibernate does this by default, updating all mapped columns. It is less efficient, but this consistency boundary is one reason dynamic update is not the default. Object–Relational Mappers (ORMs) and document databases such as MongoDB not only map domain model entities but also help ensure their consistency when those entities are mapped to a relational model that splits aggregates behind the abstraction layer.

Explicit locking. Another option is to add FOR UPDATE to a first SELECT in the transaction. Without an OF clause, it locks all rows in the join. Bob’s session then waits, sees Alice’s changes, and skips its own update. However, unless this is wrapped in a stored procedure, it adds an extra database round trip and holds locks across context switches and network calls, which harms scalability.

Avoid abstraction. A third solution is to avoid using views on joins, as they create an ambiguous consistency boundary where the user can’t tell what is locked. Still, the user could update on a join and experience the same:

update (
  select * from emp_salary join emp_commission using ( empno )
) set salary=salary+42
  where empno=99 and salary=1000 and commission=100
;

You might then ask how JSON-relational Duality views work, since they are also updatable views over multiple tables. The following example is similar to a join:

CREATE OR REPLACE JSON RELATIONAL DUALITY VIEW emp_dv AS
  emp_salary @insert @update @delete
  {
    _id      : empno,
    salary     : salary @update,
    commission : emp_commission @update
    @link ( from: empno, to: empno ) {
     empno : empno
     commission: commission
    }
  }
/

If I try to update different fields concurrently, the conflict is detected:

UPDATE emp_dv
  SET data = JSON_TRANSFORM(data, 
             SET '$.commission.commission' = 10)
;

declare
 -- using autonomous transaction to simulate a concurrent transaction 
 pragma autonomous_transaction;
begin
 UPDATE emp_dv
  SET data = JSON_TRANSFORM(data, 
             SET '$.salary' = 100)
;
 commit;
end;
/

declare
*
ERROR at line 1:
ORA-40896: Cannot update JSON Relational Duality View 'EMP_DV': Concurrent modification detected to document with ID 'FB03C16400'.
ORA-06512: at line 5
Help: https://docs.oracle.com/error-help/db/ora-40896/

It fails, so you must implement retry logic (MongoDB would automatically retry here). JSON-Relational Duality Views use ETAG to detect conflicts on the same document rather than using row locks.

You may also wonder how SQL assertions work, since they require consistency across multiple rows. They use a different locking mechanism to ensure a consistent state. I’ve included an example in the next post.

Conclusion

Relational databases aim to hide physical storage and locking so developers don’t have to manage them directly. But to achieve good performance, and avoid consistency anomalies, developers still need to understand locking boundaries. Not all RDBMSs support updatable join views (SQL-92 only specifies updatability for single-table views), though all provide explicit locking mechanisms such as SELECT FOR UPDATE. Row-store databases usually lock at the row level, but views may aggregate multiple rows without the developer knowing it.

By contrast, MongoDB, as a NoSQL database, exposes a simpler model: the document is the consistency boundary, like the aggregate in Domain-Driven Design. Developers update documents, not logical views, and write consistency applies to the entire document rather than a portion of it.

DEV Community

Consistency boundaries in SQL databases vs. MongoDB

Demonstration

Explanation

Solutions

Conclusion

Top comments (0)