Discussion on: Deep Dive: NewSQL Databases

View post

Hi Stephen,

Thanks for your great article on NewSQL databases!

I could not help but notice some wording in your article that could be misunderstood and hence would like to add some clarifications.

In the article there is a section that says:

For example, if I am trying to write to a table that has a billion rows of data, and it has foreign key to a table with a billion rows of data, the RDBMS has to inspect both tables prior to committing the write.

It makes it sound like that if I were to, e.g. insert a new row in a table A with a billion rows in it, which in turn has a foreign key to another table B with a billion rows in it, that the RDBMS has to go and inspect every single row of both tables before or while a COMMIT occurs. In the event that readers came to such a belief, I wanted to clarify that adding a row in such a big table is actually still a very lightweight operation for the RDMBS. The target table (table A) would not have to be inspected at all, all the RDBMS may have to check is probably a unique key index for the primary key, in case that a primary key has been specified. Given that such indexes in RDBMSs are commonly B-Tree data structures, i.e self-balancing, sorted data structures, such a lookup would, in fact, be very fast and only require a couple of I/Os. As to table B, the very same principle applies. In order to create a foreign key on table A, a primary key on table B has to be present to which the foreign key points to, equally with its own B-Tree unique key index. All that the RDBMS has to do in order to check whether the parent record exists is to check the primary key index of table B but never table B itself.

I have taken the liberty to verify my claim by creating two such 1 billion rows tables with a simple foreign key between them in an Oracle Database running inside a Docker container with 4 CPUs and 10GB of memory:

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> show parameter cpu
NAME                            TYPE    VALUE
------------------------------- ------- -----
cpu_count                       integer 4
cpu_min_count                   string  4
parallel_threads_per_cpu        integer 1
resource_manager_cpu_allocation integer 4

SQL> show parameter sga_target
NAME       TYPE        VALUE
---------- ----------- -----
sga_target big integer 10G

I called my target table A actually readings and my table B is called meters. Additionally, I have also added a couple of NOT NULL constraints which will have to be validated as well, just for fun really.

CREATE TABLE meters
(
   meter_id       NUMBER(10)    PRIMARY KEY,
   location       VARCHAR2(2)   NOT NULL, 
   install_date   DATE          NOT NULL
);

CREATE TABLE readings
(
   meter_id      NUMBER(10) NOT NULL,
   reading       NUMBER,
   reading_tms   DATE       NOT NULL, 
   FOREIGN KEY (meter_id)
      REFERENCES meters (meter_id)
);

As said, both of the tables contain 1 billion rows each:

SQL> select count(1) from readings;

  COUNT(1)
----------
1000000000

SQL> select count(1) from meters;

  COUNT(1)
----------
1000000000

And when inserting rows into readings the database performs that task in an instant:

SQL> insert into readings (meter_id, reading, reading_tms) values (123456789, 22.4, SYSDATE);

1 row inserted.

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |          |     1 |    17 |     1   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | READINGS |       |       |            |          |
-------------------------------------------------------------------------------------

Statistics
-----------------------------------------------------------
               1  CPU used by this session
               3  CPU used when call started
               3  DB time
               3  Requests to/from client
               2  enqueue releases
               5  enqueue requests
              30  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
              14  physical read total IO requests
               1  pinned cursors current
               1  recursive calls
              15  session logical reads
               4  user calls
Elapsed: 00:00:00.192

SQL> insert into readings (meter_id, reading, reading_tms) values (789, 22.4, SYSDATE);
1 row inserted.

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |          |     1 |    17 |     1   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | READINGS |       |       |            |          |
-------------------------------------------------------------------------------------

Statistics
-----------------------------------------------------------
               1  CPU used by this session
               1  CPU used when call started
               4  Requests to/from client
               1  enqueue releases
               1  enqueue requests
               7  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               3  physical read total IO requests
               1  pinned cursors current
              36  process last non-idle time
               1  recursive calls
              13  session logical reads
               4  user calls
Elapsed: 00:00:00.032
SQL> commit;

Commit complete.

Elapsed: 00:00:00.013

There are a couple of interesting insights here:

Neither INSERT statement took longer than a couple of milliseconds to execute
Just a few physical read total IO requests have occurred, 14 and 3 respectively, confirming that not both the entire tables had to be scanned
The second time the physical IO requests were significantly fewer than the first time and so was the execution time - I will come back to that in a bit
The meters table is nowhere to be seen in the execution plan
The database correctly identifies a 1-row insert operation of 17 bytes in total
The COMMIT also didn't take much longer than a couple of milliseconds, it was actually the fastest operation of all three of them

So while it may seem intuitive at first that the database has to scan all 2 billion rows in order to insert this new record, in fact, the database has to do very little to accomplish the operation.

I would also like to clarify that if readers came to the belief that the referential integrity is verified during/prior to the COMMIT that this is not the case, with Oracle Database anyway. It is, in fact, done during the INSERT operation itself and can quickly be validated by inserting a record with a non-existing parent record:

SQL> insert into readings (meter_id, reading, reading_tms) values (1000000001, 18.92, SYSDATE);
Error starting at line : 1 in command -
insert into readings (meter_id, reading, reading_tms) values (1000000001, 18.92, SYSDATE)
Error report -
ORA-02291: integrity constraint (TEST.SYS_C008780) violated - parent key not found

Elapsed: 00:00:00.030

Also in this case it can be observed that the foreign-key validation took only a couple of milliseconds.

Coming back to insight number 3 above and

People have tried to solve this by making RDBMS systems in-memory, ...

RDBMSs are actually "in-memory" for a long, long time. All common RDBMSs allocate memory (I have shown before that the memory of my Oracle Database, i.e. SGA_TARGET is set to 10GB) to cache data and many other things in memory. This explains why there was a drop in physical I/Os when executing a second INSERT statement. At that point in time, the database had already some of the B-Tree index blocks cached and could just look them up in memory rather than from disk. Caching also explains why the overall execution time dropped for the second INSERT operation.

Last I would like to add to this section:

As a one-off this is not a big deal, but if I am doing this with a 1,000 threads writing a 1,000 rows per second it begins to become problematic.

So far I have only performed 2 insert operations and indeed they have been quite fast but those two statements were just one-offs and done from a command-line prompt. However, given that the database actually doesn't have to scan 2 billion rows every time I insert a row, I took the liberty to run one last test. Now, I don't have a big server at my disposal to run 1000 threads on, only a compute instance in the cloud with 8 CPU cores where 4 of them are already given to the database. Instead, I just wrote an endless loop in a little Java program that just keeps inserting data into the readings table, just to see how the database performs with a continuous load. The Java program keeps count of how many INSERT operations per second it has performed:

Gerald Term1: java -jar load-1.0.jar
0 inserts/sec
1701 inserts/sec
3019 inserts/sec
3232 inserts/sec
3281 inserts/sec
3308 inserts/sec
3450 inserts/sec
3479 inserts/sec
3470 inserts/sec
3463 inserts/sec
3317 inserts/sec
3248 inserts/sec
3380 inserts/sec
3340 inserts/sec
3090 inserts/sec
3163 inserts/sec
3159 inserts/sec
3171 inserts/sec
3266 inserts/sec
3477 inserts/sec
3534 inserts/sec
3393 inserts/sec
3315 inserts/sec
3406 inserts/sec
^CShutting down ...
Gerald Term1:

The results are actually quite impressive. The Java program has, for 21 consecutive seconds, inserted more than 3000 rows per second! That means that the database did more than 3 inserts per millisecond with just 4 CPUs and 10GB of RAM on this 1 billion+ rows table!

I can confirm that by querying the 1 billion+ table as well (note the two 1-row inserts from earlier on):

SQL> SELECT TO_CHAR(reading_tms, 'YYYY-MM-DD HH24:MI:SS'), count(1)
  2    FROM readings
  3      WHERE reading_tms >= TO_DATE('2020-07-14 00', 'YYYY-MM-DD HH24')
  4       GROUP BY TO_CHAR(reading_tms, 'YYYY-MM-DD HH24:MI:SS')
  5        ORDER BY 1;

TO_CHAR(READING_TMS   COUNT(1)
------------------- ----------
2020-07-14 00:00:56          1
2020-07-14 00:01:32          1
2020-07-14 00:41:25       1702
2020-07-14 00:41:26       3019
2020-07-14 00:41:27       3232
2020-07-14 00:41:28       3281
2020-07-14 00:41:29       3308
2020-07-14 00:41:30       3450
2020-07-14 00:41:31       3479
2020-07-14 00:41:32       3470
2020-07-14 00:41:33       3463
2020-07-14 00:41:34       3317
2020-07-14 00:41:35       3248
2020-07-14 00:41:36       3380
2020-07-14 00:41:37       3340
2020-07-14 00:41:38       3090
2020-07-14 00:41:39       3163
2020-07-14 00:41:40       3159
2020-07-14 00:41:41       3170
2020-07-14 00:41:42       3267
2020-07-14 00:41:43       3477
2020-07-14 00:41:44       3534
2020-07-14 00:41:45       3393
2020-07-14 00:41:46       3315
2020-07-14 00:41:47       3406
2020-07-14 00:41:48       2965

Of course, at this stage I could go on and start a second instance of that Java program and test the upper limits of that particular database, however, that's probably a topic for a post on its own. :)

I hope that I have given readers a better understanding of how an RDBMS actually performs referential integrity checks well, at least Oracle Database, and of how fast these are done even on bigger tables.

Thanks once again!