Stephen Goldberg for HarperDB

Posted on Jul 10, 2020

Deep Dive: NewSQL Databases

#database #sql #discuss #architecture

Overview

One of my colleagues, @margo_hdb, recently posted a great article Database Architectures & Use Cases - Explained here on dev.to. In response a user asked for a deeper dive on NewSQL databases, so I thought I would put one together.

The term NewSQL was coined in 2011 by 451 Group analyst Matthew Aslett.

Wikipedia defines the term below:

“ NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.
Many enterprise systems that handle high-profile data (e.g., financial and order processing systems) are too large for conventional relational databases, but have transactional and consistency requirements that are not practical for NoSQL systems. The only options previously available for these organizations were to either purchase more powerful computers or to develop custom middleware that distributes requests over conventional DBMS. Both approaches feature high infrastructure costs and/or development costs. NewSQL systems attempt to reconcile the conflicts.”

The “NewSQL” Confusion

The problem in my mind is that the term is so new, and not widely adopted, that what is defined as a NewSQL database can be really confusing. DB-Engines, doesn’t even list NewSQL databases as a category. I understand why, because I think until the term is fully cemented, listing it would create more confusion than clarity.

Here is one list I found of “NewSQL” databases

Amazon Aurora
Apache Trafodion
HarperDB
Altibase
c-treeACE
CLustrix DB
CockroachDB
MemSQL
NuoDB
VoltDB
Google Spanner

I obviously, and selfishly like that it includes HarperDB, and I think a lot of the players mentioned deserve to be on that list, but Amazon Aurora is by no means a NewSQL database. It is basically just an RDBMS like Postgres and MySQL.

The Storage Engine

Why do I think that Aurora should not be on the list? Because to me, what makes a “NewSQL” database a “NewSQL” database is the underlying storage engine. I don’t think you would find Postgres or MongoDB, two of the most popular databases in the world, on a list of NewSQL databases. That’s because while they are both capable of supporting SQL (columns and rows) and NoSQL (documents/json), their underlying storage engines are not designed to fulfill the definition of “NewSQL” above.

Wikipedia defines a “Storage Engine” as follows:

“A database engine (or storage engine) is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application programming interface (API) that allows the user to interact with their underlying engine without going through the user interface of the DBMS.”

If you remove the element of time from the innovation lifecycle of a product and compress it from beginning to end into one moment, most products at a high-level are building very similar things. They may build them in a different order and in different ways, but they are trying to accomodate the same market feedback. For example if you look at the first automobiles on the road they were wildly different, but over time began to look pretty similar with different high-level features that distinguished them.

As a result almost all databases will eventually accommodate SQL, NoSQL, GraphQL, etc… These are really just interfaces for how you interact with the storage engine of a product. Kind of like all cars eventually having seat-belts, four wheels, gas pedals.

Most people look at the following categories when evaluating a database:

What interfaces does it have? (REST, JDBC, ODBC, CLI, Web-Sockets, etc…)
Is it ACID compliant?
Does it support SQL and/or NoSQL?
What is the cost?
How is the performance?
What are the security features?
Can it scale?
Is it easy to use?
How popular is it?
How long will it take to migrate?
How long will it take for my dev team to get up to speed?
What does the ecosystem look like?

These are all really valid points, but what’s not on that list is the underlying storage engine. Obviously some very sophisticated data experts will look at that, but the vast majority of people do not. What most folks do not realize is that the underlying storage engine impacts nearly all of those questions.

Additionally, once a database has picked an underlying storage engine it is a nearly Herculean effort to migrate to another. MongoDB for example, realized the need to provide SQL, and instead of changing their underlying storage engine which could never accommodate SQL in any performant manner, appended to their storage engine which created a host of issues which I have outlined in this article: Multimodel Databases- A Mistake.

For an example with HarperDB, it took us almost a year to migrate from one storage engine (file system) to another (LMDB) and we were only able to do that because we are not nearly as widely used as MongoDB. In contrast, for us to add in Geospatial search capability into HarperDB took only a few weeks.

So to use my analogy above, automobile manufactures can start from scratch when building a new car. When you buy a new car, the legacy technology in your previous car has no impact on the new car. They may want to keep a consistent user experience across the two cars, but each new car can be redesigned from the ground up entirely without the old car causing issues.

The same is not true of databases. If you upgrade from AcmeDB 1.0 with a document store to AcmeDB 2.0 with a columnar store your data is going to look VASTLY different. You could have data corruption, loss of features, new cost, down-time, migration time, security issues, etc. Whereas if you buy car 1 with a gas engine and switch to car 2 with an electric engine, it has no impact on car 2.

That is why IMHO, in order to categorize something as a NewSQL database, the underlying storage engine has to have been specifically designed for or able to accommodate the definition of NewSQL in terms of scale, flexibility, and analytical capability.

So what makes it a “NewSQL” Database?

Databases are typically designed for specific use cases. Timescale for example was purpose built to solve time-series analysis needs that are challenging in other DBMS systems, i.e. analyzing data over time. It’s really good at it. That said, no one would use Timescale to power a simple web page when a document store like MongoDB could do it much easier, or even Postgres which Timescale sits on top of would be a better choice. If however you try to do time-series analysis in MongoDB, good luck to you, and please make sure you have a budget the size of a small country’s GDP.

So when we look at what defines a “NewSQL” database we see a few key things:

“…a class of relational database management systems that seek to provide the scalability of NoSQL systems… “
“….while maintaining the ACID guarantees…”
“Many enterprise systems … are too large for conventional relational databases, but … are not practical for NoSQL systems.”
“The only options previously available for these organizations were to either purchase more powerful computers”
“…or to develop custom middleware…”
“Both approaches feature high infrastructure costs and/or development costs. NewSQL systems attempt to reconcile the conflicts.”

Thus a NewSQL system must accommodate the following:

Scale with the level and flexibility of a NoSQL system
Be ACID compliant
Be cost effective and efficient
Not solve the challenge through massive computing needs
Not require middleware either internalized or externalized

So let’s dig deeper. The reason I droned on so much about the storage engine is because they are all impacted by them significantly.

Document Stores, what are more commonly called NoSQL databases, are fantastic at write scale. They are also highly flexible because they are not enforcing schema, and as a result simply store a new version of your record for every transaction. The scale comes from the fact that unlike a relational database, they don’t need to update other tables on write, check data consistency, guarantee a transaction, etc… Basically each write is independent, has nothing to do with other rows of data, and just creates a new document each time. Typically one attribute, the hash, is indexed by default, and if you want to add additional indexes then they actually create copies of these records. You can then search by those other attributes; however, you are searching a copy of that record, not the original. This means you have data duplication, eventual consistency, which drives up your compute costs, slows down writes, and creates a lot of overhead.

This means though that they can’t do performant joins, they are not ACID compliant, and it’s challenging to do highly scalable analytics. Thus, they cannot be NewSQL databases. Again some document stores try to achieve this, but read my other blog to understand why that’s a bad idea.

RDBMS systems on the other hand, struggle with high-scale data ingestion, but are fantastic at read-scale. They struggle with high-scale data ingestion because they are capable of maintaining relationships of data, they have a lot of background processes to guarantee transactions and consistency, and each write might affect numerous rows of data causing row-locking etc… This causes a ton of overhead on write, whereas Document Stores have the overhead on read. It can also lead to crashes on high-scale writes. For example, if I am trying to write to a table that has a billion rows of data, and it has foreign key to a table with a billion rows of data, the RDBMS has to inspect both tables prior to committing the write. As a one-off this is not a big deal, but if I am doing this with a 1,000 threads writing a 1,000 rows per second it begins to become problematic. RDBMS tight enforce of schema, which is much needed at times for certain use cases, is what also makes them inflexible.

People have tried to solve this by making RDBMS systems in-memory, but this just pushes the problem down the road by requiring really expensive computing needs, and when you run out of memory leading to massive crashes, data inconsistency, downtime etc… Our team previously managed a multi-terabyte in-memory database for streaming analytics. When we had enough memory for all our data it was AWESOME; however, the moment we began to run close to our memory limits it was a nightmare. It was like fighting off a zombie horde. We were always racing to prevent a crash, because the moment we crashed, it was nearly impossible to catch back up to the stream. Restarting the DB and loading everything into memory took forever, and we were processing hundreds of thousands of rows of data a second. So minutes were a long time. Additionally, making an RDBMS an in-memory RDBMS doesn’t make it any more flexible or change how the storage engine works.

In my mind then, to be a “NewSQL” database, the storage engine needs to be optimized to minimize the overhead of read AND write as much as possible. It needs to do this in a flexible manner that allows for dynamic data, without driving up costs.

Without going into too much of a sales pitch, that is exactly how we designed HarperDB. We optimized it for read, by making every attribute fully indexed dynamically upon write. We did this without duplicating data. This allowed us to achieve high read performance without the pitfalls of multimodal or document stores. We optimized it for write by ensuring each data element was atomic. This allows for us to ensure high scale writes without bottlenecks because updating one attribute is a totally different transaction than another. If you want to learn more you can read this somewhat dated blog about our storage engine. It is dated, because instead of the file system we now use LMDB. Pingcap has a similar, more complex, model.

Our solution and Pincap’s are not the ONLY way to solve the “NewSQL” challenge, but in my mind in order to be “NewSQL” the approach needs to be new to accommodate these needs. That said, this is just my view of the NewSQL database landscape and I would love to hear your thoughts! Am I wrong? What am I missing?

Top comments (3)

Gerald Venzl 🚀 • Jul 14 '20

Hi Stephen,

Thanks for your great article on NewSQL databases!

I could not help but notice some wording in your article that could be misunderstood and hence would like to add some clarifications.

In the article there is a section that says:

For example, if I am trying to write to a table that has a billion rows of data, and it has foreign key to a table with a billion rows of data, the RDBMS has to inspect both tables prior to committing the write.

It makes it sound like that if I were to, e.g. insert a new row in a table A with a billion rows in it, which in turn has a foreign key to another table B with a billion rows in it, that the RDBMS has to go and inspect every single row of both tables before or while a COMMIT occurs. In the event that readers came to such a belief, I wanted to clarify that adding a row in such a big table is actually still a very lightweight operation for the RDMBS. The target table (table A) would not have to be inspected at all, all the RDBMS may have to check is probably a unique key index for the primary key, in case that a primary key has been specified. Given that such indexes in RDBMSs are commonly B-Tree data structures, i.e self-balancing, sorted data structures, such a lookup would, in fact, be very fast and only require a couple of I/Os. As to table B, the very same principle applies. In order to create a foreign key on table A, a primary key on table B has to be present to which the foreign key points to, equally with its own B-Tree unique key index. All that the RDBMS has to do in order to check whether the parent record exists is to check the primary key index of table B but never table B itself.

I have taken the liberty to verify my claim by creating two such 1 billion rows tables with a simple foreign key between them in an Oracle Database running inside a Docker container with 4 CPUs and 10GB of memory:

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> show parameter cpu
NAME                            TYPE    VALUE
------------------------------- ------- -----
cpu_count                       integer 4
cpu_min_count                   string  4
parallel_threads_per_cpu        integer 1
resource_manager_cpu_allocation integer 4

SQL> show parameter sga_target
NAME       TYPE        VALUE
---------- ----------- -----
sga_target big integer 10G

I called my target table A actually readings and my table B is called meters. Additionally, I have also added a couple of NOT NULL constraints which will have to be validated as well, just for fun really.

CREATE TABLE meters
(
   meter_id       NUMBER(10)    PRIMARY KEY,
   location       VARCHAR2(2)   NOT NULL, 
   install_date   DATE          NOT NULL
);

CREATE TABLE readings
(
   meter_id      NUMBER(10) NOT NULL,
   reading       NUMBER,
   reading_tms   DATE       NOT NULL, 
   FOREIGN KEY (meter_id)
      REFERENCES meters (meter_id)
);

As said, both of the tables contain 1 billion rows each:

SQL> select count(1) from readings;

  COUNT(1)
----------
1000000000

SQL> select count(1) from meters;

  COUNT(1)
----------
1000000000

And when inserting rows into readings the database performs that task in an instant:

SQL> insert into readings (meter_id, reading, reading_tms) values (123456789, 22.4, SYSDATE);

1 row inserted.

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |          |     1 |    17 |     1   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | READINGS |       |       |            |          |
-------------------------------------------------------------------------------------

Statistics
-----------------------------------------------------------
               1  CPU used by this session
               3  CPU used when call started
               3  DB time
               3  Requests to/from client
               2  enqueue releases
               5  enqueue requests
              30  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
              14  physical read total IO requests
               1  pinned cursors current
               1  recursive calls
              15  session logical reads
               4  user calls
Elapsed: 00:00:00.192

SQL> insert into readings (meter_id, reading, reading_tms) values (789, 22.4, SYSDATE);
1 row inserted.

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |          |     1 |    17 |     1   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | READINGS |       |       |            |          |
-------------------------------------------------------------------------------------

Statistics
-----------------------------------------------------------
               1  CPU used by this session
               1  CPU used when call started
               4  Requests to/from client
               1  enqueue releases
               1  enqueue requests
               7  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               3  physical read total IO requests
               1  pinned cursors current
              36  process last non-idle time
               1  recursive calls
              13  session logical reads
               4  user calls
Elapsed: 00:00:00.032
SQL> commit;

Commit complete.

Elapsed: 00:00:00.013

There are a couple of interesting insights here:

Neither INSERT statement took longer than a couple of milliseconds to execute
Just a few physical read total IO requests have occurred, 14 and 3 respectively, confirming that not both the entire tables had to be scanned
The second time the physical IO requests were significantly fewer than the first time and so was the execution time - I will come back to that in a bit
The meters table is nowhere to be seen in the execution plan
The database correctly identifies a 1-row insert operation of 17 bytes in total
The COMMIT also didn't take much longer than a couple of milliseconds, it was actually the fastest operation of all three of them

So while it may seem intuitive at first that the database has to scan all 2 billion rows in order to insert this new record, in fact, the database has to do very little to accomplish the operation.

I would also like to clarify that if readers came to the belief that the referential integrity is verified during/prior to the COMMIT that this is not the case, with Oracle Database anyway. It is, in fact, done during the INSERT operation itself and can quickly be validated by inserting a record with a non-existing parent record:

SQL> insert into readings (meter_id, reading, reading_tms) values (1000000001, 18.92, SYSDATE);
Error starting at line : 1 in command -
insert into readings (meter_id, reading, reading_tms) values (1000000001, 18.92, SYSDATE)
Error report -
ORA-02291: integrity constraint (TEST.SYS_C008780) violated - parent key not found

Elapsed: 00:00:00.030

Also in this case it can be observed that the foreign-key validation took only a couple of milliseconds.

Coming back to insight number 3 above and

People have tried to solve this by making RDBMS systems in-memory, ...

RDBMSs are actually "in-memory" for a long, long time. All common RDBMSs allocate memory (I have shown before that the memory of my Oracle Database, i.e. SGA_TARGET is set to 10GB) to cache data and many other things in memory. This explains why there was a drop in physical I/Os when executing a second INSERT statement. At that point in time, the database had already some of the B-Tree index blocks cached and could just look them up in memory rather than from disk. Caching also explains why the overall execution time dropped for the second INSERT operation.

Last I would like to add to this section:

As a one-off this is not a big deal, but if I am doing this with a 1,000 threads writing a 1,000 rows per second it begins to become problematic.

So far I have only performed 2 insert operations and indeed they have been quite fast but those two statements were just one-offs and done from a command-line prompt. However, given that the database actually doesn't have to scan 2 billion rows every time I insert a row, I took the liberty to run one last test. Now, I don't have a big server at my disposal to run 1000 threads on, only a compute instance in the cloud with 8 CPU cores where 4 of them are already given to the database. Instead, I just wrote an endless loop in a little Java program that just keeps inserting data into the readings table, just to see how the database performs with a continuous load. The Java program keeps count of how many INSERT operations per second it has performed:

Gerald Term1: java -jar load-1.0.jar
0 inserts/sec
1701 inserts/sec
3019 inserts/sec
3232 inserts/sec
3281 inserts/sec
3308 inserts/sec
3450 inserts/sec
3479 inserts/sec
3470 inserts/sec
3463 inserts/sec
3317 inserts/sec
3248 inserts/sec
3380 inserts/sec
3340 inserts/sec
3090 inserts/sec
3163 inserts/sec
3159 inserts/sec
3171 inserts/sec
3266 inserts/sec
3477 inserts/sec
3534 inserts/sec
3393 inserts/sec
3315 inserts/sec
3406 inserts/sec
^CShutting down ...
Gerald Term1:

The results are actually quite impressive. The Java program has, for 21 consecutive seconds, inserted more than 3000 rows per second! That means that the database did more than 3 inserts per millisecond with just 4 CPUs and 10GB of RAM on this 1 billion+ rows table!

I can confirm that by querying the 1 billion+ table as well (note the two 1-row inserts from earlier on):

SQL> SELECT TO_CHAR(reading_tms, 'YYYY-MM-DD HH24:MI:SS'), count(1)
  2    FROM readings
  3      WHERE reading_tms >= TO_DATE('2020-07-14 00', 'YYYY-MM-DD HH24')
  4       GROUP BY TO_CHAR(reading_tms, 'YYYY-MM-DD HH24:MI:SS')
  5        ORDER BY 1;

TO_CHAR(READING_TMS   COUNT(1)
------------------- ----------
2020-07-14 00:00:56          1
2020-07-14 00:01:32          1
2020-07-14 00:41:25       1702
2020-07-14 00:41:26       3019
2020-07-14 00:41:27       3232
2020-07-14 00:41:28       3281
2020-07-14 00:41:29       3308
2020-07-14 00:41:30       3450
2020-07-14 00:41:31       3479
2020-07-14 00:41:32       3470
2020-07-14 00:41:33       3463
2020-07-14 00:41:34       3317
2020-07-14 00:41:35       3248
2020-07-14 00:41:36       3380
2020-07-14 00:41:37       3340
2020-07-14 00:41:38       3090
2020-07-14 00:41:39       3163
2020-07-14 00:41:40       3159
2020-07-14 00:41:41       3170
2020-07-14 00:41:42       3267
2020-07-14 00:41:43       3477
2020-07-14 00:41:44       3534
2020-07-14 00:41:45       3393
2020-07-14 00:41:46       3315
2020-07-14 00:41:47       3406
2020-07-14 00:41:48       2965

Of course, at this stage I could go on and start a second instance of that Java program and test the upper limits of that particular database, however, that's probably a topic for a post on its own. :)

I hope that I have given readers a better understanding of how an RDBMS actually performs referential integrity checks well, at least Oracle Database, and of how fast these are done even on bigger tables.

Thanks once again!

Yaron Levi • Jul 11 '20

Thanks for the article! We should also not forget Yugabyte and Fauna.

Stephen Goldberg • Jul 12 '20

Hi Yaron,

You are most welcome! I in no way was trying to make a definitive list of NewSQL dbs, but rather give better defention to the term. Agree with you that Fauna falls into the category, not familiar with Yugabyte so not comfortable giving an opinion there.

DEV Community

Deep Dive: NewSQL Databases

Overview

The “NewSQL” Confusion

The Storage Engine

So what makes it a “NewSQL” Database?

Top comments (3)

Read next

🚀 Introducing Amazon Aurora DSQL: The Next Evolution in Databases 📊

How to Monitor SQL Performance in Spring Boot

Clean Code Essentials: YAGNI, KISS, DRY

Mastering Essential Software Architecture Patterns: A Comprehensive Guide🛠️