James Mikrut for Payload CMS

Posted on May 11, 2023 • Originally published at payloadcms.com

SQL vs. NoSQL - cutting through the Tech Twitter noise

#database #sql #nosql #mongodb

We've done a lot of thinking around database support lately, and even hosted a community discussion on Discord. Right now we're MongoDB-only, but our most upvoted roadmap item is adding support for additional databases. And transitioning from NoSQL thinking to an SQL mindset has really spotlighted the concrete differences.

If you've ever wondered exactly what NoSQL is good for, and alternatively, why SQL might be a better fit for certain applications, and you're tired of Tech Twitter telling you what to think, then read on for a real-world comparison.

In this post, I'm going to cover the big challenges that we'll face when adapting our features and functionalities over from NoSQL to SQL, and along the way, I'll highlight the strengths and weaknesses of each paradigm with some objective ramifications.

Data Definition Language

Since we're currently solely on MongoDB, we don't need to do things manually like adding a new table column when you add a new field to your Payload config. But with SQL, we need to think about that (and make it painless).

DDL, or Data Definition Language, is how SQL databases define and maintain their structure. Even when you use something like Prisma, which takes care of a lot of the headache for you, you need to become intimately familiar with how Prisma migrations work. There, Prisma translates its own schema syntax into SQL migrations for you.

Don't know what any of this means? Go ahead and try to simply follow the "Create a column in a table" example. If you know SQL and if this is not new to you, then you're probably okay with this. But there's work involved here.

When you decide you want to store a new field in a NoSQL database like MongoDB, you literally have to do nothing besides just start storing the data. No migrations, no generated DDL, no schema / database "sync".

That's obviously a big win for NoSQL, but that also means that the NoSQL structure is significantly more loosey-goosey and that can lead to errors. Lots of devs (not to mention project requirements) straight up require the structural constraints of SQL, and allowing any type of data to be stored in a NoSQL document can present issues in terms of data integrity. So there are definitely pros and cons to both approaches.

Migrations

Fascinatingly, another very hotly requested Payload feature is a first-party way of handling migrations. And that got us thinking. When you look at the Prisma migration resources above, clearly, they are using the word "migration" to describe the "migration" of your database shape from A to B. This is the expected and most widely used meaning of "migration".

But we're on MongoDB, and that doesn't apply. But developers are still asking for migration support. There is a specific distinction here that I'd like to make. Migrations can mean many things in many different contexts.

Changing database shape

Most often, they are referred to as simply modifying your database structure (DDL). This is a requirement for SQL in general. For example, adding a column to a table is a manual ALTER that needs to be run at some point.

Transforming existing data

But there's a whole different side to the story. What about if you need to change or transform your existing data? What if you had a full_name field, and you needed to split it up into two separate fields, first_name and last_name? Stuff like this is super common, and requires manual, programmatic management. Whenever anyone has asked Payload for migration support, this is what they meant.

Due to this second aspect of "migrations", that means that the concept is relevant for both NoSQL and SQL databases alike. Application frameworks like Laravel handle this very well, with a full, true "migrations" workflow that allows you to handle both DDL and data-based transformations within the same paradigm, all based on database transactions. I personally think this is by far the best way to solve this and this is an often-overlooked aspect of ORMs in general. Some ORMs come with great programmatic migration support, while others solely handle the DDL side (the pain in the ass side) in raw, generated SQL. Excuse my French, but I've been spoiled here by MongoDB, and I spoiled myself intentionally.

If you end up listening to our community planning call, you'll see that we cover our goals and future vision in-depth on that call. We want to model ourselves after Laravel's consistent (solve for both) approach by not outputting raw SQL, but rather, outputting TS files that give you granular, functional control.

Relational data and performance

One of the biggest wins for SQL in the minds of engineers is its heavily structured relational architecture. Without a doubt, relationships are a first-class citizen in SQL. Much of the querying language itself is dedicated specifically to combining and relating data from separate tables.
But relationships can certainly be done in NoSQL, so what's the deal? What are the real-world differences?

First up, there is an entire array of functionality within SQL that actually enforces relational data integrity. Within MongoDB, those types of constraints would all need to be written manually, and you'll need to ultimately "own" the integrity of your data. In NoSQL, you might have a relationship to another document. You could go delete that document, but unless you've handled this yourself, the original relationship would remain, yet be completely invalid. That's not ideal, and that places more work on you.

Let's take Payload, for example. Surprise, surprise. We have a relationship field, and it can store IDs to other related documents which are seamlessly merged in when you retrieve documents from the DB. We leverage the dataloader pattern to batch together all "populations" required for a given query, returning them all super fast and with as few separate queries to the DB as possible.
We actually even outperform SQL-based frameworks quite a bit. In a purely relational test, we were 3x faster than Directus and 7x faster than Strapi while both were running Postgres, and we were on MongoDB.

Without knowing what's under the hood of either of those platforms, my takeaway here is that relationships are possible in both NoSQL and SQL. If you were to write out queries deliberately for very specific joins between tables, and have the ability to define queries so that a single query is made vs. dynamic populations, then SQL will likely outshine MongoDB in terms of relational performance. But - your application logic will determine which is faster, and it's easy to get it wrong with either side.

One thing to note here is that if you are using a GraphQL API, or any type of dynamic / expressive querying language, it's impossible to make a single query that returns all populated "relations". This is because it's up to the shape of the query to determine what to "populate". This is known as the N+1 problem. You first need to load the main document, and only then can you determine what to load after. If this applies to you, then most relational performance gains from SQL are non-applicable.

Handling complex data structures

One absolutely massive win that NoSQL has over SQL is that you can store complex data directly in a single document. It can be stored in the same shape as you want it to be in when you read it back from the database. Some SQL databases like Postgres have pretty great JSON support via JSON columns, but if you over-use JSON columns in a SQL database, you're not taking advantage of the strengths of SQL in the first place and might as well go with NoSQL.

So what's it look like to store a complex data structure in SQL, without losing the benefits of SQL? For a simple example, take Payload's Group field:

{
  "id": "some-id-here",
  "myGroup": {
    "mySubField": "hello",
    "anotherSubField": "goodbye"
  }
}

The above field can be stored directly in NoSQL as-is, which is fantastic. But what about SQL-based databases? Do we transform the nested architecture to a flattened column structure, using a double underscore to separate field "paths"?

id	myGroup__mySubField	myGroup__anotherSubField
`some-id-here`	`hello`	`goodbye`

This could functionally perform the same, but we'd need to "transform" the data on its way out from the database somehow back to its intended shape. Doable, but not nearly as elegant.

Alternatively, we could "join in" another table that includes columns that represent properties within the group. That'll become a bit less performant, because we're now doing a join that we didn't need to do in NoSQL land.

Finally, do we just say "screw it" and embed the group and all of its properties, from that point on, into a single JSON column?

Field-based localization

One specific feature that we support, which is simple in NoSQL, but more difficult in SQL is field-based localization. You might have 30 locales, and any given text field needs to be translated into each of the 30 locales.

{
  "id": "some-id-here",
  "myLocalizedField": {
    "en": "hello",
    "es": "hola",
    "de": "hallo"
  }
}

Again, in MongoDB, this is a cake-walk. Storing the locales all on a parent document is great, because they are fundamentally a part of that parent document.

But in SQL we've gotta get crafty again. You might have 30 locales or more. Does that mean 30 columns for every single text field, all in one database?

id	myLocalizedField__en	myLocalizedField__es	myLocalizedField__de
`some-id-here`	`hello`	`hola`	`hallo`

If you have too many columns in SQL, you'll start to see performance implications. But this needs to be solved one way or another. Again, here, we could jump to a JSON column but then we're losing a lot.

Array-based structures

In NoSQL, storing arrays is dead simple. Just save it on the doc and move on.

{
  "id": "some-id-here",
  "myArray": [
    {
      "subField": "hello"
    },
    {
      "subField": "goodbye"
    }
  ]
}

The well-accepted and correct way to handle array-based data in SQL is to simply create a new table for each array-based field (like Payload's Arrays and Blocks fields). Then you'd just join in each row of array data when you retrieve the record back. It's more complicated, sure, and we're losing some performance here because we have to make joins. But it's logical and at least the answer is clearer than the last few examples.

Takeaways

Ultimately I think having a healthy dose of skepticism when reading anything that Tech Twitter or Reddit tells you will do you a great deal of good as an engineer. The truth is that there are certain cases where SQL will perform better for you, and NoSQL will be a better fit elsewhere. But it's up to your own unique requirements.

Overall, I think it's safe to say that SQL will win for your project if you:

Know the shape of your data well in advance
Are confident that your schema won't change often or are comfortable dealing with lots of DDL migrations
Have a relatively "flat" and not complex schema
Require heavily relational structures

On the flip-side, NoSQL will win out if you:

Frequently manage complex, nested data structures like Groups, Arrays, Blocks, localized field data, and similar
Use relationships solely for simple structures like Categories, Tags, Authors, etc.
Want to avoid dealing with DDL stuff entirely and make your life easier
Are okay with manually enforcing data integrity and handling edge cases yourself, or are using a framework that handles that stuff for you on top of NoSQL

Ah, life. Everything's a compromise. As much as I'd like to go hard and declare a victor, I totally understand why Payload SQL support has been asked for, although I personally think MongoDB is still a better fit for most of our users. In the end, you'll be able to pick whatever's right for you.

Planning session recording

We recorded the whole Discord planning session that took place last week. To hear more about how we're approaching future SQL support in Payload, including ORM evaluation and more, give it a listen.

Where can you help?

First up, we want to hear from you! If you haven't yet, make your voice and vote known on our Roadmap GitHub Discussion.

From there, once we have our future database adapter pattern established, anyone can build an adapter. We'd love to see some community-generated database and ORM support and that will only speed up / pressure test what we're building.

We're really appreciative of the engagement we saw on that call and are super pumped about doing more of those in the future. The next one will likely be regarding replacing Webpack, but we'll see. Follow us on Twitter to keep in the loop or join our Discord to keep an eye on the conversation there.

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Forem

SQL vs. NoSQL - cutting through the Tech Twitter noise

Data Definition Language

Migrations

Changing database shape

Transforming existing data

Relational data and performance

Handling complex data structures

Field-based localization

Array-based structures

Takeaways

Planning session recording

Where can you help?

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

Top comments (0)

The Next Generation Developer Platform

Read next

Mastering Data Cleaning: Your Guide to a Cleaner, Reliable Dataset 🚀

Idempotency in DynamoDB

SQL Interview Questions: A Comprehensive Guide for Developers

New AI Training Method Cuts Data Needs in Half While Boosting Performance by 20%

Okay