This article was written by Christina Kopecky and was originally published at Educative, Inc.
One of the things that we may struggle with as developers when working on a green field project is our stack. Choosing the right tech to solve a problem can be a harrowing experience. Databases in particular can be a bit tough if we’re unsure how our data is going to be used.
This article aims to assist you in choosing the right type of database for your project by introducing or refreshing your memory on NoSQL and RBDMS databases, exploring the differences between the two, and discussing when you should use certain kinds of databases, namely MongoDB and PostgreSQL. To be most successful with this article, you should already have a basic understanding of databases and frameworks.
To get caught up before reading, check out our beginner's tutorial Database Design Tutorial.
Today we will go over:
- Why do we need databases?
- Important terms to know
- Overview of MongoDB and PostgreSQL
- Choosing between MongoDB and PostgreSQL
- Wrapping up and resources
Why do we need databases?
This question may be a bit obvious, but understanding why we need databases helps when it comes to choosing a database structure for your stack. Databases are a basic foundation of software development, and they serve many purposes for building projects of all sizes and types. Let's take a deeper dive into the uses of a database.
How would you feel if you were to visit a website, add items to a shopping cart, and navigate away from the site only to come back to an empty cart? A frustrating experience, I’m sure. This is where databases come in. Having a database allows for session persistence so that a user can log in and stay logged in for an extended period of time. When thinking of your database choice, what type of data do you need to be persisted? The type of data you are using help you choose the database that will most suit your data and client needs.
In one of my first jobs, I recall using a system that used 5x7 ledger cards to keep track of customers, payments, and the rental inventory we carried in the warehouse. It was cumbersome, not without human error, and led to redundancies in records. A database with an inventory management system could have helped this company come into the 21st century and reduce the need for a bookkeeper and inventory reconciliation.
A database assists in normalizing your data. There are several different flavors of normalization, but the high level explanation is that it reduces redundancy and anomalies in your data. The retail store example from above could have certainly used a computerized database to increase productivity and reduce the amount of manual tabulating. With the organization of a database, you can learn a lot more about your data, as it makes that information readily available to assist decision making. How you organize that data will help you choose the database that will most suit your data and client needs.
Businesses thrive on data. Having a database to collect customer information, such as likes, dislikes, order history, or articles read, allows a business or organization to target their consumers more readily. This will lead to higher sales, more traffic, and better targeted ads. Data collection and analysis is key for any business to survive in this big data era. How you want to access and use data will help you choose the database that will most suit your data and client needs.
Important terms to know before making a database decision
Now that we are familiar with the main reasons we should use a database, let's look at some important terms we need to know before making a database decision. The following list is certainly not an exhaustive list, but knowing these basic terms will assist you in choosing a database that’s right for your project.
ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. It’s a property of relational databases that guarantees database validation and is quite reliable. Definitions of the four properties are as follows:
Atomicity. The basic idea behind atomicity is that it supports a transaction paradigm. This paradigm is used primarily by relational-based databases. When we say transaction paradigm, what does it actually mean? Either a transaction fails completely or succeeds completely, such as a transfer on funds from one account to another.
Consistency. When there is no effect on either bank account because of a failed transaction, it means that the database is in a consistent state. Consistency tells us that a transaction has brought the database from one valid state (pre-transaction) to another valid state (post-transaction). Valid in this sense means that the data is set according to defined rules or constraints.
Isolation. Imagine we have two separate people who are using an ATM to each retrieve cash from the same account at the same time. The idea behind isolation means they wouldn’t be able to fool the bank into thinking they are only pulling $100 from the account when actually they are pulling $200 ($100 from each ATM). The bank’s database treats them as sequential transactions.
Durability. If there are power failures, catastrophic outages, or crashes, durability guarantees that completed transactions are already recorded.
BASE is an acronym that means Basically Available, Soft State, Eventually Consistent. It is unique to non-relational databases. These types of databases don’t have the ACID guarantee, as they are eventually consistent. This means that there could be times where the database is not reliable, but over time it will reach consistency. So your data is basically in a fluid state until it becomes consistent.
This is a term used in relational databases to connect two tables. It reduces the amount of redundancy in your data. In SQL, a
JOIN clause is used to combine rows from two or more tables, based on a common column, and there are three types of
JOIN clauses for different needs.
A term coined for database systems (i.e. VoltDB and MemSQL) that combines the best aspects of relational databases (ACID) with the efficiency and horizontal scalability of NoSQL databases.
Normalization is the process of structuring a relational database. It allows us to use tables and columns to reduce redundancy in data, minimize anomalies in data modification, and simplify queries. Normalization is used in relational databases.
A type of database system that does not necessarily use traditional structured query language (SQL) to query database systems. Examples of NoSQL databases include MongoDB and DynamoDB. NoSQL databases are non-tabular, and they vary based on their different data models, such are document, wide-column, key-value, or graph. Their structure provides flexible schemas, and they can be scaled easily.
As we discussed in our first section, data is persistent when it outlives the process that created it. Persistence refers to a process or object that continues to exist after the parent ceases or after the system is switched off.
RDBMS is an acronym that stands for Relational Database Management System. It’s usually a SQL-based database such as PostgreSQL or MySQL and meets the ACID requirement. RDBMS makes it easy to access and locate values in a database. We call it "relational" because the values in a table and tables themselves are related, making it possible to run queries across many tables at the same time.
Structured Query Language is designed for performing CRUD (Create, Read, Update, Delete) operations on a database. We use SQL to communicate with a database, and we can use SQL statements to perform tasks like updating or retrieving data from a database.
Keep the learning going.
Learn the fundamentals of relational databases and SQL without scrubbing through videos or documentation. Educative's text-based courses are easy to skim and feature live coding environments.
Overview of MongoDB and PostgreSQL
When choosing your database structure, it’s important to factor in speed, reliability and accuracy. On the one hand, we have a relational paradigm that can guarantee data to be in a valid state. On the other hand, we have a non-relational paradigm that guarantees eventual consistency. So, how do we choose? Let’s take a look at two of the most popular databases that come from each paradigm: MongoDB and PostgreSQL.
MongoDB: a NoSQL Database
NoSQL basically means one thing: it doesn’t only use SQL to query its database (The “No” in NoSQL stands for “not only”). MongoDB is a NoSQL, document-oriented database that uses documents to manage information. Let's look at the key features on MongoDB to get a better sense of its uses.
Using JSON allows you to change your schema on a whim without repercussion. Documents can vary in terms of key/value pairs. Unlike relational databases, where altering your table is necessary to make any changes, MongoDB is a bit more flexible because it uses the JSON/BSON format. Individual entries are their own instance of the schema that was written. As time goes on, the schema can be changed with no consequence to the database. The frontend developer would just need to perform some error handling if null values are present in the
ACID properties in MongoDB?
NoSQL databases don’t usually conform to the ACID properties but instead adopt eventual consistency. This makes NoSQL databases less ideal for financial institutions where the validity of its transactions is most important.
That being said, MongoDB has adopted more to the ACID format in 2018, though it still has yet to measure up to the rigidity of those properties. A study published in May 2020 identified a bug that affects the claim that Mongo performs ACID transactions at an acceptable level. If you are leaning toward making ACID a priority, research should be done to investigate if transactions are performed in an acceptable manner for your product.
PostgreSQL: an SQL-based Relational Database
PostgreSQL, also known as Postgres, is an open-source relational database management system that emphasizes extensibility and SQL compliance. Let's look at the key features on Postgres to get a better sense of its uses.
Postgres employs SQL ultimately under the hood, a structured query language, to define, to access and to manipulate the database. Postgres does use its own flavor of SQL called PL/pgSQL (procedural language/postgreSQL). The big difference between the two is that the latter can perform more complex queries than SQL. Other relational database models have their own flavor of SQL, which leads to minor differences across the board between the different databases.
Postgres transactions follow the ACID principle. This means that the validity of its data is quite reliable.
Because this database has a relational structure, the primary component that defines the difference between this and MongoDB covered earlier is that the whole schema needs to be designed and configured at creation. Altering a table after onset can be done, but can lead to not easily identifiable bugs down the road.
PostgreSQL databases can use foreign keys. Foreign keys allow us to keep our data normalized by referencing an object from one table in another so the second table has access to the first table’s keys and values.
Choosing between MongoDB and PostgreSQL
So, now that we know what each database has to offer, we need to determine when to choose each depending on the data, organization, and requirements in question. Let's break it down to inform our database decision. Keep in mind that there is no perfect database. The key is to identify your needs and best match the abilities and benefits with those guidelines.
When to use MongoDB
Schema evolves as your application evolves: MongoDB is great to use when you need to have a database whose schema can evolve as your application evolves. Sometimes, we just don’t know what the entire scope of our application will be like and having the flexibility to adjust the schema to fit your needs is desirable.
Horizontal scaling means that MongoDB is fast: Because MongoDB doesn’t depend on more processing power as it scales, but instead combines power with additional machines, it can be a bit faster. If you plan to have tens or hundreds of thousands of documents of data, it might be a good idea to use the horizontal scaling approach by using MongoDB.
ACID not a priority: MongoDB is great if ACID is not your first priority, even with the advent of Mongo’s ACID transaction paradigm. It’s really up to what the higher priority is for you and your business.
When to use PostgreSQL
Data depends on reliability of ACID: Because transactions in this model follow ACID properties, it’s a good choice for any sort of fintech business. When you absolutely need to control the state of your data, use a relational database like Postgres.
Schemas have an identified relationship: If the structure of your data can be identified with a 1:1, 1:many, or many:1 relationship, having tables that identify those schemas and having foreign keys will be useful.
Downtime to upgrade isn’t a drawback: Because PostgreSQL scales vertically (as it gets bigger, more space or more memory is needed), it requires downtime to upgrade. Power outages or failures could also take your database down for a while. But often times, the choice to have a relational structure and ACID principles outweighs the possibility of outages.
Wrapping up and resources
MongoDB and PostgreSQL are both reputable, robust databases that have their advantages and disadvantages: there isn’t one that’s particularly better than the other. What is most important is how your data is going to be used, what structure will it have, and how will your application scale. Once the answers to these questions become evident, the choice will become clearer.
The best way to choose the best database is learn more about the options available to you. Educative's hands-on course
Database Design Fundamentals for Software Engineers
walks you through the need-to-know concepts and essential techniques for optimizing your database. You learn about:
- Relational databases
- Data modeling
- and more
Get started today and take charge of your career!
- Database Design Tutorial (article)
- MySQL Tutorial: the beginner's guide to using MySQL (article)
- Angular vs Vue vs React: choosing the best framework in 2020 (article)
- The Definitive Guide to MongoDB (course)
Top comments (5)
Great overview! Christina does a great job explaining complex topics in clear and simple terms, something I think is often lost in the database world. I'm looking forward to reading some more of your articles! 🤓
That makes it sound like you cannot modify the schema in a RDBMS, which obviously you can. The main difference is that in a relational database you only have 1 schema for all your data. Adding a column means all data also gets this piece of information.
With a typical NoSQL database you need to make sure your application supports all versions of the "schema" you have stored in it. If you don't, accessing old data will not be possible without issues.
Modifying a schema in a relational database can be expensive though. Adding or removing columns is often cheap (in modern PostgreSQL adding a new column with a default value is just as cheap as adding a nullable column). But converting data requires rewriting every row.
The relational database approach requires you think about your data model, where with NoSQL you can postpone it. However, postponing thinking about the data model never works out. So even with a NoSQL approach you need to thing about your data model otherwise you will need to write, and maintain, a lot of supporting logic in your software.
Thank you for the great write up!
Extremely thorough article thanks!