In this post, I'm not trying to explain every point in detail, but rather giving an overview with a brief description of some important aspects that we need to keep in mind while working with databases.
Types of databases
Relational databases
Data are structured using tables with columns and rows. The relationship between tables is predefined. SQL is used to query the data in the database.
E.g. PostgreSQL, MySQL, MariaDB, SQLite.
Non-Relational Databases (NoSQL - aka Not Only SQL)
Data are not organized in tables but rather have a more flexible structure. These databases are designed for easy scaling horizontally and ease of use. There are some main types of Non-Relational databases:
Document databases
Document databases store each item in a JSON, BSON, or XML encoded object called a document.
E.g. MongoDB, CouchDB,
Key-Value store
Each item has a key and a value. E.g. name=John, last_name=Doe.
E.g., Redis, Memcached
Column-oriented databases
Data is stored in columns. It is normally used when our data items have a huge amount of properties (each property is stored in one column), and we just want to query some of the properties to avoid unnecessary data.
E.g., Cassandra, HBase
Graph databases
Data is held in nodes and edges. The main focus of this type of database is the connection between nodes (using edges).
E.g., Neo4j, Amazon Neptune
Consistency model
ACID
Atomic, Consistent, Isolated, Durable
BASE
Basic Availability, Soft-state, Eventual consistent
Database replication
The data has its copies on multiple interconnected machines. These machines can be in different locations. To reduce access latency, increase availability and read throughput. Some typical database replication techniques are:
Single-Leader
New data is written on only one single machine called Leader. The data will then be replicated to other machines. These machines are called Replicas. Reading data can be done on both Leader and Replicas.
Multi-Leader
Same as Single-Leader, but there are multiple Leaders. Each data center has one Leader. Data written to one Leader will be replicated to the other Leaders.
Leaderless
There is no Leader. The Replicas receive read and write requests directly.
Transaction
We use transactions when we want to execute a set of operations that can either be successful altogether or not at all.
Index
An index is a data structure that helps read data faster with a charge of slower write operations and more space needed.
Database backup
Database backup is a way to make a copy of the data. In case of a disaster that causes data loss, we can use this copy to restore our important data.
Full backup
Backup everything
Differential backup
Only backup what was changed since the last full backup
Incremental backup
Backup what was changed since the last incremental backup
Database migration
Our software often requires changes in the code and the database schema. When we deploy new code to production, the database should be updated as well. This process is called database migration. We have to make sure that during the migration progress, the database can always be rolled back to its original state if something wrong happens.
(Bonus) Serverless databases
Cloud providers like Amazon or Google will manage everything (provisioning, scaling, maintenance, etc.) for us. We just need to pay for what we use (pay-per-use).
Top comments (1)
What do I need to know about database?
powerful islamic mantras