How you store your data can affect how well your application or system works. There are two main ways to store data: by rows or by columns. Each way has its pros and cons.
Think of a library full of books. Each book is a row of data, and its chapters and pages are data points.
Storing data by rows is like putting these books on a shelf in order, one after another. All the data for one record (book) stays together, even if you don't use some parts (chapters or pages) often.
Storing data by columns is like grouping all the chapters of all the books on a shelf, then all the pages, and so on. This makes it faster to find a specific piece of data (like a chapter), even if it's from different books (rows).
What's different?
| Features | Storing by Rows | Storing by Columns |
|--------------------|-----------------------------------------------------|---------------------------------------------------------------|
| Data organization | Keeps data in rows | Keeps data in columns |
| Access patterns | Good for reading whole rows, updating data often | Good for reading certain columns, asking analytical questions |
| Performance | Usually slower for analytical questions | Can be much faster for complex questions |
| Storage efficiency | May use more space if you only access some columns | Can save space by compressing data |
| Examples | MySQL, PostgreSQL, Oracle Database | Amazon Redshift, Apache Cassandra |
Here are some AWS services that use row-based or column-based storage:
1. Storing by columns:
Amazon Redshift: A service that manages huge data warehouses for fast analysis and querying of large datasets. It stores data in columns, which makes it very good for analytical workloads that need to aggregate and select data.
Amazon DynamoDB: A service that lets you use both columnar and row-based storage options without servers. You can pick the format that works best for your data and queries.
Amazon Quantum Ledger Database (QLDB): A service that manages a ledger database that records transactions in a verifiable and permanent way. It stores data in blocks, which are like columns of data, making sure the records are not changed.
2. Storing by rows:
Amazon Aurora: A fast, relational database that works with MySQL and PostgreSQL. It stores data in rows, which makes it great for traditional transactional workloads and applications that need strong consistency.
Amazon Relational Database Service (RDS): A service that manages different database engines, including MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server. These engines usually store data by rows.
Amazon SimpleDB: A simple, NoSQL database that stores data in attributes and values. It can be seen as row-based as each item has a set of attributes, but it doesn't follow the row-column structure of relational databases.
How to choose the best storage type:
Query patterns: Storing by columns is better for analytical questions that need to aggregate and select data. Storing by rows is better for transactional workloads that update and read whole rows often.
Data structure: Storing by columns is good for structured data with clear schemas. Storing by rows is more flexible for semi-structured or unstructured data.
Performance requirements: Storing by columns can offer better performance.
✨ Hope this content clearly explains the differences between 2 types of storage to you!
Top comments (0)