We want to make our databases fast so that the queries which we run should return the results fast.
By changing how we store the data in our database we can make the database respond to the queries and optimize our database to either serve the analytical workload or the transactional workload which we call OLAP and OLTP.
Computers in general, even the databases, read the data from disks, hard disks. We are not talking about the RAM data which gets deleted once the computer is shut down but we are talking about the hard disks where the data is stored permanently.
On these storage disks data is stored in the form of blocks. it is evident that if the computer has to read fewer blocks, it is going to take less time and if the computer is going to read data from more blocks, it is going to take more time.
If the data which we query are on fewer blocks, then the response of the database would be faster.
Let us understand this with an example
This is a sample sales table
Let us see how this data will be stored in a row-based data store.
In the row-based format, the entire record is stored in one block meaning as you can see in the figure, the record of the first customer — — -the customer name, item ID, and sales amount is stored in the first block.
Therefore, if we are looking for the first customers details you can find all that in one single block. This type of data stores will be great for transactional databases where we need the details of a single entity.
If I have to take the average of all the sales then I would have scan through all the five blocks of data, resulting in more IO and more cost.
Let us see how this data will be stored in a row-based data store.
Data in the column stores are stored differently. Here entire columns are written down inside one block so here we have all the customer names in one block all the item IDS in the next block and all the sales amount in the third block.
So, instead of having each customer’s data together we have the whole column of data together.
If I want to access the sum or the average of all the sales amount this database would be fast because everything would be saved in one single block.
I will use this type of database for analytical queries like sum or avarages etc
Columnar database is also very suitable for compression because each block will have the same type of data either numbers or strings etc. so it would be very easy to compress the data in the block efficiently.
How would queries work with these different storage methods
Transactional style query
Eg:
SELECT Customer_name , Sales
FROM data
WHERE Item_ID=3000
This is a transactional style query where we need all details of one person.
In the row database, we need to scan the data from just one block whereas in the columnar database we would have to scan through a lot of blocks.
Analytical style query
Eg:
SELECT Item_ID, count(1)
FROM data
GROUP BY Item_ID
Here we want to count how many observations we have from each Item_ID, which could be done by reading a single block in the Columnar database, whereas in the row database we would have to scan through all the blocks
Happy Databasing !!!!
By, Sabiha Ali, Solutions Architect, ScaleCapacity Inc.
Top comments (0)