How is Column Store fetch faster than Row Store?

#architecture #database #performance

Consider a table called ORDERS that has 100,000,000 rows having columns: Amount (8 bytes decimal), OrderDate (8 bytes), Country (dictionary-encoded; code 1 byte), OrderID (4 bytes int), CustomerID (4 bytes int), and five additional miscellaneous columns with an average of 8 bytes each. In a row store, the approximate row width is ≈ 4+4+1+8+8 + (5×8) = 69 bytes, which rounds up to 72 bytes.

Ex: SELECT SUM(Amount) FROM ORDERS WHERE Country = 'IN'. 

If the executable query is 'IN' = 5% of rows (selectivity 5%), the cost of fetching data from a row store table is as follows:

Case A:Complete table scan: Bytes read ≈ rows × row_size = 100,000,000 × 72B = 7.2 GB.However, only 11% of that is required by the query, which only requires one column (Amount). Thus, nearly 90% of what is read is meaningless. It also means more CPU work and slower performance.

Case B: Applying a B-tree index to the Country. Using the index, find the 5% matching = 5,000,000 row IDs. The base table's bytes touched are approximately 360 MB, or 5,000,000 × 72B. In addition to the CPU overhead for random pointers, index pages are read in tens to hundreds of MB. Even though there are fewer total bytes than a full scan, performance can still lag behind a tight sequential column scan due to random access + cache misses.

Data fetch cost in a column store table: Each column is stored independently in a column-based database. Therefore, it only reads the necessary columns (Country and Amount) for the same query. Bytes read ≈ rows × row_size = Amount column → 100 million × 8 bytes = 800 MB + Country column → 100 million × 1 byte = 100 MB.

To figure out which rows belong to "IN," it first scans the country (100 MB). After that, it adds up only the rows that are marked after scanning the Amount column up to 800 MB. The CPU operates very efficiently because this data is sequentially stored and processed in batches. Parts of the data may even be ignored if it is compressed or grouped, which would lower the read size to less than 900 MB. Therefore, a column store requires only about 900 MB of tightly packed, easily readable data rather than 7.2 GB as a row store does. This makes it approximately eight times faster and much more CPU-friendly.

DEV Community

How is Column Store fetch faster than Row Store?

Top comments (0)