I guess one problem with the concept of processing terabytes of information, is what parts of the information are you using and are you considering the "in flight" size of the data, or the size of the information stored.
For many years, column-based databases have been able to perform processing on billions of records that represent terabytes of data at the row level, in a matter of seconds. Are they processing terabytes of data? Well yes and no. The algorithms for such systems rely heavily on tokenization, compression, and similarity compression so the input terabytes become quick accesses to compressed information, stored sequentially, based on the questions being asked.
yes, you are right. To calculate on TB data does not mean to always traverse all data.
However, here I said "process", such as ETL operation, for such scenario, we often need to traverse all the data, even more times.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
I guess one problem with the concept of processing terabytes of information, is what parts of the information are you using and are you considering the "in flight" size of the data, or the size of the information stored.
For many years, column-based databases have been able to perform processing on billions of records that represent terabytes of data at the row level, in a matter of seconds. Are they processing terabytes of data? Well yes and no. The algorithms for such systems rely heavily on tokenization, compression, and similarity compression so the input terabytes become quick accesses to compressed information, stored sequentially, based on the questions being asked.
yes, you are right. To calculate on TB data does not mean to always traverse all data.
However, here I said "process", such as ETL operation, for such scenario, we often need to traverse all the data, even more times.