HyperLoglog (HLL) Types
HLL is an approximation algorithm for efficiently counting the number of distinct values in a dataset. It features faster computing and lower space usage. You only need to store HLL data structures, instead of data sets. When new data is added to a dataset, make hash calculation on the data and insert the result to an HLL. Then, you can obtain the final result based on the HLL.
Table 13 compares HLL with other algorithms.
Table 13 Comparison between HLL and other algorithms
HLL has advantages over others in the computing speed and storage space requirement. In terms of time complexity, the sorting algorithm needs O(nlogn) time for sorting, and the hash algorithm and HLL need O(n) time for full table scanning. In terms of storage space requirements, the sorting algorithm and hash algorithm need to store raw data before collecting statistics, whereas the HLL algorithm needs to store only the HLL data structures rather than the raw data, thereby occupying a fixed space of about 16 KB.
Table 14 describes main HLL data structures.
Table 14 Main HLL data structures
When you create an HLL data type, 0 to 4 input parameters are supported. The parameter meanings and specifications are the same as those of the hll_empty function. The first parameter is log2m, indicating the logarithm of the number of buckets, and its value ranges from 10 to 16. The second parameter is log2explicit, indicating the threshold in explicit mode, and its value ranges from 0 to 12. The third parameter is log2sparse, indicating the threshold of the Sparse mode, and its value ranges from 0 to 14. The fourth parameter is duplicatecheck, indicating whether to enable duplicate check, and its value ranges from 0 to 1. When the input parameter is set to –1, the default value of the HLL parameter is used. You can run the \d or \d+ command to view the parameters of the HLL type.
The following describes HLL application scenarios.
·Scenario 1: “Hello World”
The following example shows how to use the HLL data type:
·Scenario 2: Collect statistics about website visitors.
·Scenario 3: The data to be inserted does not meet the requirements of the HLL data structure.
When inserting data into a column of the HLL type, ensure that the data meets the requirements of the HLL data structure. If the data does not meet the requirements after being parsed, an error will be reported. In the following example, E\1234 to be inserted does not meet the requirements of the HLL data structure after being parsed. As a result, an error is reported.
Top comments (0)