DEV Community

Qing
Qing

Posted on

Data Types(6)

HyperLoglog (HLL) Types

HLL is an approximation algorithm for efficiently counting the number of distinct values in a dataset. It features faster computing and lower space usage. You only need to store HLL data structures, instead of data sets. When new data is added to a dataset, make hash calculation on the data and insert the result to an HLL. Then, you can obtain the final result based on the HLL.

Table 13 compares HLL with other algorithms.

Table 13 Comparison between HLL and other algorithms

Image description

HLL has advantages over others in the computing speed and storage space requirement. In terms of time complexity, the sorting algorithm needs O(nlogn) time for sorting, and the hash algorithm and HLL need O(n) time for full table scanning. In terms of storage space requirements, the sorting algorithm and hash algorithm need to store raw data before collecting statistics, whereas the HLL algorithm needs to store only the HLL data structures rather than the raw data, thereby occupying a fixed space of about 16 KB.

Image description

Table 14 describes main HLL data structures.

Table 14 Main HLL data structures

Image description

When you create an HLL data type, 0 to 4 input parameters are supported. The parameter meanings and specifications are the same as those of the hll_empty function. The first parameter is log2m, indicating the logarithm of the number of buckets, and its value ranges from 10 to 16. The second parameter is log2explicit, indicating the threshold in explicit mode, and its value ranges from 0 to 12. The third parameter is log2sparse, indicating the threshold of the Sparse mode, and its value ranges from 0 to 14. The fourth parameter is duplicatecheck, indicating whether to enable duplicate check, and its value ranges from 0 to 1. When the input parameter is set to –1, the default value of the HLL parameter is used. You can run the \d or \d+ command to view the parameters of the HLL type.

Image description

Image description

Image description

Image description

The following describes HLL application scenarios.

·Scenario 1: “Hello World”

The following example shows how to use the HLL data type:

Image description

·Scenario 2: Collect statistics about website visitors.

·Scenario 3: The data to be inserted does not meet the requirements of the HLL data structure.

When inserting data into a column of the HLL type, ensure that the data meets the requirements of the HLL data structure. If the data does not meet the requirements after being parsed, an error will be reported. In the following example, E\1234 to be inserted does not meet the requirements of the HLL data structure after being parsed. As a result, an error is reported.

Image description

Top comments (0)