DEV Community

MustafaLSailor
MustafaLSailor

Posted on

Apriori

The Apriori algorithm is a frequently used algorithm in data mining and is often used to determine the frequency of sets of items in a data set. It is especially used in applications such as market basket analysis.

The apriori algorithm gets its name from the term "a priori", which means "known in advance". This points to the basic logic of the algorithm: If a set of items occurs frequently, it means that its subsets will also appear frequently. Conversely, if a set of items is rarely seen, its supersets will also be rarely seen.

The algorithm generally consists of two steps:

It calculates the frequency of individual items and keeps those above a certain threshold.
It creates larger sets of items and calculates their frequency, discarding those below the threshold.
This process continues until no more itemsets can be created. As a result, frequently occurring item clusters are identified.

The Apriori algorithm is one of a number of algorithms used to find relationships in databases. However, it can be slow on large data sets because there is a need to calculate the frequency of entire sets of items.

min support

"Min support" is a term used in relational learning and data mining algorithms such as the Apriori algorithm. The minimum support value determines the acceptable frequency of an item or set of items.

More specifically, minimum support refers to the proportion of a set of items in total transactions. For example, in a grocery basket analysis, the co-occurrence (support) of the items 'bread' and 'milk' is calculated as a percentage of total shopping transactions.

The minimum support value is usually determined by the user, and itemsets with a frequency above this value are considered "frequent" or "interesting" by the algorithm. Item sets below this value are generally discarded.

This parameter helps prevent misleading or random relationships from being detected. However, if too high a minimum support value is set, some potentially interesting relationships may be missed. Therefore, determining the minimum support value is often a matter of care and balance.

Image description

To explain from this picture, we have a database. This table contains products bought together. and our minimum support rate is set at 50 percent, that is, it must exceed half of the total tightness.
If we look at table c1, we write the number of each different product mentioned in the DB. Then those who are below 50 percent will be eliminated.
4 here has a frequency of 1. We have a maximum frequency of 3, half of which is 1.5, so 4 has been eliminated. Then we move to table c2, there are binary transactions here too. There are also their frequencies in the L2 table. The maximum frequency is still 3 and 12 and 15 are 1, so {1,2} and {1,5} were eliminated. We continue in this way. and the group of 3 that repeats the most is {2,3,5}.

Top comments (0)