DEV Community

Ulhak
Ulhak

Posted on

TWEET ANALYSYS ALGORITHM

The algorithm takes as input a sample space of ‘n’ predefined number of tweets. It also takes the highest trending ‘x’ number of trends. The output of running the algorithm is the tweets of the sample space ranked with a deceasing description index.

The algorithm also uses two dictionaries. The first dictionary contains the list of the words, which have less significance to the content description and are more grammatical tools, namely articles, prepositions and conjunctions. The second dictionary consists of all common nouns, adjectives, adverbs, verbs and their derivatives. The former will be called ‘filter’ and the latter ‘cnfilter’ hereon.

The used sample space is placed in a file, separated by an end of tweet character, like ‘%%’. Once the tweets are acquired, the frequency of every word that is used in the file containing the tweet sample space is found. This would exclude the ‘#’ tags and the ‘@’ tags. The URLs in the tweets are also ignored while finding the frequencies. Hence the list of words and their corresponding frequencies is prepared and stored.

It is now to check for association of the highest trending tweet with the other high trending tweets. Tweets about the same event, or person, hold useful content and can be assumed to contain more relevant data. The tweets with a high trending hashtag along with the highest trending hashtag for the second time are used to collect the frequency so as to update the previously generated frequency table.

Once the frequency list is obtained we perform a rating on the words to find its weighted score. This weighted score is used to get the cumulative score of each tweet, which can be used to rank the tweets according to its content relevance.

Also proposed is a way to learn from the newer tweets about the hashtag and get more accurate tweet ranking.

Top comments (0)