DEV Community

Cover image for Analyzing Reactions on Political Issues in Social Media Using Hierarchical and K-Means Clustering Methods
Vidushi Gupta
Vidushi Gupta

Posted on

Analyzing Reactions on Political Issues in Social Media Using Hierarchical and K-Means Clustering Methods

Section wise summary of the research paper titled "Analyzing Reactions on Political Issues in Social Media Using Hierarchical and K-Means Clustering Methods" by Edi Irawan, Teddy Mantoro, Media Anugerah Ayu, M. Agni Catur Bhakti, I Komang Yogi Trisna Permana

Abstract

  • The author recognizes that social media usage is getting larger every day.
  • Twitter allows people to post their opinions and thoughts in the form of tweets. This makes it a valuable source of data.
  • Twitter data has multiple dimensions which make it difficult to cluster. Thus, this arises a need to find the most appropriate method to cluster the data.
  • This paper clusters data based on most common words that users use when reacting to a political issue.
  • The study shows a comparison between hierarchical and k-means clustering.

Introduction

  • Social media posts can be facts, opinions or news.
  • Twitter user numbers are increasing worldwide and thus it makes Twitter the best data source. It provides textual content from a large number of users.
  • Political issues is one of the important topics that is discussed among Twitter users. In order to monitor the trend, data should be scraped from Twitter and a result can be obtained by using statistics and analytics.
  • Twitter data should not be automatically clustered as:
    • Unorganized dataset
    • Has too many dimensions
    • Contains noise
  • Reasons for choosing clustering for this study:
    • Variable targets are not set
    • Variable targets are not labelled
    • Varible targets can be identified using unsupervised learning algorithms = clustering
  • Clustering based on most common similar words tweeted by users
  • Software used: RStudio IDE
  • Language used: R
  • Packaged used: TwitterR, tm

Literature Review

Twitter as Valuable and Significant Dataset Source

  • Uses of Twitter data:
    • To cluster data to find main topics of interest
    • Compare two clustering methods on text
    • To cluster users for market research
    • To analyze trends of a certain event
    • To perform sentiment analysis

Hierarchical Clustering and K-Means as Effective

Clustering Methods

  • Hierarchical clustering:

    • Use Euclidean distance for hierarchical clustering
    • To choose the most appropriate number of cluster groups, use variation of cluster validity measure and statistical model selection.
  • K-means:

    • Most common clustering method for big data
    • Can be used for large datasets.
    • Accuracy is dependable based on the use of initialization algorithms
    • Clustering is fast by improving it

Methodology

  1. Acquire data from Twitter using the Twitter API.
  2. Keywords used: Trump Impeachment Inquiry.
  3. Preprocess the data obtained. Deal with noise like: Different cases in alphabets, punctuations, numbers and stop words.
  4. Clustering using Hierarchical Cluster and K-Means and Main Topics Extraction:
    • Read the text file with raw data
    • Clean and pre-process to form the document matrix
    • Define Euclidean distance of each element

Euclidean distance formula

  • Visualize the important words to check for some trivial words which can be eliminated as they are not important for analysis
  • Form Clusters

Conclusion

  • Clustering can be done using hierarchical and K-means clustering.
  • Good clusters are produced for K-means with 5 centroids.
  • Main topics determined by ranking terms used by users to show the relevant result.

Reference

E. Irawan, T. Mantoro, M. A. Ayu, M. A. Catur Bhakti and I. K. Y. T. Permana, "Analyzing Reactions on Political Issues in Social Media Using Hierarchical and K-Means Clustering Methods," 2020 6th International Conference on Computing Engineering and Design (ICCED), 2020, pp. 1-5, doi: 10.1109/ICCED51276.2020.9415839.

Top comments (1)

Collapse
 
curiouspaul1 profile image
Curious Paul

Great piece!