Hi, I'm Vini! Nice to meet you! I'm from Brazil and I am soon graduating from Minerva Schools at KGI, an American university whose undergraduate program includes living in seven different countries across the world! For my senior thesis, I turned a critical eye to the reproducibility of the nascent literature on hate speech classification and provided an application of transfer learning for this classification task. You can find my official-looking abstract below!
In recent years, the spread of hate speech has attracted the attention of academia, governments, and industry alike in response to its dire consequences, online and otherwise. For example, the United Nations Human Rights Council (2018) implicated social media platforms in contributing to the genocide in Myanmar by allowing the systematic dissemination of hatred against Rohingya Muslims. The sheer scale of the generation of online content motivates the necessity of automatic moderation mechanisms, such as the classification and removal of hate speech. The first achievement of this research project is to provide a critical evaluation of the state of the art of the hate speech classification literature by replicating and analysis of three prominent papers. The second contribution is to provide the blueprint of a generalizable training strategy for a hate speech classifier that incorporates transfer learning strategies in Natural Language Processing (NLP).
A Critical Evaluation of Hate Speech Classifiers
This contains replication and auxiliary resources to my Senior thesis, which is available here. My goal was to replicate three prominent papers in the field of hate speech classification and provide an application of transfer learning via ULMFiT to this task.
Table of Contents
- Data Preprocessing
- Training and Testing Classifier
- Data Augmentation
- Augmented ULMFit
- CV Augmented ULMFit
99. Early Implementation with Class Breakdown
1. Data Preprocessing.
Focuses on showing text data preprocessing step by step. It makes use of helper functions in
utils.py available in this repo. The data preprocessing in later notebooks is largely hidden under the hood due to fast.ai's API.
2. Training and Testing Classifier.
Aims to replicate Hemker (2018) and makes heavy use of helper functions in
classifier_utils.py. The notebooks demonstrates how the parameters reported by Hemker (2018) do not yield a funtional classifier.
I have a very short slide deck with takeaways here. Generally, it's just scary how fragile research is when you take a very close look and try to replicate it. I changed my project slightly midway because my original goal of building extensions to the classifiers I investigate became a bit futile. On the other hand, I truly enjoyed learning more about natural language processing and deep learning through this project, especially given my deep interest in the topic of hate speech.