DEV Community

Lawrence Juma
Lawrence Juma

Posted on

Detecting Fake News with Python and Machine Learning

1. Introduction

 
We consume a lot of news throughout the day from various mediums, but it can be difficult to tell which is fake news and which is genuine news.

Do you rust the information you get from online media?
Every piece of news we hear is false. If we listen to or read fake news, we are gathering incorrect information from the world, which can have an impact on society because a person's perspective or thoughts can change after reading or listening to fake news that the user believes to be true.

Because not all news we encounter in our daily lives is authentic, how do we determine whether the news is fake or real?

Here, we will concentrate on text-based news in order to create a module that will assist us in determining whether a given piece of news is fake or real.

2. The article's terminologies

2.1 What exactly is fake news?

Fake news refers to news that is likely to be a hoax and is widely disseminated via social media and other online media. This is typically done to further impose certain ideas and is frequently archived alongside political agendas.
Such news items may contain false and misrepresented cases, and they may end up being virtualized by calculations, trapping clients in a channel bubble.
**
2.2 What exactly is a TfidfVectorizer?**
Term Frequency (TF): The Term Frequency of a word is the number of times it appears in a document. A higher value indicates that a term appears more frequently than others, and thus the document is a good match when the term appears in the search terms.
IDF (Inverse Document Frequency): Words that appear frequently in one document but also appear frequently in many others, which may be irrelevant. IDF is a measure of how important a term is in the context of the entire corpus.
The TfidfVectorizer converts a set of raw documents into a TF-IDF feature matrix.

What precisely is a Passive Aggressive Classifier?
Online learning algorithms are Passive Aggressive algorithms. In the case of a correct classification outcome, such an algorithm remains passive and becomes aggressive in the event of a miscalculation, updating, and adjusting. It does not converge, unlike most other algorithms. Its goal is to make updates that correct the loss while causing very little change in the weight vector's norm. 

3. Project

 
We need to build a machine learning model to accurately classify a collection of news as real or fake.
To detect fake or real news, we will create a project in Python using'sklearn,' and we will use 'TfidfVectorizer' in our news data, which we will gather from online media.
Following the completion of the first step, we will initialize the classifier, as well as transform and fit the model. Finally, we will compute the model's performance using the appropriate performance matrix/matrices. We will be able to see how well our model performs once we have calculated the performance matrices.
The practical application of these tools is very simple, as will be demonstrated in this article step by step.
Let’s start.

**

3.1 Data Analysis

**
Here I will explain the dataset.
In this python project, we have used the CSV dataset. The dataset contains ***** rows and *** columns.
This dataset has four columns,
1. Title: this represents the title of the news.
2. Text: this column has the news itself.
3. Label: this is a binary column representing if the news is fake (1) or real (0).
The dataset is open-sourced and can be found here

3.2 Libraries

The very basic data science libraries are sklearn, pandas, NumPy e.t.c and some specific libraries such as transformers.

Top comments (0)