DEV Community

Chris Mwalimo
Chris Mwalimo

Posted on

Detecting Fake News with Python and Machine Learning

Hello There!

We always stay updated on current and arising matters through different means but all bring us back to one main piece: News. News gives every bit of information needed to stay updated but not every piece of information is legit or trust-worthy. Anyone can pass info that is fake; this might cause great confusion and unnecessary fits. Don't you worry though, Python comes to the rescue!

A prediction model for fake news can be created with Python (for Machine Learning). The model takes a dataset and returns data telling us if the news we get is legit or fake so get ready, lets begin coding.

Fake News Prediction Model with Python
Before beginning, ensure you have the following libraries required for the model to work:

  • sklearn
  • numpy
  • pandas
  • itertools

You can check for this in Command Prompt or Terminal(MacOS):

pip list
Enter fullscreen mode Exit fullscreen mode

If you don't have the packages, you can install by simply running this code on command prompt(ensure you run it on Administrator to allow installation automatically):

pip install sklearn numpy pandas
Enter fullscreen mode Exit fullscreen mode

After all installation is done, open a new Jupyter Notebook on the code editor and lets get coding!

Opening a new Jupyter Notebook on VSCode

  1. Ctrl + Shift + P to open the Command Palette.
  2. Type in "Jupyter Notebook" and a command Create: New Jupyter Notebook will pop up. Press Enter
  3. A new Jupyter Notebook with the file extension .ipynb will be created. Now you can work on the prediction model with less hassle.

1. Libraries

Now we can import the libraries required after installing them by writing:

import pandas as pd
import numpy as np
import itertools
from sklearn.model_selection import train_test_split
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.feature_extraction.text import TfidfVectorizer
Enter fullscreen mode Exit fullscreen mode
  • Scikit_learn(sklearn) = This is a machine learning library for Python and we will implement various methods to get our output and make predictions.
  • Pandas: This is a library for Python used for data analysis and manipulation.
  • Itertools: This is a Python module used to iterate over data structures which can be stepped over using a for-loop.
  • Numpy: This is a library for Python used for scientific computing.

2. Reading dataset

Using pandas, we can read data from a .csv file and have our dataset ready:

df = pd.read_csv('magazines.csv')
Enter fullscreen mode Exit fullscreen mode

After reading the file containing the required dataset, we can get the first five rows of the dataset by using head() and get the number of rows and columns as a tuple using shape:

df.head() 
df.shape
Enter fullscreen mode Exit fullscreen mode

The labels containing REAL and FAKE will be importance in the prediction model so we get them:

labels = df.label
labels.head()
Enter fullscreen mode Exit fullscreen mode

3. Splitting dataset

After getting the labels from our dataframe, the next step is splitting it into training and testing sets using train_test_split:

x_train, x_test, y_train, y_test = train_test_split(df['text'], labels, test_size = 0.2, random_state = 7)
Enter fullscreen mode Exit fullscreen mode

4. Initializing Tfidf_Vectorizer

Tfidf_Vectorizer converts collection of raw materials into a matrix of Term frequency(TD)-Inverse Document frequency(IDF) features.

tfidf_vectorizer = TfidfVectorizer(stop_words = 'english', 
max_df = 0.7)
Enter fullscreen mode Exit fullscreen mode

After initializing tfidf_vectorizer, you can fit and transform the sets using fit_transform() and transform():

tfidf_train= tfidf_vectorizer.fit_transform(x_train)
tfidf_test = tfidf_vectorizer.transform(x_test)
Enter fullscreen mode Exit fullscreen mode

5. Initializing PassiveAggresiveClassifier

PassiveAggressiveClasifier is an online learning algorithm that remains passive for correct classification outcome and turns aggressive incase of miscalculations, updating and adjusting.
After initializing the algorithm, fit in the training set and predict test set.

pac = PassiveAggressiveClassifier(max_iter = 50)
pac.fit(tfidf_train, y_train)       

y_pred = pac.predict(tfidf_test)
Enter fullscreen mode Exit fullscreen mode

6. Calculating accuracy score of the prediction model and confusion matrix

The accuracy score tells how accurate the news we have is while the confusion matrix returns true and false negatives and positives.

score = accuracy_score(y_test, y_pred)
confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])

print(f"Accuracy: {round(score*100, 2)}%")
Enter fullscreen mode Exit fullscreen mode

Conclusion:
Congratulations! You have built your prediction model that confirms if news is fake or not. This is quite the advanced model so take your time to go through the code, learn something new and get to understand some algorithms and functions used in machine learning to create spectacular models.
Happy coding!

Top comments (0)