DEV Community

Cover image for Detecting Fake News with Python and Machine Learning.
Jason Omondi
Jason Omondi

Posted on


Detecting Fake News with Python and Machine Learning.

Hi ๐Ÿ‘‹! Nice to meet you devs ๐Ÿ˜„. Here, we will talk about how we can use python to detect fake news. This article aims to walk you through the process of creating a machine learning model using python and NLP to successfully detect fake news.
You will need to know the fundamentals of Python, data structures and algorithms, python libraries such as Pandas, NumPy, and Methods in Data Analysis for example Passive Aggressive Classifiers.

Key Terms

โ€ข TfidfVectorizer
โ€ข IDF (Inverse Document Frequency)
โ€ข TF (Term Frequency)
โ€ข Passive Aggressive Classifier

Preparing our dataset and work environment
First, we need to install a supported version of python. To do so, you can go to your browser and search โ€˜download python 3โ€™ and then install after downloading. This will vary with the type of Operating system you will be using on your machine.
Acquiring a suitable dataset presents one of the most crucial components of any data-science endeavor. These Data sets can be accessed on the Kaggle website.


Now after installing python, the libraries, and the dataset too, it is time to launch our text editors (you can use Jupyter notebook) and begin importing the libraries you need. We can now view the first four records. Using Python, we can see that the data-set is divided into the following columns: id, title, author, text, and label. The features that are of interest to us are the label and text columns. The text column contains the contents of the article, whereas the label column represents whether the article is factual or not. This has been pre-made for us in binary form using โ€˜1โ€™(โ€˜REALโ€™) and โ€˜0โ€™(โ€˜FAKEโ€™).
The PassiveAggressiveClassifier is to now be initialized in order to incorporate it into the model, we are going to use the โ€œy_trainโ€ and โ€œtfidf_trainโ€.

Initialize the PassiveAggressiveClassifier and fit training sets

pa_classifier=PassiveAggressiveClassifier(max_iter=50) (tfidf_train, y_train)
Enter fullscreen mode Exit fullscreen mode

Finally, using the vectorizer you can predict whether an article is reliable or not and we are going to calculate our modelโ€™s accuracy.

Predict and calculate accuracy

score=accuracy_score(y_test, y_pred)
print(f'Accuracy: {round(score*100,2)} %')
Enter fullscreen mode Exit fullscreen mode

You can now see the accuracy the model had while conducting its tests. Although you view the accuracy, you might not know the number of successful predictions and failures. In order to access such information, you can use a confusion matrix and draw conclusions. This can be easily done by:

Build confusion matrix

confusion_matrix(y_test, y_pred, labels=['FAKE','REAL'])


It is essential that such models are further advanced in order to effectively combat misinformation in these difficult times.

Top comments (0)

Timeless DEV post...

Git Concepts I Wish I Knew Years Ago

The most used technology by developers is not Javascript.

It's not Python or HTML.

It hardly even gets mentioned in interviews or listed as a pre-requisite for jobs.

I'm talking about Git and version control of course.

One does not simply learn git