Hi ๐! Nice to meet you devs ๐. Here, we will talk about how we can use python to detect fake news. This article aims to walk you through the process of creating a machine learning model using python and NLP to successfully detect fake news.
You will need to know the fundamentals of Python, data structures and algorithms, python libraries such as Pandas, NumPy, and Methods in Data Analysis for example Passive Aggressive Classifiers.
Key Terms
โข TfidfVectorizer
โข IDF (Inverse Document Frequency)
โข TF (Term Frequency)
โข Passive Aggressive Classifier
Preparing our dataset and work environment
First, we need to install a supported version of python. To do so, you can go to your browser and search โdownload python 3โ and then install after downloading. This will vary with the type of Operating system you will be using on your machine.
Dataset
Acquiring a suitable dataset presents one of the most crucial components of any data-science endeavor. These Data sets can be accessed on the Kaggle website.
Coding
Now after installing python, the libraries, and the dataset too, it is time to launch our text editors (you can use Jupyter notebook) and begin importing the libraries you need. We can now view the first four records. Using Python, we can see that the data-set is divided into the following columns: id, title, author, text, and label. The features that are of interest to us are the label and text columns. The text column contains the contents of the article, whereas the label column represents whether the article is factual or not. This has been pre-made for us in binary form using โ1โ(โREALโ) and โ0โ(โFAKEโ).
The PassiveAggressiveClassifier is to now be initialized in order to incorporate it into the model, we are going to use the โy_trainโ and โtfidf_trainโ.
Initialize the PassiveAggressiveClassifier and fit training sets
pa_classifier=PassiveAggressiveClassifier(max_iter=50)
pa_classifier.fit (tfidf_train, y_train)
Finally, using the vectorizer you can predict whether an article is reliable or not and we are going to calculate our modelโs accuracy.
Predict and calculate accuracy
y_pred=pa_classifier.predict(tfidf_test)
score=accuracy_score(y_test, y_pred)
print(f'Accuracy: {round(score*100,2)} %')
You can now see the accuracy the model had while conducting its tests. Although you view the accuracy, you might not know the number of successful predictions and failures. In order to access such information, you can use a confusion matrix and draw conclusions. This can be easily done by:
Build confusion matrix
confusion_matrix(y_test, y_pred, labels=['FAKE','REAL'])
Conclusion.
It is essential that such models are further advanced in order to effectively combat misinformation in these difficult times.
Top comments (0)