EphraimX

Posted on Jan 29, 2023 • Edited on Jan 8, 2024

NLP: Real and Fake News Detection Using MindsDB and MongoDB.

#discuss #help

The world of machine learning and artificial intelligence is undergoing a revolution in terms of hypergrowth and the stretching of capabilities. OpenAI recently released ChatGPT(Generative Pre-Trained Transformer), a large language model (LLM) capable of producing human-like text based on the questions asked. This extremely intelligent chatbot is a perfect example of a natural language processing system (NLP).

NLP is a branch of artificial intelligence that aims to enable machines to understand human language (hence the term natural language), both spoken and written. As a result of this, our interactions with our everyday devices become easier, and we save time in the process. NLP applications include email spam detection, speech recognition, grammar checkers, chatbots, AI paraphrasers, and a slew of others.

Aside from the applications listed above, NLP can be used to distinguish between fake and real news. According to a study performed by MIT Sloan professor Sinan Aral and Deb Roy and Soroush Vosoughi of the MIT Media Lab, it was discovered that falsehoods are 70% more likely to be retweeted than the truth, and also reach the first 1500 people six times faster. Misinformation has the potential to cause unintended conflicts and to undermine a country's democratic process, particularly during an election.

In this article, you will learn how to distinguish between fake and authentic news using Hugging Face models in MindsDB. MongoDB will also be used as a datasource in this article. If you're not familiar with MongoDB, you can learn more about it here.

What is MindsDB

Photo by Natasha Connell Unsplash

MindsDB is an in-database machine learning platform that aims to make the process of building and deploying models as simple as possible. Previously, creating a machine learning model required several steps such as exploratory data analysis, feature selection, data training, model validation, and model deployment. When building models, MindsDB, on the other hand, offers a quick path from training to deployment.

You can run a SQL statement on your data in any of the supported MindsDB databases and create a predictor to predict the variable of interest. Overall, MindsDB assists organizations and individuals in reducing the time spent from model creation to model deployment.

Hugging Face + MindsDB

MindsDB collaborated with Hugging Face, an AI American company whose mission and goal is to democratize access to machine learning and make it more accessible to both organizations and individuals, to expand its capabilities and enable natural language processing in databases. The company is well-known for its sentence transformer models, which are used in natural language processing applications.

Detecting Real And Fake News Using MindsDB and MongoDB

Photo by Brett Jordan on Unsplash

Take a look at the scenario below to get a better idea of what MindsDB can do:

Trust News (TN) is a non-profit organization whose mission it is to debunk false news and assist users in distinguishing between fake and authentic news. To accomplish this, TN software engineers decided to follow the procedure outlined below:

Scrape news titles and content from traditional media outlets as well as social media pages.
Sort through this data with MindsDB to distinguish between fake and real news.
List the titles with a link to the original content, as well as the likelihood that it is real or fake.

Engineers at TN have successfully set up a pipeline to scrape new data on a consistent basis, and they are now moving on to the next stage of the process to distinguish between real and fake news. The scraped data is stored in a MongoDB database. In the following section, you will assist TN engineers in developing a demo system that will distinguish between real and fake news using MindsDB.

Prerequisites

You will need the following items to build this system for TN::

An Atlas MongoDB account. If you are unfamiliar with it, you can learn more about it here.
MongoDB Compass (version 1.33.1 was used for this article).
A MindsDB cloud account. More information is available in the official documentation.
A dataset made up of real and fake news data samples. The data is available on Kaggle here.

Setting Up MongoDB Atlas

To set up MongoDB Atlas:

Make a new project called TN News Detection.
Next, head over to the Database tab and follow the instructions to build a database.
- Click on Build A Database.
- Choose the desired pricing. Choose the FREE plan for this article.
- Select your desired cluster options, and change the cluster name to News-Detection.
- Next, go ahead to create a user by entering in a username and password.
- After that, enter your IP address and click Finish and Close to complete the process.
Then, choose Browse Collections to create a database and insert the appropriate collections.
- Select the Add My Own Data button on the popup that appears.
- Create a database and a collection called tn news detection and news data, respectively.

Uploading The Dataset

To add the dataset to the newly created database and collection, follow these steps:

Select the Connect option from the Database tab.
Next, select Connect using MongoDB Compass.
If you do not already have MongoDB Compass installed, you can do so now. Once installed, open MongoDB Compass and copy the connection string.
Select New Connection on your MongoDB Compass, and then paste the copied connection string into the space provided.
Click Advanced Connection Options, then Authentication, and enter your password. After that, click the Connect button.
.
After connecting successfully, select the tn news detection database, followed by the 'news data' collection.
Import both the True.csv and False.csv files into the collection one after the other.
You should see this on successful import.

Connecting MindsDB To MongoDB

To begin:

Launch MongoDB compass and create a new connection window.
Connect to your MindsDB cloud by setting the host to cloud.mindsdb.com, port to 27017, username to your MindsDB username/email and password to your MindsDB password.
On successful connection, open up the mongosh terminal below and run the command use mindsdb to make changes to the mindsdb database.

Still on the mongosh terminal, run the following command to insert your Atlas database as a datasource in MindsDB.

db.databases.insertOne({
  name: "mongo_tn_news_detection",
  engine: "mongodb",
  connection_args : {
    "port" : 27017,
    "host" : "mongodb+srv://<username>:<password>@<cluster host/ID>.mongodb.net",
    "database" : "tn_news_detection"
  }
})

You can get the connection host URI from MongoDB Atlas. If successful, you should see the following response:

{ acknowledged: true,
  insertedId: ObjectId("63d41c6a3f016fa32ce93bac") }

Creating The Model

Photo by DeepMind on Unsplash

As previously stated, the collaboration between MindsDB and Hugging Face expanded the former's natural language processing capabilities, and one of the outcomes is Zero-shot classification. In the absence of training data, zero-shot classification involves labeling text data.

Run the following command in your mongosh terminal to create a zero-shot classification model:

db.models.insertOne({
  name: 'real_fake_news_classification',
  predict: 'real_fake_news',
  training_options: {
    engine: 'huggingface',
    task: 'zero-shot-classification',
    model_name: 'facebook/bart-large-mnli',
    input_column: 'text',
    candidate_labels: ['Real News', 'Fake News']
  }
})

If you are successful, you should get the following response:

{
  acknowledged: true,
  insertedId: ObjectId("63f247de9cbaa30c341d99e7")
}

You can check the status of your model using the command:

db.getCollection('models').find({'name': 'real_fake_news_classification'})

The command will produce results similar to:

{
  NAME: 'real_fake_news_classification',
  ENGINE: 'huggingface',
  PROJECT: 'mindsdb',
  VERSION: 1,
  STATUS: 'generating',
  ACCURACY: null,
  PREDICT: 'real_fake_news',
  UPDATE_STATUS: 'up_to_date',
  MINDSDB_VERSION: '23.1.3.2',
  ERROR: null,
  SELECT_DATA_QUERY: null,
  TRAINING_OPTIONS: "{'target': 'real_fake_news', 'using': {'task': 'zero-shot-classification', 'model_name': 'facebook/bart-large-mnli', 'input_column': 'text', 'candidate_labels': ['Real News', 'Fake News']}}",
  TAG: null,
  _id: ObjectId("000000000000007477395456")
}

The model's response above indicates that it is still being generated; once completed, the model's status will read complete.

Running Predictions

Photo by Jen Theodore on Unsplash

To predict real or fake news using the dataset in the MongoDB database, run the following command in your mongosh terminal.

db.real_fake_news_classification.find(
    {'collection': 'mongo_tn_news_detection.news_data'},
    {'real_fake_news_classification.real_fake_news': 'real_fake_news',
     'news_data.text': 'text'
    }
)

The query above applies the model developed above to all of the texts in the news_data collection and predicts whether they are real or fake news. Depending on the number of records in the collection, this may take some time to run.

Take a look at the syntax of the query above to better understand it:

db.model_name.find(
    {'collection': 'database_integration_name.collection_name'},
    {'model_name.predict_variable': 'predict_variable',
     'collection_name.field_name': 'field_name'
    }
)

Following execution, you should receive the following response:

{ real_fake_news: 'Real News', text: 'some text...' }
{ real_fake_news: 'Real News', text: 'some text...' }
{ real_fake_news: 'Fake News', text: 'some text...' }

Conclusion

You built a natural language processing system with MindsDB and Hugging Face in this article, using MongoDB as a datasource. With this demo, you can instruct the engineers at Truth News on how to build a system to solve misinformation problems by classifying news content as fake or real.

If you want to learn more about the possibilities made possible by MindsDB and Hugging Face, you can look here. You can also ask questions and share ideas with the team and other community members by joining the community group on Slack. That's all for now; I'll see you in my next article.

DEV Community