The world of machine learning and artificial intelligence is undergoing a revolution in terms of hypergrowth and the stretching of capabilities. OpenAI recently released ChatGPT(Generative Pre-Trained Transformer), a large language model (LLM) capable of producing human-like text based on the questions asked. This extremely intelligent chatbot is a perfect example of a natural language processing system (NLP).
NLP is a branch of artificial intelligence that aims to enable machines to understand human language (hence the term natural language), both spoken and written. As a result of this, our interactions with our everyday devices become easier, and we save time in the process. NLP applications include email spam detection, speech recognition, grammar checkers, chatbots, AI paraphrasers, and a slew of others.
Aside from the applications listed above, NLP can be used to distinguish between fake and real news. According to a study performed by MIT Sloan professor Sinan Aral and Deb Roy and Soroush Vosoughi of the MIT Media Lab, it was discovered that falsehoods are 70% more likely to be retweeted than the truth, and also reach the first 1500 people six times faster. Misinformation has the potential to cause unintended conflicts and to undermine a country's democratic process, particularly during an election.
In this article, you will learn how to distinguish between fake and authentic news using Hugging Face models in MindsDB. MongoDB will also be used as a datasource in this article. If you're not familiar with MongoDB, you can learn more about it here.
What is MindsDB
Photo by Natasha Connell Unsplash
MindsDB is an in-database machine learning platform that aims to make the process of building and deploying models as simple as possible. Previously, creating a machine learning model required several steps such as exploratory data analysis, feature selection, data training, model validation, and model deployment. When building models, MindsDB, on the other hand, offers a quick path from training to deployment.
You can run a SQL statement on your data in any of the supported MindsDB databases and create a predictor to predict the variable of interest. Overall, MindsDB assists organizations and individuals in reducing the time spent from model creation to model deployment.
Hugging Face + MindsDB
MindsDB collaborated with Hugging Face, an AI American company whose mission and goal is to democratize access to machine learning and make it more accessible to both organizations and individuals, to expand its capabilities and enable natural language processing in databases. The company is well-known for its sentence transformer models, which are used in natural language processing applications.
Detecting Real And Fake News Using MindsDB and MongoDB
Photo by Brett Jordan on Unsplash
Take a look at the scenario below to get a better idea of what MindsDB can do:
Trust News (TN) is a non-profit organization whose mission it is to debunk false news and assist users in distinguishing between fake and authentic news. To accomplish this, TN software engineers decided to follow the procedure outlined below:
- Scrape news titles and content from traditional media outlets as well as social media pages.
- Sort through this data with MindsDB to distinguish between fake and real news.
- List the titles with a link to the original content, as well as the likelihood that it is real or fake.
Engineers at TN have successfully set up a pipeline to scrape new data on a consistent basis, and they are now moving on to the next stage of the process to distinguish between real and fake news. The scraped data is stored in a MongoDB database. In the following section, you will assist TN engineers in developing a demo system that will distinguish between real and fake news using MindsDB.
Prerequisites
You will need the following items to build this system for TN::
- An Atlas MongoDB account. If you are unfamiliar with it, you can learn more about it here.
- MongoDB Compass (version 1.33.1 was used for this article).
- A MindsDB cloud account. More information is available in the official documentation.
- A dataset made up of real and fake news data samples. The data is available on Kaggle here.
Setting Up MongoDB Atlas
To set up MongoDB Atlas:
-
Next, head over to the
Database
tab and follow the instructions to build a database.- Click on
Build A Database
. - Choose the desired pricing. Choose the FREE plan for this article.
- Select your desired cluster options, and change the cluster name to
News-Detection
. - Next, go ahead to create a user by entering in a username and password.
- After that, enter your IP address and click
Finish and Close
to complete the process.
- Click on
-
Then, choose
Browse Collections
to create a database and insert the appropriate collections.
- Select the
Add My Own Data
button on the popup that appears. - Create a database and a collection called
tn news detection
andnews data
, respectively.
- Select the
Uploading The Dataset
To add the dataset to the newly created database and collection, follow these steps:
If you do not already have MongoDB Compass installed, you can do so now. Once installed, open MongoDB Compass and copy the connection string.
Select
New Connection
on your MongoDB Compass, and then paste the copied connection string into the space provided.
Click
Advanced Connection Options
, thenAuthentication
, and enter your password. After that, click theConnect
button.
.After connecting successfully, select the
tn news detection
database, followed by the 'news data' collection.
Import both the
True.csv
andFalse.csv
files into the collection one after the other.
Connecting MindsDB To MongoDB
To begin:
- Launch MongoDB compass and create a new connection window.
- Connect to your MindsDB cloud by setting the host to
cloud.mindsdb.com
, port to27017
, username to yourMindsDB username/email
and password to yourMindsDB password
. - On successful connection, open up the
mongosh
terminal below and run the commanduse mindsdb
to make changes to themindsdb
database. -
Still on the
mongosh
terminal, run the following command to insert your Atlas database as a datasource in MindsDB.
db.databases.insertOne({ name: "mongo_tn_news_detection", engine: "mongodb", connection_args : { "port" : 27017, "host" : "mongodb+srv://<username>:<password>@<cluster host/ID>.mongodb.net", "database" : "tn_news_detection" } })
You can get the connection host URI from MongoDB Atlas. If successful, you should see the following response:
{ acknowledged: true, insertedId: ObjectId("63d41c6a3f016fa32ce93bac") }
Creating The Model
As previously stated, the collaboration between MindsDB and Hugging Face expanded the former's natural language processing capabilities, and one of the outcomes is Zero-shot classification. In the absence of training data, zero-shot classification involves labeling text data.
Run the following command in your mongosh terminal to create a zero-shot classification model:
db.models.insertOne({
name: 'real_fake_news_classification',
predict: 'real_fake_news',
training_options: {
engine: 'huggingface',
task: 'zero-shot-classification',
model_name: 'facebook/bart-large-mnli',
input_column: 'text',
candidate_labels: ['Real News', 'Fake News']
}
})
If you are successful, you should get the following response:
{
acknowledged: true,
insertedId: ObjectId("63f247de9cbaa30c341d99e7")
}
You can check the status of your model using the command:
db.getCollection('models').find({'name': 'real_fake_news_classification'})
The command will produce results similar to:
{
NAME: 'real_fake_news_classification',
ENGINE: 'huggingface',
PROJECT: 'mindsdb',
VERSION: 1,
STATUS: 'generating',
ACCURACY: null,
PREDICT: 'real_fake_news',
UPDATE_STATUS: 'up_to_date',
MINDSDB_VERSION: '23.1.3.2',
ERROR: null,
SELECT_DATA_QUERY: null,
TRAINING_OPTIONS: "{'target': 'real_fake_news', 'using': {'task': 'zero-shot-classification', 'model_name': 'facebook/bart-large-mnli', 'input_column': 'text', 'candidate_labels': ['Real News', 'Fake News']}}",
TAG: null,
_id: ObjectId("000000000000007477395456")
}
The model's response above indicates that it is still being generated; once completed, the model's status will read complete
.
Running Predictions
Photo by Jen Theodore on Unsplash
To predict real or fake news using the dataset in the MongoDB database, run the following command in your mongosh terminal.
db.real_fake_news_classification.find(
{'collection': 'mongo_tn_news_detection.news_data'},
{'real_fake_news_classification.real_fake_news': 'real_fake_news',
'news_data.text': 'text'
}
)
The query above applies the model developed above to all of the texts in the news_data
collection and predicts whether they are real or fake news. Depending on the number of records in the collection, this may take some time to run.
Take a look at the syntax of the query above to better understand it:
db.model_name.find(
{'collection': 'database_integration_name.collection_name'},
{'model_name.predict_variable': 'predict_variable',
'collection_name.field_name': 'field_name'
}
)
Following execution, you should receive the following response:
{ real_fake_news: 'Real News', text: 'some text...' }
{ real_fake_news: 'Real News', text: 'some text...' }
{ real_fake_news: 'Fake News', text: 'some text...' }
Conclusion
You built a natural language processing system with MindsDB and Hugging Face in this article, using MongoDB as a datasource. With this demo, you can instruct the engineers at Truth News on how to build a system to solve misinformation problems by classifying news content as fake or real.
If you want to learn more about the possibilities made possible by MindsDB and Hugging Face, you can look here. You can also ask questions and share ideas with the team and other community members by joining the community group on Slack. That's all for now; I'll see you in my next article.
Top comments (0)