DEV Community

f2010126
f2010126

Posted on

MindsDB Tutorial: Predicting the Genre of Books

Introduction

MindsDB introduces AI and machine learning into databases to help data teams – and every day data users – identify patterns, predict trends, and train models.

MindsDB Logo

The goal of this tutorial is to predict the genre of the book based on the synopsis of the book. The only pre-requisite knowledge is the basics of SQL and databases. Knowledge of Machine learning is a bonus but not a deal-breaker.

The workflow is divided into 3 main steps:

  1. Connect to the database
  2. Fire an SQL query to create and trade an ML predictor.
  3. Once training is complete, execute another SQL query to make predictions using the trained predictor.

Development Environment

As MindsDB offers a cloud version, we can sign up here, or we could use it locally via Pip or Docker. In this tutorial, we will be using MindsDB via the cloud to highlight the ease of access.
The only drawback is the upper limit on the number of rows in the data. MindsDB supports up to 10K rows and will raise an error for a larger number. Data files are restricted to a maximum size of 10MB.

Data Setup

Data

We are using the Book Genre Prediction data set from Kaggle, downloadable here .

About the data

This dataset contains a single file named data.csv containing the data. Data is divided across 4 columns, index , title, genre, summary. The data is about 10MB and has 4656 records. Data is of the following format.

# index title genre summary
1 Drowned Wednesday fantasy Drowned Wednesday is the first Trustee among the Morrow Days who is on Arthur's side and wishes the...
2 Violets are Blue thriller The book begins where Roses are Red ended, with Dr. Alex Cross at the home of murdered FBI Agent Be...
3 Starcross science Protagonist Arthur ("Art") Mumby and his older sister Myrtle are invited to the Starcross hotel on ...

Connecting to the datasource

For this tutorial, we connect to our database as a CSV file on the MindsDB cloud. We begin by downloading the csv data file here and uploading the file "data.csv".  Follow the guide to upload the file to MindsDB and name the table appropriately. I have used "BookGenres".

Once uploaded, you can run queries directly on the database. Let's start by previewing the data we will use to train the model:
SELECT * FROM files.BookGenres LIMIT 10;

For a data visualization, showing the distribution of data, click on DATA INSIGHTS.

Data insights

Training

Let's create and train the machine learning model. For that, we use the CREATE PREDICTOR syntax, where we specify what query to train FROM and what we want to learn to PREDICT. In this case, we will train using the 'title' ,'genre' and 'summary' features/columns in the table to predict the 'genre'.

CREATE PREDICTOR mindsdb.predict_genre_model FROM files (SELECT title, genre, summary FROM BookGenres) PREDICT genre;

It may take a couple of minutes for the training to complete. You can monitor the status of your model like this:

SELECT * FROM mindsdb.predictors WHERE name='predict_genre_model';

The AutoML will figure out the model, and you can use DESCRIBE to see how this model was built. MindsDB uses several candidate models to internally train the data and then picks up the most optimised one as the result model to make the predictions.

DESCRIBE PREDICTOR mindsdb.predict_genre_model.model;
The query lists out the candidate models used to train the data along with other metrics like performance, and accuracy.

To see the parameters used by MindsDB's AutoML pipeline execute the following SQL:

DESCRIBE PREDICTOR mindsdb.predict_genre_model.ensemble;
The result is a JSON listing the hyper-parameters and values used for training your candidate predictor.

Making Predictions

Once training is completed, your model is ready to predict. Prediction using MindsDB is easy as running an SQL query.

SELECT genre FROM mindsdb.predict_genre_model WHERE summary = "The Prince of no value Brishen Khaskem, prince of the Kai, has lived content as the nonessential spare heir to a throne secured many times over. A trade and political alliance between the human kingdom of Gaur and the Kai kingdom of Bast-Haradis requires that he marry a Gauri woman to seal the treaty. Always a dutiful son, Brishen agrees to the marriage and discovers his bride is as ugly as he expected and more beautiful than he could have imagined. The noblewoman of no importance Ildiko, niece of the Gauri king, has always known her only worth to the royal family lay in a strategic marriage. Resigned to her fate, she is horrified to learn that her intended groom isn’t just a foreign aristocrat but the younger prince of a people neither familiar nor human. Bound to her new husband, Ildiko will leave behind all she’s known to embrace a man shrouded in darkness but with a soul forged by light. Two people brought together by the trappings of duty and politics will discover they are destined for each other, even as the powers of a hostile kingdom scheme to tear them apart. (less)";

Conclusions

It is very easy to start making predictions with machine learning even without being a data scientist or having any formal knowledge of machine learning. It took me under an hour to get my model up and running.
Tools like MindsDB take away the burden of choosing the right model, the right parameters and training time so you can focus on the problem at hand and not the technical aspects.
Feel free to check this for yourself. The documentation is well written and the developer community is constantly improving the product.

Top comments (0)