DEV Community

Cover image for Predicting Water Purity using MindsDB
Hridya for Learn Earn & Fun

Posted on • Edited on

Predicting Water Purity using MindsDB

Image description

Introduction

If you don't know what MindsDB is, Go checkout my blog to know more

Basically, MindsDB is an Open-Source AI Layer for existing Databases.

It's an AI Layer for traditional databases such as PostgreSQL, MariaDB, MySQL, etc..

In this tutorial, We are going to be predicting the purity of water based on several parameters by a dataset found in kaggle!

Importing Data into MindsDB Cloud

In order to import the dataset to MindsDB Cloud, we need to first download it from Kaggle and then upload it simply to MindsDB using the steps mentioned below.

Step 1: Create a MindsDB Cloud Account, If you already haven't done so

Image description

Step 2: Download this Dataset

Image description

Step 3: Go into Add Data -> Files -> Import File
Lastly Add the dataset (After downloading you will get a .zip file, You have to extract it and import the csv inside)

Image description

Step 4: Name the Table: WaterPU (You can name it anything you like!)

Image description

Step 5: To verify the dataset has successfully imported in:

Go into the Editor Tab and run this command:

SHOW TABLES FROM files;
Enter fullscreen mode Exit fullscreen mode

If you see WaterPU or whatever you named it that means it's successfully imported!

Image description

Training a Model

MindsDB provides very simple SQL queries to carry out different tasks in its interface. So, we will now proceed with the steps below to get ready with the model.

Step 1: Create a Model, we will be creating a Predictor Model.
For that MindsDB provides a syntax:

CREATE PREDICTOR mindsdb.predictor_name       (Your Predictor Name)
FROM database_name                            (Your Database Name)
(SELECT columns FROM table_name LIMIT 10000)  (Your Table Name)
PREDICT target_parameter;                     (Your Target Parameter)
Enter fullscreen mode Exit fullscreen mode

Simply replace the paramaters with the ones you want to use

The Actual query for me, looks like this:

CREATE PREDICTOR mindsdb.water_purity
FROM files 
(SELECT * FROM WaterPU LIMIT 10000)
PREDICT Potability;
Enter fullscreen mode Exit fullscreen mode

Image description

NOTE: We are predicting Potability as that's in the dataset, if you write anything else, e.g. Purity, Quality

You won't be able to Predict the Purity!

Step 2: Based on the size of the dataset, it might take some time.

There's 3 stages once you run the command to create the model:

  1. Generating: The model's generating!
  2. Training: Model is getting trained with the dataset
  3. Complete: The model is ready to do predictions

To check the status, this is the syntax:

SELECT status
FROM mindsdb.predictors
WHERE name='Name_of_the_Predictor';
Enter fullscreen mode Exit fullscreen mode

The Actual Query looks something like this:

SELECT status
FROM mindsdb.predictors
WHERE name='water_purity';
Enter fullscreen mode Exit fullscreen mode

Once it returns complete we can start predicting with it!

Image description

Describe the Model

Before we proceed to the final part of predicting the water quality, let us first understand the model that we just trained.

MindsDB provides the following 3 types of descriptions for the model using the DESCRIBE statement.

  1. By Features
  2. By Model
  3. By Model Ensemble

By Features

DESCRIBE mindsdb.predictor_model_name.features;
Enter fullscreen mode Exit fullscreen mode

Image description

This query shows the role of each column for the predictor model along with the type of encoders used on the columns while training.

By Model

DESCRIBE mindsdb.predictor_model_name.model;
Enter fullscreen mode Exit fullscreen mode

Image description

This query shows the list of all the underlying candidate models that were used during training. The one with the best performance (whose value is 1), is selected. You can see the value 1 for the selected one in the selected column while others are set at 0.

By Model Ensemble

DESCRIBE mindsdb.predictor_model_name.ensemble;
Enter fullscreen mode Exit fullscreen mode

Image description

This query gives back a JSON output that contains the different parameters that helped to choose the best candidate model for the Predictor Model.

As we are done now understanding our Predictor model, let's move on to prediciting values in the next section.

Predicting the Target Value

Predicitng the water Purity/Quality/Potability is as easy as running a simple SELECT statement using the Predictor.

As water purity depends on many feature parameters, It is preferred to enter all the parameters, However we can go forward by passing a few of them

The syntax for the query will be something like this:

SELECT target_value_name, target_value_confidence, target_value_confidence
FROM mindsdb.predictor_name
WHERE feature1=value1 AND feature2=value 2,...;
Enter fullscreen mode Exit fullscreen mode

Now, replacing the variables in the above query, the actual query will be like this.

SELECT Potability,Potability_confidence,Potability_explain
FROM mindsdb.water_purity
WHERE ph=2.6 AND Hardness=210 AND Solids=18645.233 AND Chloramines=6.546;

Enter fullscreen mode Exit fullscreen mode

Image description

As the predicted Purity is 0, this water is not safe for human consumption.

We will now pass all the required feature parameters to obtain a more accurate prediction of the water quality. So, the query now becomes something like this.

SELECT Potability,Potability_confidence,Potability_explain
FROM mindsdb.water_purity
WHERE ph=6.9 AND Hardness=201 
AND Solids=11350.675 AND Chloramines=4.3 AND Sulfate=NULL 
AND Conductivity=467.5 AND Organic_carbon=9.98 AND Trihalomethanes=89.686 AND Turbidity=4.99;
Enter fullscreen mode Exit fullscreen mode

Image description

As the predicted Purity is 1, this water is safe for human consumption.

Fantastic! We have now successfully predicted the water quality using a Predictor.

target_parameter: This returns the value we want to predict.
target_parameter_confidence: This returns how confident the model is about the Prediction.
target_parameter_explain: This returns all the details about the predicted target_value.

Conclusion

This concludes the tutorial here. Before we wrap this up, let's do a quick recap of what we did here. We first started with creating a MindsDB Cloud account, fed the dataset and created a table using the cloud UI, trained a Predictor model, described its model features and finally predicted the target water purity value.

Lastly, before you leave, I would love to know your feedback in the Comments section below and would be really motivated if you drop a LIKE on this article.

Top comments (0)