DEV Community

Cover image for Predict Diamond prices with SQL Alchemy and MindsDB
Odumuyiwa Teslim
Odumuyiwa Teslim

Posted on

Predict Diamond prices with SQL Alchemy and MindsDB

Introduction

In this tutorial, we will learn how to create, train and query a machine-learning model using a Python library called SQLAlchemy. SQLAlchemy is the most popular open-source library for working with relational databases from Python. It is one of the ORM libraries that provides an interface for using object-oriented programming to interact with a database. MindsDB is an open-source machine learning tool that brings predictive capabilities to your database. For this tutorial, we will be making use of the Kaggle dataset diamond.csv to predict the price of the diamonds.

Prerequisite

The following are the requirements for this tutorial:

  • Python
  • SQLAlchemy
  • MindsDB install via pip
  • pymysql
  • Any IDE of your choice preferably VS code.

Connecting our Database

Firstly, we need to upload our dataset into the MindsDB Cloud Interface follow this guide to learn how to upload to the interface here, and then you can download the diamonds.csv dataset on kaggle here

Now, navigate into a working directory , then create a main.py file. To create a database connection, simply follow this guide on how to do that using Sqlachemy and pymysql.

When you are done, you should have something like this:

from sqlalchemy import create_engine

user = "teslimodumuyiwa@gmail.com"
password = "MindsDB Cloud Password"
host = "cloud.mindsdb.com"
port = 3306
database = ""

def establish_connection():
        engine =  create_engine(url=f"mysql+pymysql://{user}:{password}@{host}:{port}/{database}")
        return engine
try:
        engine = establish_connection()
        engine.connect()
        print("Connection to the database is established")
except Exception as e:
        print("Couldn't connect to the database:\n",e)
Enter fullscreen mode Exit fullscreen mode

You can also check that your connection is successful by running the python file from your terminal using python main.py

SQL Alchemy Connection successful

To run further checks on your connection, you might want to run queries on the database to see if it returns some data.

from sqlalchemy import create_engine

user = "teslimodumuyiwa@gmail.com"
password = "Your MindsDB Cloud Password"
host = "cloud.mindsdb.com"
port = 3306
database = ""

def establish_connection():
        engine =  create_engine(url=f"mysql+pymysql://{user}:{password}@{host}:{port}/{database}")
        return engine
try:
        engine = establish_connection()
        with engine.connect() as eng:
                query = eng.execute("SELECT * FROM files.diamonds LIMIT 5;")
                for row in query:
                        print(row)
except Exception as e:
        print("Couldn't connect to the database:\n",e)
Enter fullscreen mode Exit fullscreen mode

Expected output would be:

Query Table ouput in Terminal

Training a Predictor With CREATE PREDICTOR

With that done, we can now train our machine-learning predictor. For that, we are going to use the CREATE PREDICTOR syntax where we would specify what query we will train FROM and what to PREDICT.

By implementing the following code to predict the price of our diamond:

from sqlalchemy import create_engine

user = "teslimodumuyiwa@gmail.com"
password = "Your MindsDB Cloud Password"
host = "cloud.mindsdb.com"
port = 3306
database = ""

def establish_connection():
        engine =  create_engine(url=f"mysql+pymysql://{user}:{password}@{host}:{port}/{database}")
        return engine
try:
        engine = establish_connection()
        with engine.connect() as eng:
                query = eng.execute("CREATE PREDICTOR mindsdb.diamond_price FROM files (SELECT * FROM diamonds) PREDICT price;")

except Exception as e:
        print("Couldn't connect to the database:\n",e)
Enter fullscreen mode Exit fullscreen mode

Checking the Status of our Predictor

We can check the status of the model with the syntax below. If the query returns Complete, then the model is ready to be used, or else wait if it returns Training.

query = eng.execute("SELECT status FROM mindsdb.predictors WHERE name='diamond_price';")
                for i in query:
                        print(i)
Enter fullscreen mode Exit fullscreen mode

If the training is not complete, you should get this printed in your terminal:

('training',)
Enter fullscreen mode Exit fullscreen mode

If the training is complete, you should get this printed in your terminal:

('complete',)
Enter fullscreen mode Exit fullscreen mode

Making Predictions

Now that we have our Prediction Model, we can simply execute some simple SQL query statements to predict the target value based on the feature parameters.

Making a Single Prediction

You can make predictions by querying the predictor as if it were a table. The [SELECT](https://docs.mindsdb.com/sql/api/select/) statement lets you make predictions for the diamonds on the chosen feature parameter.

// main.py
query = eng.execute("SELECT price, price_explain FROM mindsdb.diamond_price WHERE carat = 0.23 AND depth = 56.9;")
                for i in query:
                        print(i)
Enter fullscreen mode Exit fullscreen mode

Expected output should be:

Single Prediction

Make Batch Predictions with JOIN

Now let’s make bulk predictions or multiple predictions to predict the price by joining our table with our model.

bulk = text("SELECT t.price AS real_price, m.price_explain AS explained_price, t.carat,  t.cut, t.color, t.depth, t.table FROM files.diamonds AS t JOIN mindsdb.diamond_price AS m LIMIT 10;")
                query = eng.execute(bulk)
                for i in query:
                        print(i)
Enter fullscreen mode Exit fullscreen mode

On execution, you should get this printed into your terminal

Bulk Prediction on MindsDB

What’s Next?

Have fun while trying it out yourself!

Give a like or a comment if this tutorial was helpful

Top comments (0)