Introduction
In this tutorial, we will learn how to create, train and query a machine-learning model using a Python library called SQLAlchemy. SQLAlchemy is the most popular open-source library for working with relational databases from Python. It is one of the ORM libraries that provides an interface for using object-oriented programming to interact with a database. MindsDB is an open-source machine learning tool that brings predictive capabilities to your database. For this tutorial, we will be making use of the Kaggle dataset diamond.csv
to predict the price of the diamonds.
Prerequisite
The following are the requirements for this tutorial:
- Python
- SQLAlchemy
- MindsDB install via pip
- pymysql
- Any IDE of your choice preferably VS code.
Connecting our Database
Firstly, we need to upload our dataset into the MindsDB Cloud Interface follow this guide to learn how to upload to the interface here, and then you can download the diamonds.csv
dataset on kaggle here
Now, navigate into a working directory , then create a main.py
file. To create a database connection, simply follow this guide on how to do that using Sqlachemy and pymysql.
When you are done, you should have something like this:
from sqlalchemy import create_engine
user = "teslimodumuyiwa@gmail.com"
password = "MindsDB Cloud Password"
host = "cloud.mindsdb.com"
port = 3306
database = ""
def establish_connection():
engine = create_engine(url=f"mysql+pymysql://{user}:{password}@{host}:{port}/{database}")
return engine
try:
engine = establish_connection()
engine.connect()
print("Connection to the database is established")
except Exception as e:
print("Couldn't connect to the database:\n",e)
You can also check that your connection is successful by running the python file from your terminal using python main.py
To run further checks on your connection, you might want to run queries on the database to see if it returns some data.
from sqlalchemy import create_engine
user = "teslimodumuyiwa@gmail.com"
password = "Your MindsDB Cloud Password"
host = "cloud.mindsdb.com"
port = 3306
database = ""
def establish_connection():
engine = create_engine(url=f"mysql+pymysql://{user}:{password}@{host}:{port}/{database}")
return engine
try:
engine = establish_connection()
with engine.connect() as eng:
query = eng.execute("SELECT * FROM files.diamonds LIMIT 5;")
for row in query:
print(row)
except Exception as e:
print("Couldn't connect to the database:\n",e)
Expected output would be:
Training a Predictor With CREATE PREDICTOR
With that done, we can now train our machine-learning predictor. For that, we are going to use the CREATE PREDICTOR
syntax where we would specify what query we will train FROM
and what to PREDICT
.
By implementing the following code to predict the price of our diamond:
from sqlalchemy import create_engine
user = "teslimodumuyiwa@gmail.com"
password = "Your MindsDB Cloud Password"
host = "cloud.mindsdb.com"
port = 3306
database = ""
def establish_connection():
engine = create_engine(url=f"mysql+pymysql://{user}:{password}@{host}:{port}/{database}")
return engine
try:
engine = establish_connection()
with engine.connect() as eng:
query = eng.execute("CREATE PREDICTOR mindsdb.diamond_price FROM files (SELECT * FROM diamonds) PREDICT price;")
except Exception as e:
print("Couldn't connect to the database:\n",e)
Checking the Status of our Predictor
We can check the status of the model with the syntax below. If the query returns Complete, then the model is ready to be used, or else wait if it returns Training.
query = eng.execute("SELECT status FROM mindsdb.predictors WHERE name='diamond_price';")
for i in query:
print(i)
If the training is not complete, you should get this printed in your terminal:
('training',)
If the training is complete, you should get this printed in your terminal:
('complete',)
Making Predictions
Now that we have our Prediction Model, we can simply execute some simple SQL query statements to predict the target value based on the feature parameters.
Making a Single Prediction
You can make predictions by querying the predictor as if it were a table. The [SELECT](https://docs.mindsdb.com/sql/api/select/)
statement lets you make predictions for the diamonds
on the chosen feature parameter.
// main.py
query = eng.execute("SELECT price, price_explain FROM mindsdb.diamond_price WHERE carat = 0.23 AND depth = 56.9;")
for i in query:
print(i)
Expected output should be:
Make Batch Predictions with JOIN
Now let’s make bulk predictions or multiple predictions to predict the price
by joining our table with our model.
bulk = text("SELECT t.price AS real_price, m.price_explain AS explained_price, t.carat, t.cut, t.color, t.depth, t.table FROM files.diamonds AS t JOIN mindsdb.diamond_price AS m LIMIT 10;")
query = eng.execute(bulk)
for i in query:
print(i)
On execution, you should get this printed into your terminal
What’s Next?
Have fun while trying it out yourself!
- Star the MindsDB repository on GitHub.
- Sign up for a free MindsDB account
- Engage with the MindsDB community on Slack or GitHub to ask questions and share your ideas and thoughts.
Give a like or a comment if this tutorial was helpful
Top comments (0)