Data science and machine learning have been dubbed the sexiest jobs of the twenty-first century, but despite the hype, corporate executives and managers have come to appreciate the value they provide to any organization. Values such as assisting management in making sound decisions, assisting companies in defining goals and targets, boosting income through effective identification of target audiences, and recruiting the right people These advantages have led to increased growth and adoption among enterprises in a variety of industries.
However, developing competence in data science entails learning not only programming languages like Python or R but also principles in topics like arithmetic and statistics, making the learning curve a bit steep for prospective and current developers. Aside from the individual's learning curve, corporations will need to employ a variety of individuals to deploy an end-to-end machine learning model in production. This increases the challenge of developing and growing machine learning systems, particularly for small and medium-sized businesses.
To solve these issues, MindsDB is developing a platform that enables developers with a basic knowledge of Structured Query Language (SQL) to develop and deploy models using machine learning in databases. And in this tutorial, you'll learn how to create these models with the MindsDB SQL editor.
Getting Started by Creating a MindsDB Account.
To create an account;
- Click the
Get Started
button on the homepage. - Fill in your details in the respective fields below.
- You will be sent receive an email requesting that you validate your email address.
- On validation, you will be directed to MindsDB SQL editor.
To follow along with this tutorial, you need to upload the dataset to MindsDB GUI.
Uploading the Dataset to MindsDB GUI.
- First, dowload the zip file of the dataset here and extract its contents (a train.csv file).
- Next, click on the
Add Data
button at the top of your MindsDB SQL Editor. - Next, select the files tab, and click the import file button.
- Click the import file button, name the table
sales_data
. Click Save and Continue to upload your dataset. - On Successful upload, you should be presented with the SQL Editor.
Understanding the Data.
Every dataset you upload to MindsDB is kept in the files
database as a table. As a result, you must run your queries for a table against this database.
Run the query below to see the first five records in the sales data
table.
SELECT * FROM files.sales_data LIMIT 5;
This dataset contains eighteen columns, including the sales column. However, for this lesson, the order date, ship date, ship mode, segment, and category columns will be used to make predictions.
To see these columns, run the query below.
SELECT `Order Date`, `Ship Date`, `Ship Mode`, Segment, Category, Sales FROM files.sales_data
What do these columns stand for?
- Order Date: The date the product was ordered by the consumer.
- Ship Date: The date the product was shipped by the company.
- Ship Mode: The shipment class that the customer has selected. There are four classes available here: Same Day, First Class, Standard Class, and Second Class.
- Segment: The product's segmentation. This field has three possible values: consumer, corporate, and home office.
- Category: The category to which the products belong. There are three values in this column: furniture, office supplies, and technology.
Building the Model.
Before you start developing the model, you might want to know the steps you'll need to follow to build it with Python and machine learning tools like Scikit-Learn and Pandas.
- Data Cleaning: First, you will clean the data by removing duplicates, filling in missing values, and verifying data types in a field are consistent, among other things.
- Exploratory Data Analysis: Next, you will analyze the data to identify patterns and trends that may be relevant to the problem at hand.
- Feature Selection: Not every column in a dataset is important. At this point, you will remove any fields that you believe will have little to no effect on your machine learning model's performance.
- Data training: This is when you try out multiple machine learning algorithms to see which ones perform the best on your data. If solving a regression problem, you choose a regression algorithm; the same is true for classification and time-series analysis, among other things. Not to mention before training the data, you must ensure that all categorical data (words) have been converted to numerical data (numbers) and that these numerical data have been normalized.
- Model validation: Once the model has been trained, it is tested against the validation and test data to determine its accuracy. The model's parameters are then tweaked until you get a consistent and good performance.
- Deploying the model: Now that you've trained and tested your model, you must convert it to a binary file using Pickle or Joblib to deploy it and make predictions. These are the steps you would follow if you were creating a model using Python and machine learning tools, but MindsDB takes care of these steps and enables you to create your model using SQL queries.
Run the query below to create a model for the 'sales data' table using the columns supplied.
CREATE PREDICTOR mindsdb.predict_sales
FROM files
(SELECT `Order Date`, `Ship Date`, `Ship Mode`, Segment, Category, Sales FROM sales_data)
PREDICT Sales;
If everything is done correctly, you should see a "Query Successful" message.
The following syntax illustrates how the query works.
CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
(SELECT [column_name, ...] FROM [table_name])
PREDICT [target_column];
- [predictor name]: The name of the model.
- [integration name]: The name of the database where your table is stored, for example, files.
- [column name,...]: The field to predict, as well as the fields to train for the prediction.
- [table name]: The table containing the columns, for example,'sales data'.
- [target column]: The column to be predicted, such as Sales.
Making Predictions.
Consider the following question for this section.
A customer orders a product on March 28th, 2022, and the product is dispatched on March 31st, 2022. The product is from the consumer segment, the furniture category, and was shipped first class. How much did the item cost?
To find out, use the query below.
Select Sales_confidence, Sales_explain
FROM mindsdb.predict_sales
WHERE `Order Date` = '28/03/2022' AND
`Ship Date` = '31/03/2022'AND
`Ship Mode` = 'First Class'AND
Segment = 'Consumer'AND
Category = 'Funiture';
Using the newly constructed model, the product costs around $201.27. The model has a confidence level of 76%, or 0.76.
That's all. MindsDB makes it simple to build machine learning models.
That's all for now, guys. This tutorial shows how MindsDB is democratizing access to machine learning with SQL, a language that has been used for decades and is still relevant today.
You can connect on Slack to ask questions about MindsDB, and you can also raise an issue on GitHub if you run into any problems.
You may learn more about MindsDB by reading the documentation.
Thank you for reading this post. if you liked it, please share it.
To understand more about the field of machine learning and artificial intelligence in general, you can follow me on Twitter and LinkedIn, and please leave a comment below if you have any questions. For the time being, bye.
Top comments (0)