We'll pick up where we left off last week with our Think Lab 2124 Tutorial A: Build & Deploy a Data Join Experiment!
AutoAI in Cloud Pak for Data automates ETL(Extract, Transform, and Load) and feature engineering process for relational data, saves data scientists months of manual data prep time and acheives results comparable to top performing data scientists.
The AutoAI graphical tool in Watson Studio automatically analyzes your data and generates candidate model pipelines customized for your predictive modeling problem. These model pipelines are created iteratively as AutoAI analyzes your dataset and discovers data transformations, algorithms, and parameter settings that work best for your problem setting. Results are displayed on a leaderboard, showing the automatically generated model pipelines ranked according to your problem optimization objective.
Collect your input data in a CSV file or files. Where possible, AutoAI will transform the data and impute missing values.
- Your data source must contain a minimum of 100 records (rows).
- You can use the IBM Watson Studio Data Refinery tool to prepare and shape your data.
- Data can be a file added as connected data from a networked file system (NFS). Follow the instructions for adding a data connection of the type Mounted Volume. Choose the CSV file to add to the project so you can select it for training data.
Using AutoAI, you can build and deploy a machine learning model with sophisticated training features and no coding. The tool does most of the work for you.
AutoAI automatically runs the following tasks to build and evaluate candidate model pipelines:
- Data pre-processing
- Automated model selection
- Automated feature engineering
- Hyperparameter optimization
In this Think Lab, you will see how to join several data sources and then build an AutoAI experiment from the joined data. The scenario we’ll explore in Part B of the Lab is for a mobile company that wants to understand the key factors that have an impact on user experience in their call center. You will use IBM AutoAI to automate data analysis for a dataset collected from a fictional call center. The objective of the analysis is to gain more insight on factors that impact customer experience so that the company can improve customer service. The data consists of historical information about customer interaction with call agents, call type, customer wireless plans and call type resolution.
- Create an IBM Cloud Lite Tier Account
- Create a Watson Studio Instance
- Provision Watson Machine Learning & Cloud Object Storage Instances
- Create a New Project
- Download the Call Center Dataset from the Gallery
- Unzip the Call Center Dataset's .zip File
- Add the Call Center Datasets to the Project
In Tutorial B of this Think Lab, you will use IBM AutoAI to automate data analysis for a dataset collected from a fictional call center. The objective of the analysis is to gain more insight on factors that impact customer experience so that the company can improve customer service. The data consists of historical information about customer interaction with call agents, call type, customer wireless plans and call type resolution. Each source of information is kept in a separate table (a CSV file).
Using the data join capabilities of AutoAI, you will connect the tables using common coloumns, or keys, to create a single data source, without needing to write SQL-like queries. Additionally, AutoAI will do some automated data preparation, or feature engineering on the combined data before using the data to train the model.
The data is divided as follows:
- User_experience: User experience reflects the satisfactory feedback from customers to each call agent daily.
- Call_log: Records historical information about the calls from customers to the call center in the last 3 years.
- Call_Type: Records call type information.
- Wireless_Plans: Records the kind of wireless plans customers are subscribed to.
- Call_Resolution_Type: Records type of call resolution.
This tutorial presents the basic steps for joining data sets then training a machine learning model using AutoAI:
- Add and join the data
- Train the experiment
- Deploy the trained model
- Test the deployed model
- Create a New AutoAI Experiment
- Build the Data Join Schema
- Run the AutoAI Experiment
- Explore the Holdout & Training Data Insights
- Deploy the Trained Model
- Score the Model
- View the Prediction Results
In the data join canvas you will create a left join that connects all of the data sources to the main source.
Choose User_Experience as the column to predict.
AutoAI analyzes your data and determines that the User_Experience column contains information making the data suitable for a Multiclass Classification model. The default metric for a Multiclass Classification model is optimized for Accuracy & run time automatically by AutoAI.
- Based on analyzing a subset of the data set, AutoAI chooses a default model type: binary classification, multiclass classification, or regression. Binary is selected if the target column has two possible values, multiclass if it has a discrete set of 3 or more values, and regression if the target column is a continuous numeric variable. You can override this selection.
- AutoAI chooses a default metric for optimizing. For example, the default metric for a binary classification model is Accuracy.
- By default, ten percent of the training data is held out to test the performance of the model.
Eplore the Leading Pipeline
From the feature importance chart, the most important feature is count (Call_Type_Description) which is the total calls in a day.
Other important features are from the After_Call_Work_Time, which are talk time and queue time. These features affect users experience the most.
The call center management team should pay attention to these features and try to figure out how to improve user’s experience by adjusting these features.
To score the model, you create a batch job that will pass new data to the model for processing, then output the predictions to a file. Note: For this tutorial, you will submit the training files as the scoring files as a way to demonstrate the process and view results.
Tune in next week for our next Tutorial Tuesday post.
Follow for more Cloud Native & Watson AI/ML content: