A machine learning (ML) project requires collaboration across multiple roles in a business. We’ll introduce the high level steps of what the end-to-end ML lifecycle looks like and how different roles can collaborate to complete the ML project.
Train, evaluate, and improve model
Deploy and integrate model
Machine learning is a powerful tool to help solve different problems in your business. The article “Building your first machine learning model” gives you basic ideas of what it takes to build a machine learning model. In this article, we’ll talk about what the end-to-end machine learning project lifecycle looks like in a real business. The chart below shows the high level steps from project initiation to completion. Completing a ML project requires collaboration across multiple roles, including product manager, product developer, data scientist, and MLOps engineer. Failing to accurately execute on any one of these steps will result in misleading insights or models with no practical value.
When talking about machine learning, people usually have high expectations on what it can achieve. Before starting a machine learning project, the product team should collaborate to come up with the problem definition. Here are some questions that should be clarified at this step:
Machine learning can be used to solve various problems (e.g. reduce manual work, rank products, etc). Before starting the project, we need to clearly define the problem and expected outcome. We should think about whether this is a valuable problem to solve and estimate how much value machine learning can bring.
There are different objectives when using machine learning. We should be clear how to measure the success of the model based on different objectives. If we want to use a machine learning model to reduce manual work, we should measure whether the model can give results as well as a human does. If we want to use machine learning to rank products on a website, we can measure whether we get a higher click-through rate after using the model to rank the products.
Now that we have the idea, we need to think about one practical thing: do we have the data? The machine learning model learns from the past data and predicts for the new data. If you don’t have enough data, machine learning won’t be a good choice for you.
No matter what model we want to build, we first need to collect two types of data. The first type of data contains the labels (the target variable we want to predict) or can be used to create the labels. The second type of data can be used to generate features that’ll affect the model predictions. For example, if we want to build a model to predict whether a user will churn. We at least need to get a table which contains data indicating whether the user has churned. In addition, we also want to collect user events to generate more features which can contribute to the model predictions.
Product developers are usually responsible for collecting the data after getting data requirements from data scientists. If you have a good habit of logging the events, then you’ll be relieved when building machine learning models. If you don’t have good logging in your product, start doing it. This data will help you understand your product better even if you don’t have immediate needs for machine learning models. Next, the work can be handed over to data scientists to prepare the data and train the model.
Data preparation is one of the most complex steps in the machine learning lifecycle, which is also called “Feature Engineering”. If you don’t have data processing experience and want to learn it, this Developer Education series will be a good resource for you.
Here are the basic steps of feature engineering:
In machine learning, “label” is the target variable you want to predict with the model. To prepare the data for model training, we need to identify whether we have a label column in our dataset. If there’s no explicit label column in our datasets, we need to create labels first.
Machine learning algorithms learn from the features. Here are some ways to create features:
Expand the existing features. For example, you can expand your date feature to “year”, “month”, “day”, and “days since holiday” features.
Aggregate the events feature. One example is to count the number of user events over the past 7 days, 30 days, or 90 days. Another example is to count the number of page view events from Google and Facebook respectively.
After creating labels and features, we need to get our data ready for machine learning algorithms.
Impute: The real-world datasets usually have missing values. Machine learning algorithms don’t handle missing values well. Thus, we need to fill in the missing value with data inferred from existing data.
Encode: Machine learning algorithms require data to be numbers. Thus, we need to convert the text features to numbers.
Scale: Numbers with larger ranges will have a higher impact on the model output. We need to adjust the values of number columns to fall within similar ranges so that large numbers (such as seconds since epoch) don’t affect the prediction disproportionately as much as smaller values (such as age).
After data is prepared, we split the dataset into a training set and a test set, select an algorithm, and then start training the model with the training set. We briefly introduced some machine learning algorithms in the Fundamentals of being an AI/ML sorcerer supreme article. We’ll discuss different algorithms in detail in a future blog article.
After model training completes, we need to evaluate the model’s performance with the test set. We use Precision and Recall to evaluate a classification model’s performance, and use Mean Absolute Error and Root Mean Squared Error to evaluate a regression model’s performance.
The article “How to improve the performance of a machine learning (ML) model” introduced the strategies for improving models, including comparing multiple algorithms, hyperparameter tuning, and more feature engineering.
Once you’re done with the model training and are satisfied with the model performance, the data scientist can now hand over the model to the MLOps engineer to deploy the model to production. Then the product developer will integrate the model into the product.
There are generally two ways to integrate models and make predictions: online predictions and offline batch predictions.
For online prediction, we can deploy the model to an online web service and make API calls to the online service to get predictions. This is useful when we need to get real-time predictions, e.g. realtime product ranking.
For other models, we don’t necessarily need to get real-time predictions. We can use an offline batch prediction job to get predictions for a large amount of data points on a regular basis. These predictions are then stored in a database and can be made available to developers or end users. For example, for the demand forecast model, we can estimate the demand for products on a daily basis for upcoming one year with an offline batch prediction job.
After integrating the model into production, you can run an experiment to evaluate the model performance with real production traffic. For example, if you build a ranking model for your e-commerce website. You can split the website traffic into 50/50. Half of the users will see the products in the original order (control group). Another half of the users will see products in the ranked order determined by the ranking model (treatment group). We can compare the target metrics (e.g. click-through rate) between the users in the control and the treatment groups.
Congratulations! With the team’s hard work, your model is finally live! You evaluated the model via experimentation and got the expected outcome. Is this everything you need to do for the model? The answer is no. Model performance can degrade over time. It’s important to set up a good monitoring system to make sure the model works correctly in production over time.
Multiple things could go wrong in production. One of the most common issues is data drift, which means the distribution of the target variable or the input data changes over time. The model- monitoring system should monitor the model performance with production data, detect the data drift issue, and provide feedback for further model improvement (e.g. model retrain). Stay tuned for a future article about model monitoring.
The whole machine learning lifecycle is a lengthy process, which requires expertise across multiple roles.
The product team defines the problem.
The product developer collects the data.
The data scientist prepares the data and trains the model.
The MLOps engineer deploys the model into production.
The product developer integrates the model into the product.
The MLOps engineer sets up the model monitoring system.
If you wonder whether there’s a way to simplify the process, Mage helps handle all the work from “prepare data” to “monitor model”. Mage also provides suggestions on what type of problems you can solve with ML and what data is needed.