Machine learning is an area of high interest among tech enthusiasts. Viewed as a branch of artificial intelligence (AI), it is basically an algorithm or model that improves itself through “learning” and, as a result, becomes increasingly proficient at performing its task. The applications of machine learning are widespread as it is fast becoming an integral part of different fields such as medicine, e-commerce, banking, etc. Today, we would break down machine learning as a process and understand the steps involved from its inception to its practical application.
The process of machine learning would be broken down in the 7 steps listed below. In order to illustrate the significance and function of each step, we would be using an example of a simple model. This model would be responsible for differentiating between an apple and an orange. Machine learning is capable of much for complex tasks. However, in order to explain the process in simplistic terms, a basic example is taken to explain the relevant concepts.
For the purpose of developing our machine learning model, our first step would be to gather relevant data that can be used to differentiate between the 2 fruits. Different parameters can be used to classify a fruit as either an orange or apple. For the sake of simplicity, we would only take 2 features that our model would utilize in order to perform its operation. The first feature would be the color of the fruit itself and the second one being the shape of the fruit. Using these features, we would hope that our model can accurately differentiate between the 2 fruits.
|Color||Shape||Apple or Orange?|
A mechanism would be needed to gather the data for our 2 chosen features. For instance, for collecting data on color, we may use a spectrometer and, for the shape data, we may use pictures of the fruits so that they can be treated as 2D figures. For the purpose of collecting data, we would try to get as many different types of apples and orange as possible in order to create diverse data sets for our features. For this purpose, we may try to search the markets for oranges and apples that may be from different parts of the world.
The step of gathering data is the foundation of the machine learning process. Mistakes such as choosing the incorrect features or focusing on limited types of entries for the data set may render the model completely ineffective. That is why it is imperative that the necessary considerations are made when gathering data as the errors made in this stage would only amplify as we progress to latter stages.
Once we have gathered the data for the two features, our next step would be to prepare data for further steps. A key focus of this stage is to recognize and minimize any potential biases in our data sets for the 2 features. First, we would randomize the order of our data for the 2 fruits. This is because we do not want the order to have any bearing on the model’s choices. Furthermore, we would examine our data sets for any skewness towards a particular fruit. This again would help in identifying and rectifying a potential bias as it would mean that the model would be adept at identifying one fruit correctly but may struggle with the other fruit.
Another major component of data preparation is breaking down the data sets into 2 parts. The larger part (~80%) would be used for training the model while the smaller part (~20%) is used for evaluation purposes. This is important because using the same data sets for both training and evaluation would not give a fair assessment of the model’s performance in real world scenarios. Apart from the data split, additional steps are taken to refine the data sets. This could include removing duplicate entries, discarding incorrect readings etc.
Well-prepared data for your model can improve its efficiency. It can help in reducing the blind spots of the model which translates to greater accuracy of predictions. Therefore, it makes sense to deliberate and review your data sets so that it can be fine-tuned for producing better and meaningful results.
The selection of the model type is our next course of action once we are done with the data-centric steps. There are various existing models developed by data scientists which can be used for different purposes. These models are designed with different goals in mind. For instance, some models are more suited to dealing with texts while another model may be better equipped to handle images. With regards to our model, a simple linear regression model is suitable for differentiating between fruits. In this case, type of fruit would be our dependent variable while color of the fruit and shape of the fruit would be the 2 predictors or independent variables.
In our example, the model selection was fairly straightforward. In more complex scenarios, we need to make the choice that matches our intended outcome. The options for machine learning models can be explored across 3 broad categories. The first category is supervised learning models. In such models the outcome is known so we continuously refine the model itself until our output reaches the desired accuracy level. The linear regression model chosen for our fruit model is an example of supervised learning. If the outcome is unknown and we need classification to be done then the second category, unsupervised learning, is used. Examples of unsupervised learning include K-means and Apriori algorithm. The third category is reinforcement learning. It focuses on learning to make better decisions on the basis of trial and error. They are often used in business environments. The Markov’s decision process is its example.
At the heart of the machine learning process is the training of the model. Bulk of the “learning” is done at this stage. Here we use the part of data set allocated for training to teach our model to differentiate between the 2 fruits. If we view our model in mathematical terms, the inputs i.e. our 2 features would have coefficients. These coefficients are called the weights of features. There would also be a constant or y-intercept involved. This is referred to as the bias of the model. The process of determining their values is of trial and error. Initially, we pick random values for them and provide inputs. The achieved output is compared with actual output and the difference is minimized by trying different values of weights and biases. The iterations are repeated using different entries from our training data set until the model reaches the desired level of accuracy.
Training requires patience and experimentation. It is also useful to have knowledge of the field where the model would be implemented. For instance, if a machine learning model is to be used for identifying high risk clients for an insurance company, the knowledge of how the insurance industry operates would expedite the process of training as more educated guesses can be made during the iterations. Training can prove to be highly rewarding if the model starts to succeed in its role. It is comparable to when a child learns to ride a bicycle. Initially, they may have multiple falls but, after a while, they develop a better grasp of the process and are able to react better to different situations while riding the bicycle.
With the model trained, it needs to be tested to see if it would operate well in real world situations. That is why the part of the data set created for evaluation is used to check the model’s proficiency. This puts the model in a scenario where it encounters situations that were not a part of its training. In our case, it could mean trying to identify a type of an apple or an orange that is completely new to the model. However, through its training, the model should be capable enough to extrapolate the information and deem whether the fruit is an apple or an orange.
Evaluation becomes highly important when it comes to commercial applications. Evaluation allows data scientists to check whether the goals they set out to achieve were met or not. If the results are not satisfactory then the prior steps need to be revisited so that root cause behind the model’s underperformance can be identified and, subsequently, rectified. If the evaluation is not done properly then the model may not excel at fulfilling its desired commercial purpose. This could mean that company that designed and sold the model may lose their good will with the client. It could also mean damage to the company’s reputation as future clients may become hesitant when it comes to trusting the company’s acumen regarding machine learning models. Therefore, evaluation of the model is essential for avoiding the aforementioned ill-effects.
If the evaluation is successful, we proceed to the step of hyperparameter tuning. This step tries to improve upon the positive results achieved during the evaluation step. For our example, we would see if we can make our model even better at recognizing apples and oranges. There are different ways we can go about improving the model. One of them is revisiting the training step and use multiple sweeps of the training data set for training the model. This could lead to greater accuracy as the longer duration of training provides more exposure and improves quality of the model. Another way to go about it is refining the initial values given to the model. Random initial values often produce poor results as they are gradually refined by trial and error. However, if we can come up with better initial values or perhaps initiate the model using a distribution instead of a value then our results could get better. There are other parameters that we could play around with in order to refine the model but the process is more intuitive than logical so there is no definite approach for it.
Naturally, the question arises that why we need hyperparameter tuning in the first place when our model is achieving its targets? This can be answered by looking at the competitive nature of machine learning based service providers. Clients can choose from multiple options when they seek a machine learning model to solve their respective problem. However, they are more likely to be enticed by the one which produces the most accurate results. That is why for ensuring the commercial success of a machine learning model, hyperparameter tuning is necessary step.
The final step of the machine learning process is prediction. This is the stage where we consider the model to be ready for practical applications. Our fruit model should now be able to answer the question whether the given fruit is an apple or an orange. The model gains independence from human interference and draws its own conclusion on the basis of its data sets and training. The challenge for the model remains whether it can outperform or at least match human judgment in different relevant scenarios.
The prediction step is what the end-user sees when they use the machine learning model within their respective industry. This step highlights why many consider machine learning to be the future of various industries. A complex but well-executed machine learning model can improve the decision making process of their respective owners. Humans can only handle a certain amount data and relevant factors when arriving at a decision. On the other hand, machine learning models can process and link large amounts of data. These links allow the models to gain a unique insight which may not have been uncovered if the usual manual approach had been taken. As a result, valuable human resource is freed up from the burden of processing information and then arriving at decisions. They can simply use the machine learning model as a tool and reach better decisions with much lesser effort.
With the help of machine learning we were able to determine how to differentiate between apples and oranges, though it may not sound an impressive model, the steps we took are the same for most machine learning models. This criteria may change in the future as advances in machine learning and AI in general advances, but remember them for the next time you need to work on a ML project:
- Gathering data
- Preparing that data
- Choosing a model
- Hyperparameter tuning
Thanks for learning, and remember to subscribe, I’m posing new content on ML, AI, programming and everything related to computer science several times a week.