Hello Everyone !!
Most of you who are newbies to the Field of Machine Learning , might be hearing These Technical Jagrons very frequently :
"You have to train your Data in Machine Learning"
"The Machine Learning Models Learn from the Data"
Well ,Today I am going to deep dive into this Whole Process and by the End of this Article , you will get a better intuition of this Whole Process Of Machine Learning . So Let's Start :
Let's say we have been given a Task to create a System that Answers whether a Drink is a Wine or a Beer .
-->This Question Answering System that we Build is in technical terms called a MODEL , and this Model is created via a Process called as
TRAINING .
In Machine Learning , the role of Training is to build a System that answers our Questions Correctly Most of the Time , but in order to train a Model we need to collect Data to Train On .
In Our Example Data will be Collected from Glasses of Wine and Beer . There are many aspects of Drink that we could collect Data on --
Everything from the Amount of Foam , to the Shape of the Glass .
But for Our Purposes , we will just pick two simple one's :
1).The Color of the Drink
2).The Alcohol Content in the Drink(As a Percentage)
The Hope is that we can split our two types of drinks based on these two factors alone . We'll call these FEATURES from Now on(Again , please make note of this Technical Term as you will be hearing it Quite Frequently in the Machine Learning World) .
-->The First Step of Our Process will be Running to the Grocery Store and buy a bunch of Different Drinks and get some equipments to do our Measurements :
1).A Spectrometer for Measuring the Color
2).& A Hydrometer to Measure the Alcohol Content .
Now , when we have set all of our Equipments , it's time for the First Important Step of Machine Learning : "GATHERING THE DATA" . This Step is Important Because the Quality and Quantity of Data that we gather will directly determine How Good Our Predictive Model Can be .
In our Case , the Data that we will collect will be the Color and Alcohol Content of Each Drink . This Will Yeild us a Table of Color , Alcohol Content and whether it's a Wine or a Beer ?
This Will be Our TRAINING DATA . So After a Few Hours , we will gather all Of Our Training Data .
-->Now it's time for Our Next Step of Machine Learning which is
DATA PREPARATION , where we load Our Data into a Suitable Place and prepare it for our Use in Machine Learning Training .
Now, this is also the time to do any Visualisations of Our Data helping us see if there is Any Relationship between Two Different Variables , as well as to also show if there are any DATA IMBALANCES .
For Example , if we collected way more Data About Beer , then the Model that we will train will be heavily biased towards Beer than Wine .
We also need to split our Data into two parts - the first part used in Training Our Model will be the Majority of Our Data Set . The Second Part will be used for evaluating our Trained Model's performance .
We don't want to use the same data that the Model was trained on for Evaluation , since then it would just be able to memorise the Questions , just as you wouldn't want to use the Questions from your Maths Homework in the Maths Exam .
The Next Step in Our Workflow is CHOOSING A MODEL . There are many Models that Researcher and Data Scientists have created over the Years . Some are very well suited for Image Data , others for Sequences , such as Text Or Music , some For Numerical Data , and Others For Text Based Data .
In Our Case , we have just Two Factors , Color and Alcohol Percentage .
So in Our Case we can use a Simple Linear Model as Demonstrated Below :
Now , we will move on to the Most Important Part of the Machine Learning Process -- The Training Process
In this Step , we will Use our Data to incrementally improve our Model's Ability to predict Whether a Given Drink is Wine Or Beer .
In Some Ways , this is Similar to Someone who is Learning to Drive .
At First , they don't know how any of the Pedals , Knobs and switches work or when they should be pressed or used . However , after lots of practice and correcting for their mistakes , a licensed driver emerges
Similar to the Above Scenario , we will do the same thing on our Drinks .
We Know that , Formula For a Straight line is :
y=m*x+b
where , x is the input
m is the slope of the Line
y is the Output Generated
& b is the y-intercept .
The Values which we have available to us to Adjust or Train are just m and b , where m is the Slope of the Line & b is the y-intercept for the Line. There is no other way to affect the position of the Line since the only other Variables are x , our input & y that is Our Output .
In Machine Learning , there are Multiple m's , since there might be Multiple Features
The Collection of these Values is usually formed into a Matrix , that is Denoted by W for the Weights Matrix . Similarly , for b(the y-intercept) we arrange them together , and that's called the Biases .
The Training Proces involves initialising random values for w and b and attempting to predict the Output with those Values . It does pretty poorly at First , but we can compare our Model's Prediction with the Output it should have Produced and then adjust the Values in w and b , such that we will have more Accurate Predictions on the next time Around . So this Process , then Repeats .
Each iteration or cycle of Updating the Weights and the Biases is called
One Training Step . So Let's look what that Means for Our Data Set .
When we First Start the Training , it's as if we drew a Random Line through the Data . Then , each Step of the Training Progresses , the line moves step by step closer to Ideal Separation Between Wine and Beer Like this :
Once the Training is Completed , it's time to see if the Model is Any Good . We do so by doing Evaluation . This is the Step where the Data Set that we set aside Earilier for Evaluation Comes into Play . Evaluation allows us to test Our Model Against Data that has Never Been Used Before
This is a Representative of How the Model Might Perform in the Real World
Once we have done the Evalution , it's possible if we want to see if we can further improve our Training in Any Way . We can do so by tuning some Of Our Parameters which is known as PARAMETER TUNING .
So Summing up this Article , Machine Learning is an Interesting Way of Answering Questions from Our Data .
I hope you Liked my Article , and it was able to provide some Value to You .
Signing Off for Now :D ,
Connect with Me On LinkedIn : https://www.linkedin.com/in/deepanshu-arora-a87846132/
Top comments (0)