Hi and welcome! Let’s talk about **Machine Learning**, which is a subnet of Artificial Intelligence. Machine learning is everywhere these days, so lets see how machine learning works and how it can help you with your data!

So to look at how machine learning works, let’s image that we work for Scuba Syndrome

and have just completed an expedition where we have been monitoring sharks. We have looked at the length, girth, and weight of sharks and now we can plot a graph of this data that we collected. So if we have a simple graph of length on our x axis and weight on the y axis, we can take each of the data points we collected in our sea exploration and plot them on our graph.

So as we plot our sharks on our graph

we can see there is a linear relationship, and we can easily draw a line, or a trend line, through our data set. And this trend line can be used to predict the weight of other sharks.

And if we went on another explorations and saw other sharks, we could put each those sharks on our graph, and we could determine that if a shark is this length, then it probably weighs this amount.

And if we saw another shark that was longer, then it will most likely be heavier as well. And this data becomes our **Training Data**. But we have to ask ourselves, *is this the best line* or *is there a better line to predict the weight of sharks based on the length*. What if we drew a line through all of our sharks (the data points we collected); maybe that line is a better way to predict the weight of sharks.

Well, we just do not know, but we can add more data from our exploration. Because on our exploration we collected lots of data about sharks. And this new data is our **Testing Data**.

And *testing data* is simply data that was not used to draw our trend line, but we can use testing data to test our trend line to make sure our trend line is great at predicting the weight of a shark based on the shark’s length.

And this type of machine learning model is called **Linear Regression**. Linear Regression is great to use when we want to get a numeric value from our model. And in our example, we used Linear Regression to get the weight of the shark based on the shark’s length. But there is also **Logistic Regression** which is great when you want your outcome to be a binary output, so a yes or no, a 1 or 0. So if we used Logistic Regression as our model, we would be looking to see if yes, that is a shark or no, that is not a shark.

And there are lots of other types of machine learning models out there. **Support Vector Machines** and **Decision Trees** are both categorization models.

So at a very basic level, the way machine learning works is that it starts off with our *data*, then we choose an *algorithm* which we believe will best represent our data (and here we can use our testing data too). Then we take our *data* *plus* the *algorithm* and we train our model which is trying to find the best line to fit our data. And then the output of the training is the *model* itself which we can use to make our *predictions*.

This is a very basic level of how machine learning works. In reality there is so much more to it and so many attributes we could use from our exploration like what is the size of the **shark’s eyes**? what is the **tooth shape**? is the shark **oviparous** or **viviparous**? does the shark have a **spiracle**? what is the **color** of the shark? does the shark have a **pointed snout**? what is the **shark’s body shape**?

Most data sets you will see have much more than the two attributes (length and weight) we are using in this example,

and that is the power of machine learning. The machine learning algorithms can take all the dimensions of our data, perform calculations across those dimensions, and then infer answers across multi-dimensional large data sets.

I am just starting to learn about machine learning myself, but please reach out with any questions or if I can help!

Check out AWSJulie for more information!

## Discussion (0)