I will start with an example:
We have a dataset containing different species of cats and dogs. Each entry is a sample.
For each we know the length of their tails and their ear shape (E.g. sharp tip). These are features.
class | tail | ear |
---|---|---|
cat | long | sharp |
dog | short | round |
cat | medium | sharp |
... | ... | ... |
dog | short | round |
dog | medium | round |
cat | long | sharp |
If we plot this dataset in a 2D graph it might look something like this feature space:
Through training the classifier determines the green decision boundary. When a new sample is given it will predict which class it belongs to based on the region it is in.
This green decision boundary is what is usually aimed for when designing a classification algorithm. The line (or high dimensional plane) might look a lot more complicated than the one in this picture and the data might be spread out across so many dimensions that we humans cannot visualize it.
Machine Learning is about acquiring the samples, figuring out what features are necessary and training a classifier that makes good predictions.
Top comments (0)