A BRIEF INTRO TO CLASSIFICATION IN MACHINE LEARNING

Classification in ML is a type of supervised learning where the goal is to predict which class a given input belongs to. These categorical predictions are derived from the labeled training data and fed to the algorithm. Some examples of where classification can be applied include trying to predict what’s in a given image or whether or not one would default on the credit card bills.

HOW IT WORKS

In the training phase, the algorithm receives a dataset with input features (e.g, age, income) and known labels. For example, approved/denied loans. The data is then analysed to find patterns or relationships between the features. These patterns are then captured through a mathematical model. In the pattern recognition phase, the algorithm looks for decision boundaries that separate different classes. Let's say: If income > 50k AND credit score > 700, then likely approved.

MODELS USED FOR CLASSIFICATION
Decision Trees
This is simply a tree of yes/no questions. It will make decisions based off a series of questions based on the data’s features. It’s easy to understand but can overfit.

K-Nearest Neighbours
A simple algorithm that will classify a datapoint based on the majority class of their “k” nearest neighbours.

Random Forest
This is an ensemble technique that will basically combine multiple decision trees to improve on the accuracy and reduce the risk of over fitting.

Logistics Regression
Typically used for binary classification. This algorithm uses probability curves to estimate likelihood of each class. For instance, probabilities could be a 0.8 chance of approval, 0.2 chance of denial The final prediction based on highest probability.

Applications of classification algorithms can be found in spam filtering, sentiment analysis, image recognition, fraud detection, and medical diagnosis, among others.

PERSONAL THOUGHTS AND INSIGHTS

For a long time, I've been fascinated by how different models work to predict outcomes and boost productivity. I had no idea there were so many approaches to this, like classification, regression, and more. Exploring them, both in my academic classes and for personal projects, has helped me understand how to efficiently use each method. I've learned that the key isn't to just use the latest or most complex model, but to choose the right one for the specific problem at hand.

I believe that the next major trend in AI is its role as a "second brain". My view is that while our brains are exceptional at creative leaps and intuition, they aren't built to store and instantly recall every piece of information we encounter. This is where AI, particularly Generative AI, comes in as the perfect complement.

It's not about replacing our minds but about offloading the mundane tasks of information retrieval and organization. It's an external memory that doesn't just hold data; it actively helps you connect ideas, synthesize information, and spark new insights.

CHALLENGES FACED SO FAR
While i’m still a newbie and haven’t had as much experience working with classification models yet, some of the challenges i’ve faced include:
Dealing with imbalanced datasets
I was trying to build a model to detect a rare event, and I quickly learned that if my training data had way more examples of one class than another, the model would become biased. This forced me to explore techniques like oversampling and undersampling to balance the classes
Deciding on the right Features
Another challenge was realizing that the raw data often isn't enough. It's not just about having a lot of data; it's about having the right data, and sometimes you have to create it yourself

DEV Community

A BRIEF INTRO TO CLASSIFICATION IN MACHINE LEARNING

Top comments (0)