A compilation of Different Machine Learning Algorithms/Models for beginners in Data Science Competitons(Kaggle)

#datascience #kaggle #machinelearning #ai

Hola,

Before reading I want the reader to know that I am not an expert in data science.I am an SDE by profession. I have started spending quite a lot of my time on Kaggle and learning about data science in General.

Here I have compiled a list of frequently used ML Algorithms by various Kaggle Grandmasters, so that I can frequently lookup to this list, keep adding more stuff here for faster lookup during future Competitions(This post is just meant to be my cache)

If you consider yourself an expert, please skip this post.

1) Linear Model
1. Especially good for sparse high dimensional data.
2. Usually split a given space into two sub spaces with a line/hyperspace.
3. Regularization is usually done for Linear models in pre processing during Competitions

eg:

Logistic Regression
Support Vector Machines

Best Implementations:-

Sckit Learn
VowPal Rabbit

2) Tree Based Methods (Uses Decision tree to create models)

Here we divide spaces into sub spaces until probability of a class in a divided.

eg:

Random Forest
Gradient Boosted Decision Trees(We improve prediction probabilities based on probabilities of sum of the previous ones)
ExtraTrees Classifier

Disadvantages:

Hard to capture linear splits if it exists while classifying

Best Implementations:

Sckit Learn
XGBoost
LightGBM

3) K-NN(K nearest neighbours) methods

Based on intutiton/assumption that nearest neighbours have
similar labels.

Best Implementations are in Sckit Learn

4) Neural Networks

The most used ones according to a Kaggle Grandmaster are Feed-forward neural network which produces smooth non-linear decision boundaries.

Best Implementations:

TensorFlow
Keras
mxnet
Pytorch
Lasagne

Making Inferences from Decision Surfaces

If lines parallel to the axis and boundaries are smooth then its probably a Random Forest

Important: Choose a model for a Particular Competition based on use the use case as no model is better than others in all situations

DEV Community

A compilation of Different Machine Learning Algorithms/Models for beginners in Data Science Competitons(Kaggle)

Top comments (0)

Read next

ECCV 2024: Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning

Accelerate AI Workloads with Amazon EC2 Trn1 Instances and AWS Neuron SDK

Brain-Inspired Method Cuts Neural Networks by 90% Without Losing Accuracy

How Machine Learning Models Learn: A Journey from Basics to Foundation Models (2)