DEV Community

anirudhkannan
anirudhkannan

Posted on

1 2

A compilation of Different Machine Learning Algorithms/Models for beginners in Data Science Competitons(Kaggle)

Hola,

Before reading I want the reader to know that I am not an expert in data science.I am an SDE by profession. I have started spending quite a lot of my time on Kaggle and learning about data science in General.

Here I have compiled a list of frequently used ML Algorithms by various Kaggle Grandmasters, so that I can frequently lookup to this list, keep adding more stuff here for faster lookup during future Competitions(This post is just meant to be my cache)

If you consider yourself an expert, please skip this post.

1) Linear Model
1. Especially good for sparse high dimensional data.
2. Usually split a given space into two sub spaces with a line/hyperspace.
3. Regularization is usually done for Linear models in pre processing during Competitions

eg:

  1. Logistic Regression
  2. Support Vector Machines

Best Implementations:-

  • Sckit Learn
  • VowPal Rabbit

2) Tree Based Methods (Uses Decision tree to create models)

Here we divide spaces into sub spaces until probability of a class in a divided.

eg:

  1. Random Forest
  2. Gradient Boosted Decision Trees(We improve prediction probabilities based on probabilities of sum of the previous ones)
  3. ExtraTrees Classifier

Disadvantages:

  1. Hard to capture linear splits if it exists while classifying

Best Implementations:

  • Sckit Learn
  • XGBoost
  • LightGBM

3) K-NN(K nearest neighbours) methods

Based on intutiton/assumption that nearest neighbours have
similar labels.

Best Implementations are in Sckit Learn

4) Neural Networks

  • The most used ones according to a Kaggle Grandmaster are Feed-forward neural network which produces smooth non-linear decision boundaries.

Best Implementations:

  • TensorFlow
  • Keras
  • mxnet
  • Pytorch
  • Lasagne

Making Inferences from Decision Surfaces

  1. If lines parallel to the axis and boundaries are smooth then its probably a Random Forest

Important: Choose a model for a Particular Competition based on use the use case as no model is better than others in all situations

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay