## DEV Community

Pol Monroig Company

Posted on

# Making algorithms work together

In every machine learning application you may have found that every machine learning algorithm you try has its flaws. As the no free lunch theorem states there is no single learning algorithm in any domain always induces the most accurate learner. Learning is ill-posed since we do not have all the data and each algorithm converges for a specific subset of the original data. What if there was a way to combine the knowledge of different algorithms?(Spoiler: there is!)

# Combining base-learners

The idea is to have multiple base learners, each an expert in a subset of the dataset and accurate in it. This way each learner complements each other. There are two aspects we must consider:

• We must choose base-learners that complement each other (otherwise we would have unnecessary redundancy). A synonym to this is that they must not be correlated.
• Each learner must try to have the highest accuracy, we should also find a way to combine the output of each learner to achieve it.

# Diversifying learners

There are multiple ways in which we can diversify each learning algorithm. For instance, we can use different algorithms, we can also try similar algorithms but with different hyperparameters or each algorithm could be trained with a different data subset.

### Voting

Voting is the most simple technique in which the output of multiple classifiers is combined using a linear combination (i.e. sum, product, median, weighted sum, maximum, minimum). The main idea behind this method is the we create a voting scheme over high variance and low bias models, thus bias remains small and variance is reduced. An analogy to this is when we have multiple noisy samples of the same data, if we combine them it is much easier to sort out the noise.

### Bagging

This is variance of the voting method. The difference is that each base-learner is trained on different training sets. To generate each set we draw instances randomly with replacement. By replacing the instances it is possible that many instances are repeated and that some are not used at all.

### Boosting

This type of mixer works by using simple learners that actively learn on the mistake of the previous learners, the most commonly and effective use of boosting is called AdaBoost. For AdaBoost to work, the learners must be very weak, otherwise we have a high risk of overfitting. AdaBoost combines the votes based on weights proportional to the accuracy of each base-learner in the training set.

### Stacking

Stacking is mechanism in which multiple base-learners are trained using the train dataset. After that, a new training data is created by voting the predictions of each base-learner. Finally, another learner (called the mixer) learns from the predictions of the other algorithms.