Domas Bitvinskas

Posted on Jul 22, 2020 • Originally published at closeheat.com

ELU Activation Function

#machinelearning #python #pytorch #tensorflow

Exponential Linear Unit (ELU) is a popular activation function that speeds up learning and produces more accurate results. This article is an introduction to ELU and its position when compared to other popular activation functions. It also includes an interactive example and usage with PyTorch and Tensorflow.

Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter introduced ELU in Nov 2015. It outperformed ReLU-based CIFAR-100 networks at the time. To this day, ELUs are still popular among Machine Learning engineers and are well studied by now.

What is ELU?

ELU is an activation function based on ReLU that has an extra alpha constant (α) that defines function smoothness when inputs are negative. Play with an interactive example below to understand how α influences the curve for the negative part of the function.

ELU calculation

The ELU output for positive input is the input (identity). If the input is negative, the output curve is slightly smoothed towards the alpha constant (α). The higher the alpha constant, the more negative the output for negative inputs gets.

ELU vs ReLU

ELU and ReLU are the most popular activation functions used. Here are the advantages and disadvantages of using it when compared to other popular activation functions.

Advantages of ELU

Tend to converge faster than ReLU (because mean ELU activations are closer to zero)
Better generalization performance than ReLU
Fully continuous
Fully differentiable
Does not have a vanishing gradients problem
Does not have an exploding gradients problem
Does not have a dead relu problem

Disadvantages of ELU

Slower to compute (because of non-linearity for negative input values)

ELU is slower to compute, but ELU compensates this by faster convergence during training. During test time ELU is slower to compute than ReLU though.

Usage in machine learning frameworks

PyTorch ELU usage

To use ELU in PyTorch, use torch.nn.ELU function. Here's an example model that uses ELU:

import torch
from torch import nn

class Model(nn.Module):
    def __init__(self, dataset):
        super(Model, self).__init__()

        self.layer1 = nn.Sequential(
            nn.Conv3d(in_channels=4, out_channels=2, kernel_size=2),
            nn.ELU(alpha=2.0)
        )

    def forward(self, x):
        return self.layer1(x)

Note that alpha is 1 by default. To learn more, read PyTorch ELU documentation.

TensorFlow ELU usage

The easiest way to use ELU in TensorFlow is to use Keras layers. Example TensorFlow model below that uses ELU:

import tensorflow as tf

class Model(tf.keras.Model):
    def __init__(self):
        super(Model, self).__init__()
        self.layer1 = tf.keras.models.Sequential([
            tf.keras.layers.Conv2D(4, kernel_size=(2, 2), input_shape=(6, 6, 1)),
            tf.keras.layers.ELU(alpha=2.0)
        ])

    def call(self, x):
        return self.layer1(x)

Note that alpha is 1 by default. More details in TensorFlow ELU documentation.

Next steps

Now you know how ELU works and where it stands compared to other popular activation functions.

Here's what you can do next:

Learn more about other activation functions.
Try different values for alpha and see how it influences your model accuracy and training speed.
Use ELU in your machine learning models (try LSTMs).

Top comments (1)

Domas Bitvinskas • Jul 22 '20

Hey! Wanted to dive deeper into the activation functions in machine learning. ELUs are most commonly used nowadays, so wanted to see how alpha influences the curve, calculations, pros/cons. Hope it also helps somebody else!

DEV Community

ELU Activation Function

What is ELU?

ELU calculation

ELU vs ReLU

Advantages of ELU

Disadvantages of ELU

Usage in machine learning frameworks

PyTorch ELU usage

TensorFlow ELU usage

Next steps

Top comments (1)

Read next

Exploring Test Automation in Embedded Systems Testing

eq and ne in PyTorch

Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

How I Use Scikit-Learn for Data Science Projects