DEV Community

Cover image for Understanding Multi-Layer Perceptrons (MLPs)...
Pranjal Sharma
Pranjal Sharma

Posted on

Understanding Multi-Layer Perceptrons (MLPs)...

In the world of machine learning, neural networks have garnered significant attention due to their ability to model complex patterns. At the foundation of neural networks lies the perceptron, a simple model that, despite its limitations, has paved the way for more advanced architectures. In this blog, we will explore the limitations of perceptrons, how these can be visualized using TensorFlow Playground, and how Multi-Layer Perceptrons (MLPs) address these issues.
Perceptron in linear separable data

The Problem with Simple Perceptrons

A perceptron is the simplest type of artificial neural network, consisting of a single neuron with adjustable weights and biases. While perceptrons can solve linearly separable problems, they struggle with more complex tasks.

Linearly Separable vs. Non-Linearly Separable Problems

  • Linearly Separable: A problem is linearly separable if a single straight line (or hyperplane in higher dimensions) can separate the data points into distinct classes. For example, classifying points on a plane based on whether they are above or below a line.
  • Non-Linearly Separable: If no single line can separate the classes, the problem is non-linearly separable. An example is the XOR problem, where data points cannot be separated by a straight line.

Visualization with TensorFlow Playground

To understand the limitations of perceptrons and the power of MLPs, we can use TensorFlow Playground, an interactive tool that visualizes neural networks in action.

Access TensorFlow Playground: TensorFlow Playground

Experimenting with a Perceptron

  1. Select Dataset: Start with the "XOR" dataset, a classic example of a non-linearly separable problem.
  2. Configure the Network:

    • Input features: (x_1) and (x_2)
    • Hidden layers: None (just the output layer, making it a simple perceptron)
    • Activation function: Linear
  3. Run the Model: Click "Run" to train the model.

Perceptron in non-linear separable data

Observations

  • The perceptron fails to correctly classify the data points because it tries to draw a single straight line to separate them, which is impossible for the XOR dataset (non-linear dataset).

Introducing Multi-Layer Perceptrons (MLPs)

To overcome the limitations of perceptrons, we introduce additional layers of neurons, creating what is known as a Multi-Layer Perceptron (MLP). An MLP can model complex, non-linear relationships by using multiple hidden layers and non-linear activation functions.

Image description

Structure of an MLP

  1. Input Layer: This layer consists of neurons that receive the input features. The number of neurons in this layer equals the number of input features.

  2. Hidden Layers: These layers perform most of the computations required by the network. Each neuron in a hidden layer applies a weighted sum of inputs, adds a bias term, and passes the result through a non-linear activation function.

  3. Output Layer: The final layer of the network produces the output. The number of neurons in this layer depends on the task (e.g., one neuron for binary classification, multiple neurons for multi-class classification).

Activation Functions

Activation functions introduce non-linearity into the network, allowing it to learn complex patterns. Some common activation functions include:

  • Sigmoid: Outputs a value between 0 and 1. Useful for binary classification.

Image description

  • Tanh: Outputs a value between -1 and 1. Often used in hidden layers.

Image description

  • ReLU (Rectified Linear Unit): Outputs the input directly if positive; otherwise, it outputs zero. Helps mitigate the vanishing gradient problem.

Image description

Notation -

Notation

MLP Notation

Training an MLP

Training an MLP involves adjusting the weights and biases to minimize the difference between the predicted output and the actual output. This process is typically done using backpropagation and optimization algorithms like gradient descent.

  1. Forward Propagation: Compute the output of the network given the current weights and biases.
  2. Loss Calculation: Measure the difference between the predicted output and the actual output using a loss function (e.g., mean squared error, cross-entropy).
  3. Backward Propagation: Calculate the gradient of the loss with respect to each weight and bias.
  4. Weight Update: Adjust the weights and biases using an optimization algorithm to minimize the loss.

Visualizing MLPs with TensorFlow Playground

Let's revisit TensorFlow Playground to see how MLPs can solve the XOR problem.

  1. Select Dataset: Choose the "XOR" dataset again.
  2. Configure the Network:

    • Input features: (x_1) and (x_2)
    • Hidden layers: Add one hidden layer with 4 neurons
    • Activation function: ReLU
  3. Run the Model: Click "Run" to train the model.

MLP in linear separable data

Observations

  • The MLP successfully learns to classify the XOR dataset by creating non-linear decision boundaries. The hidden layer allows the network to combine the input features in complex ways, making it possible to separate the classes correctly.

Conclusion

By visualizing neural networks using TensorFlow Playground, we can gain a deeper understanding of the limitations of perceptrons and the capabilities of Multi-Layer Perceptrons. MLPs address the shortcomings of simple perceptrons by introducing hidden layers and non-linear activation functions, enabling them to model complex, non-linear relationships in data.

In the upcoming sections, we will explore more advanced topics, such as the role of different activation functions, the impact of network architecture, and the process of training MLPs on real-world datasets.


Stay tuned for the next blog where we'll delve into Multi-Layer Perceptrons (MLP).

Stay connected! Visit my GitHub.
Code

Join our Telegram Channel and let the adventure begin! See you there, Data Explorer! πŸŒπŸš€

Top comments (0)