# Learning the Differences between Softmax and Sigmoid for Image Classification

Happy second week of the #100DaysofCode challenge and Happy Thanksgiving, check out week one where I discussed parsing CSV rows into separate text files.

During this past week I continued research and development on the project Analyzing Deforestation and Urbanization Using Intel AI Technologies project. In this project, I have been working with Intel Optimized TensorFlow for image classification of satellite imagery on the Intel Nuc. In this second week I focused on getting a better understanding of neural networks and how they can use softmax or sigmoid for image classification based on the desired output.

The first thing that I found interesting to research was the differences between multi-class and multi-label classification models. When referring to a multi-class model this means that the output is assigned only one label from the many that exist. Therefore, when looking at images of veggies you can have squash, cucumber, and carrot labels, but the image of the carrot will only receive one label as the output and cannot be multiple veggies at once. In contrast, multi-label classification can assign multiple outputs to an image. This can be seen easily in text which can talk about multiple topics at the same time. Once I understood the difference between multi-class or multi-label, I started to look into how softmax and sigmoid could be used for each case and why.

### Softmax

Through my research, it became apparent that a softmax layer was good for multi-class classification while a sigmoid was good for multi-label. The softmax layer of a neural network is a generalized logistic function that allows for multi-lables. Softmax allows for us to handle  where k is the number of classes. Softmax is used to calculate the probability distribution of a particular label over k different labels. Softmax returns a range of 0 to 1 for its outputs with all probabilities equalling 1. For multi-lables, this will return a probability for each label with the target label having the highest probability. This is ideal when predicting one label over a set of labels. Such can be represented in TensorFlow as such:

# tf.nn.softmax
final_tensor = tf.nn.softmax(logits, name=final_tensor_name)


This sets the final tensor in the image reclassifcation script by computing the softmax activations. With this, another aspect of the neural network that can be updated with softmax is the cross entropy function. This function, as seen below, allows for probability error to be measured in discrete classification tasks which are mutually exclusive.

# tf.nn.softmax_cross_entropy_with_logits
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,ground_truth_input)


This means that an image is within one task, i.e. one image is of one fruit which can be chosen from a set of fruits, but the image cannot be more than one fruit.

### Sigmoid Function

In contrast, a sigmoid function can be used in multi-label classification. The sigmoid function is another logistic function that has a characteristic "S-curve", or a smoothed out version of a step function. Sigmoids are often introduced into neural nets to provide non-linearity to the model and are typically used for clustering, pattern classification, and function approximation. Unlike softmax which gives a probability distribution around k classes, sigmoid functions allow for independent probabilities. When looking at a sigmoid function as a neuron in a neural network, input values of a sigmoid neuron can be any value between 0 and 1 and the output is the sigmoid function. Such can be represented in TensorFlow as such:

# tf.nn.sigmoid
final_tensor = tf.nn.sigmoid(logits, name=final_tensor_name)


This sets the final tensor in the image reclassifcation script by computing the sigmoid activations. With this, another aspect of the neural network that can be updated with sigmoid is the cross entropy function. This function, as seen below, measures the probability error for discrete classification tasks.

# tf.nn.sigmoid_cross_entropy_with_logits
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits,ground_truth_input)


These classification tasks are not mutually exclusive and each class is independent. Therefore, this function allows for multi-label classification where an image can contain multiple fruits that need to be detected.

What interesting problems have you faced this week?

### Discussion

Pimpwhippa

Your article gives me so many Eureka!s. How are the satellite images got loaded into your Optimized TensorFlow? I have never used TensorFlow so I don't know the pre-processing step. How do you turn the images into blobs? Do you use Opencv? I still cannot detect snakes that don't curl :P

vocong25

Nice explanation! Can you recommend me some techniques or some papers mentions about how to build a multi-label dataset? This work requires a lot of efforts.