Getting Familiar with Activation Function and Its Types.

Reading Time: 7 minutes

Hey Folks, In this blog we are going to discuss activation function in Artificial Neural Networks and their different types.

Before going there, let’s get some idea about what is an artificial neural network?

Artificial Neural Network(i.e., ANN)

Architecture of Artificial Neural Network
Architecture Artificial Neural Network

Artificial Neural Network refers to a biologically inspired sub-field of Artificial Intelligence modeled after the brain.

ANN is a computational network based on a biological neural network and tries to mimic the human brain.

Like a human brain, ANN also has neurons that are interconnected to each other in various layers of networks.
These neurons in ANN are called nodes.

Let’s just find out how the artificial neural network i.e., ANN is inspired by our human brain network.

Biological Neural Network

Biological Neural Network
  • The neural network in the human brain is composed of a groups of chemically connected or functionally associated neurons.
  • It have structures like hierarchical networks consisting feedback’s, neurons, dendrite tree and synapses.
  • A neural circuit is a colony of neurons interconnected by synapses to carry out a specific function when activated.
    Neural circuits are interconnected to one another to form a large biological neural network in brain.
  • Neurons are the elementary functional units of the nervous system.
    They generate electrical signals call action potentials, which allows them quickly transmit information over long distances.
  • A neuron comprises of three major parts:
    • Cell Body (i.e., Soma)
    • Dendrites
    • Axon
  • Cell body or Soma is the cell’s life support center. It process the input message and produces the output.
  • The dendrites are like fibers branched in different directions and are connected to many cells in that cluster.
    It receive messages from other cells.
  • The axon are responsible for passing the message away from the cell body to other neurons, muscles or glands.

Artificial Neural Network

Artificial Neural Network recognising the input provided.
  • ANN is an computing system designed to simulate the way the human brain analyses and processes information.
  • It is the beginning of artificial intelligence (AI) and solves those problems which are difficult or impossible for humans or statistical standards.
  • It has self-learning capacities that enables them to produce better results as more data provided them to get trained.
  • ANN’s consists of three layers.
    • Input Layer
    • Hidden Layer
    • Output Layer
Biological Neural NetworkArtificial Neural Network
DendritesInputs
Cell Body or SomaNodes
SynapseWeights
AxonOutput
Table 1 BNN Vs ANN

What is Activation Function?

A simple neural network depicting the role of Activation Function.

Activation functions are the most important part of a neural network.
Very complicated tasks like object detection, language transformation, human face detection, object detection, etc are executed with the help of neural networks and activation functions. So, without it, these tasks are extremely complex to handle.

It decides whether a neuron will be activated or not by calculating the weighted sum and further adding bias with it.
The goal of the activation function is to introduce non-linearity into the output of a neuron.

Activation functions normalize the output in the range of -1 to 1 for any input.

The selected activation function should be efficient and must reduce the computation time because the neural network is trained on millions of data points sometimes.

The activation function basically checks that the input received in the neural network is relevant or irrelevant.

In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs.
For example, we can consider a standard circuit that can be seen as a digital network of activation functions that can be “ON” or “OFF” depending on the input.

Need of Activation Functions.

To answer this question, we require an activation function in our neural networks to introduce non-linearity into the output of neurons.

If activation function is absent in the neural network then, the sum of weights and bias would be a linear equation i.e., polynomial with one degree only and the would be easy to solve but will be limited in terms of ability to solve complex problems or higher degree polynomial.
This type of model will be just like a linear regression model.

But on the other hand, if we introduce the activation function to our neural network, it will hold the ability to solve complex problems such a face recognition image classification and will execute the non-linear transformation to input provided to the neural network.

Neurons of neural networks work according to the weight, bias, and their respective activation function. The weights and biases of neurons are updated based on the error at the output.
The process is called back-propagation. The activation function makes the process possible by supplying the gradients with the error to update the weights and biases.

Bit confused about the mathematics of the neural network. Don’t Worry!!
Click here.

Okay let’s move further and learn the types of activation functions

Types of Actiavtion Functions

The most commonly used activation functions are following:

  • Linear
  • Binary step
  • ReLU
  • LeakyReLU
  • Sigmoid
  • Tanh
  • Softmax

Linear Activation Function

Curve of Linear Activation Function.

A simple straight line activation function, where our function is directly proportional to the weighted sum of input.

The line of positive slope increases the firing rate as the input rate increases and linear activation functions give a wide range of activations.

Equation: f(x) = mx

Range: -INT to +INT

No matter how many layers are in the neural networks, if each is having a linear activation function, then the final activation function of the output layer will be just a linear function of the input of the first layer.

The issue with the linear activation function is it will differentiate linear function to bring non-linearity, and the result will no more depend on input “x” and function will become constant, it won’t bring any unconventional behavior to the algorithm.


Binary Step Activation Function

Curve for Binary Step Activation Function

A very basic activation function, when we try to bound our output it comes to our mind every time. It is basically a classifier that classifies the output based on the threshold.

In this function, we decide the threshold value.

Output is greater than the threshold, neuron activated otherwise deactivated.

Equation: f(x) = 1 if x > 0
0 if x<0

For Binary classifiers or problems, we put the threshold value to be 0.

ReLU Activation Function

Curve for Rectified Linear Unit Activation Function.

ReLU stands for Rectified Linear Unit, the most widely used activation function.

Primarily used in hidden layers of artificial neural networks.

Equation: f(x) = max(0,x)

It gives an output x if x is positive and 0 otherwise.

Negative values converted to zero, and the conversion rate is so fast that neither it can map nor fit into data properly which creates a problem.

Non-linear nature, which means that the errors can easily backpropagate and activates multiple layers of neurons.

Range: [0, INT)

ReLU is comparatively less computationally expensive than Tanh and Sigmoid because it involves sigmoid, simpler mathematical operations.
At a time only a few neurons activated making the network sparse, efficient, and easy for computation.

Leaky ReLU Activation Function.

Curve for Leaky Rectified Linear Unit Activation Function

Leaky ReLU function is an improved version of the ReLU activation function.
It has a small slope for negative values instead of a flat slope.

It solves the “Dying ReLU” problem, as all the negative input values turn into zero rapidly, which would deactivate the neurons in that region.
In Leaky ReLU we do not convert all the negative inputs to zero, but near zero that solved the major issue of the ReLU activation function.

Equation: f(x) = max(0.01*x, x)

It returns x for positive input, but for negative value if x, it returns a very small value which is 0.01 times of x.
Thus it gives an output for negative value as well.

Sigmoid Activation Function

Curve for Sigmoid Activation Function

Mostly used activation function because it does its task with great efficiency.
It is a probabilistic approach to decision-making.

Equation: f(x) = 1/(1+ e-x)

Non-linear nature, as the x value lies between -2 to 2, y values are very steep, which means that a small change in x would bring a large change in the value of y.

Range: 0 to 1

Usually used in the output layer of binary classifiers, where the result is either 0 or 1.

When we have to make a decision or to predict an output we use the sigmoid activation function because of its minimum range, which makes prediction more accurate.

Tangent Hyperbolic Activation Function(Tanh)

Curve for Tangent Hyperbolic Activation Function

Tanh stands for Tangent Hyperbolic activation function.
It works almost better than the sigmoid function.

Equation: f(x) = tanh(x) = 2/(1+e-2x) – 1
OR tanh(x) = 2 * sigmoid(2x) – 1

Range: -1 to 1

Like sigmoid activation function, used in hidden layers as its values lie between -1 to 1 hence the mean for the hidden layer comes out to be 0 or very close to it, hence it helps in centering the data by bringing mean close to 0. This makes learning for the next layer easier and to predict or to differentiate between two classes but it maps the negative input into negative quantity only.

Softmax Activation Function

Curve for Softmax activation function

The softmax function is also a type of sigmoid function, mostly used for classification problems.

Used primarily at the last layer i.e., Output layer for decision making like sigmoid function works.

Both sigmoid and softmax, considered for Binary Classification problems but when we try to handle multi-class classification problems. It would squeeze the outputs for each class between 0 and 1 and would squeeze the outputs for each class between 0 and 1 and would also divide by the sum of the outputs.

Conclusion

So, in this blog, we got an idea about biological neural networks, artificial neural networks. We have discussed the activation function and its need. We have also discussed the 7 types of majorly used activation functions. All these activation functions, used for the same purpose depending on different conditions.

References

Written by 

Durgesh Gupta is a Software Consultant working in the domain of AI/ML.