Activation Function : Everything You Needed To Know

Reading Time: 4 minutes

An activation function is a mathematical function that accepts input and produces output. It translates the input to the output of a layer-specific perceptron. These functions cause neurons to activate. It’s a non-linear adjustment we make to input before sending it to the next layer of neurons. Transfer Function is the another name for it.

We employ Activation functions for a variety of reasons :

1. For regularisation of input.
2. For Non-linearisation of input data.
3. For normalisation of input data.
4. Examine whether the presented data is relevant to the model’s forecast.
5. It converts the values from 0 to 1, -1 to 1, and so on (depending upon the function).

Types of Activation Functions

Activation functions come in a variety of forms. Each is significant in its own way. Let’s take a look at each one individually.

Sigmoid (Logistic) Activation Function

The Sigmoid function is an s-shaped curve with a result that ranges from 0 to 1.

Pros :-
1. For a classification task, this is incredible.
2. The network becomes more complicated. As a result of the non-linearity, allowing us to use it for more difficult tasks.

Cons :-
1. If the gradient reaches 0, no learning happens.
2. The output of f′(x) is between 0 and 1, as shown in the graph.If you use the Sigmoid function for each layer of an n-layer neural network. The gradient will get less and smaller, as the signal is back-propagated, leading to the Vanishing Gradient Problem.

f(x) is a sigmoid function.
g(x) is derivative of f(x).

Tanh Activation Function

The sigmoid function and the tanh function seem identical. The only difference is that the output ranges between -1 to 1.

Pros :
1. Optimisation is easier
2. Derivative /Differential of the Tanh function (f’(x)) will lie between 0 and 1.

Cons :
1. Slow convergence (due to use of exponential maths function )
2. A derivative of the Tanh function suffers Vanishing gradient and Exploding gradient problem.

f(x) is tanh function.
g(x) is its derivative.


ReLu Activation Function (ReLu- Rectified Linear Unit)

As the name implies,it is a modified form of a linear function. It solves vanishing gradient problems that sigmoid and tanh can’t solve.

Pros :
1. This function does not simultaneously stimulate all neurons or perceptrons.
2. Computationally efficient.
3. Very fast convergence.

Cons :
1. Derivative vanishes for negative values of input.
2. If input is sufficiently negative, the output will be 0 and as such, the gradient will always be 0.

Leaky ReLU Activation Function

ReLU has been tweaked to create Leaky ReLU. Leaky ReLU solves the problem of vanishing gradient for negative input values.

Negative inputs and derivatives for negative inputs are not discarded, as they are with ReLU.
The issue with leaky ReLU is that for input=0, the derivative is not defined.

Softmax / Normalized Exponential Function

The softmax activation function is a generalised variation of the sigmoid function.

It is the mathematical function that turns a number vector into a probability vector. In the case of multi-class classification, the Softmax activation function is widely utilised as an activation function.

“The probability for a data point belonging to each particular class is obtained by the Softmax function.”


Here,I had attempted to discuss a few widely used activation functions. Other activation functions exist as well, but the overall concept stays the same. Better activation functions are presently being researched.I hope you will now have a better understanding of activation functions.