MachineX: The alphabets of Artificial Neural Network

Reading Time: 4 minutes

In this blog, we will talk about the Neural network which is the base of deep learning which gave machine learning an ultra edge in the current AI revolution. Let’s get started!!!!!!

before diving into deep learning, let’s know –

Why Deep Learning ???

Well, there are plenty of reason , few of them are:

  • Deep learning is most popular than shallow level learning once you have a huge quantity of data (either labelled or not).
  • Awesome progressive performance in tasks involving text, sound, or image. several advances in computer vision, NLP and speech recognition.
  • Feature illustration or abstract representation, we tend to don’t need to spend time on feature engineering a lot of.

Okay, got it? Let’s go ahead and tell us more about Neural networks.


Neuron is a computational unit which takes the input(‘s), does some calculations and produces the output. that’s it no big deal.

Above, within the figure is the one we tend to use in Neural Network. we’ve got the input A neuron and we have some weights(parameters) we apply the real of those two vectors. It will produce the result (which would be a continuous value -infinity to + infinity).

if we wish to limit the output values we tend to use an Activation function.

The activation function squashes the output value and turnout a value within a rage (which is based on the kind of activation function).

We a these three (Sigmoid range from 0 to 1, Tanh from -1 to 1 and Relu from 0 to +infinity).

Activation Function

  • Sigmoid function (σ): g(z) = 1 / (1 + e^{-z}). It’s recommended to be used only on the output layer so that we can easily interpret the output as probabilities since it has restricted output between 0 and 1. One of the main disadvantages for using sigmoid function on hidden layers is that the gradient is very close to zero over a large portion of its domain which makes it slow and harder for the learning algorithm to learn.
  • tanh function: tanh is also like logistic sigmoid but better. The range of the tanh function is from (-1 to 1). tanh is also sigmoidal (s – shaThe advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph.The tanh function is mainly used classification between two classes.ped).
  • Rectified Linear Unit (ReLU): g(z) = max{0, z}. The models that are close to linear are easy to optimize. Since ReLU shares a lot of the properties of linear functions, it tends to work well on most of the problems. The only issue is that the derivative is not defined at z = 0, which we can overcome by assigning the derivative to 0 at z = 0. However, this means that for z ≤ 0 the gradient is zero and again can’t learn.tanh

Yups !!!!! that’s the neuron.

Neural Network

A neural network is a set of layers(a layer has a set of neurons) stacked together sequentially.

The output of one layer would be the input of the next layer.

Here in the above image we have three layers.

  1. Input layer: A set of input neurons where each neuron represents each feature in our dataset. It takes the inputs and passes them to the next layer.
  2. Hidden layer: A set of (n) no of neurons where each neuron has a weight(parameter) assigned to it. It takes the input from the previous layer and does the dot product of inputs and weights, applies activation function (as we have seen above), produce the result and pass the data to next layer.

Note: We can have (n) no of hidden layers in between. (for sake of understanding let’s take only one hidden layer).

4. Output layer: it’s same hidden layer except it gives the final result (outcome/class/value).

so How do we define no of neurons in each layer and the whole network???

based on no of features in the dataset, Input layer’s neurons are decided.

N_Features= N_i/p_neurons+1(bias)

we can define as many neurons/layers as we tend to wish (it depends on the data and problem) however would be good to define over features and all hidden layers have same no of neurons.


if regression then 1 neuron, for binary classification you can have 1 or 2 neurons. and for multi-classification more than 2 neurons.

Note: there is no bias here as it is the last layer in the network.

We got the basic understanding of Neural Network so let’s get into deep.

Once you got the dataset and problem identified, you can follow the below Steps:

1. Pick the network architecture(initialize with random weights)
2. Do a forward pass (Forward propagation)
3. Calculate the total error(we need to minimize this error)
4. Back propagate the error and Update weights(Back propagation)
5. Repeat the process(2-4)for no of epochs/until error is minimum.

There are 2 algorithms in Neural Networks

1.Forward propagation.

2.Back propagation.

we will talk about these steps and Forward & backward propagation in next blog.

Stay tuned !!!!!

Written by 

Shubham Goyal is a Data Scientist at Knoldus Inc. With this, he is an artificial intelligence researcher, interested in doing research on different domain problems and a regular contributor to society through blogs and webinars in machine learning and artificial intelligence. He had also written a few research papers on machine learning. Moreover, a conference speaker and an official author at Towards Data Science.