If you are looking for a short answer, I would say real life image dataset are not small like MNIST to build a model with a fully connected neural network. But let’s produce some dopamine and explore the convolutional neural network a bit more in-depth.
Our Visual Cortex focuses on certain areas to identify any image and similarly Convolutional Neural Network also focuses on the few areas of the image to extract out the feature out of it. Basically, when we provide the image in the input layer (Considering you have the knowledge to Neural Network already) all neurons in the first hidden layer are not connected to every single pixel. In fact, they connect to a few particular pixels (an area if we imagine it as an image). Similarly, the next hidden layer is also not connected to all the neurons but few. Similarly so on and so on in the next hidden layers. The picture below depicts the same –
In order to go further, we need to understand the below terminologies first –
Receptive field: It’s basically the area we get to know in the first two paragraphs. In terms of pixels, we can have them as 3 x 3 or 3 x 4 or 5 x7 etc. Look at the image below to understand it better.
Zero padding: In order to process all the pixels or to cover the sides and corners in short, we add some additional pixels while processing. This is what we call as Zero padding. In the above image, the grey squares represent the zero padding.
Stride: The distance between two consecutive receptive fields is called the stride. Now, this is important because this is going to be our hypermeter while building the model (Confirm it). Below image would help understand it better.
Filters: Also known as the convolutional kernel, is a matrix or the weights of the neurons which is used for extracting features. It gets applied to the receptive fields to fetch the features. Below is the image of how it looks after applying the filters.
Feature map: Feature map is the output of applying filters on the image that gets applied to the previous layer. So think, we have given the image as an input, we got the receptive fields in the next layer and then we apply the filter on them and in the next layer we got the feature map.
Pooling Layer: Pooling layer is responsible for shrinking the input image. Similar to the convolutional layer each neuron in a pooling layer is connected to the outputs of a limited number of neurons of the previous layer, located within a receptive field. Basically, pooling layers don’t keep any weights and just aggregates the inputs using an aggregation function such as max or mean.
Now let’s summarize convolutional neural network using these terminologies. So it’s a neural network, with the exception that it won’t be fully connected to all the pixels of an image, instead, it will divide the focus to different areas so that it can be fit into their receptive fields. The receptive field is like one neuron is seeing a particular part of the image. Now not necessarily that part covers everything that a neuron needs to see, it’s quite possible that some portion of the other part also required on that receptive field. That’s where the stride comes into play and it covers that part. Although, by looking at the image we might feel like the division of image parts is easy and we are handling the overlapping using stride anyway, however not necessarily it will cover the corner cases. That’s where the zero-padding comes into the picture which helps divide the images well to get well-received in the receptive fields. Then comes the pooling which just aggregates the inputs and makes the image shrink. Once we are set with dividing the image and receiving it, the Filters are what helps us extracting the Feature Maps. And feature map is what the NN would be required to find patterns.
Now we will go into a bit of coding in our next section. But let’s conclude theoretically why Convolutional Neural Network. Well, it’s quite simple if you have read everything mentioned above carefully, just like the Neural Network was an inspiration how the brain works, Convolutional neural network is an inspiration how the brain identifies images. However, there is more to it and people have developed different other algorithms to enhance the convolutional neural network capabilities to complement the inspiration which we will get to know in our next set of blogs.
Instead of seeing the code on this blog page, let’s see them in Github.
Note: It might seem like the convolutional neural network is the hero here on its own, but the truth is, it’s not without the power of GPU. Over the past years, the evolution of hardware has made it possible. The power of GPU compliments CNN and hence CNN becomes a hero.