A Complete Guide To Recurrent Neural Network

Table of contents

Reading Time: 5 minutes

Recurrent neural networks are a type of deep learning-oriented algorithm that follows a sequential approach. Neural networks always assume that each input and output is independent of all other layers. This type of neural network is recurrent neural network because it performs mathematical calculations in a sequence.

Neural networks imitate the function of the human brain in the fields of AI, machine learning, and deep learning, allowing computer programs to recognize patterns and solve common issues. recurrent neural networks are a type of neural network that can be used to model sequence data. RNNs, which are formed from feedforward networks, are similar to human brains in their behavior. Simply said, recurrent neural networks can anticipate sequential data in a way that other algorithms can’t.

All of the inputs and outputs in standard neural networks are independent of one another, however in some circumstances, such as when predicting the next word of a phrase, the prior words are necessary, and so the previous words must be remembered. As a result, RNN was created, which used a Hidden Layer to overcome the problem. The most important component of RNN is the Hidden state, which remembers specific information about a sequence.

RNNs have a Memory that stores all information about the calculations. It employs the same settings for each input since it produces the same outcome by performing the same task on all inputs or hidden layers.

RNN architecture can vary depending on the problem you’re trying to solve. From those with a single input and output to those with many (with variations between).

Below are some examples of RNN architectures that can help you better understand this.

One To One: There is only one pair here. A one-to-one architecture is used in traditional neural networks.
One To Many: A single input in a one-to-many network might result in numerous outputs. One too many networks are used in the production of music, for example.
Many To One: In this scenario, a single output is produced by combining many inputs from distinct time steps. Sentiment analysis and emotion identification use such networks, in which the class label is determined by a sequence of words.
Many To Many: For many to many, there are numerous options. Two inputs yield three outputs. Machine translation systems, such as English to French or vice versa translation systems, use many to many networks.

Common Activation Functions

The activation function is a non-linear transformation that we do over the input before sending it to the next layer of neurons or finalizing it as output. A neuron’s activation function dictates whether it should be turned on or off. Nonlinear functions usually transform a neuron’s output to a number between 0 and 1 or -1 and 1.

Consider the following steps to train a recurrent neural network −

1 − Input a specific example from dataset.

2 − Network will take an example and compute some calculations using randomly initialized variables.

3 − The predicted result is then compute.

4 − The comparison of actual result generated with the expected value will produce an error.

5 − To trace the error, it is propagated through same path where the variables are also adjusted.

6 − The steps from 1 to 5 are repeated until we are confident that the variables declared to get the output are correct .

7 − A systematic prediction is made by applying these variables to get new unseen input.

The schematic approach is below −

Advantages of Recurrent Neural Network
1. An RNN remembers each and every information through time. It is useful in time series prediction only because of the feature to remember previous inputs as well. This is called Long Short Term Memory.

2. Recurrent neural network are even used with convolutional layers to extend the effective pixel neighborhood.

Disadvantages of Recurrent Neural Network

1. Gradient vanishing and exploding problems.
2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an activation function.

Recurrent Neural Network Implementation using Tensor Flow

In this section, we will learn how to implement recurrent neural network with Tensor Flow.

Step 1 − Tensor Flow includes various libraries for specific implementation of the recurrent neural network module.

#Import necessary modules
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import rnn
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)

Step 2 – Our main motivation is to consider each image line as a sequence of pixels and use an iterative neural network to classify the images. The shape of the MNIST image is specifically define as 28 * 28px. Here we cover 28 sequences with 28 steps for each sample mentioned. Defines the input parameters for creating a sequential pattern.

n_input = 28 # MNIST data input with img shape 28*28
n_steps = 28
n_hidden = 128
n_classes = 10

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes]
weights = {
   'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
   'out': tf.Variable(tf.random_normal([n_classes]))
}

Step 3 − Compute the results using a defined function in RNN to get the best results. Here, each data shape is compared with current input shape and the results are computed to maintain accuracy rate.

def RNN(x, weights, biases):
   x = tf.unstack(x, n_steps, 1)

   # Define a lstm cell with tensorflow
   lstm_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

   # Get lstm cell output
   outputs, states = rnn.static_rnn(lstm_cell, x, dtype = tf.float32)

   # Linear activation, using rnn inner loop last output
   return tf.matmul(outputs[-1], weights['out']) + biases['out']

pred = RNN(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()

Step 4 − In this step, we will launch the graph to get the computational results. This also helps in calculating the accuracy for test results.

with tf.Session() as sess:
   sess.run(init)
   step = 1
   # Keep training until reach max iterations
   
   while step * batch_size < training_iters:
      batch_x, batch_y = mnist.train.next_batch(batch_size)
      batch_x = batch_x.reshape((batch_size, n_steps, n_input))
      sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
      
      if step % display_step == 0:
         # Calculate batch accuracy
         acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
         
         # Calculate batch loss
         loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
         
         print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
            "{:.6f}".format(loss) + ", Training Accuracy= " + \
            "{:.5f}".format(acc))
      step += 1
   print("Optimization Finished!")
      test_len = 128
   test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
   
   test_label = mnist.test.labels[:test_len]
   print("Testing Accuracy:", \
      sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

The screenshots below show the output generated −