##### Deep LearningTutorials

# Implement Long-short Term Memory (LSTM) with TensorFlow

*This article is an excerpt from the book,*

*Deep Learning Essentials*

*written by Wei Di, Anurag Bhardwaj, and Jianing Wei. This book will help you get started with the essentials of deep learning and neural network modeling.*

In today’s tutorial, we will look at an example of using LSTM in TensorFlow to perform sentiment classification.

The input to LSTM will be a sentence or sequence of words. The output of LSTM will be a binary value indicating a positive sentiment with 1 and a negative sentiment with 0. We will use a many-to-one LSTM architecture for this problem since it maps multiple inputs onto a single output.

Figure LSTM: Basic cell architecture shows this architecture in more detail. As shown here, the input takes a sequence of word tokens (in this case, a sequence of three words). Each word token is input at a new time step and is input to the hidden state for the corresponding time step.

For example, the word Book is input at time step t and is fed to the hidden state ht: Sentiment analysis:

To implement this model in TensorFlow, we need to first define a few variables as follows:

```
batch_size = 4
lstm_units = 16
num_classes = 2
max_sequence_length = 4
embedding_dimension = 64
num_iterations = 1000
```

As shown previously, batch_size dictates how many sequences of tokens we can input in one batch for training. lstm_units represents the total number of LSTM cells in the network. max_sequence_length represents the maximum possible length of a given sequence. Once defined, we now proceed to initialize TensorFlow-specific data structures for input data as follows:

```
import tensorflow as tf
labels = tf.placeholder(tf.float32, [batch_size, num_classes])
raw_data = tf.placeholder(tf.int32, [batch_size, max_sequence_length])
```

Given we are working with word tokens, we would like to represent them using a good feature representation technique. Let us assume the word embedding representation takes a word token and projects it onto an embedding space of dimension, embedding_dimension. The two-dimensional input data containing raw word tokens is now transformed into a three-dimensional word tensor with the added dimension representing the word embedding. We also use pre-computed word embedding, stored in a word_vectors data structure. We initialize the data structures as follows:

```
data = tf.Variable(tf.zeros([batch_size, max_sequence_length,
embedding_dimension]),dtype=tf.float32)
data = tf.nn.embedding_lookup(word_vectors,raw_data)
```

Now that the input data is ready, we look at defining the LSTM model. As shown previously, we need to create lstm_units of a basic LSTM cell. Since we need to perform a classification at the end, we wrap the LSTM unit with a dropout wrapper. To perform a full temporal pass of the data on the defined network, we unroll the LSTM using a dynamic_rnn routine of TensorFlow. We also initialize a random weight matrix and a constant value of 0.1 as the bias vector, as follows:

```
weight = tf.Variable(tf.truncated_normal([lstm_units, num_classes]))
bias = tf.Variable(tf.constant(0.1, shape=[num_classes]))
lstm_cell = tf.contrib.rnn.BasicLSTMCell(lstm_units)
wrapped_lstm_cell = tf.contrib.rnn.DropoutWrapper(cell=lstm_cell,
output_keep_prob=0.8)
output, state = tf.nn.dynamic_rnn(wrapped_lstm_cell, data,
dtype=tf.float32)
```

Once the output is generated by the dynamic unrolled RNN, we transpose its shape, multiply it by the weight vector, and add a bias vector to it to compute the final prediction value:

```
output = tf.transpose(output, [1, 0, 2])
last = tf.gather(output, int(output.get_shape()[0]) - 1)
prediction = (tf.matmul(last, weight) + bias)
weight = tf.cast(weight, tf.float64)
last = tf.cast(last, tf.float64)
bias = tf.cast(bias, tf.float64)
```

Since the initial prediction needs to be refined, we define an objective function with crossentropy to minimize the loss as follows:

```
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits
(logits=prediction, labels=labels))
optimizer = tf.train.AdamOptimizer().minimize(loss)
```

After this sequence of steps, we have a trained, end-to-end LSTM network for sentiment classification of arbitrary length sentences.

To summarize, we saw how effectively we can implement LSTM network using TensorFlow.

*If you are interested to know more, check out this book **Deep Learning Essentials* which will *help you take first steps in training efficient deep learning models and apply them in various practical scenarios.*