Silicon Vanity | Tech lifestyle in Silicon Valley: LSTM, Recurrent Neural Network, GRU Review and Notes

Tuesday, March 5, 2019

LSTM, Recurrent Neural Network, GRU Review and Notes - Udacity Deep Learning Nanodegree

RNN averaging noisy samples yield less noisy results.

RNN - music input each row is a new note, like a beat in the song the row number is the number of beats in a song, the columns are mostly zero except one entry each row, it’s the note hard encoddd

timestep is a segment of the song length we are using to train. Ensures same sequence length

https://github.com/udacity/deep-learning-v2-pytorch/blob/master/recurrent-neural-networks/char-rnn/Character_Level_RNN_Solution.ipynb

LSTM overcomes the vanishing gradient problem of RNN. Back propagation through time, can make gradient too small. Avoid loss of information

LSTM allows learning across many different steps. 1000 steps.
The cell is fully differentiable. All its functions have a derivative, and hence a gradient. That can be computed. Including: sigmoid, hyperbolic tangent, multiplication, addition. Easy use of backpropagation or SGD to update the weights.

Sigmoid threshold is the key to manage: what goes into the cell, what retains within the cell, what passes the output.

If RNN set hidden state as None then all the hidden state weights will just be zero.

At first the blue line is just flat, hasn’t learn anything yet. As it learns, it starts to track red line well. Eventually it gets close. But suddenly, in this Udacity lecture the graph looks like it flipped upside down?! This is the same graph but for better visualization, it is flipped, so that the two graph look like their track each other nicely on this new axis. But the lecturer didn’t point this out so it looked surprising.

https://pytorch.org/docs/stable/nn.html#recurrent-layers

If detach hidden variable, but assigning hidden.data to a new variable that means no need to do back propagation on this particular variable that is detached

GRU dimensions (num_layer, batch size, hidden dimensions )

One nuance is that tanh activation function may work better than sigmoid with RNN>

Gated Recurrent Unit

Works well in practice. Only has one working memory, not two (LSTM has long term and short term memory). Has UPDATE GATE (combines learn and forget gate) and runs through COMBINE GATE.

LSTM with peephole connections:

long term memory (LTM) also contributes to decisions made by short term memory (STM) and current event (E). Previously, there's a NN just on those two. Now the NN activates all three with a bias.

Silicon Vanity | Tech lifestyle in Silicon Valley

Ad

Tuesday, March 5, 2019

LSTM, Recurrent Neural Network, GRU Review and Notes - Udacity Deep Learning Nanodegree

Gated Recurrent Unit

LSTM with peephole connections:

No comments:

Post a Comment

React UI, UI UX, Reactstrap React Bootstrap