LSTM overcomes the vanishing gradient problem of RNN. Back propagation through time, can make gradient too small. Avoid loss of information
LSTM allows learning across many different steps. 1000 steps.
The cell is fully differentiable. All its functions have a derivative, and hence a gradient. That can be computed. Including: sigmoid, hyperbolic tangent, multiplication, addition. Easy use of backpropagation or SGD to update the weights.
Sigmoid threshold is the key to manage: what goes into the cell, what retains within the cell, what passes the output.