Hyperparameters in RNN
Yoshua Bengio: learning rate is the most important hyperparameter. Good starting point = 0.01. Also frequently seen: 0.1, 0.01, 0.001, 0.00001, 0.000001
Intuition for starting small (important): If our learning rate is perfect - the multiplier is the best, then in rare ideal scenario we will land at the optimal point. Any learning rate smaller than the perfect rate, will still converge, and would not overshot the optimal point. If learning rate is too large, will never converge (if it is more than twice the optimal rate for example. If it is close enough to the ideal rate, it may still converge. ). Intuition that is must start small. Udacity Deep Learning Nanodegree Part 5 RNN Hyperparameter No.3 Learning Rate gives a great visual illustration.
If learning rate is too small, may take too long to converge, wasting valuable computing resources.