What is the learning rate?

The learning rate in deep learning is a value that specifies the percentage to increase with respect to the slope in order to update the weight and bias parameters.

It's complicated to write in words, so I'll write an example.

An example where you can imagine the learning rate

It was found that increasing the weight w11 by 0.001 reduces the value of the loss function (an index of error, the closer it is to 0, the better) by 0.002. The slope of the loss function for the weight w11 is -20 (-0.002 / 0.001).

It was found that increasing the weight w21 by 0.001 increases the value of the loss function (an index of error, the closer it is to 0, the better) by 0.001. The slope of the loss function for the weight w21 is 10 (0.001 / 0.001).

Updating the deep learning weight parameter does not use the calculated slope value as is. If you use this as it is, you will jump over too much.

Now that we know how much the loss function increases or decreases as we increase or decrease the parameters, let's move a little based on that. The learning rate expresses this "moving a little".

Suppose you set the learning rate to 0.1

The update amount of the weight w11 is "-20 * 0.1 = -2", and this is subtracted from the weight "w11 = w11-(-2)".

The update amount of the weight w11 is "10 * 0.1 = 1", and this is subtracted from the weight "w11 = w11 --1".

The reason for subtraction is that we need to move the value of the loss function in a decreasing direction.

Updating the bias parameters is exactly the same as updating the weights.

The slope of the weight and bias parameters with respect to the loss function can be obtained at once in all layers by using an algorithm called the inverse mispropagation method.

Associated Information