What is the parameter update optimization algorithm?

The parameter update optimization algorithm updates the parameters of weight and bias. In some cases, it is an algorithm designed to reach a high percentage of correct answers quickly and also to have a high final percentage of correct answers.

What is Mini Batch SGD?

In the explanation of this site, the stochastic gradient descent method includes mini-batch SGD, but the original meaning is that one with a batch size of 1 is called SGD, and one with multiple batch sizes is called mini-batch SGD. I call.

What is the gradient descent method?

Gradient descent is a stochastic gradient descent that does not randomly sort the training data.

Probabilistic means that the training data is randomly rearranged.

What is the steepest descent method?

The steepest descent method is a gradient descent method in which the batch size is the same as the number of trainings.

"Sudden descent" means that a parameter that causes a large change (large slope) to the loss function, which is an index of error, is updated by subtracting the value considering the learning rate from its magnitude.

How to understand

How to understand the parameter update optimization algorithm.

Randomly sort the training data

It is expressed by the word "stochastic".

Several batch sizes

If the batch size is the same as the number of trainings, it is the steepest descent method, if there is one batch size, it is the gradient descent method (SDG), and if there are multiple batch sizes, it is the mini-batch SGD.

Batch size and parallelization

The great thing about the mini-batch SDG is that it can be parallelized using threads and so on. This can improve performance.

Since each learning is independent, the slope can be calculated independently.

Parameter update optimization algorithm that further improves the mini-batch SDG

There is a parameter optimization algorithm that is an improvement over the mini-batch SDG.

Associated Information