What is a gradient?

In deep learning, the slope is the slope of the loss function for each parameter of the bias, which is combined into one as a vector. Alternatively, the slope of the loss function for each parameter of each weight column is calculated and combined into one as a vector.

For bias parameters

If the value of the loss function increases by 0.002 when b1 of layer n is increased by 0.01, the slope is "2 = 0.002 / 0.001".

If the value of the loss function increases by 0.005 when b2 of layer n is increased by 0.01, the slope is "5 = 0.005 / 0.001".

If the value of the loss function increases by 0.003 when b3 of layer n is increased by 0.01, the slope is "3 = 0.003 / 0.001".

The slope is "[2, 5, 3]" because the slope is calculated and combined into one as a vector.

When considering it as an algorithm, it is sufficient to consider only the individual slopes.

For weight parameters

First column

If the value of the loss function increases by 0.002 when w11 of layer n is increased by 0.01, the slope is "2 = 0.002 / 0.001".

If the value of the loss function increases by 0.005 when w21 of layer n is increased by 0.01, the slope is "5 = 0.005 / 0.001".

If the value of the loss function increases by 0.003 when w31 of n layers is increased by 0.01, the slope is "3 = 0.003 / 0.001".

The slope is "[2, 5, 3]" because the slope is calculated and combined into one as a vector.

Second row

If the value of the loss function increases by 0.006 when w12 of the n layer is increased by 0.01, the slope is "6 = 0.006 / 0.001".

If the value of the loss function increases by 0.004 when w22 of the n layer is increased by 0.01, the slope is "4 = 0.004 / 0.001".

If the value of the loss function increases by 0.002 when w32 of n layers is increased by 0.01, the slope is "2 = 0.002 / 0.001".

The slope is "[6, 4, 2]" because the slope is calculated and combined into one as a vector.

As an additional knowledge, the matrix in which each gradient is arranged vertically and connected is called the Jacobian determinant.

2 6
5 4
3 2

When considering it as an algorithm, it is sufficient to consider only the individual slopes. There is no need to be aware of the gradient or Jacobian determinant in the algorithm. Just be aware of it as an array.

Associated Information