Reverse error propagation method-Reverse error propagation method

I will explain the algorithm that is the most difficult to understand in deep learning, the reverse mispropagation method (reverse error propagation method).

What is the reverse mispropagation method?

The reverse mispropagation method is one of the algorithms to find the slope of the loss function with respect to the weight and bias parameters of each layer.

There is also an easier-to-understand algorithm than the reverse mispropagation method for determining the slope of the loss function for each layer's weight and bias parameters.

In deep learning, performance is important, so think of the sample code as having the reverse mispropagation method from the beginning.

Let's consider the reverse mispropagation method by dividing it into several elements.

Find the slope of the loss function for the individual weight and bias parameters of the final layer

The reverse mispropagation method goes from behind. The final output is the loss function. The loss function is an indicator of error.

First, consider the relationship between the bias parameter of the last layer and the loss function. Think of weights and biases as being defined. Think of vec_add as the product of vectors, mat_mul as the matrix product, and vec_sub as the function to find the difference between the vectors.

# weight
my $weights = [..., ..., ..., ..., ..., ..., ..., ..., ..., ..., ...,. ..,];

#Bias
my $biases = [..., ..., ...];

#Expected output
my $desired_outputs = [1, 0, 0];

# input
my $inputs = [0.3, 0.2, 0, 0.5];

# output
my $outputs = vec_add (mat_mul ($wieghts, $inputs), $biases);

# Activated output
my $activate_outputs = activate ($outputs);

# Loss function result (error index)
my $cost = cost (vec_sub ($desired_outputs, $activate_outputs);

Now suppose you want to see how the loss function works when you move one value of the bias. For example, suppose that when you increase Bias 1 by a small value of 0.001, the loss function decreases by 0.002.

At this time, the slope of the loss function with respect to bias 1 is "-2 = -0.002 / 0.001". Then, in turn, find all the biases and all the weights. The reverse mispropagation method is an algorithm for finding this value quickly.

From here on, I don't think about difficult things anymore. As a software engineer, let's think that this is what you want. We will implement the method derived by AI researchers with mathematical knowledge.

Slope of loss function with respect to bias

First, find the slope of the loss function with respect to the first bias.

# Slope with respect to the loss function of one bias
# = Derivative of activation function (1 output) * Derivative of loss function (1 expected output, 1 activated output);
my $bias_grads0 = activate_derivative ($outputs->[0]) * cost_derivative ($desired_outputs->[0], $activate_outputs->[0]);

Since we want to find the slope of the loss function for all biases, we loop for.

my $biase_grads = [];
for (my $i = 0; $i <@$biases; $i ++) {
  $bias_grads->[$i] = activate_derivative ($outputs->[0]) * cost_derivative ($desired_outputs->[0], $activate_outputs->[0]);
}

Slope of loss function with respect to weight

Find the slope of the loss function with respect to the weight.

# Loss function slope for one weight 0 rows 0 columns = one bias slope found above 0 * one input 0
my $weight_grads0 = $bias_grads->[0] * $inputs->[0];

The weights are represented as a column-major matrix, so the for loop looks like this:

my $weights_grads = [];
for (my $input_index = 0; $input_index <$inputs_length; $input_index ++) {
  for (my $bias_index = 0; $bias_index <$bias_length; $bias_index ++) {
    my $weight_grad_index = $bias_index + ($input_index * $biases_length);
    $weight_grads->[$weight_grad_index] = $biase_grads->[$biase_index] * $inputs->[$input_index];
  }
}

This can also be expressed as a matrix multiplication of the bias as a vertical vector and the input as a horizontal vector.

#Bias slope
1
2
3

# input
4 5 6 7

#Slope of loss function with respect to weight
4 5 6 7
8 10 12 14
12 15 18 21

The calculation content on this page is currently undergoing trial and error.

Find the slope of the loss function for the individual weight and bias parameters of the middle layer

In the reverse mispropagation method, the layer is traced backwards to find the slope of the weight and bias, but the slope of the bias found above is used.

# weight
my $weights = [..., ..., ..., ..., ..., ..., ..., ..., ..., ..., ...,. ..,];

#Bias
my $biases = [..., ..., ...];

# input
my $inputs = [0.3, 0.2, 0, 0.5];

# output
my $outputs = vec_add (mat_mul ($wieghts, $inputs), $biases);

# Activated output
my $activate_outputs = activate ($outputs);

Slope of loss function with respect to bias

Loss function slope with respect to bias 0 = Inner product of "slope of next layer bias 0-n" and "weight of next layer 0 columns 0-n rows"
Slope of loss function with respect to bias 1 = Dot product of "slope of bias of next layer 0 to n" and "weight of next layer 1 column 0 to n rows"

Slope of loss function with respect to weight

Calculating the slope of the loss function for weights is the same as the method found in the last layer.

(The calculations and code in this article are trial and error.)

Associated Information