Calculation process to get the final output from the initial input by deep learning

I will explain the calculation process to obtain the final output from the initial input by deep learning.

As a premise, calculation to find the sum of vectors and calculation to find the product of matrices Please understand.

Relationship between input layer, hidden layer, and output layer

In the explanation of deep learning, the figures of the input layer, the hidden layer, and the output layer are always shown. However, this diagram is a conceptual diagram and does not adequately represent the data structure in an actual program.

The information you need to know when writing a program is:

"Number of inputs" and "Number of outputs of hidden layer".

* * * (Input. 3)

Hidden layer 0

* * * * * (Output of hidden layer 0. 5)

Hidden layer 1

* * * (Output of hidden layer 1. Three.)

Hidden layer 2

* * (Hidden layer 2 output. Two. This is the final output.)

All individual data is represented by 32-bit floating point. It is a float type in C language.

Number of inputs

For a 28-pixel x 28-pixel monochrome image, there are 784 float-type inputs. Since the color depth can be expressed from 0 to 255, it can be expressed as a float type value. The float type is a floating point type, but you can also represent an integer by not using a decimal point.

Number of hidden layer outputs

You decide the number of hidden layer outputs yourself. If there are 3 layers, the 0th layer is 100, the 1st layer is 150, the 2nd layer is 120, and so on.

The number of neurons in the neural network corresponds to this number.

Number of outputs

For example, in the case of pattern recognition, the number of outputs is 3 when A, B, and C are judged.

The final number of outputs in the hidden layer is the final number of outputs. In the above example, the last 120 is the final number of outputs.

So, if you decide the number of outputs of the last hidden layer, it will be the number of outputs.

Information on each layer of the hidden layer

Next, I will write about the information of each layer of the hidden layer. Each layer of the hidden layer has parameters called weights and biases. This is to convert m inputs to n outputs.

The weights are expressed as a matrix. Bias is expressed as a vector.

Calculation to obtain output from input using weight and bias

Perl code that converts two inputs into three outputs using weights and biases. This is succinctly calculated using a matrix. Think of add_vec as a matrix sum and mul_mut as a matrix multiplication function.

Think of the weights as a 3-by-2 column-major matrix.

#Details of actual processing
$outputs->[0] = $weights->[0] * $inputs->[0] + $weights->[3] * $inputs->[1] + $biases->[0];
$outputs->[1] = $weights->[1] * $inputs->[0] + $weights->[4] * $inputs->[1] + $biases->[1];
$outputs->[2] = $weights->[2] * $inputs->[0] + $weights->[5] * $inputs->[1] + $biases->[2];

# Matrix representation
$outputs = add_vec (mul_mut ($weights, $inputs), $biases);

Looking at mathematical formulas can be confusing, but it's easy to think of them as simple multiplication, addition, and function calls.

How to determine the shape of the weight and bias parameters for each layer

Write what determines the weight and bias parameters for each layer.

It's simple: the number of inputs, the number of neurons in each layer of the hidden layer, and the number of outputs. Once this is decided, it will be decided automatically.

In the above example, the input is 2 and the output is 3. Then the weight is a 3-by-2 matrix and the bias is a 3-length vector.

With 784 inputs and 100 outputs, the weight is a 100-by-784 matrix and the bias is a 100-length vector.

Weights and biases are dynamic, updated after learning is complete. Please refer to the following articles for the initial values ​​with good weight and bias.

The position where the activation function is applied

The activation function is applied to the output of each layer. The output to which the activation function is applied becomes the input of the next layer.

# Apply activation function
my $new_inputs = [];
for (my $i = 0; $i <@$outputs; $i ++) {
  $new_inputs->[$i] = activate_func ($outputs->[$i]);
}

Get the final output from the initial input with deep learning

Now, let's write a program in Perl that gets the final output from the initial input in deep learning. Use ReLU for the activation function.

Weights and biases can be calculated automatically, but for the sake of convenience, they are written in solid.

The number of inputs is 2, and the number of outputs of each hidden layer is "3, 2". The final output of the hidden layer is the final output.

use strict;
use warnings;

my $first_inputs = [0.1, 0.2];

#Hidden layer weights and biases
my $layers = [
  #Hidden layer 0 layer (2 inputs to 3 outputs)
  {
    weights => [
      0.6, 0.2, 0.4,
      0.4, 0.3, 0.7
    ],,
    weights_rows_length => 3,
    weights_columns_length => 2,
    biases => [0.5, 0.2, 0.8]
  },
  # 1 hidden layer (3 inputs to 2 outputs)
  {
    weights => [
      0.8, 0.2, 0.2,
      0.2, 0.1, 0.6
    ],,
    weights_rows_length => 2,
    weights_columns_length => 3,
    biases => [0.5, 0.1]
  },
];;

# Get the final output from the initial input
my $inputs = $first_inputs;
my $outputs;
for (my $i = 0; $i <@$layers; $i ++) {
  my $layer = $layers->[$i];
  
  # weight
  my $weights = $layer->{weights};
  my $weights_rows_length = $layer->{weights_rows_length};
  my $weights_columns_length = $layer->{weights_columns_length};
  
  #Bias
  my $biases = $layer->{biases};
  
  # Output = Weight Matrix * Input + Bias
  my $mul_weight_inputs = mul_mat ($weights, $weights_rows_length, $weights_columns_length, $inputs);
  $outputs = add_vec ($mul_weight_inputs, $biases);
  
  # Apply activation function
  my $activate_outputs = [];
  for (my $i = 0; $i <@$outputs; $i ++) {
    $activate_outputs->[$i] = relu ($outputs->[$i]);
  }
  
  # Output to next input
  $inputs = $activate_outputs;
}

# Show final output 1.166 0.872
print "@$outputs\n";

#Activation function ReLU
sub relu {
  my ($x) = @_;
  
  my $relu = $x * ($x> 0.0);
  
  return $relu;
}

#Sum of vectors
sub add_vec {
  my ($mul_weight_inputs, $biases) = @_;
  
  my $outputs = [];
  for (my $i = 0; $i <@$mul_weight_inputs; $i ++) {
    $outputs->[$i] = $mul_weight_inputs->[$i] + $biases->[$i];
  }
  
  return $outputs;
}

# Matrix product (matrix and vector multiplication)
sub mul_mat {
  my ($weights, $weights_rows_length, $weights_columns_length, $inputs) = @_;
  
  my $inputs_rows_length = @$inputs;
  my $inputs_columns_length = 1;

  my $outputs = [];
  
  #Calculation of matrix product
  for (my $row = 0; $row <$weights_rows_length; $row ++) {
    for (my $col = 0; $col <$inputs_columns_length; $col ++) {
      for (my $incol = 0; $incol <$weights_columns_length; $incol ++) {
        $outputs->[$row + $col * $inputs_rows_length]
         + = $weights->[$row + $incol * $weights_rows_length] * $inputs->[$incol + $col * $inputs_rows_length];
      }
    }
  }
  
  return $outputs;
}

Associated Information