Initial value of Xavier

Let's find the initial value of Xavier. The initial value of Xavier is random number according to normal distribution, the average is 0, and the standard deviation is "sqrt(1 / number of inputs)". ..

The number of inputs is the value of m in the conversion from m to n.

Initial value of weight when sigmoid function is mainly used as activation function It seems to be used for. By choosing a good initial value, the value to which the activation function is applied after the conversion from m to n in each layer will vary moderately.

Get the initial value of #Xivier
sub xavier_init_value {
  my ($inputs_length) = @_;
  return randn (0, sqrt(1 / $inputs_length));

Initialize an array of weights using the initial values ​​of Xivier

use strict;
use warnings;

# Function to find random numbers that follow a normal distribution
# $m is mean, $sigma is standard deviation,
sub randn {
  my ($m, $sigma) = @_;
  my ($r1, $r2) = (rand(), rand());
  while ($r1 == 0) {$r1 = rand();}
  return($sigma * sqrt(-2 * log($r1)) * sin(2 * 3.14159265359 * $r2)) + $m;

#Create initial value of Xavier
sub create_xavier_init_value {
  my ($inputs_length) = @_;
  return randn (0, sqrt(1 / $inputs_length));

#Create an array with the default value of Xavier
sub array_create_xavier_init_value {
  my ($array_length, $inputs_length) = @_;
  my $nums_out = [];
  for (my $i = 0; $i <$array_length; $i ++) {
    $nums_out->[$i] = create_xavier_init_value ($inputs_length);
  return $nums_out;

# If the number of inputs is 728 and the number of outputs is 30, the length of the array of the matrix is ​​"728 * 30".
my $inputs_length = 728;
my $outputs_length = 30;
my $weights_mat = {
  rows_length => $outputs_length,
  columns_length => $inputs_length,
my $weights_values_length = $inputs_length * $outputs_length;
$weights_mat->{values} = array_create_xavier_init_value ($weights_values_length, $inputs_length);

use Data::Dumper;
print Dumper $weights_mat;

Initial values ​​other than the initial values ​​of Xavier

When using the ReLU function as an activation function, it is better to use Initial value of He.

Associated Information