Adam --Parameter update optimization algorithm with improved SGD

Adam is a parameter update optimization algorithm that improves SGD. The feature is that the learning rate part in SGD is updated every time you train. Another feature is that it takes into account the amount of updates made the previous time.

Let's write Adam's algorithm for updating one parameter in Perl. This is a sample for updating bias. The same is true for weights.

I wrote Adam in Perl code.

#Adam

use strict;
use warnings;

#Hyperparameters
my $biase = 0.14;
my $learning_rate = 0.001;
my $much_small_value = 1e-8;
my $before_moment_weight = 0.9;
my $before_velocity_weight = 0.999;

# Moment value
my $moment = 0;

#Velocity value
my $velocity = 0;
for (my $i = 0; $i <10; $i ++) {
  my $grad = calc_grad ();
  $moment = $before_moment_weight * $moment + (1-$before_moment_weight) * $grad;
  $velocity = $before_velocity_weight * $velocity + (1-$before_velocity_weight) * $grad * $grad;
  
  my $cur_moment = $moment / (1-$before_moment_weight);
  my $cur_velocity = $velocity / (1-$before_velocity_weight);
  
  $bias-= ($learning_rate / (sqrt($cur_velocity) + $much_small_value)) * $cur_moment;
}

# Find the slope
sub calc_grad {
  
  #Returns a convenient value
  my $grad = rand;
  
  return $grad;
}

The so-called default recommendations are: However, this is an initial value and needs to be set optimally in order for the correct answer rate to rise quickly and the final correct answer rate to be high.

my $learning_rate = 0.001;
my $much_small_value = 1e-8;
my $before_moment_weight = 0.9;
my $before_velocity_weight = 0.999;

What does Adam's formula mean?

I'm not sure. I found the article "Optimization by Adam" easy to understand.

Associated Information