4.10. Optimizers

4.10.1. Optimizer Base Class

class rlgraph.components.optimizers.optimizer.Optimizer(learning_rate=None, **kwargs)[source]

Bases: rlgraph.components.component.Component

A component that takes a tuple of variables as in-Sockets and optimizes them according to some loss function or another criterion or method.

get_optimizer_variables()[source]

Returns this optimizer’s variables. This extra utility function is necessary because some frameworks like TensorFlow create optimizer variables “late”, e.g. Adam variables, so they cannot be fetched at graph build time yet.

Returns:
list: List of variables.

4.10.2. Local Optimizer

class rlgraph.components.optimizers.local_optimizers.AdadeltaOptimizer(learning_rate, **kwargs)[source]

Bases: rlgraph.components.optimizers.local_optimizers.LocalOptimizer

Adadelta optimizer which adapts learning rate over time:

https://arxiv.org/abs/1212.5701

class rlgraph.components.optimizers.local_optimizers.AdagradOptimizer(learning_rate, **kwargs)[source]

Bases: rlgraph.components.optimizers.local_optimizers.LocalOptimizer

Adaptive gradient optimizer which sets small learning rates for frequently appearing features and large learning rates for rare features:

http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

class rlgraph.components.optimizers.local_optimizers.AdamOptimizer(learning_rate, **kwargs)[source]

Bases: rlgraph.components.optimizers.local_optimizers.LocalOptimizer

Adaptive momentum optimizer: https://arxiv.org/abs/1412.6980

class rlgraph.components.optimizers.local_optimizers.GradientDescentOptimizer(learning_rate, **kwargs)[source]

Bases: rlgraph.components.optimizers.local_optimizers.LocalOptimizer

Classic gradient descent optimizer: “Stochastic Estimation of the Maximum of a Regression Function.” - Kiefer and Wolfowitz, 1952

class rlgraph.components.optimizers.local_optimizers.LocalOptimizer(learning_rate, clip_grad_norm=None, **kwargs)[source]

Bases: rlgraph.components.optimizers.optimizer.Optimizer

A local optimizer performs optimization irrespective of any distributed semantics, i.e. it has no knowledge of other machines and does not implement any communications with them.

get_optimizer_variables()[source]

Returns this optimizer’s variables. This extra utility function is necessary because some frameworks like TensorFlow create optimizer variables “late”, e.g. Adam variables, so they cannot be fetched at graph build time yet.

Returns:
list: List of variables.
class rlgraph.components.optimizers.local_optimizers.NadamOptimizer(learning_rate, **kwargs)[source]

Bases: rlgraph.components.optimizers.local_optimizers.LocalOptimizer

Nesterov-adaptive momentum optimizer which applies Nesterov’s accelerated gradient to Adam:

http://cs229.stanford.edu/proj2015/054_report.pdf

class rlgraph.components.optimizers.local_optimizers.RMSPropOptimizer(learning_rate, **kwargs)[source]

Bases: rlgraph.components.optimizers.local_optimizers.LocalOptimizer

RMSProp Optimizer as discussed by Hinton:

https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

class rlgraph.components.optimizers.local_optimizers.SGDOptimizer(learning_rate, **kwargs)[source]

Bases: rlgraph.components.optimizers.local_optimizers.LocalOptimizer

Stochastic gradient descent optimizer from tf.keras including support for momentum, learning-rate-decay and Nesterov momentum.

4.10.3. Horovod Optimizer

class rlgraph.components.optimizers.horovod_optimizer.HorovodOptimizer(local_optimizer=None, **kwargs)[source]

Bases: rlgraph.components.optimizers.optimizer.Optimizer

This Optimizer provides a wrapper for the horovod optimizer package:

https://github.com/uber/horovod

Horovod is meant to be used as an alternative to distributed TensorFlow as it implements communication in a different way, as explained in the Horovod paper:

arXiv:1802.05799

This Horovod Optimizer expects a local LocalOptimizer spec (tensorflow) as input.