Optimisation¶

Gradient¶

class mlstatpy.optim.sgd.SGDOptimizer(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)[source][source]

Stochastic gradient descent optimizer with momentum.

Paramètres:

coef – array, initial coefficient
learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights,
lr_schedule – {“constant”, “adaptive”, “invscaling”}, learning rate schedule for weight updates, “constant” for a constant learning rate given by learning_rate_init. “invscaling” gradually decreases the learning rate learning_rate_ at each time step t using an inverse scaling exponent of power_t. learning_rate_ = learning_rate_init / pow(t, power_t), “adaptive”, keeps the learning rate constant to learning_rate_init as long as the training keeps decreasing. Each time 2 consecutive epochs fail to decrease the training loss by tol, or fail to increase validation score by tol if “early_stopping” is on, the current learning rate is divided by 5.
momentum – float Value of momentum used, must be larger than or equal to 0
power_t – double The exponent for inverse scaling learning rate.
early_th – stops if the error goes below that threshold
min_threshold – lower bound for parameters (can be None)
max_threshold – upper bound for parameters (can be None)
l1 – L1 regularization
l2 – L2 regularization

The class holds the following attributes:

learning_rate: float, the current learning rate
velocity*: array, velocity that are used to update params

Stochastic Gradient Descent applied to linear regression

The following example how to optimize a simple linear regression.

<<<

import numpy
from mlstatpy.optim import SGDOptimizer


def fct_loss(c, X, y):
    return numpy.linalg.norm(X @ c - y) ** 2


def fct_grad(c, x, y, i=0):
    return x * (x @ c - y) * 0.1


coef = numpy.array([0.5, 0.6, -0.7])
X = numpy.random.randn(10, 3)
y = X @ coef

sgd = SGDOptimizer(numpy.random.randn(3))
sgd.train(X, y, fct_loss, fct_grad, max_iter=15, verbose=True)
print("optimized coefficients:", sgd.coef)

>>>

    0/15: loss: 43.68 lr=0.1 max(coef): 1.2 l1=0/2.5 l2=0/2.3
    1/15: loss: 23.97 lr=0.0302 max(coef): 0.59 l1=0.39/1.7 l2=0.067/0.93
    2/15: loss: 4.844 lr=0.0218 max(coef): 0.58 l1=0.19/1.1 l2=0.017/0.53
    3/15: loss: 3.968 lr=0.018 max(coef): 0.8 l1=0.11/1.6 l2=0.0057/0.99
    4/15: loss: 4.019 lr=0.0156 max(coef): 0.9 l1=0.15/1.7 l2=0.013/1.1
    5/15: loss: 3.659 lr=0.014 max(coef): 0.91 l1=0.059/1.7 l2=0.0014/1.1
    6/15: loss: 3.355 lr=0.0128 max(coef): 0.9 l1=0.16/1.6 l2=0.018/1.1
    7/15: loss: 2.801 lr=0.0119 max(coef): 0.86 l1=0.14/1.6 l2=0.011/1
    8/15: loss: 2.309 lr=0.0111 max(coef): 0.81 l1=0.13/1.5 l2=0.0096/0.9
    9/15: loss: 2.003 lr=0.0105 max(coef): 0.79 l1=0.12/1.4 l2=0.0048/0.82
    10/15: loss: 1.718 lr=0.00995 max(coef): 0.75 l1=0.039/1.3 l2=0.00062/0.76
    11/15: loss: 1.514 lr=0.00949 max(coef): 0.72 l1=0.052/1.2 l2=0.0016/0.71
    12/15: loss: 1.384 lr=0.00909 max(coef): 0.71 l1=0.05/1.2 l2=0.0018/0.67
    13/15: loss: 1.281 lr=0.00874 max(coef): 0.69 l1=0.042/1.1 l2=0.00079/0.64
    14/15: loss: 1.176 lr=0.00842 max(coef): 0.69 l1=0.01/1.1 l2=4.6e-05/0.62
    15/15: loss: 1.107 lr=0.00814 max(coef): 0.68 l1=0.03/1.1 l2=0.00035/0.6
    optimized coefficients: [ 0.376 -0.021 -0.678]