Optimisation¶
Gradient¶
- class mlstatpy.optim.sgd.SGDOptimizer(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)[source][source]
Stochastic gradient descent optimizer with momentum.
- Paramètres:
coef – array, initial coefficient
learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights,
lr_schedule – {“constant”, “adaptive”, “invscaling”}, learning rate schedule for weight updates, “constant” for a constant learning rate given by learning_rate_init. “invscaling” gradually decreases the learning rate learning_rate_ at each time step t using an inverse scaling exponent of power_t. learning_rate_ = learning_rate_init / pow(t, power_t), “adaptive”, keeps the learning rate constant to learning_rate_init as long as the training keeps decreasing. Each time 2 consecutive epochs fail to decrease the training loss by tol, or fail to increase validation score by tol if “early_stopping” is on, the current learning rate is divided by 5.
momentum – float Value of momentum used, must be larger than or equal to 0
power_t – double The exponent for inverse scaling learning rate.
early_th – stops if the error goes below that threshold
min_threshold – lower bound for parameters (can be None)
max_threshold – upper bound for parameters (can be None)
l1 – L1 regularization
l2 – L2 regularization
The class holds the following attributes:
learning_rate: float, the current learning rate
velocity*: array, velocity that are used to update params
Stochastic Gradient Descent applied to linear regression
The following example how to optimize a simple linear regression.
<<<
import numpy from mlstatpy.optim import SGDOptimizer def fct_loss(c, X, y): return numpy.linalg.norm(X @ c - y) ** 2 def fct_grad(c, x, y, i=0): return x * (x @ c - y) * 0.1 coef = numpy.array([0.5, 0.6, -0.7]) X = numpy.random.randn(10, 3) y = X @ coef sgd = SGDOptimizer(numpy.random.randn(3)) sgd.train(X, y, fct_loss, fct_grad, max_iter=15, verbose=True) print("optimized coefficients:", sgd.coef)
>>>
0/15: loss: 13.54 lr=0.1 max(coef): 0.54 l1=0/1.1 l2=0/0.59 1/15: loss: 6.621 lr=0.0302 max(coef): 0.58 l1=0.16/1.4 l2=0.015/0.67 2/15: loss: 2.802 lr=0.0218 max(coef): 0.67 l1=0.13/1.4 l2=0.0089/0.8 3/15: loss: 1.808 lr=0.018 max(coef): 0.82 l1=0.17/1.5 l2=0.012/0.99 4/15: loss: 1.404 lr=0.0156 max(coef): 0.87 l1=0.073/1.5 l2=0.0031/1.1 5/15: loss: 1.124 lr=0.014 max(coef): 0.87 l1=0.089/1.5 l2=0.0032/1.1 6/15: loss: 0.9209 lr=0.0128 max(coef): 0.86 l1=0.041/1.6 l2=0.00057/1.1 7/15: loss: 0.7814 lr=0.0119 max(coef): 0.85 l1=0.069/1.6 l2=0.0023/1.1 8/15: loss: 0.6718 lr=0.0111 max(coef): 0.84 l1=0.061/1.6 l2=0.0017/1.1 9/15: loss: 0.6003 lr=0.0105 max(coef): 0.84 l1=0.011/1.7 l2=6e-05/1.1 10/15: loss: 0.5376 lr=0.00995 max(coef): 0.83 l1=0.043/1.7 l2=0.00076/1.1 11/15: loss: 0.4719 lr=0.00949 max(coef): 0.82 l1=0.037/1.7 l2=0.00083/1.1 12/15: loss: 0.4072 lr=0.00909 max(coef): 0.81 l1=0.065/1.7 l2=0.0014/1.1 13/15: loss: 0.35 lr=0.00874 max(coef): 0.79 l1=0.06/1.7 l2=0.0012/1.1 14/15: loss: 0.3061 lr=0.00842 max(coef): 0.78 l1=0.038/1.7 l2=0.00069/1.1 15/15: loss: 0.2722 lr=0.00814 max(coef): 0.78 l1=0.024/1.7 l2=0.00024/1.1 optimized coefficients: [ 0.284 0.629 -0.781]