gradient

gradient.grad_helper

DerivativeOptions

class experimental_experiment.gradient.grad_helper.DerivativeOptions(value)[source]

Options defining how to build the onnx graph of the gradients.

  • Zero: default option, all options are disabled

  • KeepYieldOp: keeps the operator YieldOp in the graph, see @see fn onnx_derivative

  • KeepOutputs: keeps the output of the original graph

  • FillGrad: does not add any output to specify the gradient of the output but assumes it is one

  • Loss: the function assumes the loss was added to the graph

onnx_derivative

experimental_experiment.gradient.grad_helper.onnx_derivative(onx: ModelProto, weights: List[str] | None = None, inputs: List[str] | None = None, options: DerivativeOptions = DerivativeOptions.Zero, loss: str | None = None, label: str | None = None, path_name: str | None = None, verbose: int = 0) ModelProto[source]

Builds the gradient for an onnx graph.

Parameters:
  • onx – onnx graph

  • weights – gradient against those weights, None for all real weights

  • inputs – gradient against inputs, None for all real inputs

  • options – options of type @see cl DerivativeOptions

  • loss – loss output in case a loss was added in the graph, options must be equal to DerivativeOptions.Loss

  • label – if loss is specified, then the label must be specified as well

  • path_name – if options equal to DerivativeOptions.Loss, the gradient is saved to that path

  • verbose – verbosity

Returns:

onnx graph

The function calls OrtModuleGraphBuilderConfiguration from onnxruntime-training. This graph is meant to be used with @see cl OrtGradientForwardBackward and includes operator YieldOp. That’s the graph looks this way:

digraph{
  ranksep=0.25;
  orientation=portrait;
  nodesep=0.05;
  size=7;

  X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  Ad_Addcst [shape=box color=red label="Ad_Addcst\nTensorProto.FLOAT\nshape=[1]" fontsize=10];

  X_grad [shape=box color=green label="X_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  Ad_Addcst_grad [shape=box color=green label="Ad_Addcst_grad\nTensorProto.FLOAT\nshape=[1]" fontsize=10];


  Y [shape=box label="Y" fontsize=10];
  Ad_Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  X -> Ad_Add;
  Ad_Addcst -> Ad_Add;
  Ad_Add -> Y;

  Y_grad [shape=box label="Y_grad" fontsize=10];
  YieldOp [shape=box style="filled,rounded" color=orange label="YieldOp\nfull_shape_outputs=[0]" fontsize=10];
  Y -> YieldOp;
  YieldOp -> Y_grad;

  Ad_Add_Grad_Shape_Ad_Addcst [shape=box label="Ad_Add_Grad_Shape_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_Shape_Ad_Addcst_rhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  Ad_Addcst -> Ad_Add_Grad_Shape_Ad_Addcst_rhs;
  Ad_Add_Grad_Shape_Ad_Addcst_rhs -> Ad_Add_Grad_Shape_Ad_Addcst;

  Ad_Add_Grad_Shape_X [shape=box label="Ad_Add_Grad_Shape_X" fontsize=10];
  Ad_Add_Grad_Shape_X_lhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  X -> Ad_Add_Grad_Shape_X_lhs;
  Ad_Add_Grad_Shape_X_lhs -> Ad_Add_Grad_Shape_X;

  Ad_Add_Grad_ReduceAxes_X [shape=box label="Ad_Add_Grad_ReduceAxes_X" fontsize=10];
  Ad_Add_Grad_ReduceAxes_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceAxes_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_BroadcastGradientArgs_2 [shape=box style="filled,rounded" color=orange label="BroadcastGradientArgs" fontsize=10];
  Ad_Add_Grad_Shape_X -> Ad_Add_Grad_BroadcastGradientArgs_2;
  Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_BroadcastGradientArgs_2;
  Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_X;
  Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_Ad_Addcst;

  Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_ReduceSum_5 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10];
  Y_grad -> Ad_Add_Grad_ReduceSum_5;
  Ad_Add_Grad_ReduceAxes_Ad_Addcst -> Ad_Add_Grad_ReduceSum_5;
  Ad_Add_Grad_ReduceSum_5 -> Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst;

  Ad_Add_Grad_Reshape_6 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10];
  Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst -> Ad_Add_Grad_Reshape_6;
  Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_Reshape_6;
  Ad_Add_Grad_Reshape_6 -> Ad_Addcst_grad;

  Ad_Add_Grad_ReduceSum_Y_grad_for_X [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_X" fontsize=10];
  Ad_Add_Grad_ReduceSum_3 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10];
  Y_grad -> Ad_Add_Grad_ReduceSum_3;
  Ad_Add_Grad_ReduceAxes_X -> Ad_Add_Grad_ReduceSum_3;
  Ad_Add_Grad_ReduceSum_3 -> Ad_Add_Grad_ReduceSum_Y_grad_for_X;

  Ad_Add_Grad_Reshape_4 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10];
  Ad_Add_Grad_ReduceSum_Y_grad_for_X -> Ad_Add_Grad_Reshape_4;
  Ad_Add_Grad_Shape_X -> Ad_Add_Grad_Reshape_4;
  Ad_Add_Grad_Reshape_4 -> X_grad;
}

These operators are the outputs of the initial graph and must be replaced by the gradient of these outputs to compute the gradient of the weights and the inputs. After they are replaced, it looks this way:

digraph{
  ranksep=0.25;
  orientation=portrait;
  nodesep=0.05;
  size=7;

  X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  Ad_Addcst [shape=box color=red label="Ad_Addcst\nTensorProto.FLOAT\nshape=[1]" fontsize=10];
  Y_grad [shape=box color=red label="Y_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];

  X_grad [shape=box color=green label="X_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  Ad_Addcst_grad [shape=box color=green label="Ad_Addcst_grad\nTensorProto.FLOAT\nshape=[1]" fontsize=10];


  Ad_Add_Grad_Shape_Ad_Addcst [shape=box label="Ad_Add_Grad_Shape_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_Shape_Ad_Addcst_rhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  Ad_Addcst -> Ad_Add_Grad_Shape_Ad_Addcst_rhs;
  Ad_Add_Grad_Shape_Ad_Addcst_rhs -> Ad_Add_Grad_Shape_Ad_Addcst;

  Ad_Add_Grad_Shape_X [shape=box label="Ad_Add_Grad_Shape_X" fontsize=10];
  Ad_Add_Grad_Shape_X_lhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  X -> Ad_Add_Grad_Shape_X_lhs;
  Ad_Add_Grad_Shape_X_lhs -> Ad_Add_Grad_Shape_X;

  Ad_Add_Grad_ReduceAxes_X [shape=box label="Ad_Add_Grad_ReduceAxes_X" fontsize=10];
  Ad_Add_Grad_ReduceAxes_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceAxes_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_BroadcastGradientArgs_2 [shape=box style="filled,rounded" color=orange label="BroadcastGradientArgs" fontsize=10];
  Ad_Add_Grad_Shape_X -> Ad_Add_Grad_BroadcastGradientArgs_2;
  Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_BroadcastGradientArgs_2;
  Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_X;
  Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_Ad_Addcst;

  Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_ReduceSum_5 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10];
  Y_grad -> Ad_Add_Grad_ReduceSum_5;
  Ad_Add_Grad_ReduceAxes_Ad_Addcst -> Ad_Add_Grad_ReduceSum_5;
  Ad_Add_Grad_ReduceSum_5 -> Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst;

  Ad_Add_Grad_Reshape_6 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10];
  Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst -> Ad_Add_Grad_Reshape_6;
  Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_Reshape_6;
  Ad_Add_Grad_Reshape_6 -> Ad_Addcst_grad;

  Ad_Add_Grad_ReduceSum_Y_grad_for_X [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_X" fontsize=10];
  Ad_Add_Grad_ReduceSum_3 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10];
  Y_grad -> Ad_Add_Grad_ReduceSum_3;
  Ad_Add_Grad_ReduceAxes_X -> Ad_Add_Grad_ReduceSum_3;
  Ad_Add_Grad_ReduceSum_3 -> Ad_Add_Grad_ReduceSum_Y_grad_for_X;

  Ad_Add_Grad_Reshape_4 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10];
  Ad_Add_Grad_ReduceSum_Y_grad_for_X -> Ad_Add_Grad_Reshape_4;
  Ad_Add_Grad_Shape_X -> Ad_Add_Grad_Reshape_4;
  Ad_Add_Grad_Reshape_4 -> X_grad;
}

The user can still compute the outputs.

digraph{
  ranksep=0.25;
  orientation=portrait;
  nodesep=0.05;
  size=7;

  X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  Ad_Addcst [shape=box color=red label="Ad_Addcst\nTensorProto.FLOAT\nshape=[1]" fontsize=10];
  Y_grad [shape=box color=red label="Y_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];

  X_grad [shape=box color=green label="X_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  Ad_Addcst_grad [shape=box color=green label="Ad_Addcst_grad\nTensorProto.FLOAT\nshape=[1]" fontsize=10];
  Y [shape=box color=green label="Y\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];


  Ad_Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  X -> Ad_Add;
  Ad_Addcst -> Ad_Add;
  Ad_Add -> Y;

  Ad_Add_Grad_Shape_Ad_Addcst [shape=box label="Ad_Add_Grad_Shape_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_Shape_Ad_Addcst_rhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  Ad_Addcst -> Ad_Add_Grad_Shape_Ad_Addcst_rhs;
  Ad_Add_Grad_Shape_Ad_Addcst_rhs -> Ad_Add_Grad_Shape_Ad_Addcst;

  Ad_Add_Grad_Shape_X [shape=box label="Ad_Add_Grad_Shape_X" fontsize=10];
  Ad_Add_Grad_Shape_X_lhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  X -> Ad_Add_Grad_Shape_X_lhs;
  Ad_Add_Grad_Shape_X_lhs -> Ad_Add_Grad_Shape_X;

  Ad_Add_Grad_ReduceAxes_X [shape=box label="Ad_Add_Grad_ReduceAxes_X" fontsize=10];
  Ad_Add_Grad_ReduceAxes_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceAxes_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_BroadcastGradientArgs_2 [shape=box style="filled,rounded" color=orange label="BroadcastGradientArgs" fontsize=10];
  Ad_Add_Grad_Shape_X -> Ad_Add_Grad_BroadcastGradientArgs_2;
  Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_BroadcastGradientArgs_2;
  Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_X;
  Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_Ad_Addcst;

  Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_ReduceSum_5 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10];
  Y_grad -> Ad_Add_Grad_ReduceSum_5;
  Ad_Add_Grad_ReduceAxes_Ad_Addcst -> Ad_Add_Grad_ReduceSum_5;
  Ad_Add_Grad_ReduceSum_5 -> Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst;

  Ad_Add_Grad_Reshape_6 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10];
  Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst -> Ad_Add_Grad_Reshape_6;
  Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_Reshape_6;
  Ad_Add_Grad_Reshape_6 -> Ad_Addcst_grad;

  Ad_Add_Grad_ReduceSum_Y_grad_for_X [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_X" fontsize=10];
  Ad_Add_Grad_ReduceSum_3 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10];
  Y_grad -> Ad_Add_Grad_ReduceSum_3;
  Ad_Add_Grad_ReduceAxes_X -> Ad_Add_Grad_ReduceSum_3;
  Ad_Add_Grad_ReduceSum_3 -> Ad_Add_Grad_ReduceSum_Y_grad_for_X;

  Ad_Add_Grad_Reshape_4 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10];
  Ad_Add_Grad_ReduceSum_Y_grad_for_X -> Ad_Add_Grad_Reshape_4;
  Ad_Add_Grad_Shape_X -> Ad_Add_Grad_Reshape_4;
  Ad_Add_Grad_Reshape_4 -> X_grad;
}

The input gradient can be filled with a constant matrix filled with one and with the expected shape.

digraph{
  ranksep=0.25;
  orientation=portrait;
  nodesep=0.05;
  size=7;

  X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  Ad_Addcst [shape=box color=red label="Ad_Addcst\nTensorProto.FLOAT\nshape=[1]" fontsize=10];

  X_grad [shape=box color=green label="X_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  Ad_Addcst_grad [shape=box color=green label="Ad_Addcst_grad\nTensorProto.FLOAT\nshape=[1]" fontsize=10];
  Y [shape=box color=green label="Y\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];


  Ad_Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  X -> Ad_Add;
  Ad_Addcst -> Ad_Add;
  Ad_Add -> Y;

  Y_shape [shape=box label="Y_shape" fontsize=10];
  Shape [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  Y -> Shape;
  Shape -> Y_shape;

  Y_grad [shape=box label="Y_grad" fontsize=10];
  ConstantOfShape [shape=box style="filled,rounded" color=orange label="ConstantOfShape\nvalue=[1.]" fontsize=10];
  Y_shape -> ConstantOfShape;
  ConstantOfShape -> Y_grad;

  Ad_Add_Grad_Shape_Ad_Addcst [shape=box label="Ad_Add_Grad_Shape_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_Shape_Ad_Addcst_rhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  Ad_Addcst -> Ad_Add_Grad_Shape_Ad_Addcst_rhs;
  Ad_Add_Grad_Shape_Ad_Addcst_rhs -> Ad_Add_Grad_Shape_Ad_Addcst;

  Ad_Add_Grad_Shape_X [shape=box label="Ad_Add_Grad_Shape_X" fontsize=10];
  Ad_Add_Grad_Shape_X_lhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10];
  X -> Ad_Add_Grad_Shape_X_lhs;
  Ad_Add_Grad_Shape_X_lhs -> Ad_Add_Grad_Shape_X;

  Ad_Add_Grad_ReduceAxes_X [shape=box label="Ad_Add_Grad_ReduceAxes_X" fontsize=10];
  Ad_Add_Grad_ReduceAxes_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceAxes_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_BroadcastGradientArgs_2 [shape=box style="filled,rounded" color=orange label="BroadcastGradientArgs" fontsize=10];
  Ad_Add_Grad_Shape_X -> Ad_Add_Grad_BroadcastGradientArgs_2;
  Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_BroadcastGradientArgs_2;
  Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_X;
  Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_Ad_Addcst;

  Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst" fontsize=10];
  Ad_Add_Grad_ReduceSum_5 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10];
  Y_grad -> Ad_Add_Grad_ReduceSum_5;
  Ad_Add_Grad_ReduceAxes_Ad_Addcst -> Ad_Add_Grad_ReduceSum_5;
  Ad_Add_Grad_ReduceSum_5 -> Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst;

  Ad_Add_Grad_Reshape_6 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10];
  Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst -> Ad_Add_Grad_Reshape_6;
  Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_Reshape_6;
  Ad_Add_Grad_Reshape_6 -> Ad_Addcst_grad;

  Ad_Add_Grad_ReduceSum_Y_grad_for_X [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_X" fontsize=10];
  Ad_Add_Grad_ReduceSum_3 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10];
  Y_grad -> Ad_Add_Grad_ReduceSum_3;
  Ad_Add_Grad_ReduceAxes_X -> Ad_Add_Grad_ReduceSum_3;
  Ad_Add_Grad_ReduceSum_3 -> Ad_Add_Grad_ReduceSum_Y_grad_for_X;

  Ad_Add_Grad_Reshape_4 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10];
  Ad_Add_Grad_ReduceSum_Y_grad_for_X -> Ad_Add_Grad_Reshape_4;
  Ad_Add_Grad_Shape_X -> Ad_Add_Grad_Reshape_4;
  Ad_Add_Grad_Reshape_4 -> X_grad;
}

gradient.loss_helper

add_loss_output

experimental_experiment.gradient.loss_helper.add_loss_output(onx: ModelProto, score_name: str = 'squared_error', loss_name: str = 'loss', label_name: str = 'label', weight_name: str | None = None, penalty: Dict[str, float] | None = None, output_index: int | None = None, **kwargs: Dict[str, Any] | None) ModelProto[source]

Modifies an ONNX graph to add operators to score and allow training.

Parameters:
  • onx – onx graph

  • score_name – name of the score

  • loss_name – name of the output loss

  • label_name – name of the label input

  • weight_name – None or any value to consider weight while computing loss

  • penalty – dictionary similar to the following one { weight_name: {‘l1’: alpha, ‘l2’: beta} } or { weight_name: beta}, it adds a L1 and/or L2 penalty to one input or initializer, penalty = |w| \alpha + w^2 \beta

  • output_index – the output used to compute the loss, if None, the function assumes there is only one output, it must be specified if there are more than 1, it can be an integer or a string (output name)

  • kwargs – additional arguments for losses (see below)

Returns:

modified graph

Possible values for score_name:

  • ‘squared_error’ or ‘l2’: \sum_i{(f(x_i)-y_i)^2} or \sum_i{w_i (f(x_i)-y_i)^2} if weight_name is not None

  • ‘absolute_error’ or ‘l1’: \sum_i{|f(x_i)-y_i|} or \sum_i{w_i |f(x_i)-y_i|} if weight_name is not None

  • ‘elastic’: mixture of losses, kwargs must define l1_weight and l2_weight, undefined, default value are 0.5

  • ‘log’: log loss (1-yt)\log(1-yp) - yt\log(yp),

    this only works for a binary classification where yp is the predicted probability, yt is the expected probability. yt is expected to be binary, yp is a matrix with two columns, the sum on every line is 1.

Next example shows the loss with L1 and L2 loss.

digraph{
  ranksep=0.25;
  orientation=portrait;
  nodesep=0.05;
  size=7;

  X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  label [shape=box color=red label="label\nTensorProto.FLOAT\nshape=['', 1]" fontsize=10];
  weight [shape=box color=red label="weight\nTensorProto.FLOAT\nshape=['']" fontsize=10];

  loss [shape=box color=green label="loss\nTensorProto.FLOAT\nshape=[1, 1]" fontsize=10];
  variable [shape=box color=green label="variable\nTensorProto.FLOAT\nshape=['', 1]" fontsize=10];

  coef [shape=box label="coef\nfloat32((10, 1))\n[[77.474815 ]\n [ 1.425148 ]\n [34.21035  ]\n [61.476..." fontsize=10];
  intercept [shape=box label="intercept\nfloat32((1,))\n[1.9999976]" fontsize=10];
  l1_name [shape=box label="l1_name\nfloat32((1,))\n[0.1]" fontsize=10];
  l2_name [shape=box label="l2_name\nfloat32((1,))\n[0.9]" fontsize=10];
  shape_tensor [shape=box label="shape_tensor\nint64((2,))\n[-1  1]" fontsize=10];

  multiplied [shape=box label="multiplied" fontsize=10];
  MatMul [shape=box style="filled,rounded" color=orange label="MatMul" fontsize=10];
  X -> MatMul;
  coef -> MatMul;
  MatMul -> multiplied;

  resh [shape=box label="resh" fontsize=10];
  Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  multiplied -> Add;
  intercept -> Add;
  Add -> resh;

  Reshape [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10];
  resh -> Reshape;
  shape_tensor -> Reshape;
  Reshape -> variable;

  loss_diff [shape=box label="loss_diff" fontsize=10];
  Sub [shape=box style="filled,rounded" color=orange label="Sub" fontsize=10];
  variable -> Sub;
  label -> Sub;
  Sub -> loss_diff;

  loss_l2 [shape=box label="loss_l2" fontsize=10];
  Mul [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  loss_diff -> Mul;
  loss_diff -> Mul;
  Mul -> loss_l2;

  loss_l1 [shape=box label="loss_l1" fontsize=10];
  Abs [shape=box style="filled,rounded" color=orange label="Abs" fontsize=10];
  loss_diff -> Abs;
  Abs -> loss_l1;

  loss_l1_2 [shape=box label="loss_l1_2" fontsize=10];
  Mul1 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  loss_l1 -> Mul1;
  l1_name -> Mul1;
  Mul1 -> loss_l1_2;

  loss_l2_2 [shape=box label="loss_l2_2" fontsize=10];
  Mul12 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  loss_l2 -> Mul12;
  l2_name -> Mul12;
  Mul12 -> loss_l2_2;

  final_loss [shape=box label="final_loss" fontsize=10];
  Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  loss_l1_2 -> Add;
  loss_l2_2 -> Add;
  Add -> final_loss;

  loss_diff_weight [shape=box label="loss_diff_weight" fontsize=10];
  Mul123 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  final_loss -> Mul123;
  weight -> Mul123;
  Mul123 -> loss_diff_weight;

  ReduceSum [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10];
  loss_diff_weight -> ReduceSum;
  ReduceSum -> loss;
}

Next example shows how to add a L2 loss with L1 and L2 penalties on the coefficients.

digraph{
  ranksep=0.25;
  orientation=portrait;
  nodesep=0.05;
  size=7;

  X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10];
  label [shape=box color=red label="label\nTensorProto.FLOAT\nshape=['', 1]" fontsize=10];
  weight [shape=box color=red label="weight\nTensorProto.FLOAT\nshape=['']" fontsize=10];

  loss [shape=box color=green label="loss\nTensorProto.FLOAT\nshape=[1, 1]" fontsize=10];
  variable [shape=box color=green label="variable\nTensorProto.FLOAT\nshape=['', 1]" fontsize=10];

  coef [shape=box label="coef\nfloat32((10, 1))\n[[77.474754 ]\n [ 1.4251976]\n [34.2104   ]\n [61.477..." fontsize=10];
  intercept [shape=box label="intercept\nfloat32((1,))\n[2.0000095]" fontsize=10];
  l1_name [shape=box label="l1_name\nfloat32((1,))\n[0.5]" fontsize=10];
  l1_weight_coef [shape=box label="l1_weight_coef\nfloat32((1,))\n[0.5]" fontsize=10];
  l1_weight_intercept [shape=box label="l1_weight_intercept\nfloat32((1,))\n[0.5]" fontsize=10];
  l2_name [shape=box label="l2_name\nfloat32((1,))\n[0.5]" fontsize=10];
  l2_weight_coef [shape=box label="l2_weight_coef\nfloat32((1,))\n[0.5]" fontsize=10];
  l2_weight_intercept [shape=box label="l2_weight_intercept\nfloat32((1,))\n[0.5]" fontsize=10];
  shape_coef [shape=box label="shape_coef\nint64((1,))\n[-1]" fontsize=10];
  shape_intercept [shape=box label="shape_intercept\nint64((1,))\n[-1]" fontsize=10];
  shape_tensor [shape=box label="shape_tensor\nint64((2,))\n[-1  1]" fontsize=10];
  shapevect [shape=box label="shapevect\nint64((2,))\n[-1  1]" fontsize=10];

  multiplied [shape=box label="multiplied" fontsize=10];
  MatMul [shape=box style="filled,rounded" color=orange label="MatMul" fontsize=10];
  X -> MatMul;
  coef -> MatMul;
  MatMul -> multiplied;

  resh [shape=box label="resh" fontsize=10];
  Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  multiplied -> Add;
  intercept -> Add;
  Add -> resh;

  Reshape [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10];
  resh -> Reshape;
  shape_tensor -> Reshape;
  Reshape -> variable;

  loss_diff [shape=box label="loss_diff" fontsize=10];
  Sub [shape=box style="filled,rounded" color=orange label="Sub" fontsize=10];
  variable -> Sub;
  label -> Sub;
  Sub -> loss_diff;

  loss_l2 [shape=box label="loss_l2" fontsize=10];
  Mul [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  loss_diff -> Mul;
  loss_diff -> Mul;
  Mul -> loss_l2;

  loss_l1 [shape=box label="loss_l1" fontsize=10];
  Abs [shape=box style="filled,rounded" color=orange label="Abs" fontsize=10];
  loss_diff -> Abs;
  Abs -> loss_l1;

  loss_l1_2 [shape=box label="loss_l1_2" fontsize=10];
  Mul1 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  loss_l1 -> Mul1;
  l1_name -> Mul1;
  Mul1 -> loss_l1_2;

  loss_l2_2 [shape=box label="loss_l2_2" fontsize=10];
  Mul12 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  loss_l2 -> Mul12;
  l2_name -> Mul12;
  Mul12 -> loss_l2_2;

  final_loss [shape=box label="final_loss" fontsize=10];
  Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  loss_l1_2 -> Add;
  loss_l2_2 -> Add;
  Add -> final_loss;

  loss_diff_weight [shape=box label="loss_diff_weight" fontsize=10];
  Mul123 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  final_loss -> Mul123;
  weight -> Mul123;
  Mul123 -> loss_diff_weight;

  loss_diff_2 [shape=box label="loss_diff_2" fontsize=10];
  ReduceSum [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10];
  loss_diff_weight -> ReduceSum;
  ReduceSum -> loss_diff_2;

  reshaped_coef [shape=box label="reshaped_coef" fontsize=10];
  Reshape [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10];
  coef -> Reshape;
  shape_coef -> Reshape;
  Reshape -> reshaped_coef;

  reducedm_coef [shape=box label="reducedm_coef" fontsize=10];
  Mul1234 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  reshaped_coef -> Mul1234;
  reshaped_coef -> Mul1234;
  Mul1234 -> reducedm_coef;

  reduced2_coef [shape=box label="reduced2_coef" fontsize=10];
  ReduceSum1 [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10];
  reducedm_coef -> ReduceSum1;
  ReduceSum1 -> reduced2_coef;

  penalty2_coef [shape=box label="penalty2_coef" fontsize=10];
  Mul12345 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  reduced2_coef -> Mul12345;
  l2_weight_coef -> Mul12345;
  Mul12345 -> penalty2_coef;

  absolute_coef [shape=box label="absolute_coef" fontsize=10];
  Abs1 [shape=box style="filled,rounded" color=orange label="Abs" fontsize=10];
  reshaped_coef -> Abs1;
  Abs1 -> absolute_coef;

  reduced1_coef [shape=box label="reduced1_coef" fontsize=10];
  ReduceSum12 [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10];
  absolute_coef -> ReduceSum12;
  ReduceSum12 -> reduced1_coef;

  penalty1_coef [shape=box label="penalty1_coef" fontsize=10];
  Mul123456 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  reduced1_coef -> Mul123456;
  l1_weight_coef -> Mul123456;
  Mul123456 -> penalty1_coef;

  penalty_coef [shape=box label="penalty_coef" fontsize=10];
  Add1 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  penalty1_coef -> Add1;
  penalty2_coef -> Add1;
  Add1 -> penalty_coef;

  reshaped_intercept [shape=box label="reshaped_intercept" fontsize=10];
  Reshape1 [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10];
  intercept -> Reshape1;
  shape_intercept -> Reshape1;
  Reshape1 -> reshaped_intercept;

  reducedm_intercept [shape=box label="reducedm_intercept" fontsize=10];
  Mul1234567 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  reshaped_intercept -> Mul1234567;
  reshaped_intercept -> Mul1234567;
  Mul1234567 -> reducedm_intercept;

  reduced2_intercept [shape=box label="reduced2_intercept" fontsize=10];
  ReduceSum123 [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10];
  reducedm_intercept -> ReduceSum123;
  ReduceSum123 -> reduced2_intercept;

  penalty2_intercept [shape=box label="penalty2_intercept" fontsize=10];
  Mul12345678 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  reduced2_intercept -> Mul12345678;
  l2_weight_intercept -> Mul12345678;
  Mul12345678 -> penalty2_intercept;

  absolute_intercept [shape=box label="absolute_intercept" fontsize=10];
  Abs12 [shape=box style="filled,rounded" color=orange label="Abs" fontsize=10];
  reshaped_intercept -> Abs12;
  Abs12 -> absolute_intercept;

  reduced1_intercept [shape=box label="reduced1_intercept" fontsize=10];
  ReduceSum1234 [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10];
  absolute_intercept -> ReduceSum1234;
  ReduceSum1234 -> reduced1_intercept;

  penalty1_intercept [shape=box label="penalty1_intercept" fontsize=10];
  Mul123456789 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10];
  reduced1_intercept -> Mul123456789;
  l1_weight_intercept -> Mul123456789;
  Mul123456789 -> penalty1_intercept;

  penalty_intercept [shape=box label="penalty_intercept" fontsize=10];
  Add12 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  penalty1_intercept -> Add12;
  penalty2_intercept -> Add12;
  Add12 -> penalty_intercept;

  sumop [shape=box label="sumop" fontsize=10];
  Add123 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  penalty_coef -> Add123;
  penalty_intercept -> Add123;
  Add123 -> sumop;

  penalty_reshape [shape=box label="penalty_reshape" fontsize=10];
  Reshape12 [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10];
  sumop -> Reshape12;
  shapevect -> Reshape12;
  Reshape12 -> penalty_reshape;

  loss_reshape [shape=box label="loss_reshape" fontsize=10];
  Reshape123 [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10];
  loss_diff_2 -> Reshape123;
  shapevect -> Reshape123;
  Reshape123 -> loss_reshape;

  Add1234 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10];
  penalty_reshape -> Add1234;
  loss_reshape -> Add1234;
  Add1234 -> loss;
}

get_train_initializer

experimental_experiment.gradient.loss_helper.get_train_initializer(onx: ModelProto)[source]

Returns the list of initializers to train.

Returns:

dictionary {name: (value, tensor)}

The function walk through the list of initializers and returns all tensors with elements from types float or double.

penalty_loss_onnx

experimental_experiment.gradient.loss_helper.penalty_loss_onnx(name: str, dtype: Any, l1: float | None = None, l2: float | None = None, existing_names: List[str] | None = None)[source]

Returns onnx nodes to compute |w| \alpha + w^2 \beta where \alpha=l1 and \beta=l2.

Parameters:
  • name – name of weights

  • dtype – numpy dtype

  • l1 – coefficient for L1 norm

  • l2 – coefficient for L2 norm

  • existing_names – names already taken in the ONNX graph

Returns:

initializer, nodes