.gradient.loss_helper¶

experimental_experiment.gradient.loss_helper.add_loss_output(onx: ModelProto, score_name: str = 'squared_error', loss_name: str = 'loss', label_name: str = 'label', weight_name: str | None = None, penalty: Dict[str, float] | None = None, output_index: int | None = None, **kwargs: Dict[str, Any] | None) → ModelProto[source]¶

Modifies an ONNX graph to add operators to score and allow training.

Parameters:

onx – onx graph
score_name – name of the score
loss_name – name of the output loss
label_name – name of the label input
weight_name – None or any value to consider weight while computing loss
penalty – dictionary similar to the following one { weight_name: {‘l1’: alpha, ‘l2’: beta} } or { weight_name: beta}, it adds a L1 and/or L2 penalty to one input or initializer, penalty = $|w| \alpha + w^2 \beta$
output_index – the output used to compute the loss, if None, the function assumes there is only one output, it must be specified if there are more than 1, it can be an integer or a string (output name)
kwargs – additional arguments for losses (see below)

Returns:

modified graph

Possible values for score_name:

‘squared_error’ or ‘l2’: $\sum_i{(f(x_i)-y_i)^2}$ or $\sum_i{w_i (f(x_i)-y_i)^2}$ if weight_name is not None
‘absolute_error’ or ‘l1’: $\sum_i{|f(x_i)-y_i|}$ or $\sum_i{w_i |f(x_i)-y_i|}$ if weight_name is not None
‘elastic’: mixture of losses, kwargs must define l1_weight and l2_weight, undefined, default value are 0.5
‘log’: log loss $(1-yt)\log(1-yp) - yt\log(yp)$ ,
this only works for a binary classification where yp is the predicted probability, yt is the expected probability. yt is expected to be binary, yp is a matrix with two columns, the sum on every line is 1.

Next example shows the loss with L1 and L2 loss.

Next example shows how to add a L2 loss with L1 and L2 penalties on the coefficients.

$digraph{ nodesep=0.05; size=7; ranksep=0.25; orientation=portrait; X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; label [shape=box color=red label="label\nTensorProto.FLOAT\nshape=['', 1]" fontsize=10]; weight [shape=box color=red label="weight\nTensorProto.FLOAT\nshape=['']" fontsize=10]; loss [shape=box color=green label="loss\nTensorProto.FLOAT\nshape=[1, 1]" fontsize=10]; variable [shape=box color=green label="variable\nTensorProto.FLOAT\nshape=['', 1]" fontsize=10]; coef [shape=box label="coef\nfloat32((10, 1))\n[[77.47476 ]\n [ 1.4251633]\n [34.210346 ]\n [61.476..." fontsize=10]; intercept [shape=box label="intercept\nfloat32((1,))\n[2.000019]" fontsize=10]; l1_name [shape=box label="l1_name\nfloat32((1,))\n[0.5]" fontsize=10]; l1_weight_coef [shape=box label="l1_weight_coef\nfloat32((1,))\n[0.5]" fontsize=10]; l1_weight_intercept [shape=box label="l1_weight_intercept\nfloat32((1,))\n[0.5]" fontsize=10]; l2_name [shape=box label="l2_name\nfloat32((1,))\n[0.5]" fontsize=10]; l2_weight_coef [shape=box label="l2_weight_coef\nfloat32((1,))\n[0.5]" fontsize=10]; l2_weight_intercept [shape=box label="l2_weight_intercept\nfloat32((1,))\n[0.5]" fontsize=10]; shape_coef [shape=box label="shape_coef\nint64((1,))\n[-1]" fontsize=10]; shape_intercept [shape=box label="shape_intercept\nint64((1,))\n[-1]" fontsize=10]; shape_tensor [shape=box label="shape_tensor\nint64((2,))\n[-1 1]" fontsize=10]; shapevect [shape=box label="shapevect\nint64((2,))\n[-1 1]" fontsize=10]; multiplied [shape=box label="multiplied" fontsize=10]; MatMul [shape=box style="filled,rounded" color=orange label="MatMul" fontsize=10]; X -> MatMul; coef -> MatMul; MatMul -> multiplied; resh [shape=box label="resh" fontsize=10]; Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; multiplied -> Add; intercept -> Add; Add -> resh; Reshape [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10]; resh -> Reshape; shape_tensor -> Reshape; Reshape -> variable; loss_diff [shape=box label="loss_diff" fontsize=10]; Sub [shape=box style="filled,rounded" color=orange label="Sub" fontsize=10]; variable -> Sub; label -> Sub; Sub -> loss_diff; loss_l2 [shape=box label="loss_l2" fontsize=10]; Mul [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; loss_diff -> Mul; loss_diff -> Mul; Mul -> loss_l2; loss_l1 [shape=box label="loss_l1" fontsize=10]; Abs [shape=box style="filled,rounded" color=orange label="Abs" fontsize=10]; loss_diff -> Abs; Abs -> loss_l1; loss_l1_2 [shape=box label="loss_l1_2" fontsize=10]; Mul1 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; loss_l1 -> Mul1; l1_name -> Mul1; Mul1 -> loss_l1_2; loss_l2_2 [shape=box label="loss_l2_2" fontsize=10]; Mul12 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; loss_l2 -> Mul12; l2_name -> Mul12; Mul12 -> loss_l2_2; final_loss [shape=box label="final_loss" fontsize=10]; Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; loss_l1_2 -> Add; loss_l2_2 -> Add; Add -> final_loss; loss_diff_weight [shape=box label="loss_diff_weight" fontsize=10]; Mul123 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; final_loss -> Mul123; weight -> Mul123; Mul123 -> loss_diff_weight; loss_diff_2 [shape=box label="loss_diff_2" fontsize=10]; ReduceSum [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10]; loss_diff_weight -> ReduceSum; ReduceSum -> loss_diff_2; reshaped_coef [shape=box label="reshaped_coef" fontsize=10]; Reshape [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10]; coef -> Reshape; shape_coef -> Reshape; Reshape -> reshaped_coef; reducedm_coef [shape=box label="reducedm_coef" fontsize=10]; Mul1234 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; reshaped_coef -> Mul1234; reshaped_coef -> Mul1234; Mul1234 -> reducedm_coef; reduced2_coef [shape=box label="reduced2_coef" fontsize=10]; ReduceSum1 [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10]; reducedm_coef -> ReduceSum1; ReduceSum1 -> reduced2_coef; penalty2_coef [shape=box label="penalty2_coef" fontsize=10]; Mul12345 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; reduced2_coef -> Mul12345; l2_weight_coef -> Mul12345; Mul12345 -> penalty2_coef; absolute_coef [shape=box label="absolute_coef" fontsize=10]; Abs1 [shape=box style="filled,rounded" color=orange label="Abs" fontsize=10]; reshaped_coef -> Abs1; Abs1 -> absolute_coef; reduced1_coef [shape=box label="reduced1_coef" fontsize=10]; ReduceSum12 [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10]; absolute_coef -> ReduceSum12; ReduceSum12 -> reduced1_coef; penalty1_coef [shape=box label="penalty1_coef" fontsize=10]; Mul123456 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; reduced1_coef -> Mul123456; l1_weight_coef -> Mul123456; Mul123456 -> penalty1_coef; penalty_coef [shape=box label="penalty_coef" fontsize=10]; Add1 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; penalty1_coef -> Add1; penalty2_coef -> Add1; Add1 -> penalty_coef; reshaped_intercept [shape=box label="reshaped_intercept" fontsize=10]; Reshape1 [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10]; intercept -> Reshape1; shape_intercept -> Reshape1; Reshape1 -> reshaped_intercept; reducedm_intercept [shape=box label="reducedm_intercept" fontsize=10]; Mul1234567 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; reshaped_intercept -> Mul1234567; reshaped_intercept -> Mul1234567; Mul1234567 -> reducedm_intercept; reduced2_intercept [shape=box label="reduced2_intercept" fontsize=10]; ReduceSum123 [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10]; reducedm_intercept -> ReduceSum123; ReduceSum123 -> reduced2_intercept; penalty2_intercept [shape=box label="penalty2_intercept" fontsize=10]; Mul12345678 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; reduced2_intercept -> Mul12345678; l2_weight_intercept -> Mul12345678; Mul12345678 -> penalty2_intercept; absolute_intercept [shape=box label="absolute_intercept" fontsize=10]; Abs12 [shape=box style="filled,rounded" color=orange label="Abs" fontsize=10]; reshaped_intercept -> Abs12; Abs12 -> absolute_intercept; reduced1_intercept [shape=box label="reduced1_intercept" fontsize=10]; ReduceSum1234 [shape=box style="filled,rounded" color=orange label="ReduceSum" fontsize=10]; absolute_intercept -> ReduceSum1234; ReduceSum1234 -> reduced1_intercept; penalty1_intercept [shape=box label="penalty1_intercept" fontsize=10]; Mul123456789 [shape=box style="filled,rounded" color=orange label="Mul" fontsize=10]; reduced1_intercept -> Mul123456789; l1_weight_intercept -> Mul123456789; Mul123456789 -> penalty1_intercept; penalty_intercept [shape=box label="penalty_intercept" fontsize=10]; Add12 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; penalty1_intercept -> Add12; penalty2_intercept -> Add12; Add12 -> penalty_intercept; sumop [shape=box label="sumop" fontsize=10]; Add123 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; penalty_coef -> Add123; penalty_intercept -> Add123; Add123 -> sumop; penalty_reshape [shape=box label="penalty_reshape" fontsize=10]; Reshape12 [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10]; sumop -> Reshape12; shapevect -> Reshape12; Reshape12 -> penalty_reshape; loss_reshape [shape=box label="loss_reshape" fontsize=10]; Reshape123 [shape=box style="filled,rounded" color=orange label="Reshape" fontsize=10]; loss_diff_2 -> Reshape123; shapevect -> Reshape123; Reshape123 -> loss_reshape; Add1234 [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; penalty_reshape -> Add1234; loss_reshape -> Add1234; Add1234 -> loss; }$

experimental_experiment.gradient.loss_helper.get_train_initializer(onx: ModelProto)[source]¶

Returns the list of initializers to train.

Returns:: dictionary {name: (value, tensor)}

The function walk through the list of initializers and returns all tensors with elements from types float or double.

experimental_experiment.gradient.loss_helper.penalty_loss_onnx(name: str, dtype: Any, l1: float | None = None, l2: float | None = None, existing_names: List[str] | None = None)[source]¶

Returns onnx nodes to compute $|w| \alpha + w^2 \beta$ where $\alpha=l1$ and $\beta=l2$ .

Parameters:

name – name of weights
dtype – numpy dtype
l1 – coefficient for L1 norm
l2 – coefficient for L2 norm
existing_names – names already taken in the ONNX graph

Returns:

initializer, nodes