.gradient.grad_helper¶

class experimental_experiment.gradient.grad_helper.DerivativeOptions(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Options defining how to build the onnx graph of the gradients.

Zero: default option, all options are disabled
KeepYieldOp: keeps the operator YieldOp in the graph, see @see fn onnx_derivative
KeepOutputs: keeps the output of the original graph
FillGrad: does not add any output to specify the gradient of the output but assumes it is one
Loss: the function assumes the loss was added to the graph

experimental_experiment.gradient.grad_helper.onnx_derivative(onx: ~onnx.onnx_ml_pb2.ModelProto, weights: ~typing.List[str] | None = None, inputs: ~typing.List[str] | None = None, options: ~experimental_experiment.gradient.grad_helper.DerivativeOptions = <DerivativeOptions.Zero: 0>, loss: str | None = None, label: str | None = None, path_name: str | None = None, verbose: int = 0) → ModelProto[source]¶

Builds the gradient for an onnx graph.

Parameters:

onx – onnx graph
weights – gradient against those weights, None for all real weights
inputs – gradient against inputs, None for all real inputs
options – options of type @see cl DerivativeOptions
loss – loss output in case a loss was added in the graph, options must be equal to DerivativeOptions.Loss
label – if loss is specified, then the label must be specified as well
path_name – if options equal to DerivativeOptions.Loss, the gradient is saved to that path
verbose – verbosity

Returns:

onnx graph

The function calls OrtModuleGraphBuilderConfiguration from onnxruntime-training. This graph is meant to be used with OrtGradientForwardBackward and includes operator YieldOp. That’s the graph looks this way:

$digraph{ nodesep=0.05; size=7; ranksep=0.25; orientation=portrait; X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Addcst [shape=box color=red label="Ad_Addcst\nTensorProto.FLOAT\nshape=[1]" fontsize=10]; X_grad [shape=box color=green label="X_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Addcst_grad [shape=box color=green label="Ad_Addcst_grad\nTensorProto.FLOAT\nshape=[1]" fontsize=10]; Y [shape=box label="Y" fontsize=10]; Ad_Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; X -> Ad_Add; Ad_Addcst -> Ad_Add; Ad_Add -> Y; Y_grad [shape=box label="Y_grad" fontsize=10]; YieldOp [shape=box style="filled,rounded" color=orange label="YieldOp\nfull_shape_outputs=[0]" fontsize=10]; Y -> YieldOp; YieldOp -> Y_grad; Ad_Add_Grad_Shape_Ad_Addcst [shape=box label="Ad_Add_Grad_Shape_Ad_Addcst" fontsize=10]; Ad_Add_Grad_Shape_Ad_Addcst_rhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; Ad_Addcst -> Ad_Add_Grad_Shape_Ad_Addcst_rhs; Ad_Add_Grad_Shape_Ad_Addcst_rhs -> Ad_Add_Grad_Shape_Ad_Addcst; Ad_Add_Grad_Shape_X [shape=box label="Ad_Add_Grad_Shape_X" fontsize=10]; Ad_Add_Grad_Shape_X_lhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; X -> Ad_Add_Grad_Shape_X_lhs; Ad_Add_Grad_Shape_X_lhs -> Ad_Add_Grad_Shape_X; Ad_Add_Grad_ReduceAxes_X [shape=box label="Ad_Add_Grad_ReduceAxes_X" fontsize=10]; Ad_Add_Grad_ReduceAxes_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceAxes_Ad_Addcst" fontsize=10]; Ad_Add_Grad_BroadcastGradientArgs_2 [shape=box style="filled,rounded" color=orange label="BroadcastGradientArgs" fontsize=10]; Ad_Add_Grad_Shape_X -> Ad_Add_Grad_BroadcastGradientArgs_2; Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_BroadcastGradientArgs_2; Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_X; Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_Ad_Addcst; Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst" fontsize=10]; Ad_Add_Grad_ReduceSum_5 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10]; Y_grad -> Ad_Add_Grad_ReduceSum_5; Ad_Add_Grad_ReduceAxes_Ad_Addcst -> Ad_Add_Grad_ReduceSum_5; Ad_Add_Grad_ReduceSum_5 -> Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst; Ad_Add_Grad_Reshape_6 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10]; Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst -> Ad_Add_Grad_Reshape_6; Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_Reshape_6; Ad_Add_Grad_Reshape_6 -> Ad_Addcst_grad; Ad_Add_Grad_ReduceSum_Y_grad_for_X [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_X" fontsize=10]; Ad_Add_Grad_ReduceSum_3 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10]; Y_grad -> Ad_Add_Grad_ReduceSum_3; Ad_Add_Grad_ReduceAxes_X -> Ad_Add_Grad_ReduceSum_3; Ad_Add_Grad_ReduceSum_3 -> Ad_Add_Grad_ReduceSum_Y_grad_for_X; Ad_Add_Grad_Reshape_4 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10]; Ad_Add_Grad_ReduceSum_Y_grad_for_X -> Ad_Add_Grad_Reshape_4; Ad_Add_Grad_Shape_X -> Ad_Add_Grad_Reshape_4; Ad_Add_Grad_Reshape_4 -> X_grad; }$

These operators are the outputs of the initial graph and must be replaced by the gradient of these outputs to compute the gradient of the weights and the inputs. After they are replaced, it looks this way:

$digraph{ nodesep=0.05; size=7; ranksep=0.25; orientation=portrait; X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Addcst [shape=box color=red label="Ad_Addcst\nTensorProto.FLOAT\nshape=[1]" fontsize=10]; Y_grad [shape=box color=red label="Y_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; X_grad [shape=box color=green label="X_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Addcst_grad [shape=box color=green label="Ad_Addcst_grad\nTensorProto.FLOAT\nshape=[1]" fontsize=10]; Ad_Add_Grad_Shape_Ad_Addcst [shape=box label="Ad_Add_Grad_Shape_Ad_Addcst" fontsize=10]; Ad_Add_Grad_Shape_Ad_Addcst_rhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; Ad_Addcst -> Ad_Add_Grad_Shape_Ad_Addcst_rhs; Ad_Add_Grad_Shape_Ad_Addcst_rhs -> Ad_Add_Grad_Shape_Ad_Addcst; Ad_Add_Grad_Shape_X [shape=box label="Ad_Add_Grad_Shape_X" fontsize=10]; Ad_Add_Grad_Shape_X_lhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; X -> Ad_Add_Grad_Shape_X_lhs; Ad_Add_Grad_Shape_X_lhs -> Ad_Add_Grad_Shape_X; Ad_Add_Grad_ReduceAxes_X [shape=box label="Ad_Add_Grad_ReduceAxes_X" fontsize=10]; Ad_Add_Grad_ReduceAxes_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceAxes_Ad_Addcst" fontsize=10]; Ad_Add_Grad_BroadcastGradientArgs_2 [shape=box style="filled,rounded" color=orange label="BroadcastGradientArgs" fontsize=10]; Ad_Add_Grad_Shape_X -> Ad_Add_Grad_BroadcastGradientArgs_2; Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_BroadcastGradientArgs_2; Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_X; Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_Ad_Addcst; Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst" fontsize=10]; Ad_Add_Grad_ReduceSum_5 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10]; Y_grad -> Ad_Add_Grad_ReduceSum_5; Ad_Add_Grad_ReduceAxes_Ad_Addcst -> Ad_Add_Grad_ReduceSum_5; Ad_Add_Grad_ReduceSum_5 -> Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst; Ad_Add_Grad_Reshape_6 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10]; Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst -> Ad_Add_Grad_Reshape_6; Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_Reshape_6; Ad_Add_Grad_Reshape_6 -> Ad_Addcst_grad; Ad_Add_Grad_ReduceSum_Y_grad_for_X [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_X" fontsize=10]; Ad_Add_Grad_ReduceSum_3 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10]; Y_grad -> Ad_Add_Grad_ReduceSum_3; Ad_Add_Grad_ReduceAxes_X -> Ad_Add_Grad_ReduceSum_3; Ad_Add_Grad_ReduceSum_3 -> Ad_Add_Grad_ReduceSum_Y_grad_for_X; Ad_Add_Grad_Reshape_4 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10]; Ad_Add_Grad_ReduceSum_Y_grad_for_X -> Ad_Add_Grad_Reshape_4; Ad_Add_Grad_Shape_X -> Ad_Add_Grad_Reshape_4; Ad_Add_Grad_Reshape_4 -> X_grad; }$

The user can still compute the outputs.

$digraph{ nodesep=0.05; size=7; ranksep=0.25; orientation=portrait; X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Addcst [shape=box color=red label="Ad_Addcst\nTensorProto.FLOAT\nshape=[1]" fontsize=10]; Y_grad [shape=box color=red label="Y_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; X_grad [shape=box color=green label="X_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Addcst_grad [shape=box color=green label="Ad_Addcst_grad\nTensorProto.FLOAT\nshape=[1]" fontsize=10]; Y [shape=box color=green label="Y\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; X -> Ad_Add; Ad_Addcst -> Ad_Add; Ad_Add -> Y; Ad_Add_Grad_Shape_Ad_Addcst [shape=box label="Ad_Add_Grad_Shape_Ad_Addcst" fontsize=10]; Ad_Add_Grad_Shape_Ad_Addcst_rhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; Ad_Addcst -> Ad_Add_Grad_Shape_Ad_Addcst_rhs; Ad_Add_Grad_Shape_Ad_Addcst_rhs -> Ad_Add_Grad_Shape_Ad_Addcst; Ad_Add_Grad_Shape_X [shape=box label="Ad_Add_Grad_Shape_X" fontsize=10]; Ad_Add_Grad_Shape_X_lhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; X -> Ad_Add_Grad_Shape_X_lhs; Ad_Add_Grad_Shape_X_lhs -> Ad_Add_Grad_Shape_X; Ad_Add_Grad_ReduceAxes_X [shape=box label="Ad_Add_Grad_ReduceAxes_X" fontsize=10]; Ad_Add_Grad_ReduceAxes_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceAxes_Ad_Addcst" fontsize=10]; Ad_Add_Grad_BroadcastGradientArgs_2 [shape=box style="filled,rounded" color=orange label="BroadcastGradientArgs" fontsize=10]; Ad_Add_Grad_Shape_X -> Ad_Add_Grad_BroadcastGradientArgs_2; Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_BroadcastGradientArgs_2; Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_X; Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_Ad_Addcst; Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst" fontsize=10]; Ad_Add_Grad_ReduceSum_5 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10]; Y_grad -> Ad_Add_Grad_ReduceSum_5; Ad_Add_Grad_ReduceAxes_Ad_Addcst -> Ad_Add_Grad_ReduceSum_5; Ad_Add_Grad_ReduceSum_5 -> Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst; Ad_Add_Grad_Reshape_6 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10]; Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst -> Ad_Add_Grad_Reshape_6; Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_Reshape_6; Ad_Add_Grad_Reshape_6 -> Ad_Addcst_grad; Ad_Add_Grad_ReduceSum_Y_grad_for_X [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_X" fontsize=10]; Ad_Add_Grad_ReduceSum_3 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10]; Y_grad -> Ad_Add_Grad_ReduceSum_3; Ad_Add_Grad_ReduceAxes_X -> Ad_Add_Grad_ReduceSum_3; Ad_Add_Grad_ReduceSum_3 -> Ad_Add_Grad_ReduceSum_Y_grad_for_X; Ad_Add_Grad_Reshape_4 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10]; Ad_Add_Grad_ReduceSum_Y_grad_for_X -> Ad_Add_Grad_Reshape_4; Ad_Add_Grad_Shape_X -> Ad_Add_Grad_Reshape_4; Ad_Add_Grad_Reshape_4 -> X_grad; }$

The input gradient can be filled with a constant matrix filled with one and with the expected shape.

$digraph{ nodesep=0.05; size=7; ranksep=0.25; orientation=portrait; X [shape=box color=red label="X\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Addcst [shape=box color=red label="Ad_Addcst\nTensorProto.FLOAT\nshape=[1]" fontsize=10]; X_grad [shape=box color=green label="X_grad\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Addcst_grad [shape=box color=green label="Ad_Addcst_grad\nTensorProto.FLOAT\nshape=[1]" fontsize=10]; Y [shape=box color=green label="Y\nTensorProto.FLOAT\nshape=['', 10]" fontsize=10]; Ad_Add [shape=box style="filled,rounded" color=orange label="Add" fontsize=10]; X -> Ad_Add; Ad_Addcst -> Ad_Add; Ad_Add -> Y; Y_shape [shape=box label="Y_shape" fontsize=10]; Shape [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; Y -> Shape; Shape -> Y_shape; Y_grad [shape=box label="Y_grad" fontsize=10]; ConstantOfShape [shape=box style="filled,rounded" color=orange label="ConstantOfShape\nvalue=[1.]" fontsize=10]; Y_shape -> ConstantOfShape; ConstantOfShape -> Y_grad; Ad_Add_Grad_Shape_Ad_Addcst [shape=box label="Ad_Add_Grad_Shape_Ad_Addcst" fontsize=10]; Ad_Add_Grad_Shape_Ad_Addcst_rhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; Ad_Addcst -> Ad_Add_Grad_Shape_Ad_Addcst_rhs; Ad_Add_Grad_Shape_Ad_Addcst_rhs -> Ad_Add_Grad_Shape_Ad_Addcst; Ad_Add_Grad_Shape_X [shape=box label="Ad_Add_Grad_Shape_X" fontsize=10]; Ad_Add_Grad_Shape_X_lhs [shape=box style="filled,rounded" color=orange label="Shape" fontsize=10]; X -> Ad_Add_Grad_Shape_X_lhs; Ad_Add_Grad_Shape_X_lhs -> Ad_Add_Grad_Shape_X; Ad_Add_Grad_ReduceAxes_X [shape=box label="Ad_Add_Grad_ReduceAxes_X" fontsize=10]; Ad_Add_Grad_ReduceAxes_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceAxes_Ad_Addcst" fontsize=10]; Ad_Add_Grad_BroadcastGradientArgs_2 [shape=box style="filled,rounded" color=orange label="BroadcastGradientArgs" fontsize=10]; Ad_Add_Grad_Shape_X -> Ad_Add_Grad_BroadcastGradientArgs_2; Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_BroadcastGradientArgs_2; Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_X; Ad_Add_Grad_BroadcastGradientArgs_2 -> Ad_Add_Grad_ReduceAxes_Ad_Addcst; Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst" fontsize=10]; Ad_Add_Grad_ReduceSum_5 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10]; Y_grad -> Ad_Add_Grad_ReduceSum_5; Ad_Add_Grad_ReduceAxes_Ad_Addcst -> Ad_Add_Grad_ReduceSum_5; Ad_Add_Grad_ReduceSum_5 -> Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst; Ad_Add_Grad_Reshape_6 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10]; Ad_Add_Grad_ReduceSum_Y_grad_for_Ad_Addcst -> Ad_Add_Grad_Reshape_6; Ad_Add_Grad_Shape_Ad_Addcst -> Ad_Add_Grad_Reshape_6; Ad_Add_Grad_Reshape_6 -> Ad_Addcst_grad; Ad_Add_Grad_ReduceSum_Y_grad_for_X [shape=box label="Ad_Add_Grad_ReduceSum_Y_grad_for_X" fontsize=10]; Ad_Add_Grad_ReduceSum_3 [shape=box style="filled,rounded" color=orange label="ReduceSum\nkeepdims=1\nnoop_with_empty_axes=1" fontsize=10]; Y_grad -> Ad_Add_Grad_ReduceSum_3; Ad_Add_Grad_ReduceAxes_X -> Ad_Add_Grad_ReduceSum_3; Ad_Add_Grad_ReduceSum_3 -> Ad_Add_Grad_ReduceSum_Y_grad_for_X; Ad_Add_Grad_Reshape_4 [shape=box style="filled,rounded" color=orange label="Reshape\nallowzero=0" fontsize=10]; Ad_Add_Grad_ReduceSum_Y_grad_for_X -> Ad_Add_Grad_Reshape_4; Ad_Add_Grad_Shape_X -> Ad_Add_Grad_Reshape_4; Ad_Add_Grad_Reshape_4 -> X_grad; }$

experimental_experiment.gradient.grad_helper.random_feed(inputs, batch: int = 10, empty_dimension: int = 1) → Dict[str, ndarray][source]¶

Creates a dictionary of random inputs.

Parameters:

batch – dimension to use as batch dimension if unknown
empty_dimension – if a dimension is null, replaces it by this value

Returns:

dictionary