yobx.sklearn.ensemble.hist_gradient_boosting#

Converter for sklearn.ensemble.HistGradientBoostingClassifier and sklearn.ensemble.HistGradientBoostingRegressor.

The ONNX graph mirrors the model’s prediction pipeline:

raw_prediction = sum(tree_values for all trees) + baseline_prediction
# regression:        raw_prediction  →  output (N, 1)
# binary cls:        Sigmoid(raw)    →  [1-p, p],  ArgMax → label
# multiclass:        Softmax(raw)    →  proba,     ArgMax → label

Two encoding paths are supported:

  • Legacy (ai.onnx.ml opset ≤ 4): TreeEnsembleRegressor with aggregate_function="SUM" and base_values.

  • Modern (ai.onnx.ml opset 5): TreeEnsemble with aggregate_function=1 (SUM) and base_values_as_tensor.

Both paths raise NotImplementedError when the model contains categorical splits (is_categorical == 1 in any tree node), as the ONNX ML operator set does not support bitset-based categorical splits.

yobx.sklearn.ensemble.hist_gradient_boosting.sklearn_hgb_classifier(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: HistGradientBoostingClassifier, X: str, name: str = 'hgb_classifier') Tuple[str, str][source]#

Converts a sklearn.ensemble.HistGradientBoostingClassifier to ONNX.

When ai.onnx.ml opset 5 (or later) is active the unified TreeEnsemble operator is used; otherwise the legacy TreeEnsembleRegressor is emitted.

Binary classification — the raw sum (one logit per sample) passes through a Sigmoid; the resulting probability p for class 1 is concatenated as [1-p, p] to match predict_proba.

Multiclass — the raw sums (one logit per class) pass through a Softmax along axis 1.

In both cases the predicted label is derived via ArgMax and a Gather into the classes_ array.

Parameters:
  • g – graph builder

  • sts – shapes provided by scikit-learn

  • outputs – desired output names (label, probabilities)

  • estimator – fitted HistGradientBoostingClassifier

  • X – input tensor name

  • name – node-name prefix

Returns:

tuple (label_name, proba_name)

Raises:

NotImplementedError – if the model contains categorical splits

yobx.sklearn.ensemble.hist_gradient_boosting.sklearn_hgb_regressor(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: HistGradientBoostingRegressor, X: str, name: str = 'hgb_regressor') str[source]#

Converts a sklearn.ensemble.HistGradientBoostingRegressor to ONNX.

When ai.onnx.ml opset 5 (or later) is active the unified TreeEnsemble operator is used; otherwise the legacy TreeEnsembleRegressor is emitted.

The prediction formula is:

raw = sum(tree.predict(X) for tree in _predictors) + _baseline_prediction
output = raw          # shape (N, 1)

When the input is float64 the output is cast back to float64 (both ONNX ML tree operators always output float32).

Parameters:
  • g – graph builder

  • sts – shapes provided by scikit-learn

  • outputs – desired output names

  • estimator – fitted HistGradientBoostingRegressor

  • X – input tensor name

  • name – node-name prefix

Returns:

output tensor name (shape [N, 1])

Raises:

NotImplementedError – if the model contains categorical splits