yobx.sklearn.lightgbm.lgbm#

ONNX converters for lightgbm.LGBMClassifier, lightgbm.LGBMRegressor, and lightgbm.LGBMRanker.

The trees are extracted from the fitted booster via booster_.dump_model() and encoded using the ONNX ML TreeEnsembleClassifier / TreeEnsembleRegressor operators (legacy ai.onnx.ml opset ≤ 4) or the unified TreeEnsemble operator (ai.onnx.ml opset ≥ 5).

  • Binary classification — the raw per-sample margin is passed through a sigmoid function and assembled into a [N, 2] probability matrix.

  • Multi-class classification — per-class margins are passed through softmax to produce a [N, n_classes] probability matrix.

  • Regression — raw margin output with an objective-dependent output transform:

    • Identity objectives (regression, regression_l1, huber, quantile, mape, …): no transform; raw == prediction.

    • Exp objectives (poisson, tweedie): exp(margin); prediction is in positive-real space.

  • Ranking — raw margin output (identity transform); output shape [N, 1]. Supported objectives: lambdarank, rank_xendcg.

Numerical splits: LightGBM’s condition “go to left child when x ≤ split_condition” maps to BRANCH_LEQ for both ai.onnx.ml opset ≤ 4 and opset ≥ 5 — exact match.

Categorical splits: LightGBM encodes categorical splits as decision_type == '==' with a threshold like '0||1||2'. ONNX only supports single-value BRANCH_EQ comparisons, so each multi-value categorical node is expanded into a chain of single-value checks by _expand_categorical_splits() before flattening. The memoised DFS in _flatten_lgbm_tree() ensures shared subtree references (the left branch of every chain node) are assigned exactly one flat node ID.

yobx.sklearn.lightgbm.lgbm.sklearn_lgbm_classifier(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator, X: str, name: str = 'lgbm_classifier') Tuple[str, str][source]#

Convert an lightgbm.LGBMClassifier to ONNX.

The converter supports:

  • Binary classification (n_classes_ == 2) — one tree per boosting round; sigmoid post-processing; output shape [N, 2].

  • Multi-class classification (n_classes_ > 2) — n_classes trees per round; softmax post-processing; output shape [N, n_classes].

Both ai.onnx.ml legacy (opset ≤ 4) and modern (opset ≥ 5) encodings are emitted based on the active opset in g.

Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes dict (passed through, not used internally)

  • outputs – desired output names [label, probabilities]

  • estimator – a fitted LGBMClassifier

  • X – input tensor name

  • name – prefix for node names added to the graph

Returns:

tuple (label_result_name, proba_result_name)

yobx.sklearn.lightgbm.lgbm.sklearn_lgbm_ranker(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator, X: str, name: str = 'lgbm_ranker') str[source]#

Convert an lightgbm.LGBMRanker to ONNX.

The raw margin (sum of all tree leaf values) is computed via a TreeEnsembleRegressor / TreeEnsemble node. Ranking objectives always use the identity link, so no output transform is applied.

Supported objectives: lambdarank (default), rank_xendcg.

Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes dict (passed through, not used internally)

  • outputs – desired output names [scores]

  • estimator – a fitted LGBMRanker

  • X – input tensor name

  • name – prefix for node names added to the graph

Returns:

output tensor name (shape [N, 1])

Raises:

NotImplementedError – if the model’s objective is not supported

yobx.sklearn.lightgbm.lgbm.sklearn_lgbm_regressor(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator, X: str, name: str = 'lgbm_regressor') str[source]#

Convert an lightgbm.LGBMRegressor to ONNX.

The raw margin (sum of all tree leaf values) is computed via a TreeEnsembleRegressor / TreeEnsemble node, and then an objective-dependent output transform is applied to match predict():

  • Identity (regression, regression_l1, huber, quantile, mape, …): no transform.

  • poisson, tweedie: exp(margin).

Unsupported objectives raise NotImplementedError.

Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes dict (passed through, not used internally)

  • outputs – desired output names [predictions]

  • estimator – a fitted LGBMRegressor

  • X – input tensor name

  • name – prefix for node names added to the graph

Returns:

output tensor name (shape [N, 1])

Raises:

NotImplementedError – if the model’s objective is not supported