yobx.sklearn.lightgbm.lgbm#

ONNX converters for lightgbm.LGBMClassifier, lightgbm.LGBMRegressor, and lightgbm.LGBMRanker.

The trees are extracted from the fitted booster via booster_.dump_model() and encoded using the ONNX ML TreeEnsembleClassifier / TreeEnsembleRegressor operators (legacy ai.onnx.ml opset ≤ 4) or the unified TreeEnsemble operator (ai.onnx.ml opset ≥ 5).

Binary classification — the raw per-sample margin is passed through a sigmoid function and assembled into a [N, 2] probability matrix.
Multi-class classification — per-class margins are passed through softmax to produce a [N, n_classes] probability matrix.
Regression — raw margin output with an objective-dependent output transform:
- Identity objectives (regression, regression_l1, huber, quantile, mape, …): no transform; raw == prediction.
- Exp objectives (poisson, tweedie): exp(margin); prediction is in positive-real space.
Ranking — raw margin output (identity transform); output shape [N, 1]. Supported objectives: lambdarank, rank_xendcg.

Numerical splits: LightGBM’s condition “go to left child when x ≤ split_condition” maps to BRANCH_LEQ for both ai.onnx.ml opset ≤ 4 and opset ≥ 5 — exact match.

Categorical splits: LightGBM encodes categorical splits as decision_type == '==' with a threshold like '0||1||2'. ONNX only supports single-value BRANCH_EQ comparisons, so each multi-value categorical node is expanded into a chain of single-value checks by _expand_categorical_splits() before flattening. The memoised DFS in _flatten_lgbm_tree() ensures shared subtree references (the left branch of every chain node) are assigned exactly one flat node ID.

yobx.sklearn.lightgbm.lgbm.sklearn_lgbm_classifier(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator, X: str, name: str = 'lgbm_classifier') → Tuple[str, str][source]#

Convert an lightgbm.LGBMClassifier to ONNX.

The converter supports:

Binary classification (n_classes_ == 2) — one tree per boosting round; sigmoid post-processing; output shape [N, 2].
Multi-class classification (n_classes_ > 2) — n_classes trees per round; softmax post-processing; output shape [N, n_classes].

Both ai.onnx.ml legacy (opset ≤ 4) and modern (opset ≥ 5) encodings are emitted based on the active opset in g.

Parameters:

g – the graph builder to add nodes to
sts – shapes dict (passed through, not used internally)
outputs – desired output names [label, probabilities]
estimator – a fitted LGBMClassifier
X – input tensor name
name – prefix for node names added to the graph

Returns:

tuple (label_result_name, proba_result_name)

yobx.sklearn.lightgbm.lgbm.sklearn_lgbm_ranker(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator, X: str, name: str = 'lgbm_ranker') → str[source]#

Convert an lightgbm.LGBMRanker to ONNX.

The raw margin (sum of all tree leaf values) is computed via a TreeEnsembleRegressor / TreeEnsemble node. Ranking objectives always use the identity link, so no output transform is applied.

Supported objectives: lambdarank (default), rank_xendcg.

Parameters:

g – the graph builder to add nodes to
sts – shapes dict (passed through, not used internally)
outputs – desired output names [scores]
estimator – a fitted LGBMRanker
X – input tensor name
name – prefix for node names added to the graph

Returns:

output tensor name (shape [N, 1])

Raises:

NotImplementedError – if the model’s objective is not supported

yobx.sklearn.lightgbm.lgbm.sklearn_lgbm_regressor(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator, X: str, name: str = 'lgbm_regressor') → str[source]#

Convert an lightgbm.LGBMRegressor to ONNX.

The raw margin (sum of all tree leaf values) is computed via a TreeEnsembleRegressor / TreeEnsemble node, and then an objective-dependent output transform is applied to match predict():

Identity (regression, regression_l1, huber, quantile, mape, …): no transform.
poisson, tweedie: exp(margin).

Unsupported objectives raise NotImplementedError.

Parameters:

g – the graph builder to add nodes to
sts – shapes dict (passed through, not used internally)
outputs – desired output names [predictions]
estimator – a fitted LGBMRegressor
X – input tensor name
name – prefix for node names added to the graph

Returns:

output tensor name (shape [N, 1])

Raises:

NotImplementedError – if the model’s objective is not supported