yobx.sklearn.preprocessing.kbins_discretizer#

yobx.sklearn.preprocessing.kbins_discretizer.sklearn_kbins_discretizer(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: KBinsDiscretizer, X: str, name: str = 'kbins') → str[source]#

Converts a sklearn.preprocessing.KBinsDiscretizer into ONNX.

Supported values of the encode hyperparameter:

'ordinal' — each feature is replaced by its 0-based integer bin index, cast to the input floating-point dtype. Output shape: (N, F).
'onehot-dense' and 'onehot' — each feature is one-hot encoded into n_bins_[j] columns. The one-hot blocks are concatenated along axis 1. Output shape: (N, sum(n_bins_)).

The bin index for feature j is computed by counting how many interior thresholds (bin_edges_[j][1:-1]) are less than or equal to the sample value:

X (N, F)
  │
  └─Unsqueeze(axis=2)──► X_exp (N, F, 1)
                              │
thresholds (1, F, T) ─────────┤
                              ▼
                      GreaterOrEqual ──► (N, F, T)  bool
                              │
                          Cast(int64)
                              │
                       ReduceSum(axis=2) ──► bin_indices (N, F)  int64
                              │
                         Min / Max clip
                              │
                     [ordinal]  Cast(float) ──► output (N, F)
               [onehot(-dense)] OneHot + Concat ──► output (N, sum(n_bins_))

Interior thresholds for features that have fewer bins than the maximum are padded with +inf so that the excess comparisons always yield False and contribute 0 to the sum.

Parameters:

g – the graph builder to add nodes to
sts – shapes defined by scikit-learn
estimator – a fitted KBinsDiscretizer
outputs – desired output names
X – input tensor name
name – prefix name for the added nodes

Returns:

output name