yobx.sklearn.category_encoders.one_hot_encoder#

yobx.sklearn.category_encoders.one_hot_encoder.category_encoders_one_hot_encoder(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: OneHotEncoder, X: str, name: str = 'ce_one_hot_encoder') str[source]#

Converts a category_encoders.OneHotEncoder into ONNX.

The encoder replaces each categorical column with a block of binary indicator columns (one per known category) and passes non-categorical columns through unchanged.

X  ──col_j (categorical, K cats)──►  Equal(c_1)?──Cast(float)──► ind_1 (N,1)
                                      Equal(c_2)?──Cast(float)──► ind_2 (N,1)
                                      ...
                                      Equal(c_K)?──Cast(float)──► ind_K (N,1)
                                      Concat(ind_1,...,ind_K, axis=1)──► block (N,K)

X  ──col_k (numerical)──►  unchanged (N,1)

Concat(all blocks and pass-through cols, axis=1)──► output (N, F_out)

The conversion reads the fitted ordinal_encoder and mapping attributes to determine the known category values and their one-hot positions.

Unknown categories (values not seen during training):

  • handle_unknown='value' (default): the entire block for that row is all-zero (naturally produced by the Equal comparisons returning False).

  • handle_unknown='return_nan': the entire block is NaN. This is detected by checking that no Equal node fired (ReduceMax of indicator values is 0) and, for floating-point inputs, that the value is not itself NaN (NaN inputs always produce a zero block).

Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes defined by scikit-learn

  • outputs – desired output tensor names

  • estimator – a fitted OneHotEncoder

  • X – name of the input tensor (shape (N, F))

  • name – prefix used for names of nodes added by this converter

Returns:

name of the output tensor

Raises:

AssertionError – if estimator is not fitted or type info is missing from the graph