yobx.sklearn.category_encoders.binary_encoder#

yobx.sklearn.category_encoders.binary_encoder.category_encoders_binary_encoder(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: BinaryEncoder, X: str, name: str = 'binary_encoder') str[source]#

Converts a category_encoders.BinaryEncoder into ONNX.

Each categorical column is replaced by a block of binary indicator columns that encode the ordinal index of the category value in base 2 (MSB first). Non-categorical columns pass through unchanged.

X  ──col_j (categorical, K cats)──►  bit_0 (MSB)  (N, 1)
                                      bit_1         (N, 1)
                                      ...
                                      bit_B (LSB)   (N, 1)
                                      Concat(bit_0 ... bit_B, axis=1)──► block (N, B)

X  ──col_k (numerical)──►  unchanged  (N, 1)

Concat(all blocks and pass-through cols, axis=1)──► output (N, F_out)

where B is the number of bits required to represent the largest ordinal in binary (max_ordinal.bit_length(), e.g. 4 categories with ordinals 1–4 give B = 3) and F_out is the total number of output columns.

The conversion reads the fitted ordinal_encoder attribute to determine the known category values and their ordinal assignments.

Unknown categories (values not seen during training):

  • handle_unknown='value' (default): all binary columns for that row are 0.

  • handle_unknown='return_nan': all binary columns for that row are NaN.

Missing values (NaN inputs):

  • handle_missing='value' (default): all binary columns for that row are 0.

  • handle_missing='return_nan': all binary columns for that row are NaN.

Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes defined by scikit-learn

  • outputs – desired output tensor names

  • estimator – a fitted BinaryEncoder

  • X – name of the input tensor (shape (N, F))

  • name – prefix used for names of nodes added by this converter

Returns:

name of the output tensor

Raises:

AssertionError – if estimator is not fitted or type info is missing from the graph