yobx.sklearn.cluster.birch#

yobx.sklearn.cluster.birch.sklearn_birch(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: Birch, X: str, name: str = 'birch') str | Tuple[str, str][source]#

Converts a sklearn.cluster.Birch into ONNX.

The converter produces two outputs: the predicted cluster labels (equivalent to predict()) and the Euclidean distances from each sample to every subcluster centre (equivalent to transform()).

After fitting, Birch exposes subcluster_centers_ (shape (K, F)), which are the centroids used to assign new samples. Prediction is nearest-centroid assignment based on Euclidean distance.

CDist path (com.microsoft domain available):

When the com.microsoft opset is registered in the graph builder, the pairwise Euclidean distances are computed by a single com.microsoft.CDist node, which ONNX Runtime executes via a fused C++ kernel.

X (N,F)  centers (K,F)
      `---- CDist(metric="euclidean") --► distances (N,K)
                                               │
                      ArgMin(axis=1) ──────────► subcluster_idx (N,)
                                               │
                      Gather(subcluster_labels_) ► labels (N,)

Standard ONNX path (fallback):

When the com.microsoft domain is absent the distances are computed via the squared-distance identity:

||x - c||² = ||x||² - 2·x·cᵀ + ||c||²
X (N,F)
  │
  ├──Mul──ReduceSum(axis=1, keepdims=1)──────────────────────────────► x_sq (N,1)
  │                                                                         │
  └──MatMul(centersᵀ)────────────────────────────────────────────────► cross (N,K)
                                                                            │
c_sq (1,K) ─────────────────────── Add(x_sq) ─── Sub(Mul(2,cross)) ──► sq_dists (N,K)
                                                                            │
                                           Sqrt ──────────────────────► distances (N,K)
                                                                            │
                  ArgMin(axis=1) ──────────────────────────────────► subcluster_idx (N,)
                                                                            │
                  Gather(subcluster_labels_) ──────────────────────► labels (N,)
Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes defined by scikit-learn

  • estimator – a fitted Birch

  • outputs – desired output names; outputs[0] receives the cluster labels and outputs[1] (if present) receives the distances matrix

  • X – input tensor name

  • name – prefix names for the added nodes

Returns:

tuple (labels, distances) when two outputs are requested, otherwise just labels