yobx.sklearn.cluster.bisecting_kmeans#
- yobx.sklearn.cluster.bisecting_kmeans.sklearn_bisecting_kmeans(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: BisectingKMeans, X: str, name: str = 'bisecting_kmeans') str | Tuple[str, str][source]#
Converts a
sklearn.cluster.BisectingKMeansinto ONNX.The converter produces two outputs:
labels — predicted cluster index for each sample, replicating
predict()by traversing the internal bisection tree.distances — Euclidean distances from each sample to every cluster centre, replicating
transform().
Label assignment via tree traversal
BisectingKMeansassigns labels by walking down the hierarchical bisection tree rather than finding the globally nearest centre. At each internal node the sample is routed to the child whose (X_mean-adjusted) centre is closest. Reached at a leaf, the leaf’s label is assigned.Each internal node defines a linear decision boundary:
go_left iff X · (right_eff - left_eff) ≤ (||right_eff||² - ||left_eff||²) / 2
which requires only one
MatMulper internal node. A sample reaches a leaf when all conditions on the root-to-leaf path are satisfied. Since leaves are mutually exclusive and exhaustive, the final label is computed as:label = Σ_leaf leaf_label × Cast(leaf_mask, int64)
Distance computation (transform output):
When the
com.microsoftopset is available theCDistoperator is used directly (metric="euclidean"). Otherwise the distances are computed manually:||x - c||² = ||x||² - 2·x·cᵀ + ||c||² → Sqrt
Full graph structure (three-cluster example):
X (N,F) │ ├──Mul──ReduceSum(axis=1,keepdims=1)──────────────────────► x_sq (N,1) │ │ └──MatMul(centersᵀ)──────────────────────────────────────► cross (N,K) │ c_sq (1,K) ──────────────── Add(x_sq) ── Sub(2·cross) ──► sq_dists (N,K) │ Sqrt ──────────────────► distances (N,K) │ ├──MatMul(dir_0)──LessOrEqual(thr_0) ──► go_left_0 (N,) │ … └── label accumulation via And / Cast / Mul / Add ──► labels (N,)- Parameters:
g – the graph builder to add nodes to
sts – shapes defined by scikit-learn
estimator – a fitted
BisectingKMeansoutputs – desired output names;
outputs[0]receives the cluster labels andoutputs[1](if present) receives the distances matrixX – input tensor name
name – prefix names for the added nodes
- Returns:
tuple
(labels, distances)when two outputs are requested, otherwise justlabels