yobx.sklearn.neighbors.kneighbors_transformer#

yobx.sklearn.neighbors.kneighbors_transformer.sklearn_kneighbors_transformer(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: KNeighborsTransformer, X: str, name: str = 'knn_transform') → str[source]#

Converts a sklearn.neighbors.KNeighborsTransformer into ONNX.

The converter produces a dense (N, M) output tensor where N is the number of query samples and M is the number of training samples.

mode='connectivity' — entry (i, j) is 1.0 when training sample j is among the n_neighbors nearest neighbours of query point i, and 0.0 otherwise.
mode='distance' — entry (i, j) is the distance from query point i to training sample j when j is among the n_neighbors nearest neighbours, and 0.0 otherwise.

Note

sklearn.neighbors.KNeighborsTransformer.transform() returns a sparse CSR matrix. The ONNX graph returns the equivalent dense matrix (i.e. what you would obtain by calling .toarray() on the sparse result).

Note

sklearn’s transform() uses n_neighbors + 1 neighbours internally for mode='distance' to account for self-connections when transforming the training set. This converter always uses exactly n_neighbors neighbours for both modes. The output matches sklearn.neighbors.kneighbors(X, n_neighbors) applied to the query points. For the training set with mode='distance', one of the n_neighbors slots may be the query point itself (distance 0.0 scattered at the diagonal), which is indistinguishable from a non-neighbour entry.

Supported metrics: "sqeuclidean", "euclidean", "cosine", "manhattan" (aliases: "cityblock", "l1"), "chebyshev", "minkowski". The "euclidean" and "sqeuclidean" metrics use com.microsoft.CDist when that domain is registered; all other metrics use the standard-ONNX path.

Full graph structure (standard-ONNX path):

X (N, F)
  │
  └─── pairwise distances ─────────────────────────────────────► dists (N, M)
                                                                       │
                        TopK(k, axis=1, largest=0) ──► values (N, k),  indices (N, k)
                                                                       │
             zeros (1, M)  ──► Expand(N, M) ──► zeros_NM (N, M)        │
                                                      │                │
                  ScatterElements(axis=1) ─────────────────────► output (N, M)

Parameters:

g – graph builder
sts – shapes defined by scikit-learn
outputs – desired output names
estimator – a fitted KNeighborsTransformer
X – input tensor name
name – prefix for node names

Returns:

output tensor name — dense (N, M) matrix

Raises:

NotImplementedError – if opset < 13 or the metric is not supported