yobx.sklearn.neighbors.kernel_density#

yobx.sklearn.neighbors.kernel_density.sklearn_kernel_density(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: KernelDensity, X: str, name: str = 'kde') → str[source]#

Converts a sklearn.neighbors.KernelDensity into ONNX.

The converter implements score_samples(), which returns the log-density at each query point:

log_density(x) = log( (1/N) · Σᵢ k( ‖x − xᵢ‖ / h ) ) − log(norm_h)

where k is the unnormalized kernel, h the bandwidth, and norm_h the kernel-specific normalization factor that does not depend on the query x.

Rearranging, the output equals:

output(x) = log(Σᵢ k( ‖x − xᵢ‖ / h )) − log(N · norm_h · h^D)

All normalization constants are precomputed at conversion time and stored as ONNX scalar initializers, so the resulting graph contains only arithmetic and reduction operations.

Supported kernels

`'gaussian'`	`k(t) = exp(−t²/2)`
`'exponential'`	`k(t) = exp(−t)`
`'tophat'`	`k(t) = 1` for `t ≤ 1`, else `0`
`'epanechnikov'`	`k(t) = 1 − t²` for `t ≤ 1`, else `0`
`'linear'`	`k(t) = 1 − t` for `t ≤ 1`, else `0`
`'cosine'`	`k(t) = (π/4)·cos(πt/2)` for `t ≤ 1`, else 0

where t = ‖x − xᵢ‖ / h.

Graph structure (gaussian kernel, standard-ONNX path)

X (N, F)       X_train (M, F)
  │                 │
  └──sq_euclidean───┘  →  sq_dists (N, M)
                               │
                       Mul(−0.5/h²)
                               │
                       ReduceLogSumExp(axis=1)  →  log_sum (N,)
                               │
                       Sub(log_norm_const)  →  log_density (N,)

For compact kernels (tophat, epanechnikov, linear, cosine) the same squared-distance matrix is used but kernel values are summed directly, then the log is taken. When no training sample falls within the bandwidth the score is −∞ (matching sklearn behaviour for degenerate cases).

Computation paths

With com.microsoft opset (CDist path): Squared distances are delegated to com.microsoft.CDist (metric="sqeuclidean"), which is hardware-accelerated by ONNX Runtime.

Without com.microsoft opset (standard ONNX path): Squared distances are computed using the expansion identity ||x-c||² = ||x||² − 2·x·cᵀ + ||c||², which requires only MatMul and element-wise ops available since opset 13.

Parameters:

g – the graph builder to add nodes to
sts – shapes defined by scikit-learn
outputs – desired output names; outputs[0] receives the log-density vector of shape (N,)
estimator – a fitted KernelDensity
X – input tensor name
name – prefix for added node names

Returns:

output tensor name for the log-density (shape (N,))