yobx.sklearn.neighbors.kernel_density#

yobx.sklearn.neighbors.kernel_density.sklearn_kernel_density(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: KernelDensity, X: str, name: str = 'kde') str[source]#

Converts a sklearn.neighbors.KernelDensity into ONNX.

The converter implements score_samples(), which returns the log-density at each query point:

log_density(x) = log( (1/N) · Σᵢ k( ‖x − xᵢ‖ / h ) ) − log(norm_h)

where k is the unnormalized kernel, h the bandwidth, and norm_h the kernel-specific normalization factor that does not depend on the query x.

Rearranging, the output equals:

output(x) = log(Σᵢ k( ‖x − xᵢ‖ / h )) − log(N · norm_h · h^D)

All normalization constants are precomputed at conversion time and stored as ONNX scalar initializers, so the resulting graph contains only arithmetic and reduction operations.

Supported kernels

'gaussian'

k(t) = exp(−t²/2)

'exponential'

k(t) = exp(−t)

'tophat'

k(t) = 1 for t 1, else 0

'epanechnikov'

k(t) = 1 for t 1, else 0

'linear'

k(t) = 1 t for t 1, else 0

'cosine'

k(t) = (π/4)·cos(πt/2) for t 1, else 0

where t = ‖x xᵢ‖ / h.

Graph structure (gaussian kernel, standard-ONNX path)

X (N, F)       X_train (M, F)
  │                 │
  └──sq_euclidean───┘  →  sq_dists (N, M)
                               │
                       Mul(−0.5/h²)
                               │
                       ReduceLogSumExp(axis=1)  →  log_sum (N,)
                               │
                       Sub(log_norm_const)  →  log_density (N,)

For compact kernels (tophat, epanechnikov, linear, cosine) the same squared-distance matrix is used but kernel values are summed directly, then the log is taken. When no training sample falls within the bandwidth the score is −∞ (matching sklearn behaviour for degenerate cases).

Computation paths

With com.microsoft opset (CDist path): Squared distances are delegated to com.microsoft.CDist (metric="sqeuclidean"), which is hardware-accelerated by ONNX Runtime.

Without com.microsoft opset (standard ONNX path): Squared distances are computed using the expansion identity ||x-c||² = ||x||² 2·x·cᵀ + ||c||², which requires only MatMul and element-wise ops available since opset 13.

Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes defined by scikit-learn

  • outputs – desired output names; outputs[0] receives the log-density vector of shape (N,)

  • estimator – a fitted KernelDensity

  • X – input tensor name

  • name – prefix for added node names

Returns:

output tensor name for the log-density (shape (N,))