yobx.sklearn.neighbors.local_outlier_factor#

yobx.sklearn.neighbors.local_outlier_factor.sklearn_local_outlier_factor(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: LocalOutlierFactor, X: str, name: str = 'lof') str | Tuple[str, str][source]#

Converts a sklearn.neighbors.LocalOutlierFactor into ONNX.

Only novelty=True is supported (novelty detection mode), which enables the predict() and score_samples() methods on new data.

Algorithm overview

For each query point x, the converter implements the exact LOF formula:

  1. Compute pairwise distances to all training points: dists (N, M).

  2. Find the k nearest training neighbours: topk_dists, topk_idx.

  3. Compute the reachability distance from x to each neighbour x_i:

    reach_dist(x, x_i) = max(dist(x, x_i), k_distance(x_i))
    

    where k_distance(x_i) is the distance from training point x_i to its own k-th nearest neighbour, precomputed as estimator._distances_fit_X_[:, n_neighbors_ - 1].

  4. Local Reachability Density of x:

    LRD(x) = 1 / (mean(reach_dist(x, x_i)) + 1e-10)
    
  5. LOF score:

    LOF(x) = mean(LRD(x_i)) / LRD(x)
    
  6. Anomaly score (score_samples):

    score_samples(x) = -LOF(x)
    
  7. Decision function and label:

    decision_function(x) = score_samples(x) - offset_
    predict(x) = 1  if decision_function(x) >= 0 else -1
    

ONNX graph structure

X (N, F)
  │
  └─── pairwise distances ────────────────────────────────► dists (N, M)
                                                                  │
               TopK(k, axis=1, largest=0) ──────────────► topk_dists (N,k), topk_idx (N,k)
                                                                  │
    Gather(k_distances_train, topk_idx) ───────────────► k_dists_nbrs (N, k)
                                                                  │
    Max(topk_dists, k_dists_nbrs) ─────────────────────► reach_dists (N, k)
                                                                  │
    ReduceMean(axis=1) + 1e-10 ─────────────────────────► mean_reach (N,)
                                                                  │
    Div(1, mean_reach) ─────────────────────────────────► lrd_query (N,)
                                                                  │
    Gather(lrd_train, topk_idx) ────────────────────────► lrd_nbrs (N, k)
                                                                  │
    Div(lrd_nbrs, Unsqueeze(lrd_query,1)) ──────────────► lrd_ratios (N, k)
                                                                  │
    Neg(ReduceMean(lrd_ratios, axis=1)) ────────────────► score_samples (N,)
                                                                  │
    Sub(score_samples, offset_) ────────────────────────► decision (N,)
                                                                  │
    Where(decision >= 0, 1, -1) ────────────────────────► label (N,)
Parameters:
  • g – the graph builder to add nodes to

  • sts – shapes and types defined by scikit-learn

  • outputs – desired output tensor names; two entries (label, scores) or one entry (label,)

  • estimator – a fitted LocalOutlierFactor with novelty=True

  • X – name of the input tensor

  • name – prefix used for names of nodes added by this converter

Returns:

label tensor name, or tuple (label, scores)

Raises:
  • ValueError – if estimator.novelty is False

  • NotImplementedError – if the opset is below 18 (required for ReduceMean with axes as input) or the metric is unsupported