yobx.sklearn.decomposition.latent_dirichlet_allocation#
- yobx.sklearn.decomposition.latent_dirichlet_allocation.sklearn_latent_dirichlet_allocation(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: LatentDirichletAllocation, X: str, name: str = 'lda') str[source]#
Converts a
sklearn.decomposition.LatentDirichletAllocationinto ONNX.The converter implements the variational E-step used by
transform(). Starting from a uniform document-topic distribution, it iteratesmax_doc_update_itertimes (no early-stopping tolerance check):gamma ← ones((N, K)) exp_dt ← exp(digamma(gamma) − digamma(rowsum(gamma))) for _ in range(max_doc_update_iter): norm_phi ← exp_dt @ exp_W + ε (N, F) gamma ← exp_dt * (X / norm_phi @ exp_Wᵀ) + α (N, K) exp_dt ← exp(digamma(gamma) − digamma(rowsum(gamma))) output ← gamma / rowsum(gamma) (N, K)where
exp_Wisexp_dirichlet_component_(K × F),αisdoc_topic_prior_, andεis the floating-point machine epsilon.Note
The Digamma function is approximated via the asymptotic expansion
ψ(x) ≈ ln(x) − 1/(2x) − 1/(12x²) + 1/(120x⁴) − 1/(252x⁶)after 8 recurrence steps. The approximation error is below 1e-9 for all positive inputs, comfortably within float32 precision.Note
Unlike sklearn’s sparse implementation, this converter processes all word features densely. For documents with many zero counts the zero entries contribute nothing to the update, so the results are numerically identical.
- Parameters:
g – the graph builder to add nodes to
sts – shapes defined by scikit-learn
outputs – desired output names (document-topic distribution)
estimator – a fitted
LatentDirichletAllocationX – input tensor name – word-count matrix
(N, n_features)name – prefix name for the added nodes
- Returns:
output tensor name
(N, n_components)