yobx.sklearn.decomposition.latent_dirichlet_allocation#

yobx.sklearn.decomposition.latent_dirichlet_allocation.sklearn_latent_dirichlet_allocation(g: GraphBuilderExtendedProtocol, sts: Dict, outputs: List[str], estimator: LatentDirichletAllocation, X: str, name: str = 'lda') → str[source]#

Converts a sklearn.decomposition.LatentDirichletAllocation into ONNX.

The converter implements the variational E-step used by transform(). Starting from a uniform document-topic distribution, it iterates max_doc_update_iter times (no early-stopping tolerance check):

gamma  ←  ones((N, K))
exp_dt ←  exp(digamma(gamma) − digamma(rowsum(gamma)))

for _ in range(max_doc_update_iter):
    norm_phi ←  exp_dt @ exp_W + ε         (N, F)
    gamma    ←  exp_dt * (X / norm_phi @ exp_Wᵀ) + α   (N, K)
    exp_dt   ←  exp(digamma(gamma) − digamma(rowsum(gamma)))

output ←  gamma / rowsum(gamma)              (N, K)

where exp_W is exp_dirichlet_component_ (K × F), α is doc_topic_prior_, and ε is the floating-point machine epsilon.

Note

The Digamma function is approximated via the asymptotic expansion ψ(x) ≈ ln(x) − 1/(2x) − 1/(12x²) + 1/(120x⁴) − 1/(252x⁶) after 8 recurrence steps. The approximation error is below 1e-9 for all positive inputs, comfortably within float32 precision.

Note

Unlike sklearn’s sparse implementation, this converter processes all word features densely. For documents with many zero counts the zero entries contribute nothing to the update, so the results are numerically identical.

Parameters:

g – the graph builder to add nodes to
sts – shapes defined by scikit-learn
outputs – desired output names (document-topic distribution)
estimator – a fitted LatentDirichletAllocation
X – input tensor name – word-count matrix (N, n_features)
name – prefix name for the added nodes

Returns:

output tensor name (N, n_components)