Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork26k
From-Scratch EM Algorithm for GMM Matches scikit-learn on UMAP-Reduced Text Data#31216
-
I implemented the EM algorithm for multivariate Gaussian Mixture Models from scratch and benchmarked it against sklearn.mixture.GaussianMixture. On a UMAP-reduced version of a high-dimensional text dataset, the results aligned almost perfectly: Matching mixing weights, means, and covariances Adjusted Rand Index = 1.0000 Component assignments match after greedy alignment via L2 distance The implementation is object-oriented, numerically stable (with covariance regularization), and tracks parameter convergence across iterations. A direct comparison to scikit-learn is included. Notebook: Core class: Note: The convergence only matches this closely after dimensionality reduction with UMAP. On raw high-dimensional data, convergence is more sensitive to initialization. Happy to share this as a learning tool or discussion starter around reproducibility and clustering convergence diagnostics. — |
BetaWas this translation helpful?Give feedback.