Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

From-Scratch EM Algorithm for GMM Matches scikit-learn on UMAP-Reduced Text Data#31216

dimitris-markopoulos started this conversation inShow and tell
Discussion options

I implemented the EM algorithm for multivariate Gaussian Mixture Models from scratch and benchmarked it against sklearn.mixture.GaussianMixture. On a UMAP-reduced version of a high-dimensional text dataset, the results aligned almost perfectly:

Matching mixing weights, means, and covariances

Adjusted Rand Index = 1.0000

Component assignments match after greedy alignment via L2 distance

The implementation is object-oriented, numerically stable (with covariance regularization), and tracks parameter convergence across iterations. A direct comparison to scikit-learn is included.

Notebook:
06_em_algorithm_fit_gmm.ipynb

Core class:
ml_utils.py

Note: The convergence only matches this closely after dimensionality reduction with UMAP. On raw high-dimensional data, convergence is more sensitive to initialization.

Happy to share this as a learning tool or discussion starter around reproducibility and clustering convergence diagnostics.


If you are interested in seeing the entire project:here

You must be logged in to vote

Replies: 0 comments

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Labels
None yet
1 participant
@dimitris-markopoulos

[8]ページ先頭

©2009-2025 Movatter.jp