memgonzales/meta-learning-clusteringPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star3

Presented at the 2022 IEEE Region 10 Conference (TENCON 2022). Our main contribution is twofold: (1) the construction of a meta-learning model for recommending a distance metric for k-means clustering and (2) a fine-grained analysis of the importance and effects of the meta-features on the model's output

doi.org/10.1109/TENCON55691.2022.9978037

License

MIT license

3 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
dataset_labels		dataset_labels
dataset_of_datasets		dataset_of_datasets
figures		figures
.gitignore		.gitignore
0. Directory.ipynb		0. Directory.ipynb
1. Dataset Labeling.ipynb		1. Dataset Labeling.ipynb
2. Meta-Feature Extraction.ipynb		2. Meta-Feature Extraction.ipynb
3. Model Building & Evaluation.ipynb		3. Model Building & Evaluation.ipynb
Distance Metric Recommendation for k-Means Clustering A Meta-Learning Approach.pdf		Distance Metric Recommendation for k-Means Clustering A Meta-Learning Approach.pdf
LICENSE.md		LICENSE.md
Presentation Slides.pdf		Presentation Slides.pdf
README.md		README.md

Repository files navigation

Distance Metric Recommendation for$k$-Means Clustering: A Meta-Learning Approach

This work was accepted for paper presentation at the 2022 IEEE Region 10 Conference (TENCON 2022), held virtually and in-person in Hong Kong:

The final version of our paper (as published in the conference proceedings of TENCON 2022) can be accessed via thislink.
- Our preprint can be accessed via thislink.
- Our TENCON 2022 presentation slides can be accessed via thislink.
Ourdataset of datasets is publicly released for future researchers.
Kindly refer to0. Directory.ipynb for a guide on navigating through this repository.

If you find our work useful, please consider citing:

@INPROCEEDINGS{9978037,  author={Gonzales, Mark Edward M. and Uy, Lorene C. and Sy, Jacob Adrianne L. and Cordel, Macario O.},  booktitle={TENCON 2022 - 2022 IEEE Region 10 Conference (TENCON)},   title={Distance Metric Recommendation for k-Means Clustering: A Meta-Learning Approach},   year={2022},  pages={1-6},  doi={10.1109/TENCON55691.2022.9978037}}

This repository is also archived onZenodo.

Description

ABSTRACT: The choice of distance metric impacts the clustering quality of centroid-based algorithms, such as$k$-means. Theoretical attempts to select the optimal metric entail deep domain knowledge, while experimental approaches are resource-intensive. This paper presents a meta-learning approach to automatically recommend a distance metric for$k$-means clustering that optimizes the Davies-Bouldin score. Three distance measures were considered: Chebyshev, Euclidean, and Manhattan. General, statistical, information-theoretic, structural, and complexity meta-features were extracted, and random forest was used to construct the meta-learning model; borderline SMOTE was applied to address class imbalance. The model registered an accuracy of 70.59%. Employing Shapley additive explanations, it was found that the mean of the sparsity of the attributes has the highest meta-feature importance. Feeding only the top 25 most important meta-features increased the accuracy to 71.57%. The main contribution of this paper is twofold: the construction of a meta-learning model for distance metric recommendation and a fine-grained analysis of the importance and effects of the meta-features on the model’s output.

INDEX TERMS: meta-learning, meta-features,$k$-means, clustering, distance metric, random forest

Authors

Mark Edward M. Gonzales
mark_gonzales@dlsu.edu.ph
Lorene C. Uy
lorene_c_uy@dlsu.edu.ph
Jacob Adrianne L. Sy
jacob_adrianne_l_sy@dlsu.edu.ph
Dr. Macario O. Cordel, II
macario.cordel@dlsu.edu.ph

This is the major course output in a machine learning class for master's students under Dr. Macario O. Cordel, II of the Department of Computer Technology, De La Salle University. The task is to create a ten-week investigatory project that applies machine learning to a particular research area or offers a substantial theoretical or algorithmic contribution to existing machine learning techniques.

About

doi.org/10.1109/TENCON55691.2022.9978037

Releases1

Distance Metric Recommendation for k-Means Clustering: A Meta-Learning Approach Latest

Apr 30, 2023

Packages

No packages published

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Distance Metric Recommendation for$k$-Means Clustering: A Meta-Learning Approach

Description

Authors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Uh oh!

Languages

Movatterモバイル変換

License

memgonzales/meta-learning-clustering

Folders and files

Latest commit

History

Repository files navigation

Distance Metric Recommendation for$k$-Means Clustering: A Meta-Learning Approach

Description

Authors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Uh oh!

Languages

Packages