Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Presented at the 2022 IEEE Region 10 Conference (TENCON 2022). Our main contribution is twofold: (1) the construction of a meta-learning model for recommending a distance metric for k-means clustering and (2) a fine-grained analysis of the importance and effects of the meta-features on the model's output

License

NotificationsYou must be signed in to change notification settings

memgonzales/meta-learning-clustering

Repository files navigation

badgebadgeRbadgebadgebadgescikit-learn

This work was accepted for paper presentation at the 2022 IEEE Region 10 Conference (TENCON 2022), held virtually and in-person in Hong Kong:

  • The final version of our paper (as published in the conference proceedings of TENCON 2022) can be accessed via thislink.
    • Our preprint can be accessed via thislink.
    • Our TENCON 2022 presentation slides can be accessed via thislink.
  • Ourdataset of datasets is publicly released for future researchers.
  • Kindly refer to0. Directory.ipynb for a guide on navigating through this repository.

If you find our work useful, please consider citing:

@INPROCEEDINGS{9978037,  author={Gonzales, Mark Edward M. and Uy, Lorene C. and Sy, Jacob Adrianne L. and Cordel, Macario O.},  booktitle={TENCON 2022 - 2022 IEEE Region 10 Conference (TENCON)},   title={Distance Metric Recommendation for k-Means Clustering: A Meta-Learning Approach},   year={2022},  pages={1-6},  doi={10.1109/TENCON55691.2022.9978037}}

This repository is also archived onZenodo.

Description

ABSTRACT: The choice of distance metric impacts the clustering quality of centroid-based algorithms, such as$k$-means. Theoretical attempts to select the optimal metric entail deep domain knowledge, while experimental approaches are resource-intensive. This paper presents a meta-learning approach to automatically recommend a distance metric for$k$-means clustering that optimizes the Davies-Bouldin score. Three distance measures were considered: Chebyshev, Euclidean, and Manhattan. General, statistical, information-theoretic, structural, and complexity meta-features were extracted, and random forest was used to construct the meta-learning model; borderline SMOTE was applied to address class imbalance. The model registered an accuracy of 70.59%. Employing Shapley additive explanations, it was found that the mean of the sparsity of the attributes has the highest meta-feature importance. Feeding only the top 25 most important meta-features increased the accuracy to 71.57%. The main contribution of this paper is twofold: the construction of a meta-learning model for distance metric recommendation and a fine-grained analysis of the importance and effects of the meta-features on the model’s output.

INDEX TERMS: meta-learning, meta-features,$k$-means, clustering, distance metric, random forest

App Screenshots

Authors

This is the major course output in a machine learning class for master's students under Dr. Macario O. Cordel, II of the Department of Computer Technology, De La Salle University. The task is to create a ten-week investigatory project that applies machine learning to a particular research area or offers a substantial theoretical or algorithmic contribution to existing machine learning techniques.

About

Presented at the 2022 IEEE Region 10 Conference (TENCON 2022). Our main contribution is twofold: (1) the construction of a meta-learning model for recommending a distance metric for k-means clustering and (2) a fine-grained analysis of the importance and effects of the meta-features on the model's output

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp