Movatterモバイル変換

293Accesses
7Citations
1Altmetric
Explore all metrics

Abstract

One of the applications of center-based clustering algorithms such as K-means is partitioning data points intoK clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number of clusters, which is useful in some practical cases. Other practical methods which do are simply too complex, as they require at least one run of K-means for each possibleK. In order to address this issue, we propose a K-means initialization similar to K-means++, which would be able to estimateK based on the feature space while finding suitable initial centroids for K-means in a deterministic manner. Then we compare the proposed method, DISCERN, with a few of the most practicalK estimation methods, while also comparing clustering results of K-means when initialized randomly, using K-means++ and using DISCERN. The results show improvement in both the estimation and final clustering performance.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Faster K-Means Cluster Estimation

Initial Centroid Selection Method for an Enhanced K-means Clustering Algorithm

Global k-means++: an effective relaxation of the global k-means clustering algorithm

Article05 July 2024

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, September 2018. Springer, Cham, pp 132–149
Zhang H, Basu S, Davidson I (2019) Deep constrained clustering-algorithms and advances. ArXiv preprintarXiv:190110061
Gansbeke WV, Vandenhende S, Georgoulis S, Proesmans M, Gool LV (2020) Learning to classify images without labels. 2005.12320
Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A (2020) Unsupervised learning of visual features by contrasting cluster assignments. 2006.09882
Kim D, Lee B, Lee HJ, Lee SP, Moon Y, Jeong MK (2012) A graph kernel approach for detecting core patents and patent groups. IEEE Intell Syst 29(4):44–51
Article Google Scholar
Fang Y, Gui-fa T (2015) Visual music score detection with unsupervised feature learning method based on K-means. Int J Mach Learn Cybern 6(2):277–287
Article Google Scholar
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms (SODA), Society for Industrial and Applied Mathematics, New Orleans, LA, USA, January 2007, pp 1027–1035
Gulnashin F, Sharma I, Sharma H (2019) A new deterministic method of initializing spherical K-means for document clustering. In: Progress in advanced computing and intelligent engineering. Springer, Singapore, pp 149–155
Chapter Google Scholar
Jain A, Sharma I (2018) Clustering of text streams via facility location and spherical K-means. In: 2018 second international conference on electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, March 2018. IEEE, pp 1209–1213
Hartigan JA, Wong MA (1979) Algorithm as 136: a K-means clustering algorithm. J R Stat Soc Ser C (Applied Statistics) 28(1):100–108
MATH Google Scholar
Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the K-means clustering method. Int J Mach Learn Cybern 4(2):107–117
Article Google Scholar
Wang P, Shi H, Yang X, Mi J (2019) Three-way K-means: integrating K-means and three-way decision. Int J Mach Learn Cybern 10(10):2767–2777
Article Google Scholar
Chen L, Xu Z, Wang H, Liu S (2018) An ordered clustering algorithm based on K-means and the promethee method. Int J Mach Learn Cybern 9(6):917–926
Article Google Scholar
Cai Z, Zhou C, Li X (2019) Application research of employment recommendation based on improved K-means++ algorithm in colleges and universities. Appl Intell Syst Multimodal Inf Anal.https://doi.org/10.1007/978-3-030-15740-1_124
Article Google Scholar
Solak S, Altinisik U (2018) A new method for classifying nuts using image processing and K-means++ clustering. J Food Process Eng 41(7):e12859
Article Google Scholar
Maggioni M, Murphy JM (2019) Learning by unsupervised nonlinear diffusion. J Mach Learn Res 20(160):1–56.http://jmlr.org/papers/v20/18-873.html
Little A, Byrd A (2015) A multiscale spectral method for learning number of clusters. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), Miami, FL, USA, December 2015. pp 457–460
Pelleg D, Moore AW (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning (ICML), Stanford, CA, USA, June–July 2000. Morgan Kaufmann Publishers Inc., pp 727–734
Thomaz CE (2006) Fei face database.https://fei.edu.br/~cet/facedatabase.html. Accessed 1 Aug 2019
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19:711–720
Article Google Scholar
Weyrauch B, Heisele B, Huang J, Blanz V (2004) Component-based face recognition with 3d morphable models. In: 2004 conference on computer vision and pattern recognition workshop (CVPR), Washington, DC, USA, June–July 2004. IEEE
Nefian AV (1999) Georgia tech face database.http://www.anefian.com/research/face_reco.htm. Accessed 1 Aug 2019
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE workshop on applications of computer vision (ACV), Sarasota, FL, USA, December 1994. IEEE, pp 138–142,https://git-disl.github.io/GTDLBench/datasets/att_face_dataset/
Computational Visions Group (1999) Faces 1999.http://www.vision.caltech.edu/html-files/archive.html. Accessed 1 Aug 2019
Fastai (2019) Imagenette: ImageNet Subset.https://github.com/fastai/imagenette. Accessed 1 May 2020
Blishen B, Carroll W, Moore C (2001) Prestige: Prestige of Canadian Occupations
Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) Webace: a web agent for document categorization and exploration. In: Proceedings of the second international conference on autonomous agents. ACM, pp 408–415
Greene D, Cunningham P (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on Machine learning (ICML), Pittsburgh, PA, USA, June 2006. ACM, pp 377–384
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA, June 2015. IEEE, pp 815–823
Taniai H (2018) keras-facenet.https://github.com/nyoki-mtl/keras-facenet. Accessed 1 Aug 2019
Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European conference on computer vision (ECCV), Amsterdam, The Netherlands, October 2016. Springer, Cham, pp 87–102
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, June–July 2016. IEEE, pp 770–778
Paszke A, Gross S, Massa F, Lerer A, Bradbury J,Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems (NIPS) 32, Vancouver, Canada, December 2019. Curran Associates, Inc., pp 8024–8035
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems.https://www.tensorflow.org/, software available from tensorflow.org. Accessed 1 Aug 2019
Van Der Walt S, Colbert SC, Varoquaux G (2011) The numpy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830
MathSciNet MATH Google Scholar
Novikov A (2019) Pyclustering: data mining library. J Open Sour Softw 4(36):1230
Article Google Scholar
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple Valued Logic Soft Comput 17(3):255–287
Google Scholar
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231
Google Scholar
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod record. ACM 28:49–60
Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their valuable feedback and comments. We also thank Dr. Farid Saberi Movahed for his useful comments and discussions.

Author information

Authors and Affiliations

Department of Computer Science, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran
Ali Hassani & Amir Iranmanesh
Department of Applied Mathematics and Mahani Mathematical Research Center, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran
Abbas Salemi
Department of Computer Engineering, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran
Mahdi Eftekhari

Authors

Ali Hassani
View author publications
You can also search for this author inPubMed Google Scholar
Amir Iranmanesh
View author publications
You can also search for this author inPubMed Google Scholar
Mahdi Eftekhari
View author publications
You can also search for this author inPubMed Google Scholar
Abbas Salemi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toAbbas Salemi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hassani, A., Iranmanesh, A., Eftekhari, M.et al. DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering.Int. J. Mach. Learn. & Cyber.12, 635–649 (2021). https://doi.org/10.1007/s13042-020-01193-5

Download citation

Received:09 February 2020
Accepted:28 August 2020
Published:21 September 2020
Issue Date:March 2021
DOI:https://doi.org/10.1007/s13042-020-01193-5

Movatterモバイル変換

DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Faster K-Means Cluster Estimation

Initial Centroid Selection Method for an Enhanced K-means Clustering Algorithm

Global k-means++: an effective relaxation of the global k-means clustering algorithm

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Access this article

Subscribe and save

Buy Now