293Accesses
7Citations
1Altmetric
Abstract
One of the applications of center-based clustering algorithms such as K-means is partitioning data points intoK clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number of clusters, which is useful in some practical cases. Other practical methods which do are simply too complex, as they require at least one run of K-means for each possibleK. In order to address this issue, we propose a K-means initialization similar to K-means++, which would be able to estimateK based on the feature space while finding suitable initial centroids for K-means in a deterministic manner. Then we compare the proposed method, DISCERN, with a few of the most practicalK estimation methods, while also comparing clustering results of K-means when initialized randomly, using K-means++ and using DISCERN. The results show improvement in both the estimation and final clustering performance.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, September 2018. Springer, Cham, pp 132–149
Zhang H, Basu S, Davidson I (2019) Deep constrained clustering-algorithms and advances. ArXiv preprintarXiv:190110061
Gansbeke WV, Vandenhende S, Georgoulis S, Proesmans M, Gool LV (2020) Learning to classify images without labels. 2005.12320
Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A (2020) Unsupervised learning of visual features by contrasting cluster assignments. 2006.09882
Kim D, Lee B, Lee HJ, Lee SP, Moon Y, Jeong MK (2012) A graph kernel approach for detecting core patents and patent groups. IEEE Intell Syst 29(4):44–51
Fang Y, Gui-fa T (2015) Visual music score detection with unsupervised feature learning method based on K-means. Int J Mach Learn Cybern 6(2):277–287
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms (SODA), Society for Industrial and Applied Mathematics, New Orleans, LA, USA, January 2007, pp 1027–1035
Gulnashin F, Sharma I, Sharma H (2019) A new deterministic method of initializing spherical K-means for document clustering. In: Progress in advanced computing and intelligent engineering. Springer, Singapore, pp 149–155
Jain A, Sharma I (2018) Clustering of text streams via facility location and spherical K-means. In: 2018 second international conference on electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, March 2018. IEEE, pp 1209–1213
Hartigan JA, Wong MA (1979) Algorithm as 136: a K-means clustering algorithm. J R Stat Soc Ser C (Applied Statistics) 28(1):100–108
Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the K-means clustering method. Int J Mach Learn Cybern 4(2):107–117
Wang P, Shi H, Yang X, Mi J (2019) Three-way K-means: integrating K-means and three-way decision. Int J Mach Learn Cybern 10(10):2767–2777
Chen L, Xu Z, Wang H, Liu S (2018) An ordered clustering algorithm based on K-means and the promethee method. Int J Mach Learn Cybern 9(6):917–926
Cai Z, Zhou C, Li X (2019) Application research of employment recommendation based on improved K-means++ algorithm in colleges and universities. Appl Intell Syst Multimodal Inf Anal.https://doi.org/10.1007/978-3-030-15740-1_124
Solak S, Altinisik U (2018) A new method for classifying nuts using image processing and K-means++ clustering. J Food Process Eng 41(7):e12859
Maggioni M, Murphy JM (2019) Learning by unsupervised nonlinear diffusion. J Mach Learn Res 20(160):1–56.http://jmlr.org/papers/v20/18-873.html
Little A, Byrd A (2015) A multiscale spectral method for learning number of clusters. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), Miami, FL, USA, December 2015. pp 457–460
Pelleg D, Moore AW (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning (ICML), Stanford, CA, USA, June–July 2000. Morgan Kaufmann Publishers Inc., pp 727–734
Thomaz CE (2006) Fei face database.https://fei.edu.br/~cet/facedatabase.html. Accessed 1 Aug 2019
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19:711–720
Weyrauch B, Heisele B, Huang J, Blanz V (2004) Component-based face recognition with 3d morphable models. In: 2004 conference on computer vision and pattern recognition workshop (CVPR), Washington, DC, USA, June–July 2004. IEEE
Nefian AV (1999) Georgia tech face database.http://www.anefian.com/research/face_reco.htm. Accessed 1 Aug 2019
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE workshop on applications of computer vision (ACV), Sarasota, FL, USA, December 1994. IEEE, pp 138–142,https://git-disl.github.io/GTDLBench/datasets/att_face_dataset/
Computational Visions Group (1999) Faces 1999.http://www.vision.caltech.edu/html-files/archive.html. Accessed 1 Aug 2019
Fastai (2019) Imagenette: ImageNet Subset.https://github.com/fastai/imagenette. Accessed 1 May 2020
Blishen B, Carroll W, Moore C (2001) Prestige: Prestige of Canadian Occupations
Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) Webace: a web agent for document categorization and exploration. In: Proceedings of the second international conference on autonomous agents. ACM, pp 408–415
Greene D, Cunningham P (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on Machine learning (ICML), Pittsburgh, PA, USA, June 2006. ACM, pp 377–384
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA, June 2015. IEEE, pp 815–823
Taniai H (2018) keras-facenet.https://github.com/nyoki-mtl/keras-facenet. Accessed 1 Aug 2019
Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European conference on computer vision (ECCV), Amsterdam, The Netherlands, October 2016. Springer, Cham, pp 87–102
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, June–July 2016. IEEE, pp 770–778
Paszke A, Gross S, Massa F, Lerer A, Bradbury J,Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems (NIPS) 32, Vancouver, Canada, December 2019. Curran Associates, Inc., pp 8024–8035
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems.https://www.tensorflow.org/, software available from tensorflow.org. Accessed 1 Aug 2019
Van Der Walt S, Colbert SC, Varoquaux G (2011) The numpy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830
Novikov A (2019) Pyclustering: data mining library. J Open Sour Softw 4(36):1230
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple Valued Logic Soft Comput 17(3):255–287
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod record. ACM 28:49–60
Acknowledgements
We would like to thank the anonymous reviewers for their valuable feedback and comments. We also thank Dr. Farid Saberi Movahed for his useful comments and discussions.
Author information
Authors and Affiliations
Department of Computer Science, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran
Ali Hassani & Amir Iranmanesh
Department of Applied Mathematics and Mahani Mathematical Research Center, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran
Abbas Salemi
Department of Computer Engineering, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran
Mahdi Eftekhari
- Ali Hassani
You can also search for this author inPubMed Google Scholar
- Amir Iranmanesh
You can also search for this author inPubMed Google Scholar
- Mahdi Eftekhari
You can also search for this author inPubMed Google Scholar
- Abbas Salemi
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toAbbas Salemi.
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hassani, A., Iranmanesh, A., Eftekhari, M.et al. DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering.Int. J. Mach. Learn. & Cyber.12, 635–649 (2021). https://doi.org/10.1007/s13042-020-01193-5
Received:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative