Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

One of the applications of center-based clustering algorithms such as K-means is partitioning data points intoK clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number of clusters, which is useful in some practical cases. Other practical methods which do are simply too complex, as they require at least one run of K-means for each possibleK. In order to address this issue, we propose a K-means initialization similar to K-means++, which would be able to estimateK based on the feature space while finding suitable initial centroids for K-means in a deterministic manner. Then we compare the proposed method, DISCERN, with a few of the most practicalK estimation methods, while also comparing clustering results of K-means when initialized randomly, using K-means++ and using DISCERN. The results show improvement in both the estimation and final clustering performance.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, September 2018. Springer, Cham, pp 132–149

  2. Zhang H, Basu S, Davidson I (2019) Deep constrained clustering-algorithms and advances. ArXiv preprintarXiv:190110061

  3. Gansbeke WV, Vandenhende S, Georgoulis S, Proesmans M, Gool LV (2020) Learning to classify images without labels. 2005.12320

  4. Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A (2020) Unsupervised learning of visual features by contrasting cluster assignments. 2006.09882

  5. Kim D, Lee B, Lee HJ, Lee SP, Moon Y, Jeong MK (2012) A graph kernel approach for detecting core patents and patent groups. IEEE Intell Syst 29(4):44–51

    Article  Google Scholar 

  6. Fang Y, Gui-fa T (2015) Visual music score detection with unsupervised feature learning method based on K-means. Int J Mach Learn Cybern 6(2):277–287

    Article  Google Scholar 

  7. Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms (SODA), Society for Industrial and Applied Mathematics, New Orleans, LA, USA, January 2007, pp 1027–1035

  8. Gulnashin F, Sharma I, Sharma H (2019) A new deterministic method of initializing spherical K-means for document clustering. In: Progress in advanced computing and intelligent engineering. Springer, Singapore, pp 149–155

    Chapter  Google Scholar 

  9. Jain A, Sharma I (2018) Clustering of text streams via facility location and spherical K-means. In: 2018 second international conference on electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, March 2018. IEEE, pp 1209–1213

  10. Hartigan JA, Wong MA (1979) Algorithm as 136: a K-means clustering algorithm. J R Stat Soc Ser C (Applied Statistics) 28(1):100–108

    MATH  Google Scholar 

  11. Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the K-means clustering method. Int J Mach Learn Cybern 4(2):107–117

    Article  Google Scholar 

  12. Wang P, Shi H, Yang X, Mi J (2019) Three-way K-means: integrating K-means and three-way decision. Int J Mach Learn Cybern 10(10):2767–2777

    Article  Google Scholar 

  13. Chen L, Xu Z, Wang H, Liu S (2018) An ordered clustering algorithm based on K-means and the promethee method. Int J Mach Learn Cybern 9(6):917–926

    Article  Google Scholar 

  14. Cai Z, Zhou C, Li X (2019) Application research of employment recommendation based on improved K-means++ algorithm in colleges and universities. Appl Intell Syst Multimodal Inf Anal.https://doi.org/10.1007/978-3-030-15740-1_124

    Article  Google Scholar 

  15. Solak S, Altinisik U (2018) A new method for classifying nuts using image processing and K-means++ clustering. J Food Process Eng 41(7):e12859

    Article  Google Scholar 

  16. Maggioni M, Murphy JM (2019) Learning by unsupervised nonlinear diffusion. J Mach Learn Res 20(160):1–56.http://jmlr.org/papers/v20/18-873.html

  17. Little A, Byrd A (2015) A multiscale spectral method for learning number of clusters. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), Miami, FL, USA, December 2015. pp 457–460

  18. Pelleg D, Moore AW (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning (ICML), Stanford, CA, USA, June–July 2000. Morgan Kaufmann Publishers Inc., pp 727–734

  19. Thomaz CE (2006) Fei face database.https://fei.edu.br/~cet/facedatabase.html. Accessed 1 Aug 2019

  20. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19:711–720

    Article  Google Scholar 

  21. Weyrauch B, Heisele B, Huang J, Blanz V (2004) Component-based face recognition with 3d morphable models. In: 2004 conference on computer vision and pattern recognition workshop (CVPR), Washington, DC, USA, June–July 2004. IEEE

  22. Nefian AV (1999) Georgia tech face database.http://www.anefian.com/research/face_reco.htm. Accessed 1 Aug 2019

  23. Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE workshop on applications of computer vision (ACV), Sarasota, FL, USA, December 1994. IEEE, pp 138–142,https://git-disl.github.io/GTDLBench/datasets/att_face_dataset/

  24. Computational Visions Group (1999) Faces 1999.http://www.vision.caltech.edu/html-files/archive.html. Accessed 1 Aug 2019

  25. Fastai (2019) Imagenette: ImageNet Subset.https://github.com/fastai/imagenette. Accessed 1 May 2020

  26. Blishen B, Carroll W, Moore C (2001) Prestige: Prestige of Canadian Occupations

  27. Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) Webace: a web agent for document categorization and exploration. In: Proceedings of the second international conference on autonomous agents. ACM, pp 408–415

  28. Greene D, Cunningham P (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on Machine learning (ICML), Pittsburgh, PA, USA, June 2006. ACM, pp 377–384

  29. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  30. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA, June 2015. IEEE, pp 815–823

  31. Taniai H (2018) keras-facenet.https://github.com/nyoki-mtl/keras-facenet. Accessed 1 Aug 2019

  32. Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European conference on computer vision (ECCV), Amsterdam, The Netherlands, October 2016. Springer, Cham, pp 87–102

  33. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, June–July 2016. IEEE, pp 770–778

  34. Paszke A, Gross S, Massa F, Lerer A, Bradbury J,Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems (NIPS) 32, Vancouver, Canada, December 2019. Curran Associates, Inc., pp 8024–8035

  35. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems.https://www.tensorflow.org/, software available from tensorflow.org. Accessed 1 Aug 2019

  36. Van Der Walt S, Colbert SC, Varoquaux G (2011) The numpy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22

    Article  Google Scholar 

  37. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet MATH  Google Scholar 

  38. Novikov A (2019) Pyclustering: data mining library. J Open Sour Softw 4(36):1230

    Article  Google Scholar 

  39. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple Valued Logic Soft Comput 17(3):255–287

    Google Scholar 

  40. Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231

    Google Scholar 

  41. Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod record. ACM 28:49–60

    Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their valuable feedback and comments. We also thank Dr. Farid Saberi Movahed for his useful comments and discussions.

Author information

Authors and Affiliations

  1. Department of Computer Science, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran

    Ali Hassani & Amir Iranmanesh

  2. Department of Applied Mathematics and Mahani Mathematical Research Center, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran

    Abbas Salemi

  3. Department of Computer Engineering, Shahid Bahonar University of Kerman, Pajoohesh Square, Kerman, 76169-14111, Islamic Republic of Iran

    Mahdi Eftekhari

Authors
  1. Ali Hassani

    You can also search for this author inPubMed Google Scholar

  2. Amir Iranmanesh

    You can also search for this author inPubMed Google Scholar

  3. Mahdi Eftekhari

    You can also search for this author inPubMed Google Scholar

  4. Abbas Salemi

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toAbbas Salemi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hassani, A., Iranmanesh, A., Eftekhari, M.et al. DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering.Int. J. Mach. Learn. & Cyber.12, 635–649 (2021). https://doi.org/10.1007/s13042-020-01193-5

Download citation

Keywords

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp