Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 12468))
Included in the following conference series:
825Accesses
Abstract
In many knowledge discovery applications, findingoutliers, i.e. objects that behave in an unexpected way or have abnormal properties, is more interesting than findinginliers in a dataset. Outlier detection is important for many applications, including those related to intrusion detection, credit card fraud, and criminal activity in e-commerce. Several methods of outlier detection have been proposed, and even many of them from the perspective of Rough Set Theory, but at the moment none of them is specifically intended for multi-label datasets. In this paper, we propose a method that measures the degree of anomaly of an object in a multi-label dataset. This score or measure quantifies the degree of irregularity of an object with respect to the dataset. In addition, a method for generating anomalies in this type of datasets is proposed. From these synthetic datasets, the efficacy of the proposed method is proved. The results show the superiority of our proposal over other methods in the literature adapted to multi-label problems.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Acuña, E., Rodriguez, C.: On Detection of Outliers and Their Effect in Supervised Classification, vol. 15. University of Puerto Rico at Mayaguez (2004)
Aggarwal, C.C.: Outlier analysis. Data Mining, pp. 237–263. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-14142-8_8
Barnet, V., Lewis, T.: Outliers in Statistical Data (1994)
Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Bookstein, A., Kulyukin, V.A., Raita, T.: Generalized hamming distance. Inf. Retrieval5(4), 353–375 (2002)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Charte, F., Charte, D., Rivera, A., del Jesus, M.J., Herrera, F.: R ultimate multilabel dataset repository. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 487–499. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-32034-2_41
Chen, Y., Miao, D., Zhang, H.: Neighborhood outlier detection. Expert Syst. Appl.37(12), 8745–8749 (2010)
Gebhardt, J., Goldstein, M., Shafait, F., Dengel, A.: Document authentication using printing technique features and unsupervised anomaly detection. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 479–483. IEEE (2013)
Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, Netherlands (1980).https://doi.org/10.1007/978-94-015-3994-4
Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel classification. Multilabel Classification, pp. 17–31. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-41111-8_2
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR)31(3), 264–323 (1999)
Jiang, F., Chen, Y.-M.: Outlier detection based on granular computing and rough set theory. Appl. Intell.42(2), 303–322 (2014).https://doi.org/10.1007/s10489-014-0591-4
Jiang, F., Sui, Y., Cao, C.: Outlier detection using rough set theory. In: Ślęzak, D., Yao, J.T., Peters, J.F., Ziarko, W., Hu, X. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3642, pp. 79–87. Springer, Heidelberg (2005).https://doi.org/10.1007/11548706_9
Jiang, F., Sui, Y., Cao, C.: A rough set approach to outlier detection. Int. J. Gener. Syst.37(5), 519–536 (2008)
Johnson, T., Kwok, I., Ng, R.T.: Fast computation of 2-dimensional depth contours. In: KDD, pp. 224–228. Citeseer (1998)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J.8(3–4), 237–253 (2000)
Kovács, L., Vass, D., Vidács, A.: Improving quality of service parameter prediction with preliminary outlier detection and elimination. In: Proceedings of the Second International Workshop on Inter-domain Performance and Simulation (IPS 2004), Budapest, vol. 2004, pp. 194–199 (2004)
Lundin, E., Kvarnström, H., Jonsson, E.: A synthetic fraud data generation methodology. In: Deng, R., Bao, F., Zhou, J., Qing, S. (eds.) ICICS 2002. LNCS, vol. 2513, pp. 265–277. Springer, Heidelberg (2002).https://doi.org/10.1007/3-540-36159-6_23
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci.11(5), 341–356 (1982)
Pereira, R.B., Plastino, A., Zadrozny, B., Merschmann, L.H.: Correlation analysis of performance measures for multi-label classification. Inf. Process. Manage.54(3), 359–369 (2018)
Porwal, U., Mukund, S.: Credit card fraud detection in e-commerce: an outlier detection approach. arXiv preprintarXiv:1811.02196 (2018)
Ramakrishnan, J., Shaabani, E., Li, C., Sustik, M.A.: Anomaly detection for an e-commerce pricing system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1917–1926 (2019)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection, vol. 589. Wiley, New York (2005)
Shaari, F., Bakar, A.A., Hamdan, A.R.: Outlier detection based on rough sets theory. Intell. Data Anal.13(2), 191–206 (2009)
Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Trans. Knowl. Data Eng.12(2), 331–336 (2000)
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: a java library for multi-label learning. J. Mach. Learn. Res.12(Jul), 2411–2414 (2011)
Tsoumakas, G., Vlahavas, I.: Randomk-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007).https://doi.org/10.1007/978-3-540-74958-5_38
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res.6, 1–34 (1997)
Zhang, M.L., Zhou, Z.H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng.18(10), 1338–1351 (2006)
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn.40(7), 2038–2048 (2007)
Author information
Authors and Affiliations
Computer Science Department, Universidad Central de Las Villas, Santa Clara, Cuba
Marilyn Bello, Rafael Morera & Rafael Bello
Faculty of Business Economics, Hasselt University, Hasselt, Belgium
Marilyn Bello, Gonzalo Nápoles & Koen Vanhoof
Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands
Gonzalo Nápoles
- Marilyn Bello
You can also search for this author inPubMed Google Scholar
- Gonzalo Nápoles
You can also search for this author inPubMed Google Scholar
- Rafael Morera
You can also search for this author inPubMed Google Scholar
- Koen Vanhoof
You can also search for this author inPubMed Google Scholar
- Rafael Bello
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toMarilyn Bello.
Editor information
Editors and Affiliations
Facultad de Ingeniería, Universidad Panamericana, Mexico City, Mexico
Lourdes Martínez-Villaseñor
Universidad Autónoma Metropolitana, Mexico City, Mexico
Oscar Herrera-Alcántara
Facultad de Ingeniería, Universidad Panamericana, Mexico City, Mexico
Hiram Ponce
Universidad Autónoma del Estado de Hidalgo, Hidalgo, Mexico
Félix A. Castro-Espinoza
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bello, M., Nápoles, G., Morera, R., Vanhoof, K., Bello, R. (2020). Outliers Detection in Multi-label Datasets. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds) Advances in Soft Computing. MICAI 2020. Lecture Notes in Computer Science(), vol 12468. Springer, Cham. https://doi.org/10.1007/978-3-030-60884-2_5
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-60883-5
Online ISBN:978-3-030-60884-2
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative