Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 9437))
2616Accesses
Abstract
In this paper we describe our submission to the IJCRS’15 Data Mining Competition, which is concerned with prediction of dangerous concentrations of methane in longwalls of a Polish coalmine. We address the challenge of building robust classification models with support vector machines (SVMs) that are built from time series data. Moreover, we investigate the impact of parameter tuning of SVMs with grid search on the classification performance and its effect on preventing over-fitting. Our results show improvements of predictive performance with proper parameter tuning but also improved stability of the classification models even when the test data comes from a different time period and class distribution. By applying the proposed method we were able to build a classification model that predicts unseen test data even better than the training data, thus highlighting the non-over-fitting properties of the model. The submitted solution was about 2 % behind the winning solution.
P. Lameski—This work was partially financed by the Faculty of Computer Science and Engineering at the Ss.Cyril and Methodius University, Skopje, Macedonia.
This is a preview of subscription content,log in via an institution to check access.
Similar content being viewed by others
References
Finkelman, R.B.: Health impacts of coal: facts and fallacies. AMBIO J. Hum. Environ.36(1), 103–106 (2007)
Hendryx, M., Ahern, M.M., Nurkiewicz, T.R.: Hospitalization patterns associated with appalachian coal mining. J. Toxicol. Environ. Health Part A70(24), 2064–2070 (2007)
Kozielski, M., Skowron, A., Wrbel, L., Sikora, M.: Regression rule learning for methane forecasting in coal mines. In: Kozielski, S., Mrozek, D., Kasprowski, P., Malysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. Communications in Computer and Information Science, vol. 521, pp. 495–504. Springer, Cham (2015)
Krasuski, A., Jankowski, A., Skowron, A., Slezak, D.: From sensory data to decision making: a perspective on supporting a fire commander. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 229–236. IEEE (2013)
Janusz, A., Ślȩzak, D., Sikora, M., Wróbel, ł., Stawicki, S., Marek, G., Slezak, D.: Mining data from coal mines: IJCRS’15 data challenge. In: Yao, Y., Hu, Q., Yu, H. Grzymala-Busse, J. (eds.) RSFDGrC 2015. LNCS, vol. 9437, pp. 429–438. Springer, Heidelberg (2015).https://knowledgepit.fedcsis.org/contest/view.php?id=109. Accessed 29 Jun 2015
Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell.24(1), 164–181 (2011)
Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv.45(1), 12:1–12:34 (2012)
Hu, B., Chen, Y., Keogh, E.: Classification of streaming time series under more realistic assumptions. Data Min. Knowl. Disc. 1–35 (2015)
Nguyen, H.S.: On efficient handling of continuous attributes in large data bases. Fundam. Inf.48(1), 61–81 (2001)
Grzymala-Busse, J.W.: A new version of the rule induction system lers. Fundam. Inf.31(1), 27–39 (1997)
Riza, L.S., Janusz, A., Bergmeir, C., Cornelis, C., Herrera, F., Slezak, D., Bentez, J.M.: Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “roughsets”. Information Sciences287, 68–89 (2014)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011)
Ben-Hur, A., Weston, J.: A users guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press, New York (2010)
Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification
Zdravevski, E., Lameski, P., Mingov, R., Kulakov, A., Gjorgjevikj, D.: Robust histogram-based feature engineering of time series data. In Ganzha, M., Maciaszek, L.A., Paprzycki, M., (eds.) Proceedings of the 2015 Federated Conference on Computer Science and Information Systems (2015, in print)
Zdravevski, E., Lameski, P., Kulakov, A., Gjorgjevikj, D.: Feature selection and allocation to diverse subsets for multi-label learning problems with large datasets. In: 2014 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 387–394, September 2014
Jolliffe, I.: Principal component analysis. In: Balakrishnan, N., Colton, T., Everitt, B., Piegorsch, W., Ruggeri, F., Teugels, J.L. (eds.) Wiley StatsRef: Statistics Reference Online. Wiley, Chichester (2014)
Author information
Authors and Affiliations
Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, Skopje, Macedonia
Petre Lameski, Eftim Zdravevski & Andrea Kulakov
NI TEKNA - Intelligent Technologies, Negotino, Macedonia
Riste Mingov
- Petre Lameski
You can also search for this author inPubMed Google Scholar
- Eftim Zdravevski
You can also search for this author inPubMed Google Scholar
- Riste Mingov
You can also search for this author inPubMed Google Scholar
- Andrea Kulakov
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toEftim Zdravevski.
Editor information
Editors and Affiliations
University of Regina, Regina, SK, Canada
Yiyu Yao
Tianjin University, Tianjin, China
Qinghua Hu
Chongqing University of Posts and Telecommunications, Chongqing, China
Hong Yu
University of Kansas, Lawrence, KS, USA
Jerzy W. Grzymala-Busse
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lameski, P., Zdravevski, E., Mingov, R., Kulakov, A. (2015). SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-fitting. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J.W. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Lecture Notes in Computer Science(), vol 9437. Springer, Cham. https://doi.org/10.1007/978-3-319-25783-9_41
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-319-25782-2
Online ISBN:978-3-319-25783-9
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative