Part of the book series:Lecture Notes in Computer Science ((LNTCS,volume 10196))
Included in the following conference series:
1484Accesses
Abstract
Feature construction is a pre-processing technique to create new features with better discriminating ability from the original features. Genetic programming (GP) has been shown to be a prominent technique for this task. However, applying GP to high-dimensional data is still challenging due to the large search space. Feature clustering groups similar features into clusters, which can be used for dimensionality reduction by choosing representative features from each cluster to form the feature subset. Feature clustering has been shown promising in feature selection; but has not been investigated in feature construction for classification. This paper presents the first work of utilising feature clustering in this area. We propose a cluster-based GP feature construction method called CGPFC which uses feature clustering to improve the performance of GP for feature construction on high-dimensional data. Results on eight high-dimensional datasets with varying difficulties show that the CGPFC constructed features perform better than the original full feature set and features constructed by the standard GP constructor based on the whole feature set.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhang, J., Wang, S., Chen, L., Gallinari, P.: Multiple Bayesian discriminant functions for high-dimensional massive data classification. Data Min. Knowl. Discov.31, 465–501 (2017)
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers, Norwell (1998)
Krawiec, K.: Evolutionary feature selection and construction. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 353–357. Springer, Heidelberg (2010)
Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput.16, 645–661 (2012)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Hiroyasu, T., Shiraishi, T., Yoshida, T., Yamamoto, U.: A feature transformation method using multiobjective genetic programming for two-class classification. In: IEEE Congress on Evolutionary Computation (CEC), pp. 2989–2995 (2015)
Ahmed, S., Zhang, M., Peng, L., Xue, B.: Multiple feature construction for effective biomarker identification and classification using genetic programming. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 249–256. ACM (2014)
Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Comput.8, 3–15 (2015)
Tran, B., Xue, B., Zhang, M.: Multiple feature construction in high-dimensional data using genetic programming. In: IEEE Symposium Series on Computational Intelligence (SSCI) (2016)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res.3, 1157–1182 (2003)
Butterworth, R., Piatetsky-Shapiro, G., Simovici, D.A.: On feature selection through clustering. In: ICDM, vol. 5, pp. 581–584 (2005)
Gupta, A., Gupta, A., Sharma, K.: Clustering based feature selection methods from fMRI data for classification of cognitive states of the human brain. In: 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 3581–3584. IEEE (2016)
Jaskowiak, P.A., Campello, R.J.: A cluster based hybrid feature selection approach. In: Brazilian Conference on Intelligent Systems (BRACIS), pp. 43–48. IEEE (2015)
Krier, C., François, D., Rossi, F., Verleysen, M.: Feature clustering and mutual information for the selection of variables in spectral data. In: European Symposium on Artificial Neural Networks (ESANN), Le Chesnay Cedex, France, pp. 157–162 (2007)
Rostami, M., Moradi, P.: A clustering based genetic algorithm for feature selection. In: Conference on Information and Knowledge Technology, pp. 112–116 (2014)
Ahmed, S., Zhang, M., Peng, L.: Feature selection and classification of high dimensional mass spectrometry data: a genetic programming approach. In: Vanneschi, L., Bush, W.S., Giacobini, M. (eds.) EvoBIO 2013. LNCS, vol. 7833, pp. 43–55. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37189-9_5
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput.20, 606–626 (2016)
Nag, K., Pal, N.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern.46, 499–510 (2016)
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci.2, 165–193 (2015)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw.16, 645–678 (2005)
Lane, M.C., Xue, B., Liu, I., Zhang, M.: Gaussian based particle swarm optimisation and statistical clustering for feature selection. In: Blum, C., Ochoa, G. (eds.) EvoCOP 2014. LNCS, vol. 8600, pp. 133–144. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44320-0_12
Nguyen, H.B., Xue, B., Liu, I., Zhang, M.: PSO and statistical clustering for feature selection: a new representation. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 569–581. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13563-2_48
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng.25, 1–14 (2013)
Hsu, H.H., Hsieh, C.W.: Feature selection via correlation coefficient clustering. J. Softw.5, 1371–1377 (2010)
Xu, R.F., Lee, S.J.: Dimensionality reduction by feature clustering for regression problems. Inf. Sci.299, 42–57 (2015)
Press, W.H., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C, vol. 1, p. 3. Cambridge University Press, Cambridge (1988)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., Burlington (1993)
Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press, Boca Raton (2007)
Pledger, S., Arnold, R.: Multivariate methods using mixtures: correspondence analysis, scaling and pattern-detection. Comput. Stat. Data Anal.71, 241–261 (2014)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Thirteenth International Joint Conference on Artificial Intelligence, vol. 2, pp. 1022–1027. Morgan Kaufmann Publishers (1993)
Patterson, G., Zhang, M.: Fitness functions in genetic programming for classification with unbalanced data. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 769–775. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76928-6_90
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol.3, 185–205 (2005)
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull.1, 80–83 (1945)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.20, 53–65 (1987)
Author information
Authors and Affiliations
School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington, 6140, New Zealand
Binh Tran, Bing Xue & Mengjie Zhang
- Binh Tran
You can also search for this author inPubMed Google Scholar
- Bing Xue
You can also search for this author inPubMed Google Scholar
- Mengjie Zhang
You can also search for this author inPubMed Google Scholar
Corresponding authors
Editor information
Editors and Affiliations
University College Dublin , Dublin, Ireland
James McDermott
Universidade Nova de Lisboa , Lisbon, Portugal
Mauro Castelli
Brno University of Technology , Brno, Czech Republic
Lukas Sekanina
Vrije Universiteit Amsterdam , Amsterdam, The Netherlands
Evert Haasdijk
University of Cádiz , Cádiz, Spain
Pablo García-Sánchez
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tran, B., Xue, B., Zhang, M. (2017). Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2017. Lecture Notes in Computer Science(), vol 10196. Springer, Cham. https://doi.org/10.1007/978-3-319-55696-3_14
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-319-55695-6
Online ISBN:978-3-319-55696-3
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative