- Naila Aslam1,
- Ohoud Alzamzami2,
- Kewen Xia1,
- Saima Sadiq3,
- Muhammad Umer4,
- Carmen Bisogni ORCID:orcid.org/0000-0003-1358-006X5 &
- …
- Imran Ashraf6
375Accesses
3Citations
This article has beenupdated
Abstract
Online reviews play an integral part in making mobile applications stand out from the large number of applications available on the Google Play store. Predominantly, users consider posted reviews for appropriate app selection. Manual categorization of such reviews is both inefficient and time-consuming. Therefore, automatic analysis of the sentiments of such reviews provides fast suggestions for new users and facilitates their selection of the appropriate app. However, data imbalance is a major challenge for performing class prediction of such reviews as their distribution is sparse and often leads to low accuracy. This work proposes a framework to overcome this limitation. Extensive experiments are performed using the original and balanced data with the synthetic minority oversampling technique (SMOTE) and adaptive synthetic sampling (ADASYN). Additionally, deep learning and machine learning models are evaluated using FastText, FastText Subword, global vector (GloVe), and their combinations for word representation. Baseline machine learning models, including random forest, extra tree classifier, gradient boosting, Naive Bayes, logistic regression (LR), stochastic gradient descent (SGD), and voting classifier (VC) that combines LR and SGD, are used for comparison. The outcomes show that the convolutional neural network using a combination of word embedding techniques produces the most accurate results.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Change history
15 February 2023
Correct Email address of author Kewen Xia has been updated in original version
References
Aditsania A, Saonard AL, et al (2017) Handling imbalanced data in churn prediction using adasyn and backpropagation algorithm. In: 2017 3rd International Conference on science in information technology (ICSITech), IEEE, pp 533–536
Aggarwal CC (2018) Opinion mining and sentiment analysis. In: Machine learning for text. Springer, Cham, pp 413–434
Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 International Conference on engineering and technology (ICET), Ieee, pp 1–6
Araque O, Corcuera-Platas I, Sánchez-Rada JF et al (2017) Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Syst Appl 77:236–246
Balogun AO, Basri S, Said JA et al (2019) Software defect prediction: analysis of class imbalance and performance stability. J Eng Sci Technol 14(6):3294–3308
Banerjee I, Ling Y, Chen MC et al (2019) Comparative effectiveness of convolutional neural network (cnn) and recurrent neural network (rnn) architectures for radiology text report classification. Artif Intell Med 97:79–88
Bar Y, Diamant I, Wolf L et al (2015) Chest pathology detection using deep learning with non-medical training. In: Proceedings–International Symposium on biomedical imaging, 2015, pp 294–297
Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade. Springer, Berlin, Heidelberg, pp 421–436
Castiglione A, Vijayakumar P, Nappi M et al (2021) Covid-19: Automatic detection of the novel coronavirus disease from ct images using an optimized convolutional neural network. IEEE Trans Ind Inform 17(9):6480–6488
Chakraborty K, Bhatia S, Bhattacharyya S et al (2020) Sentiment analysis of covid-19 tweets by deep learning classifiers-a study to show how popularity is affecting accuracy in social media. Appl Soft Comput 97(106):754
Chambua J, Niu Z, Yousif A et al (2018) Tensor factorization method based on review text semantic similarity for rating prediction. Expert Syst Appl 114:629–638
Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Ciurumelea A, Schaufelbühl A, Panichella S et al (2017) Analyzing reviews and code of mobile apps for better release planning. In: 2017 IEEE 24th International Conference on software analysis. evolution and reengineering (SANER), IEEE, pp 91–102
Dai L, Sheng B, Wu Q, et al (2017) Retinal microaneurysm detection using clinical report guided multi-sieving cnn. In: International Conference on medical image computing and computer-assisted intervention, vol 10435. Springer, Cham, pp 525–532
Désir C, Petitjean C, Heutte L et al (2012) Classification of endomicroscopic images of the lung based on random subwindows and extra-trees. IEEE Trans Biomed Eng 59(9):2677–2683
Dessi D, Helaoui R, Kumar V, et al (2021) Tf-idf vs word embeddings for morbidity identification in clinical notes: an initial study. arXiv preprintarXiv:2105.09632
Elmurngi E, Gherbi A (2018) Fake reviews detection on movie reviews through sentiment analysis using supervised learning techniques. Int J Adv Syst Meas 11(1 & 2):196–207
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 19:1189–1232
Garcia LP, Duarte E (2020) Infodemic: excess quantity to the detriment of quality of information about COVID-19. Epidemiol Serv Saude 29(4):e2020186.https://doi.org/10.1590/S1679-49742020000400019
González-Barcenas V, Rendón E, Alejo R, et al (2019) Addressing the big data multi-class imbalance problem with oversampling and deep learning neural networks. In: Iberian Conference on pattern recognition and image analysis, vol 11867. Springer, Cham, pp 216–224
Hailong Z, Wenyan G, Bo J (2014) Machine learning and lexicon based methods for sentiment classification: a survey. In: 2014 11th Web Information System and Application Conference, IEEE, pp 262–265
He H, Bai Y, Garcia EA, et al (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on neural networks (IEEE world congress on computational intelligence), IEEE, pp 1322–1328
He H, Zhang W, Zhang S (2018) A novel ensemble method for credit scoring: adaption of different imbalance ratios. Expert Syst Appl 98:105–117
Ishaq A, Umer M, Mushtaq MF et al (2021) Extensive hotel reviews classification using long short term memory. J Ambient Intell Humaniz Comput 12(10):9375–9385
Joulin A, Grave E, Bojanowski P, et al (2016) Fasttext. zip: Compressing text classification models. arXiv preprintarXiv:1612.03651
Kaur A, Kaur K (2018) Systematic literature review of mobile application development and testing effort estimation. J King Saud Univ-Comput Inform Sci, pp 452–455
Korkmaz M, Güney S, Yiğiter Ş (2012) The importance of logistic regression implementations in the turkish livestock sector and logistic regression implementations/fields. Harran Tarım ve Gıda Bilimleri Dergisi 16(2):25–36
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Neural Inform Process Syst 25:84–90
Kumar V, Recupero DR, Riboni D et al (2020) Ensembling classical machine learning and deep learning approaches for morbidity identification from clinical notes. IEEE Access 9:7107–7126
Kunaefi A, Aritsugi M (2021) Extracting arguments based on user decisions in app reviews. IEEE Access 9:45,078-45,094
Leung KM (2007) Naive Bayesian classifier. Polytechnic University Department of Computer Science/Finance and Risk Engineering, pp 123–156
Liu B et al (2010) Sentiment analysis and subjectivity. Handb Nat Lang Process 2(2010):627–666
Luca M (2016) Reviews, reputation, and revenue: the case of yelp. com. Com (March 15, 2016) Harvard Business School NOM Unit Working Paper (12-016)
Lx Luo (2019) Network text sentiment analysis method combining lda text representation and gru-cnn. Pers Ubiquit Comput 23(3):405–412
Luo Y, Xu X (2019) Predicting the helpfulness of online restaurant reviews using different machine learning algorithms: A case study of yelp. Sustainability 11(19):5254
Maalej W, Kurtanović Z, Nabil H et al (2016) On the automatic classification of app reviews. Requirements Eng 21(3):311–331
Monett D, Stolte H (2016) Predicting star ratings based on annotated reviews of mobile apps. In: 2016 Federated Conference on Computer Science and Information Systems (FedCSIS). Gdansk, Poland, pp 421–428
Ning X, Yac L, Wang X et al (2020) Rating prediction via generative convolutional neural networks based regression. Pattern Recogn Lett 132:12–20
Panichella S, Di Sorbo A, Guzman E, et al (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE International Conference on software maintenance and evolution (ICSME), IEEE, pp 281–290
Park H, Kj Kim (2020) Impact of word embedding methods on performance of sentiment analysis with machine learning techniques. J Korea Soc Comput Inform 25(8):181–188
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Pereira S, Pinto A, Alves V et al (2016) Brain tumor segmentation using convolutional neural networks in mri images. IEEE Trans Med Imaging 35:1–1
Qaiser S, Ali R (2018) Text mining: use of tf-idf to examine the relevance of words to documents. Int J Comput Appl 181(1):25–29
Sadiq S, Mehmood A, Ullah S et al (2021a) Aggression detection through deep neural model on twitter. Futur Gener Comput Syst 114:120–129
Sadiq S, Umer M, Ullah S et al (2021b) Discrepancy detection between actual user reviews and numeric ratings of google app store using deep learning. Expert Syst Appl 181(115):111
Song S, Huang H, Ruan T (2019) Abstractive text summarization using lstm-cnn based deep learning. Multimed Tools Appl 78(1):857–875
Spelmen VS, Porkodi R (2018) A review on handling imbalanced data. In: 2018 International Conference on current trends towards converging technologies (ICCTCT), IEEE, pp 1–11
Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and qsar modeling. J Chem Inf Comput Sci 43(6):1947–1958
Tian Y, Nagappan M, Lo D, et al (2015) What are the characteristics of high-rated apps? a case study on free android applications. In: 2015 IEEE International Conference on software maintenance and evolution (ICSME), IEEE, pp 301–310
Tsai CF, Lin WC, Hu YH et al (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54
Umer M (2021) Mumersabir/cais. GitHubhttps://github.com/MUmerSabir/CAIS. Accessed 02 Jan 2022
Umer M, Ashraf I, Mehmood A et al (2021) Predicting numeric ratings for google apps using text features and ensemble learning. ETRI J 43(1):95–108
Villarroel L, Bavota G, Russo B, et al (2016) Release planning of mobile apps based on user reviews. In: 2016 IEEE/ACM 38th International Conference on software engineering (ICSE), IEEE, pp 14–24
Xiao Z, Xu X, Xing H et al (2021a) Rtfn: a robust temporal feature network for time series classification. Inf Sci 571:65–86
Xiao Z, Xu X, Xing H, et al (2021b) Rnts: Robust neural temporal search for time series classification. In: 2021 International Joint Conference on neural networks (IJCNN), IEEE, pp 1–8
Xiao Z, Xu X, Xing H et al (2021) A federated learning system with enhanced feature extraction for human activity recognition. Knowl-Based Syst 229(107):338
Yousaf A, Umer M, Sadiq S et al (2020) Emotion recognition by textual tweets classification using voting classifier (lr-sgd). IEEE Access 9:6289–6295
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. U1813222, No. 42075129), Hebei Province Natural Science Foundation (No. E2021202179), Key Research and Development Project from Hebei Province (No. 19210404D, No. 20351802D, No.21351803D).
Author information
Authors and Affiliations
School of Electronics and Information Engineering, Hebei University of Technology, Tianjin, China
Naila Aslam & Kewen Xia
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Ohoud Alzamzami
Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
Saima Sadiq
Department of Computer Science Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
Muhammad Umer
Department of Computer Science, University of Salerno, Fisciano, Italy
Carmen Bisogni
Information and Communication Engineering, Yeungnam University, Gyeongsan, Republic of Korea
Imran Ashraf
- Naila Aslam
You can also search for this author inPubMed Google Scholar
- Ohoud Alzamzami
You can also search for this author inPubMed Google Scholar
- Kewen Xia
You can also search for this author inPubMed Google Scholar
- Saima Sadiq
You can also search for this author inPubMed Google Scholar
- Muhammad Umer
You can also search for this author inPubMed Google Scholar
- Carmen Bisogni
You can also search for this author inPubMed Google Scholar
- Imran Ashraf
You can also search for this author inPubMed Google Scholar
Corresponding authors
Correspondence toKewen Xia orCarmen Bisogni.
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aslam, N., Alzamzami, O., Xia, K.et al. Improving the review classification of Google apps using combined feature embedding and deep convolutional neural network model.J Ambient Intell Human Comput14, 4257–4272 (2023). https://doi.org/10.1007/s12652-023-04529-5
Received:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative