753Accesses
27Citations
Abstract
The article examined the deep learning models and Fastai text classification technique to predict the medical speech utterances, transcriptions, and intent to extract the 25 medicals problems. The experimental work was conducted using a large amount of data which contains 6661.wav files and one.csv file, including 13 distinct categorization fields of medical speech utterances. Each illness's exploratory data analysis demonstrated the phrase length classes and disease categorization based on the recorded speech sound of patients for each disease. The preprocessing of the task included the wordcloud consisting of all the vocabulary words having different sizes based on the number of speech utterances in each category, eliminating Nan values, verifying for duplicates, and computing the corpus and their term index. Further, features are extracted to determine the number of words in each category, the length of phrases, and the number of words in each phrase, followed by lemmatization and tokenization. Deep learning models such as GRU (Gated Recurrent Unit), LSTM (Long Short Term Memory), bidirectional gated recurrent unit, bidirectional long short-term memory, and Fastai classifier have been used to exact category of disease from the medical speech utterances and their textual phrases. After the assessment, it was discovered that Fastai earned the most incredible precision, recall, accuracy, and lowest loss rate by 96.89%, 95.8%, 93.32%, and 0.169, respectively. In comparison, bidirectional LSTM had achieved the highest F1 score by 95.69% to predict the medical speech utterances for each category.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Not applicable.
References
Abdelgwad MM, Soliman THA, Taloba AI, Farghaly MF (2021) Arabic aspect based sentiment analysis using bidirectional GRU based models. J King Saud Univ–comput Inf Sci.https://doi.org/10.1016/j.jksuci.2021.08.030
Akinloye FO, Obe O, Boyinbode O (2020) Development of an affective-based e-healthcare system for autistic children. Sci African 9:e00514.https://doi.org/10.1016/j.sciaf.2020.e00514
Al-Hassan A, Al-Dossari H (2021) Detection of hate speech in Arabic tweets using deep learning. Multimedia Syst.https://doi.org/10.1007/s00530-020-00742-w
Alhussein M, Muhammad G (2018) Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 6:41034–41041.https://doi.org/10.1109/ACCESS.2018.2856238
Blackley SV, Huynh J, Wang L, Korach Z, Zhou L (2019) Speech recognition for clinical documentation from 1990 to 2018: a systematic review. J Am Med Inform Assoc 26(4):324–338.https://doi.org/10.1093/jamia/ocy179
Dey R, Sale F (2017) Gate variants of Gated Recurrent Unit (GRU) neural networks. In: 60th International Midwest Symposium on Circuits and Systems, pp 1597–1600
Graves, A., Jaitly, N., Mohamed, A. (2013) Hybrid Speech Recognition with Deep Bidirectional LSTM. In: IEEE workshop on Automatic Speech Recognition and Understanding, pp 273–278
Ismail A, Abdlerazek S, El-Henawy IM (2020) Development of smart healthcare system based on speech recognition using support vector machine and dynamic time warping. Sustain (switz).https://doi.org/10.3390/su12062403
Jayashankar S, Sridaran R (2017) Superlative model using wordcloud for short answers evaluation in eLearning. Educ Inf Technol 22:2383–2402.https://doi.org/10.1007/s10639-016-9547-0
Johnson M, Lapkin S, Long V, Sanchez P, Suominen H, Basilakis J, Dawson L (2014) A systematic review of speech recognition technology in health care. BMC Med Inform Decis Mak.https://doi.org/10.1186/1472-6947-14-94
Krishnan PT, Joseph Raj AN, Rajangam V (2021) Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell Syst 7:1919–1934.https://doi.org/10.1007/s40747-021-00295-z
Kumah-Crystal YA, Pirtle CJ, Whyte HM, Goode ES, Anders SH, Lehmann CU (2018) Electronic health record interactions through voice: a review. Appl Clin Inform 9(3):541–552.https://doi.org/10.1055/s-0038-1666844
Kumar Y, Singh N, Kumar M, Singh A (2021) AutoSSR: an efficient approach for automatic spontaneous speech recognition model for the Punjabi language. Soft Comput 25(2):1617–1630.https://doi.org/10.1007/s00500-020-05248-1
Lam HY, Tang YM, Tang V, Wu CH (2020) An intelligent m-healthcare system for improving the service quality in domestic care industry. IFAC-PapersOnLine 53(2):17439–17444.https://doi.org/10.1016/j.ifacol.2020.12.2113
Latif S, Qadir J, Qayyum A, Usama M, Younis S (2021) Speech technology for healthcare opportunities challenges, and state of the art. IEEE Rev Biomed Eng 14:342–356.https://doi.org/10.1109/RBME.2020.3006860
Lazzarini V (2019) Soundfiles. In: Computer music instruments II. Springer, Cham.https://doi.org/10.1007/978-3-030-13712-0_10
Louinci K, Meziani K, Riu B (2021) Muddling label regularization deep learning for tabular datasets. arXiv, pp 1–36
Lu L, Sheng J, Liu Z, Gao JH (2021) Neural representations of imagined speech revealed by frequency-tagged magnetoencephalography responses. Neuroimage 229:117724.https://doi.org/10.1016/j.neuroimage.2021.117724
Luchies E, Spruit M, Askari M (2018) Speech technology in Dutch health care: A qualitative study. In: HEALTHINF 2018–11th international conference on health informatics, proceedings; part of 11th international joint conference on biomedical engineering systems and technologies, BIOSTEC, vol 5, pp 339–348.https://doi.org/10.5220/0006550103390348
Mehta RP, Sanghvi MA, Shah DK, Singh A (2020) Sentiment analysis of tweets using supervised learning algorithms. In: Luhach A, Kosa J, Poonia R, Gao XZ, Singh D (eds) First international conference on sustainable technologies for computational intelligence advances in intelligent systems and computing. Springer, Singapore.https://doi.org/10.1007/978-981-15-0029-9_26
Mohamed J, Zweig G, Gong Y (2015) LSTM time and frequency recurrence for automatic speech recognition. IEEE Workshop Autom Speech Recognit Underst (ASRU).https://doi.org/10.1109/ASRU.2015.7404793
Mohammed MA, Abdulkareem KH, Mostafa SA, Ghani MKA, Maashi MS, Garcia-Zapirain B, Oleagordia I, Alhakami H, Al-Dhief FT (2020) Voice pathology detection and classification using convolutional neural network model. Appl Sci (switz) 10(11):1–13.https://doi.org/10.3390/app10113723
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks a systematic review. IEEE Access 7:19143–19165.https://doi.org/10.1109/ACCESS.2019.2896880
Noort MC, Reader TW, Gillespie A (2021) The sounds of safety silence: interventions and temporal patterns unmute unique safety voice content in speech. Saf Sci 140:105289.https://doi.org/10.1016/j.ssci.2021.105289
Patil S, Agashe S (2021) Comparison of neural network architectures for speech emotion recognition. In: Biswas A, Wennekes E, Hong TP, Wieczorkowska A (eds) Advances in speech and music technology. advances in intelligent systems and computing. Springer, Singapore.https://doi.org/10.1007/978-981-33-6881-1_25
Paulett JM, Langlotz CP (2009) Improving language models for radiology speech recognition. J Biomed Inform 42(1):53–58.https://doi.org/10.1016/j.jbi.2008.08.001
Poder TG, Fisette JF, Déry V (2018) Speech recognition for medical dictation: overview in quebec and systematic review. J Med Syst.https://doi.org/10.1007/s10916-018-0947-0
Ramasubramanian K, Singh A (2019) Deep learning using keras and tensorflow. In: Machine learning using R. Apress, Berkeley.https://doi.org/10.1007/978-1-4842-4215-5_11
Santosh KC (2019) Speech processing in healthcare can we integrate. In: Intelligent speech signal processing. Elsevier.https://doi.org/10.1016/B978-0-12-818130-0.00001-5
Shukla S, Jain M (2021) A novel stochastic deep resilient network for effective speech recognition. Int J Speech Technol 24:797–806.https://doi.org/10.1007/s10772-021-09851-x
Sonal J, Dodiya T (2016) Speech recognition system for medical domain pdf. Int J Comput Sci Inf Technol 7(1):185–189
Suominen H, Zhou L, Goeuriot L, Kelly L (2016) Task 1 of the CLEF ehealth evaluation lab 2016 handover information extraction. CEUR Workshop Proceed 1609:1–14
Takao T, Masumura R, Sakauchi S, Ohara Y, Bilgic E, Umegaki E, Kutsumi H, Azuma T, Medicine A, Takao T (2018) New report preparation system for endoscopic procedures using speech recognition technology, pp 6–8. 10–1055-a-0579–6494.
Uddin MZ, Nilsson EG (2020) Emotion recognition using speech and neural structured learning to facilitate edge intelligence. Eng Appl Artif Intell 94:103775.https://doi.org/10.1016/j.engappai.2020.103775
van Lente H, Boon WPC, Klerkx L (2020) Positioning of systemic intermediaries in sustainability transitions between storylines and speech acts. Environ Innov Soc Trans 36:485–497.https://doi.org/10.1016/j.eist.2020.02.006
Vij A, Pruthi J (2018) An automated psychometric analyzer based on sentiment analysis and emotion recognition for healthcare. Proced Comput Sci 132:1184–1191.https://doi.org/10.1016/j.procs.2018.05.033
Zhang F, Underwood G, McGuire K, Liang C, Moore DR, Fu QJ (2019) Frequency change detection and speech perception in cochlear implant users. Hear Res 379:12–20.https://doi.org/10.1016/j.heares.2019.04.007
Zisad SN, Hossain MS, Andersson K (2020) Speech emotion recognition in neurological disorders using convolutional neural network. In: Mahmud M, Vassanelli S, Kaiser MS, Zhong N (eds) Brain informatics bi 2020 lecture notes in computer science. Springer, Cham.https://doi.org/10.1007/978-3-030-59277-6_26
Funding
Not applicable.
Author information
Authors and Affiliations
Department of Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat, India
Yogesh Kumar
Department of Computer Science and Engineering, Punjabi University, Patiala, India
Apeksha Koul
Department of Computer Engineering, Indus Institute of Technology & Engineering, Indus University, Rancharda, Shilaj, Ahmedabad, 382115, Gujarat, India
Seema Mahajan
- Yogesh Kumar
You can also search for this author inPubMed Google Scholar
- Apeksha Koul
You can also search for this author inPubMed Google Scholar
- Seema Mahajan
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toYogesh Kumar.
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kumar, Y., Koul, A. & Mahajan, S. A deep learning approaches and fastai text classification to predict 25 medical diseases from medical speech utterances, transcription and intent.Soft Comput26, 8253–8272 (2022). https://doi.org/10.1007/s00500-022-07261-y
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative