{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T00:18:14Z","timestamp":1773965894350,"version":"3.50.1"},"reference-count":42,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2024,4,30]],"date-time":"2024-04-30T00:00:00Z","timestamp":1714435200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Early detection of infant pathologies by non-invasive means is a critical aspect of pediatric healthcare. Audio analysis of infant crying has emerged as a promising method to identify various health conditions without direct medical intervention. In this study, we present a cutting-edge machine learning model that employs audio spectrograms and transformer-based algorithms to classify infant crying into distinct pathological categories. Our innovative model bypasses the extensive preprocessing typically associated with audio data by exploiting the self-attention mechanisms of the transformer, thereby preserving the integrity of the audio\u2019s diagnostic features. When benchmarked against established machine learning and deep learning models, our approach demonstrated a remarkable 98.69% accuracy, 98.73% precision, 98.71% recall, and an F1 score of 98.71%, surpassing the performance of both traditional machine learning and convolutional neural network models. This research not only provides a novel diagnostic tool that is scalable and efficient but also opens avenues for improving pediatric care through early and accurate detection of pathologies.<\/jats:p>","DOI":"10.3390\/info15050253","type":"journal-article","created":{"date-parts":[[2024,4,30]],"date-time":"2024-04-30T08:14:31Z","timestamp":1714464871000},"page":"253","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-6634-157X","authenticated-orcid":false,"given":"Mohammad","family":"Tami","sequence":"first","affiliation":[{"name":"Department of Natural, Engineering and Technology Sciences, Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-1592-2540","authenticated-orcid":false,"given":"Sari","family":"Masri","sequence":"additional","affiliation":[{"name":"Department of Natural, Engineering and Technology Sciences, Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ahmad","family":"Hasasneh","sequence":"additional","affiliation":[{"name":"Department of Natural, Engineering and Technology Sciences, Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3322-4882","authenticated-orcid":false,"given":"Chakib","family":"Tadj","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, \u00c9cole de Technologie Sup\u00e9rieur, Universit\u00e9 du Qu\u00e9bec, Montr\u00e9al, QC H3C 1K3, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,4,30]]},"reference":[{"key":"ref_1","unstructured":"World Health Organization (2024, January 02). Newborn Mortality. Available online: https:\/\/www.who.int\/news-room\/fact-sheets\/detail\/newborns-reducing-mortality."},{"key":"ref_2","unstructured":"National Heart, Lung, and Blood Institute (NHLBI) (2024, January 02). Respiratory Distress Syndrome (RDS), Available online: https:\/\/www.nhlbi.nih.gov\/health-topics\/respiratory-distress-syndrome."},{"key":"ref_3","unstructured":"World Health Organization (2024, January 02). Sepsis. Available online: https:\/\/www.who.int\/news-room\/fact-sheets\/detail\/sepsis."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"101986","DOI":"10.1016\/j.pupt.2020.101986","article-title":"Aerosolized Beractant in neonatal respiratory distress syndrome: A randomized fixed-dose parallel-arm phase II trial","volume":"66","author":"Sood","year":"2021","journal-title":"Pulm. Pharmacol. Ther."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"170","DOI":"10.5152\/TurkPediatriArs.2015.2627","article-title":"Factors which affect mortality in neonatal sepsis","volume":"50","author":"Turhan","year":"2015","journal-title":"T\u00fcrk. Pediatri. Ar\u015fivi"},{"key":"ref_6","unstructured":"(2024, January 02). Mayo Clinic. Available online: https:\/\/www.mayoclinic.org\/diseases-conditions\/ards\/diagnosis-treatment\/drc-20355581."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"179","DOI":"10.4161\/viru.27045","article-title":"Pediatric sepsis: Important considerations for diagnosing and managing severe infections in infants, children, and adolescents","volume":"5","author":"Randolph","year":"2014","journal-title":"Virulence"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Khalilzad, Z., Hasasneh, A., and Tadj, C. (2022). Newborn Cry-Based Diagnostic System to Distinguish between Sepsis and Respiratory Distress Syndrome Using Combined Acoustic Features. Diagnostics, 12.","DOI":"10.3390\/diagnostics12112802"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1994","DOI":"10.1016\/j.cub.2009.09.064","article-title":"Newborns\u2019 Cry Melody Is Shaped by Their Native Language","volume":"19","author":"Mampe","year":"2009","journal-title":"Curr. Biol."},{"key":"ref_10","unstructured":"(2024, January 02). The Cry of The Human Infant on JSTOR. Available online: https:\/\/www.jstor.org\/stable\/24950031."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Osmani, A., Hamidi, M., and Chibani, A. (2017, January 6\u20138). Machine Learning Approach for Infant Cry Interpretation. Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA.","DOI":"10.1109\/ICTAI.2017.00038"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wu, K., Zhang, C., Wu, X., Wu, D., and Niu, X. (2019, January 6\u20138). Research on Acoustic Feature Extraction of Crying for Early Screening of Children with Autism. Proceedings of the 2019 34rd Youth Academic Annual Conference of Chinese Association of Automation (YAC), Jinzhou, China.","DOI":"10.1109\/YAC.2019.8787725"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1016\/j.cmpb.2011.07.010","article-title":"Normal and hypoacoustic infant cry signal classification using time\u2013frequency analysis and general regression neural network","volume":"108","author":"Hariharan","year":"2012","journal-title":"Comput. Methods Programs Biomed."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Orlandi, S., Manfredi, C., Bocchi, L., and Scattoni, M.L. (September, January 28). Automatic newborn cry analysis: A Non-invasive tool to help autism early diagnosis. Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA.","DOI":"10.1109\/EMBC.2012.6346583"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zayed, Y., Hasasneh, A., and Tadj, C. (2023). Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features. Diagnostics, 13.","DOI":"10.3390\/diagnostics13122107"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1186\/s13636-021-00197-5","article-title":"A review of infant cry analysis and classification","volume":"2021","author":"Ji","year":"2021","journal-title":"EURASIP J. Audio Speech Music Process."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1007\/s11517-008-0334-y","article-title":"Classification of cries of infants with cleft-palate using parallel hidden Markov models","volume":"46","author":"Lederman","year":"2008","journal-title":"Med. Biol. Eng. Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"819865","DOI":"10.3389\/fpubh.2022.819865","article-title":"A Multistage Heterogeneous Stacking Ensemble Model for Augmented Infant Cry Classification","volume":"10","author":"Joshi","year":"2022","journal-title":"Front. Public Health"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Patil, A.T., Kachhi, A., and Patil, H.A. (September, January 29). Subband Teager Energy Representations for Infant Cry Analysis and Classification. Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia.","DOI":"10.23919\/EUSIPCO55093.2022.9909974"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, L., Li, Y., and Kuo, K. (2018, January 23\u201325). Infant Cry Signal Detection, Pattern Extraction and Recognition. Proceedings of the 2018 International Conference on Information and Computer Technologies (ICICT), DeKalb, IL, USA.","DOI":"10.1109\/INFOCT.2018.8356861"},{"key":"ref_21","unstructured":"Cohen, R., Ruinskiy, D., Zickfeld, J., IJzerman, H., and Lavner, Y. (2020). Development and Analysis of Deep Learning Architectures, Springer."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"656","DOI":"10.1016\/j.jvoice.2015.08.007","article-title":"Application of Pattern Recognition Techniques to the Classification of Full-Term and Preterm Infant Cry","volume":"30","author":"Orlandi","year":"2016","journal-title":"J. Voice"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Chang, C.-Y., and Li, J.-J. (2016, January 27\u201329). Application of Deep Learning for Recognizing Infant Cries. Proceedings of the 2016 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Nantou, Taiwan.","DOI":"10.1109\/ICCE-TW.2016.7520947"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"433","DOI":"10.18280\/isi.270309","article-title":"The Study of Learning System for Infant Cry Classification Using Discrete Wavelet Transform and Extreme Machine Learning","volume":"27","author":"Chaiwachiragompol","year":"2022","journal-title":"Ing\u00e9nierie Des. Syst\u00e8mes D Inf."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"670352","DOI":"10.3389\/fpubh.2021.670352","article-title":"Deep Learning Assisted Neonatal Cry Classification via Support Vector Machine Models","volume":"9","author":"Vincent","year":"2021","journal-title":"Front. Public Health"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Felipe, G.Z., Aguiar, R.L., Costa, Y.M.G., Silla, C.N., Brahnam, S., Nanni, L., and McMurtrey, S. (2019, January 5\u20137). Identification of Infants\u2019 Cry Motivation Using Spectrograms. Proceedings of the 2019 International Conference on Systems, Signals and Image Processing (IWSSIP), Osijek, Croatia.","DOI":"10.1109\/IWSSIP.2019.8787318"},{"key":"ref_27","unstructured":"Ji, C., Basodi, S., Xiao, X., and Pan, Y. (2020). International Conference on AI and Mobile Services, Springer International Publishing."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"118064","DOI":"10.1016\/j.eswa.2022.118064","article-title":"Classification of Asphyxia Infant Cry Using Hybrid Speech Features and Deep Learning Models","volume":"208","author":"Ting","year":"2022","journal-title":"Expert. Syst. Appl."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"111700","DOI":"10.1016\/j.chaos.2021.111700","article-title":"Deep learning systems for automatic diagnosis of infant cry signals","volume":"154","author":"Lahmiri","year":"2022","journal-title":"Chaos Solitons Fractals"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, Y., Tagliasacchi, M., Rybakov, O., Ungureanu, V., and Roblek, D. (2021, January 6\u201311). Real-Time Speech Frequency Bandwidth Extension. Proceedings of the ICASSP 2021\u20132021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413439"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1109\/MCOM.2006.1637953","article-title":"Challenges of 16 kHz in acoustic pre- and post-processing for terminals","volume":"44","author":"Beaugeant","year":"2006","journal-title":"IEEE Commun. Mag."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1109\/TMM.2005.861292","article-title":"Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification","volume":"8","author":"Lie","year":"2006","journal-title":"IEEE Trans. Multimed."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lu, L., Liu, C., Li, J., and Gong, Y. (2020). Exploring Transformers for Large-Scale Speech Recognition. arXiv.","DOI":"10.21437\/Interspeech.2020-2638"},{"key":"ref_34","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_35","unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020). Language Models are Few-Shot Learners. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gong, Y., Chung, Y.A., and Glass, J. (2021). Ast: Audio spectrogram transformer. arXiv.","DOI":"10.21437\/Interspeech.2021-698"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhang, S., Loweimi, E., Bell, P., and Renals, S. (2021, January 19\u201322). On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers. Proceedings of the 2021 IEEE Spoken Language Technology Workshop (SLT), Shenzhen, China.","DOI":"10.1109\/SLT48900.2021.9383521"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"3495","DOI":"10.1109\/TMM.2022.3161851","article-title":"Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer","volume":"25","author":"Shih","year":"2023","journal-title":"IEEE Trans. Multimed."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3505244","article-title":"Transformers in Vision: A Survey","volume":"54","author":"Khan","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_40","unstructured":"Gong, Y., Lai, C.-I., Chung, Y.-A., and Glass, J. (March, January 22). SSAST: Self-Supervised Audio Spectrogram Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, Online."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"2438","DOI":"10.21437\/Interspeech.2022-10961","article-title":"MAE-AST: Masked Autoencoding Audio Spectrogram Transformer","volume":"2022","author":"Baade","year":"2022","journal-title":"Interspeech"},{"key":"ref_42","unstructured":"Gong, Y., Khurana, S., Rouditchenko, A., and Glass, J. (2022). Cmkd: Cnn\/transformer-based cross-model knowledge distillation for audio classification. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/5\/253\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:37:06Z","timestamp":1760107026000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/5\/253"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,30]]},"references-count":42,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,5]]}},"alternative-id":["info15050253"],"URL":"https:\/\/doi.org\/10.3390\/info15050253","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,30]]}}}