{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T15:11:29Z","timestamp":1778166689297,"version":"3.51.4"},"reference-count":42,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2019,4,11]],"date-time":"2019-04-11T00:00:00Z","timestamp":1554940800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"The Institute for Information &amp; Communications Technology Planning &amp; Promotion (IITP)","award":["R0126-15-1119"],"award-info":[{"award-number":["R0126-15-1119"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Automatic gender classification in speech is a challenging research field with a wide range of applications in HCI (human-computer interaction). A couple of decades of research have shown promising results, but there is still a need for improvement. Until now, gender classification has been made using differences in the spectral characteristics of males and females. We assumed that a neutral margin exists between the male and female spectral range. This margin causes misclassification of gender. To address this limitation, we studied three non-lexical speech features (fillers, overlapping, and lengthening). From the statistical analysis, we found that overlapping and lengthening are effective in gender classification. Next, we performed gender classification using overlapping, lengthening, and the baseline acoustic feature, Mel Frequency Cepstral Coefficient (MFCC). We have tried to achieve the best results by using various combinations of features at the same time or sequentially. We used two types of machine-learning methods, support vector machine (SVM) and recurrent neural networks (RNN), to classify the gender. We achieved 89.61% with RNN using a feature set including MFCC, overlapping, and lengthening at the same time. Also, we have reclassified using non-lexical features with only data belonging to the neutral margin which was empirically selected based on the result of gender classification with only MFCC. As a result, we determined that the accuracy of classification with RNN using lengthening was 1.83% better than when MFCC alone was used. We concluded that new speech features could be effective in improving gender classification through a behavioral approach, notably including emergency calls.<\/jats:p>","DOI":"10.3390\/sym11040525","type":"journal-article","created":{"date-parts":[[2019,4,12]],"date-time":"2019-04-12T03:46:37Z","timestamp":1555040797000},"page":"525","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Gender Classification Based on the Non-Lexical Cues of Emergency Calls with Recurrent Neural Networks (RNN)"],"prefix":"10.3390","volume":"11","author":[{"given":"Guiyoung","family":"Son","sequence":"first","affiliation":[{"name":"Department of Software, Sejong University, 209, Neung-dong-ro, Gwangjin-gu, Seoul 05006, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Soonil","family":"Kwon","sequence":"additional","affiliation":[{"name":"Department of Software, Sejong University, 209, Neung-dong-ro, Gwangjin-gu, Seoul 05006, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Neungsoo","family":"Park","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering Kunkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,4,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Groves, R.M., O\u2019Hare, B.C., Gould-Smith, D., Benki, J., and Maher, P. (2008). Telephone interviewer voice characteristics and the survey participation decision. Adv. Teleph. Surv. Methodol., 385\u2013400.","DOI":"10.1002\/9780470173404.ch18"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1016\/j.csl.2012.01.008","article-title":"Automatic speaker age and gender recognition using acoustic and prosodic level information fusion","volume":"27","author":"Li","year":"2013","journal-title":"Comput. Speech Lang."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1959","DOI":"10.1109\/TNNLS.2016.2550532","article-title":"Adaptation to new microphones using artificial neural networks with trainable activation functions","volume":"28","author":"Siniscalchi","year":"2017","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_4","unstructured":"Naini, A.S., and Homayounpour, M. (2006, January 16\u201320). Speaker age interval and sex identification based on jitters, shimmers and mean mfcc using supervised and unsupervised discriminative classification methods. Proceedings of the 2006 8th international Conference on Signal Processing, Beijing, China."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zeng, Y.-M., Wu, Z.-Y., Falk, T., and Chan, W.-Y. (2006, January 13\u201316). Robust GMM based gender classification using pitch and RASTA-PLP parameters of speech. Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, Dalian, China.","DOI":"10.1109\/ICMLC.2006.258497"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Muller, C., Huber, R., Andrassy, B., and Bauer, J.G. (2007, January 15\u201320). Comparison of four approaches to age and gender recognition for telephone applications. Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP\u201907, Honolulu, HI, USA.","DOI":"10.1109\/ICASSP.2007.367263"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Vergin, R., Farhat, A., and O\u2019Shaughnessy, D. (1996, January 3\u20136). Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male\/female classification. Proceedings of the Fourth International Conference on Spoken Language Processing, Philadelphia, PA, USA.","DOI":"10.21437\/ICSLP.1996-284"},{"key":"ref_8","unstructured":"Ververidis, D., and Kotropoulos, C. (2004, January 6\u201310). Automatic speech classification to five emotional states based on gender information. Proceedings of the EUSIPCO, Vienna, Austria."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1002\/sec.308","article-title":"Pitch-based gender identification with two-stage classification","volume":"5","author":"Hu","year":"2012","journal-title":"Secur. Commun. Netw."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ting, H., Yingchun, Y., and Zhaohui, W. (2006, January 16\u201320). Combining MFCC and pitch to enhance the performance of the gender recognition. Proceedings of the 2006 8th international Conference on Signal Processing, Beijing, China.","DOI":"10.1109\/ICOSP.2006.345541"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Kabil, S.H., Muckenhirn, H., and Doss, M.M. (2019, March 01). On Learning to Identify Genders from Raw Speech Signal using CNNs. Available online: http:\/\/publications.idiap.ch\/downloads\/papers\/2018\/Kabil_INTERSPEECH_2018.pdf.","DOI":"10.21437\/Interspeech.2018-1240"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1016\/j.apacoust.2015.04.013","article-title":"A new pitch-range based feature set for a speaker\u2019s age and gender classification","volume":"98","author":"Barkana","year":"2015","journal-title":"Appl. Acoust."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"410","DOI":"10.1016\/S0892-1997(97)80036-7","article-title":"Aerodynamic and acoustic characteristics of the adult AfricanAmerican voice","volume":"11","author":"Sapienza","year":"1997","journal-title":"J. Voice"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/S0892-1997(05)80247-4","article-title":"Phonational profiles of male trained singers and nonsingers","volume":"9","author":"Morris","year":"1995","journal-title":"J. Voice"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1064","DOI":"10.1121\/1.427116","article-title":"Glottal characteristics of male speakers: Acoustic correlates and comparison with female data","volume":"106","author":"Hanson","year":"1999","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_16","first-page":"53","article-title":"Gender Differences in Powerful\/Powerless Language Use in Adult and Higher Education Settings: A Meta-Analysis","volume":"6","author":"Jun","year":"2005","journal-title":"Asian J. Educ."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1111\/j.1749-818X.2008.00068.x","article-title":"Hesitation disfluencies in spontaneous speech: The meaning of um","volume":"2","author":"Corley","year":"2008","journal-title":"Lang. Linguist. Compass"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1590","DOI":"10.1016\/j.specom.2006.04.004","article-title":"Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation","volume":"48","author":"Stouten","year":"2006","journal-title":"Speech Commun."},{"key":"ref_19","first-page":"89","article-title":"A Corpus-analysis of Gender Effects in Private Speech: The Function of Discourse Markers in Spoken Korean","volume":"53","year":"2011","journal-title":"Lang. Linguist."},{"key":"ref_20","first-page":"93","article-title":"A Study on the Use of Korean Discourse Markers according to Gender","volume":"1","year":"2004","journal-title":"Korean Lang. Lit."},{"key":"ref_21","first-page":"61","article-title":"Special Feature-Korean Speech and Conversation Analysis: Characteristics found among Men and Women engaging in Interrupting the Turn of the Next Speaker","volume":"2","year":"2000","journal-title":"Speech Res."},{"key":"ref_22","first-page":"41","article-title":"The Difference between Men\u2019s and Women\u2019s Speeches: From the Aspect of Discourse Strategy and Discourse Context","volume":"19","author":"Cheon","year":"2007","journal-title":"Korean J. Russ. Lang. Lit."},{"key":"ref_23","first-page":"23","article-title":"Interventions in Talk Shows: Discourse Functions and Social Variables","volume":"6","year":"1999","journal-title":"Discourse Cogn."},{"key":"ref_24","first-page":"85","article-title":"Intonation Patterns of Korean Spontaneous Speech","volume":"1","author":"Kim","year":"2009","journal-title":"J. Korean Soc. Speech Sci."},{"key":"ref_25","first-page":"99","article-title":"Politeness Strategy in Intonation Based on Age: Through Analysis of Spontaneous Speech of Those in 10s, 20s, and 30s Women","volume":"45","year":"2014","journal-title":"Korean Semant."},{"key":"ref_26","unstructured":"Hagerer, G., Pandit, V., Eyben, F., and Schuller, B. (2017, January 22\u201324). Enhancing lstm rnn-based speech overlap detection by artificially mixed data. Proceedings of the Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio, Erlangen, Germany."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, Z.-Q., and Tashev, I. (2017, January 5\u20139). Learning utterance-level representations for speech emotion and age\/gender recognition using deep neural networks. Proceedings of the 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7953138"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Buyukyilmaz, M., and Cibikdiken, A.O. (2016, January 18\u201319). Voice gender recognition using deep learning. Proceedings of the 2016 International Conference on Modeling, Simulation and Optimization Technologies and Applications (MSOTA2016), Xiamen, China.","DOI":"10.2991\/msota-16.2016.90"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1109\/TETC.2013.2274797","article-title":"Gender-driven emotion recognition through speech signals for ambient intelligence applications","volume":"1","author":"Bisio","year":"2013","journal-title":"IEEE Emerg. Top Com."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, L., Wang, L., Dang, J., Guo, L., and Yu, Q. (2018, January 4\u20137). Gender-Aware CNN-BLSTM for Speech Emotion Recognition. Proceedings of the International Conference on Artificial Neural Networks, Rhodes, Greece.","DOI":"10.1007\/978-3-030-01418-6_76"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"22524","DOI":"10.1109\/ACCESS.2018.2816163","article-title":"Age estimation in short speech utterances based on LSTM recurrent neural networks","volume":"6","author":"Zazo","year":"2018","journal-title":"IEEE Access"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.neucom.2012.11.008","article-title":"Exploiting deep neural networks for detection-based speech recognition","volume":"106","author":"Siniscalchi","year":"2013","journal-title":"Neurocomputing"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Katerenchuk, D. (2018). Age group classification with speech and metadata multimodality fusion. arXiv.","DOI":"10.18653\/v1\/E17-2030"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Abouelenien, M., P\u00e9rez-Rosas, V., Mihalcea, R., and Burzo, M. (2017, January 13\u201317). Multimodal gender detection. Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK.","DOI":"10.1145\/3136755.3136770"},{"key":"ref_35","unstructured":"McCowan, I., Carletta, J., Kraaij, W., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., and Karaiskos, V. (2005, January 11\u201313). The AMI meeting corpus. Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, Edinburgh, UK."},{"key":"ref_36","unstructured":"Batliner, A., Steidl, S., and N\u00f6th, E. (June, January 26). Releasing a thoroughly annotated and processed spontaneous emotional database: the FAU Aibo Emotion Corpus. Proceedings of the Satellite Workshop of LREC, Marrakech, Morocco."},{"key":"ref_37","unstructured":"Burkhardt, F., Eckert, M., Johannsen, W., and Stegmann, J. (2010, January 17\u201323). A Database of Age and Gender Annotated Telephone Speech. Proceedings of the LREC, Valletta, Malta."},{"key":"ref_38","unstructured":"(2019, March 01). IBM SPSS Decision Trees 21. Available online: http:\/\/www.sussex.ac.uk\/its\/pdfs\/SPSS_Decision_Trees_21.pdf."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support vector machine","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_40","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Graves, A., Mohamed, A., and Hinton, G. (2013, January 26\u201331). Speech recognition with deep recurrent neural networks. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Eyben, F., Weninger, F., Squartini, S., and Schuller, B. (2013, January 26\u201331). Real-life voice activity detection with lstm recurrent neural networks and an application to hollywood movies. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6637694"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/4\/525\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:44:44Z","timestamp":1760186684000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/4\/525"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,11]]},"references-count":42,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2019,4]]}},"alternative-id":["sym11040525"],"URL":"https:\/\/doi.org\/10.3390\/sym11040525","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,4,11]]}}}