{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,3]],"date-time":"2024-09-03T05:56:36Z","timestamp":1725342996394},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2021,9]]},"DOI":"10.1007\/s10579-020-09527-z","type":"journal-article","created":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T09:05:19Z","timestamp":1611133519000},"page":"689-730","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system"],"prefix":"10.1007","volume":"55","author":[{"given":"Chuya","family":"China Bhanja","sequence":"first","affiliation":[]},{"given":"Mohammad Azharuddin","family":"Laskar","sequence":"additional","affiliation":[]},{"given":"Rabul Hussain","family":"Laskar","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,1,20]]},"reference":[{"key":"9527_CR2","doi-asserted-by":"crossref","unstructured":"Adami, A. G., Mihaescu, R., Reynolds, D. A., & Godfrey, J. J. (2003, April). Modeling prosodic dynamics for speaker recognition. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP'03). (Vol. 4, pp. IV-788). IEEE.","DOI":"10.1109\/ICASSP.2003.1202761"},{"issue":"2","key":"9527_CR3","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1016\/S0095-4470(03)00039-1","volume":"32","author":"M Atterer","year":"2004","unstructured":"Atterer, M., & Ladd, D. R. (2004). On the phonetics and phonology of \u201csegmental anchoring\u201d of F0: Evidence from German. Journal of Phonetics, 32(2), 177\u2013197.","journal-title":"Journal of Phonetics"},{"key":"9527_CR4","unstructured":"Baby, A., Thomas, A. L., & Nishanthi, N. L. (2016). T. Consortium, \u201cResources for Indian languages,\u201d CBBLR-Community-Based Building of Language Resources. Brno, Czech Republic: Tribun EU, 37\u201343."},{"key":"9527_CR5","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1017\/S095267570000066X","volume":"3","author":"ME Beckman","year":"1986","unstructured":"Beckman, M. E., & Pierrehumbert, J. B. (1986). Intonational structure in Japanese and English. Phonology, 3, 255\u2013309.","journal-title":"Phonology"},{"key":"9527_CR6","unstructured":"Burgos, W. (2014). Gammatone and MFCC Features in Speaker Recognition (Doctoral dissertation)."},{"key":"9527_CR7","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1016\/j.csl.2005.06.003","volume":"20","author":"WM Campbell","year":"2006","unstructured":"Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20, 210\u2013229.","journal-title":"Computer Speech & Language"},{"key":"9527_CR8","doi-asserted-by":"crossref","unstructured":"Casale, S., Russo, A., Scebba, G., & Serrano, S. (2008, August). Speech emotion classification using machine learning algorithms. In The IEEE international conference on semantic computing (pp. 158\u2013165). IEEE.","DOI":"10.1109\/ICSC.2008.43"},{"key":"9527_CR9","doi-asserted-by":"publisher","DOI":"10.1007\/s00034-018-0962-x","author":"C China Bhanja","year":"2018","unstructured":"China Bhanja, C., Laskar, M. A., & Laskar, R. H. (2018 October). A pre-classification-based language identification for Northeast Indian Languages using prosody and spectral features. Circuits System and Signal Processing. https:\/\/doi.org\/10.1007\/s00034-018-0962-x.","journal-title":"Circuits System and Signal Processing"},{"issue":"26","key":"9527_CR10","doi-asserted-by":"publisher","first-page":"10944","DOI":"10.1073\/pnas.0610848104","volume":"104","author":"D Dediu","year":"2007","unstructured":"Dediu, D., & Ladd, D. R. (2007). Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and Microcephalin. Proceedings of the National Academy of Sciences, 104(26), 10944\u201310949.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"9527_CR11","doi-asserted-by":"crossref","unstructured":"Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D., & Dehak, R. (2011). Language recognition via i-vectors and dimensionality reduction. In Twelfth annual conference of the international speech communication association.","DOI":"10.21437\/Interspeech.2011-328"},{"key":"9527_CR12","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1016\/j.specom.2017.01.009","volume":"88","author":"S Dey","year":"2017","unstructured":"Dey, S., Motlicek, P., Madikeri, S., & Ferras, M. (2017). Template-matching for text-dependent speaker verification. Speech Communication, 88, 96\u2013105.","journal-title":"Speech Communication"},{"key":"9527_CR13","first-page":"39","volume":"33","author":"M Dorofki","year":"2012","unstructured":"Dorofki, M., Elshafie, A. H., Jaafar, O., Karim, O. A., & Mastura, S. (2012). Comparison of artificial neural network transfer functions abilities to simulate extreme runoff data. International Proceedings of Chemical, Biological and Environmental Engineering, 33, 39\u201344.","journal-title":"International Proceedings of Chemical, Biological and Environmental Engineering"},{"key":"9527_CR14","doi-asserted-by":"crossref","unstructured":"Dusan S, & Deng L. (1998). Recovering vocal tract shapes from MFCC parameters. In Fifth International Conference on Spoken Language Processing.","DOI":"10.21437\/ICSLP.1998-795"},{"key":"9527_CR15","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1016\/0024-3841(77)90054-7","volume":"41","author":"J Gandour","year":"1977","unstructured":"Gandour, J. (1977). Counterfeit tones in the speech of Southern Thai bidialectals. Lingua, 41, 125\u2013143.","journal-title":"Lingua"},{"key":"9527_CR16","doi-asserted-by":"crossref","unstructured":"Hatch, A. O., Kajarekar, S., & Stolcke, A. (2006). Within-class covariance normalization for SVM-based speaker recognition. In Ninth international conference on spoken language processing.","DOI":"10.21437\/Interspeech.2006-183"},{"key":"9527_CR1","unstructured":"https:\/\/www.iitm.ac.in\/donlab\/tt\/index.php"},{"issue":"3","key":"9527_CR17","doi-asserted-by":"publisher","first-page":"544","DOI":"10.1016\/j.dsp.2011.11.008","volume":"22","author":"S Jothilakshmi","year":"2012","unstructured":"Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2012). A hierarchical language identification system for Indian languages. Digital Signal Processing, 22(3), 544\u2013553.","journal-title":"Digital Signal Processing"},{"key":"9527_CR18","doi-asserted-by":"crossref","unstructured":"Le, P. N., Ambikairajah, E., & Choi, E. H. (2009, July). Improvement of Vietnamese tone classification using FM and MFCC features. In Computing and communication technologies, 2009. RIVF'09. International Conference on\u00a0(pp. 1\u20134). IEEE.","DOI":"10.1109\/RIVF.2009.5174644"},{"issue":"9\u201310","key":"9527_CR19","doi-asserted-by":"publisher","first-page":"1162","DOI":"10.1016\/j.specom.2011.06.004","volume":"53","author":"CC Lee","year":"2011","unstructured":"Lee, C. C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9\u201310), 1162\u20131171.","journal-title":"Speech Communication"},{"key":"9527_CR20","doi-asserted-by":"publisher","first-page":"940","DOI":"10.1016\/j.csl.2014.02.004","volume":"28","author":"M Li","year":"2014","unstructured":"Li, M., & Narayanan, S. (2014). Simplified supervised i-vector modelling with application to robust and efficient language identification and speaker verification. Computer Speech and Language, 28, 940\u2013958.","journal-title":"Computer Speech and Language"},{"issue":"6","key":"9527_CR21","doi-asserted-by":"publisher","first-page":"1791","DOI":"10.1109\/TASL.2010.2101594","volume":"19","author":"Q Li","year":"2011","unstructured":"Li, Q., & Huang, Y. (2011). An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1791\u20131801.","journal-title":"IEEE Transactions on Audio, Speech and Language Processing"},{"key":"9527_CR22","volume-title":"The world atlas of language structures online","author":"I Maddieson","year":"2013","unstructured":"Maddieson, I., Dryer, M. S., & Haspelmath, M. (2013). The world atlas of language structures online. Leipzig, Germany: Max Planck Institute for Evolutionary Anthropology."},{"key":"9527_CR23","doi-asserted-by":"crossref","unstructured":"Maity, S., Vuppala, A. K., Rao, K. S., & Nandi, D. (2012, February). IITKGP-MLILSC speech database for language identification. In National conference on communication.","DOI":"10.1109\/NCC.2012.6176831"},{"key":"9527_CR24","doi-asserted-by":"crossref","unstructured":"Martinez, D., Lleida, E., Ortega, A., & Miguel, A. (2013, May). Prosodic features and formant modelling for an ivector-based language recognition system. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 6847\u20136851). IEEE.","DOI":"10.1109\/ICASSP.2013.6638988"},{"key":"9527_CR25","unstructured":"Mary, L. (2006). Multilevel implicit features for language and speaker recognition.\u00a0Ph.D. dissertation. IIT Madras, India."},{"key":"9527_CR26","unstructured":"Mounika, K. V., Achanta, S., Lakshmi, H. R., Gangashetty, S. V., & Vuppala, A. K. (2016, June). An investigation of deep neural network architectures for language recognition in Indian languages. In INTERSPEECH (pp. 2930\u20132933)."},{"key":"9527_CR27","doi-asserted-by":"crossref","unstructured":"Muthusamy, Y. K., Cole, R. A., & Oshika, B. T. (1992). The OGI multi-language telephone speech corpus. In Second International Conference on Spoken Language Processing.","DOI":"10.21437\/ICSLP.1992-276"},{"key":"9527_CR28","doi-asserted-by":"crossref","unstructured":"Ng, R. W. M., Lee, T., Leung, C. C., Ma, B., Li, H. (2009). Analysis and selection of prosodic features for language identification. In Proc. Asian Language Processing, pp 123\u2013128.","DOI":"10.1109\/IALP.2009.34"},{"key":"9527_CR29","unstructured":"Patterson, R. D., Nimmo-Smith, I., Holdsworth, J., & Rice, P. (1987, December). An efficient auditory filterbank based on the gammatone function. In A meeting of the IOC Speech Group on Auditory Modelling at RSRE (Vol. 2, No. 7)."},{"issue":"4","key":"9527_CR30","doi-asserted-by":"publisher","first-page":"556","DOI":"10.1109\/TASL.2008.2010884","volume":"17","author":"SRM Prasanna","year":"2009","unstructured":"Prasanna, S. R. M., Reddy, B. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556\u2013565.","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"9527_CR31","doi-asserted-by":"crossref","unstructured":"Prince, S. J., & Elder, J. H. (2007, October). Probabilistic linear discriminant analysis for inferences about identity. In IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007. (pp. 1\u20138). IEEE.","DOI":"10.1109\/ICCV.2007.4409052"},{"key":"9527_CR32","volume-title":"The interaction of stress and tone in standard Chinese: Experimental findings and theoretical consequences","author":"C Qu","year":"2012","unstructured":"Qu, C., & Goad, H. (2012). The interaction of stress and tone in standard Chinese: Experimental findings and theoretical consequences. Tone: Theory and Practice, Max Planck Institute for Evolutionary Anthropology."},{"issue":"4","key":"9527_CR33","doi-asserted-by":"publisher","first-page":"489","DOI":"10.1007\/s10772-013-9198-0","volume":"16","author":"VR Reddy","year":"2013","unstructured":"Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489\u2013511.","journal-title":"International Journal of Speech Technology"},{"key":"9527_CR34","doi-asserted-by":"crossref","unstructured":"Reynolds, D. (2015). Gaussian mixture models. Encyclopedia of biometrics, 827\u2013832.","DOI":"10.1007\/978-1-4899-7488-4_196"},{"key":"9527_CR36","doi-asserted-by":"crossref","unstructured":"Richardson, F., Reynolds, D., & Dehak, N. (2015a). A unified deep neural network for speaker and language recognition. In: proc of International Speech Communication Association.","DOI":"10.1109\/LSP.2015.2420092"},{"issue":"10","key":"9527_CR35","doi-asserted-by":"publisher","first-page":"1671","DOI":"10.1109\/LSP.2015.2420092","volume":"22","author":"F Richardson","year":"2015","unstructured":"Richardson, F., Reynolds, D., & Dehak, N. (2015b). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671\u20131675.","journal-title":"IEEE Signal Processing Letters"},{"key":"9527_CR37","doi-asserted-by":"crossref","unstructured":"Ryant, N., Yuan, J., & Liberman, M. (2014, May). Mandarin tone classification without pitch tracking. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 4868\u20134872). IEEE.","DOI":"10.1109\/ICASSP.2014.6854527"},{"key":"9527_CR38","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1016\/j.specom.2015.04.005","volume":"72","author":"SO Sadjadi","year":"2015","unstructured":"Sadjadi, S. O., & Hansen, J. H. L. (2015). Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Communication, 72, 138\u2013148.","journal-title":"Speech Communication"},{"issue":"3","key":"9527_CR39","first-page":"121","volume":"37","author":"P Sarmah","year":"2010","unstructured":"Sarmah, P., & Wiltshire, C. R. (2010). A preliminary acoustic study of Mizo vowels and tones. Journal of Acoustic Society of India, 37(3), 121\u2013129.","journal-title":"Journal of Acoustic Society of India"},{"key":"9527_CR40","unstructured":"Singh, A. K. (2006, October). A computational phonetic model for Indian language scripts. In Constraints on spelling changes: Fifth international workshop on writing systems."},{"issue":"4","key":"9527_CR41","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1109\/TASSP.1980.1163420","volume":"28","author":"D Steven","year":"1980","unstructured":"Steven, D., & Mermelstein, P. (August 1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing, 28(4), 357\u2013366.","journal-title":"IEEE Trans Acoustics Speech and Signal Proc."},{"key":"9527_CR42","volume-title":"Speech coding and synthesis","author":"D Talkin","year":"1995","unstructured":"Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). In W. B. Klein & K. K. Paliwal (Eds.), Speech coding and synthesis. New York: Elsevier."},{"key":"9527_CR43","unstructured":"Torres-Carrasquillo, P. A., Singer, E., Kohler, M. A., Greene, R. J., Reynolds, D. A., & Deller Jr, J. R. (2007). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In Seventh international conference on spoken language processing."},{"key":"9527_CR44","doi-asserted-by":"crossref","unstructured":"Wang, L., Ambikairajah, E., & Choi, E. H. (2007, September). Automatic language recognition with tonal and non-tonal language pre-classification. In Signal Processing Conference, 2007 15th European (pp. 2375\u20132379). IEEE.","DOI":"10.1109\/ICME.2007.4284659"},{"key":"9527_CR45","unstructured":"www.ciil-spokencorpus.net [Online, Retrieved January 20, 2009]."},{"key":"9527_CR46","doi-asserted-by":"crossref","unstructured":"Yin, B., Ambikairajah, E., & Chen, F. (2006). Combining cepstral and prosodic features in language identification. In 18th international conference on pattern recognition (ICPR'06) (Vol. 4, pp. 254\u2013257). IEEE.","DOI":"10.1109\/ICPR.2006.381"},{"key":"9527_CR47","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1002\/9781118584552.ch17","volume-title":"The handbook of Chinese linguistics","author":"J Zhang","year":"2014","unstructured":"Zhang, J. (2014). Tones, tonal phonology, and tone sandhi. In C.-T. James Huang, Y.-H. Audrey Li, & A. Simpson (Eds.), The handbook of Chinese linguistics (pp. 443\u2013464). Oxford: Wiley Blackwell."},{"key":"9527_CR48","doi-asserted-by":"crossref","unstructured":"Zhao, X., & Wang, D. (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification. In 2013 IEEE International conference on acoustics, speech and signal processing (ICASSP), (pp. 7204\u20137208). IEEE.","DOI":"10.1109\/ICASSP.2013.6639061"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-020-09527-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-020-09527-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-020-09527-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T18:10:22Z","timestamp":1724350222000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-020-09527-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,20]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9]]}},"alternative-id":["9527"],"URL":"https:\/\/doi.org\/10.1007\/s10579-020-09527-z","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"type":"print","value":"1574-020X"},{"type":"electronic","value":"1574-0218"}],"subject":[],"published":{"date-parts":[[2021,1,20]]},"assertion":[{"value":"30 December 2020","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 January 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}