{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:27:12Z","timestamp":1740122832858,"version":"3.37.3"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,1,25]],"date-time":"2021-01-25T00:00:00Z","timestamp":1611532800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2021,1,25]],"date-time":"2021-01-25T00:00:00Z","timestamp":1611532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Speech Technol"],"published-print":{"date-parts":[[2021,6]]},"DOI":"10.1007\/s10772-021-09796-1","type":"journal-article","created":{"date-parts":[[2021,1,25]],"date-time":"2021-01-25T08:05:19Z","timestamp":1611561919000},"page":"409-418","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["DNN and i-vector combined method for speaker recognition on multi-variability environments"],"prefix":"10.1007","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3358-3188","authenticated-orcid":false,"given":"Flavio J.","family":"Reyes-D\u00edaz","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gabriel","family":"Hern\u00e1ndez-Sierra","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9 R. Calvo","family":"de Lara","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,1,25]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"Al-Ali, A. K. H., Senadji, B., & Naik, G. R. (2017). Enhanced forensic speaker verification using multi-run ica in the presence of environmental noise and reverberation conditions. In: Proceedings of ICSIPA. IEEE, pp 174\u2013179.","key":"9796_CR1","DOI":"10.1109\/ICSIPA.2017.8120601"},{"doi-asserted-by":"crossref","unstructured":"Alam, M. J., Kenny, P., Bhattacharya, G., & Kockmann, M. (2017). Speaker verification under adverse conditions using i-vector adaptation and neural networks. In: Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20\u201324, 2017, pp 3732\u20133736.","key":"9796_CR2","DOI":"10.21437\/Interspeech.2017-1240"},{"unstructured":"Avila, A. R., Paja, M. O. S., & Fraga, F. J., et\u00a0al. (2014). Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systems. In: INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14\u201318, 2014, pp 1096\u20131100.","key":"9796_CR3"},{"issue":"4","key":"9796_CR4","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1109\/TASSP.1980.1163420","volume":"28","author":"S Davis","year":"1980","unstructured":"Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357\u2013366.","journal-title":"IEEE Transactions on Acoustics, Speech and Signal Processing"},{"issue":"4","key":"9796_CR5","doi-asserted-by":"publisher","first-page":"788","DOI":"10.1109\/TASL.2010.2064307","volume":"19","author":"N Dehak","year":"2011","unstructured":"Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Trans Audio, Speech & Language Processing, 19(4), 788\u2013798.","journal-title":"IEEE Trans Audio, Speech & Language Processing"},{"doi-asserted-by":"crossref","unstructured":"Garcia-Romero, D., & Espy-Wilson, C. Y. (2011). Analysis of i-vector length normalization in speaker recognition systems. In: INTERSPEECH 2011, 12th Annual conference of the international speech communication association, Florence, Italy, August 27\u201331, 2011, pp 249\u2013252.","key":"9796_CR6","DOI":"10.21437\/Interspeech.2011-53"},{"doi-asserted-by":"crossref","unstructured":"Garcia-Romero, D., Zhou, X., & Espy-Wilson, C. Y. (2012). Multicondition training of gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition. In: 2012 IEEE international conference on acoustics, speech and signal processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012, pp 4257\u20134260.","key":"9796_CR7","DOI":"10.1109\/ICASSP.2012.6288859"},{"issue":"1","key":"9796_CR8","doi-asserted-by":"publisher","first-page":"007","DOI":"10.3989\/loquens.2014.007","volume":"1","author":"J Gonzalez-Rodriguez","year":"2014","unstructured":"Gonzalez-Rodriguez, J. (2014). Evaluating automatic speaker recognition systems: An overview of the nist speaker recognition evaluations (1996\u20132014). Loquens, 1(1), 007.","journal-title":"Loquens"},{"doi-asserted-by":"crossref","unstructured":"Greenberg, C. S., Stanford, V. M., Martin, A. F., Yadagiri, M., Doddington, G. R., Godfrey, J. J., & Hernandez-Cordero, J. (2013). The 2012 NIST speaker recognition evaluation. In: INTERSPEECH 2013, 14th annual conference of the international speech communication association, Lyon, France, August 25\u201329, 2013, pp 1971\u20131975.","key":"9796_CR9","DOI":"10.21437\/Interspeech.2013-469"},{"key":"9796_CR10","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1016\/j.specom.2018.10.004","volume":"105","author":"J Guo","year":"2018","unstructured":"Guo, J., Xu, N., Qian, K., Shi, Y., Xu, K., Wu, Y., et al. (2018). Deep neural network based i-vector mapping for speaker verification using short utterances. Speech Communication, 105, 92\u2013102.","journal-title":"Speech Communication"},{"issue":"6","key":"9796_CR11","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","volume":"29","author":"G Hinton","year":"2012","unstructured":"Hinton, G., Deng, L., Yu, D., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82\u201397.","journal-title":"IEEE Signal Processing Magazine"},{"key":"9796_CR12","doi-asserted-by":"publisher","first-page":"599","DOI":"10.1007\/978-3-642-35289-8_32","volume-title":"Neural networks: Tricks of the trade","author":"GE Hinton","year":"2012","unstructured":"Hinton, G. E. (2012). A practical guide to training restricted boltzmann machines. Neural networks: Tricks of the trade (2nd ed., pp. 599\u2013619). Berlin, Heidelberg: Springer.","edition":"2"},{"issue":"5786","key":"9796_CR13","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1126\/science.1127647","volume":"313","author":"GE Hinton","year":"2006","unstructured":"Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504\u2013507.","journal-title":"Science"},{"issue":"7","key":"9796_CR14","doi-asserted-by":"publisher","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","volume":"18","author":"GE Hinton","year":"2006","unstructured":"Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527\u20131554.","journal-title":"Neural Computation"},{"unstructured":"Kenny, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal (Report) CRIM-06\/08-13.","key":"9796_CR15"},{"unstructured":"Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In: Odyssey 2010: The speaker and language recognition workshop, Brno, Czech Republic, June 28\u2013July 1, 2010, p.\u00a014.","key":"9796_CR16"},{"doi-asserted-by":"crossref","unstructured":"Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2005). Factor analysis simplified. In: 2005 IEEE international conference on acoustics, speech, and signal processing, ICASSP \u201905, Philadelphia, Pennsylvania, USA, March 18\u201323, 2005, pp. 637\u2013640.","key":"9796_CR17","DOI":"10.1109\/ICASSP.2005.1415194"},{"unstructured":"Kenny, P., Stafylakis, T., Ouellet, P., Gupta, V., & Alam, MJ. (2014). Deep neural networks for extracting baum-welch statistics for speaker recognition. In: Odyssey 2014: The speaker and language recognition workshop, Joensuu, Finland, June 16\u201319, 2014.","key":"9796_CR18"},{"issue":"3","key":"9796_CR19","doi-asserted-by":"publisher","first-page":"633","DOI":"10.1109\/TASLP.2018.2789399","volume":"26","author":"WB Kheder","year":"2018","unstructured":"Kheder, W. B., Matrouf, D., Ajili, M., & Bonastre, J. F. (2018). A unified joint model to deal with nuisance variabilities in the i-vector space. IEEE\/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 633\u2013645.","journal-title":"IEEE\/ACM Transactions on Audio, Speech, and Language Processing"},{"issue":"7","key":"9796_CR20","doi-asserted-by":"publisher","first-page":"1315","DOI":"10.1109\/TASLP.2016.2545928","volume":"24","author":"C Kim","year":"2016","unstructured":"Kim, C., & Stern, R. M. (2016). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE\/ACM Trans Audio, Speech & Language Processing, 24(7), 1315\u20131329.","journal-title":"IEEE\/ACM Trans Audio, Speech & Language Processing"},{"doi-asserted-by":"crossref","unstructured":"Kinoshita, K., Delcroix, M., Yoshioka, T., & Nakatani, T., et\u00a0al. (2013). The reverb challenge: Acommon evaluation framework for dereverberation and recognition of reverberant speech. In: IEEE workshop on applications of signal processing to audio and acoustics, WASPAA 2013, New Paltz, NY, USA, October 20\u201323, 2013, pp. 1\u20134.","key":"9796_CR21","DOI":"10.1109\/WASPAA.2013.6701894"},{"doi-asserted-by":"crossref","unstructured":"Kudashev, O., Novoselov, S., Pekhovsky, T., Simonchik, K., & Lavrentyeva, G. (2016). Usage of DNN in speaker recognition: Advantages and problems. In: Advances in Neural Networks-ISNN 2016, 13th International symposium on neural networks, ISNN 2016, St. Petersburg, Russia, July 6\u20138, 2016, Proceedings, pp 82\u201391.","key":"9796_CR22","DOI":"10.1007\/978-3-319-40663-3_10"},{"doi-asserted-by":"crossref","unstructured":"Lei, Y., Scheffer, N., Ferrer, L., & McLaren, M. (2014). A novel scheme for speaker recognition using a phonetically-aware deep neural network. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2014, Florence, Italy, May 4\u20139, 2014, pp 1695\u20131699.","key":"9796_CR23","DOI":"10.1109\/ICASSP.2014.6853887"},{"issue":"6","key":"9796_CR24","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1049\/el.2016.4629","volume":"53","author":"J Ma","year":"2017","unstructured":"Ma, J., Sethu, V., Ambikairajah, E., & Lee, K. A. (2017). Duration compensation of i-vectors for short duration speaker verification. Electronics Letters, 53(6), 405\u2013407.","journal-title":"Electronics Letters"},{"issue":"1","key":"9796_CR25","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1109\/TASL.2011.2109382","volume":"20","author":"A Mohamed","year":"2012","unstructured":"Mohamed, A., Dahl, G. E., & Hinton, G. E. (2012). Acoustic modeling using deep belief networks. IEEE Trans Audio, Speech & Language Processing, 20(1), 14\u201322.","journal-title":"IEEE Trans Audio, Speech & Language Processing"},{"doi-asserted-by":"crossref","unstructured":"Novotn\u00fd, O., Plchot, O., Matejka, P., Mosner, L., & Glembek, O. (2018). On the use of x-vectors for robust speaker recognition. Odyssey the speaker and language recognition workshop, 26\u201329 June 2018, (pp. 168\u2013175). Les Sables d\u2019Olonne.","key":"9796_CR26","DOI":"10.21437\/Odyssey.2018-24"},{"doi-asserted-by":"crossref","unstructured":"Pekhovsky, T., Novoselov, S., Sholohov, A., & Kudashev, O. (2016). On autoencoders in the i-vector space for speaker recognition. In: Odyssey 2016: The speaker and language recognition workshop, Bilbao, Spain, June 21\u201324, 2016, pp 217\u2013224.","key":"9796_CR27","DOI":"10.21437\/Odyssey.2016-31"},{"issue":"2","key":"9796_CR28","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1049\/iet-bmt.2017.0065","volume":"7","author":"A Poddar","year":"2017","unstructured":"Poddar, A., Sahidullah, M., & Saha, G. (2017). Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics, 7(2), 91\u2013101.","journal-title":"IET Biometrics"},{"doi-asserted-by":"crossref","unstructured":"Rajan, P., Kinnunen, T., & Hautam\u00e4ki, V. (2013). Effect of multicondition training on i-vector PLDA configurations for speaker recognition. In: INTERSPEECH 2013, 14th annual conference of the international speech communication association, Lyon, France, August 25\u201329, 2013, pp 3694\u20133697.","key":"9796_CR29","DOI":"10.21437\/Interspeech.2013-693"},{"issue":"3","key":"9796_CR30","doi-asserted-by":"publisher","first-page":"475","DOI":"10.1007\/s10772-017-9414-4","volume":"20","author":"FJ Reyes-D\u00edaz","year":"2017","unstructured":"Reyes-D\u00edaz, F. J., Hern\u00e1ndez-Sierra, G., & Calvo-de Lara, J. R. (2017). Two-space variability compensation technique for speaker verification in short length and reverberant environments. International Journal of Speech Technology (IJST), 20(3), 475\u2013485.","journal-title":"International Journal of Speech Technology (IJST)"},{"issue":"3","key":"9796_CR31","first-page":"152","volume":"12","author":"FJ Reyes-D\u00edaz","year":"2018","unstructured":"Reyes-D\u00edaz, F. J., Roble-Guti\u00e9rres, A., Hern\u00e1ndez-Sierra, G., & Calvo-de Lara, J. R. (2018). Filtrado wiener para la reducci\u00f3n de ruido en la verificaci\u00f3n de locutores. Revista Cubana de Ciencias Inform\u00e1ticas (RCCI), 12(3), 152\u2013162.","journal-title":"Revista Cubana de Ciencias Inform\u00e1ticas (RCCI)"},{"doi-asserted-by":"crossref","unstructured":"Ribas, D., Vincent, E., & Calvo-de Lara, J. R. (2015). Full multicondition training for robust i-vector based speaker recognition. In: INTERSPEECH 2015, 16th annual conference of the international speech communication association, Dresden, Germany, September 6\u201310, 2015, pp 1057\u20131061.","key":"9796_CR32","DOI":"10.21437\/Interspeech.2015-284"},{"issue":"10","key":"9796_CR33","doi-asserted-by":"publisher","first-page":"1671","DOI":"10.1109\/LSP.2015.2420092","volume":"22","author":"F Richardson","year":"2015","unstructured":"Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671\u20131675.","journal-title":"IEEE Signal Processing Letters"},{"key":"9796_CR34","volume-title":"On-line learning in neural networks","author":"D Saad","year":"2010","unstructured":"Saad, D. (2010). On-line learning in neural networks. Cambridge: Cambridge University Press."},{"doi-asserted-by":"crossref","unstructured":"Scheffer, N., Ferrer, L., Lawson, A., Lei, Y., & McLaren, M. (2013). Recent developments in voice biometrics: Robustness and high accuracy. In: 2013 IEEE international conference on technologies for homeland security (HST), pp 447\u2013452.","key":"9796_CR35","DOI":"10.1109\/THS.2013.6699046"},{"doi-asserted-by":"crossref","unstructured":"Senior, A. W., Sak, H., & Shafran, I. (2015). Context dependent phone models for LSTM RNN acoustic modelling. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19\u201324, 2015, pp 4585\u20134589.","key":"9796_CR36","DOI":"10.1109\/ICASSP.2015.7178839"},{"doi-asserted-by":"crossref","unstructured":"Snyder, D., Garcia-Romero, D., Povey, D., & Khudanpur, S. (2017). Deep neural network embeddings for text-independent speaker verification. In: Interspeech 2017, 18th annual conference of the international speech communication association, Stockholm, Sweden, August 20\u201324, 2017, pp 999\u20131003.","key":"9796_CR37","DOI":"10.21437\/Interspeech.2017-620"},{"doi-asserted-by":"crossref","unstructured":"Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., & Khudanpur, S. (2018). X-vectors: Robust DNN embeddings for speaker recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing, ICASSP 2018, Calgary, AB, Canada, April 15\u201320, 2018, pp. 5329\u20135333.","key":"9796_CR38","DOI":"10.1109\/ICASSP.2018.8461375"},{"issue":"1","key":"9796_CR39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/97.736233","volume":"6","author":"J Sohn","year":"1999","unstructured":"Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1\u20133.","journal-title":"IEEE Signal Processing Letters"},{"unstructured":"Solanas, A., & P\u00e9rez, A. (2004). Estad\u00edstica descriptiva en ciencias del comportamiento. Thomson, https:\/\/books.google.com.cu\/books?id=NOBYAAAACAAJ.","key":"9796_CR40"},{"doi-asserted-by":"crossref","unstructured":"Xu, L., Das, RK., Y\u0131lmaz, E., Yang, J., & Li, H. (2018). Generative x-vectors for text-independent speaker verification. arXiv preprint arXiv:180906798.","key":"9796_CR41","DOI":"10.1109\/SLT.2018.8639510"},{"doi-asserted-by":"crossref","unstructured":"Zhang, C., & Koishida, K. (2017). End-to-end text-independent speaker verification with triplet loss on short utterances. In: Interspeech 2017, 18th annual conference of the international speech communication association, Stockholm, Sweden, August 20\u201324, 2017, pp 1487\u20131491.","key":"9796_CR42","DOI":"10.21437\/Interspeech.2017-1608"}],"container-title":["International Journal of Speech Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10772-021-09796-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10772-021-09796-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10772-021-09796-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,13]],"date-time":"2022-12-13T04:42:39Z","timestamp":1670906559000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10772-021-09796-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,25]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,6]]}},"alternative-id":["9796"],"URL":"https:\/\/doi.org\/10.1007\/s10772-021-09796-1","relation":{},"ISSN":["1381-2416","1572-8110"],"issn-type":[{"type":"print","value":"1381-2416"},{"type":"electronic","value":"1572-8110"}],"subject":[],"published":{"date-parts":[[2021,1,25]]},"assertion":[{"value":"24 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 January 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}