{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T17:48:10Z","timestamp":1771350490658,"version":"3.50.1"},"reference-count":34,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2018,7,23]],"date-time":"2018-07-23T00:00:00Z","timestamp":1532304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61701046"],"award-info":[{"award-number":["61701046"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The convolutional neural network (CNN) has made great strides in the area of voiceprint recognition; but it needs a huge number of data samples to train a deep neural network. In practice, it is too difficult to get a large number of training samples, and it cannot achieve a better convergence state due to the limited dataset. In order to solve this question, a new method using a deep migration hybrid model is put forward, which makes it easier to realize voiceprint recognition for small samples. Firstly, it uses Transfer Learning to transfer the trained network from the big sample voiceprint dataset to our limited voiceprint dataset for the further training. Fully-connected layers of a pre-training model are replaced by restricted Boltzmann machine layers. Secondly, the approach of Data Augmentation is adopted to increase the number of voiceprint datasets. Finally, we introduce fast batch normalization algorithms to improve the speed of the network convergence and shorten the training time. Our new voiceprint recognition approach uses the TLCNN-RBM (convolutional neural network mixed restricted Boltzmann machine based on transfer learning) model, which is the deep migration hybrid model that is used to achieve an average accuracy of over 97%, which is higher than that when using either CNN or the TL-CNN network (convolutional neural network based on transfer learning). Thus, an effective method for a small sample of voiceprint recognition has been provided.<\/jats:p>","DOI":"10.3390\/s18072399","type":"journal-article","created":{"date-parts":[[2018,7,24]],"date-time":"2018-07-24T02:58:56Z","timestamp":1532401136000},"page":"2399","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":37,"title":["Voiceprint Identification for Limited Dataset Using the Deep Migration Hybrid Model Based on Transfer Learning"],"prefix":"10.3390","volume":"18","author":[{"given":"Cunwei","family":"Sun","sequence":"first","affiliation":[{"name":"School of Computer Science, Yangtze University, Jingzhou 434023, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuxin","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Yangtze University, Jingzhou 434023, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chang","family":"Wen","sequence":"additional","affiliation":[{"name":"School of Computer Science, Yangtze University, Jingzhou 434023, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kai","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Electronic and Information, Yangtze University, Jingzhou 434023, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fangqing","family":"Wen","sequence":"additional","affiliation":[{"name":"School of Electronic and Information, Yangtze University, Jingzhou 434023, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,7,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Islam, M.A. (2016, January 28\u201329). Frequency domain linear prediction-based robust text-dependent speaker identification. Proceedings of the International Conference on Innovations in Science, Engineering and Technology (ICISET), Dhaka, Bangladesh.","DOI":"10.1109\/ICISET.2016.7856508"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1109\/TASLP.2014.2339736","article-title":"Convolutional neural networks for speech recognition","volume":"22","author":"Mohamed","year":"2014","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Huang, J.T., Li, J., and Gong, Y. (2015, January 19\u201324). An analysis of convolutional neural networks for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia.","DOI":"10.1109\/ICASSP.2015.7178920"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lukic, Y., Vogt, C., and D\u00fcrr, O. (2016, January 13\u201316). Speaker identification and clustering using convolutional neural networks. Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Salerno, Italy.","DOI":"10.1109\/MLSP.2016.7738816"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"13487","DOI":"10.1016\/j.eswa.2011.04.069","article-title":"Speaker recognition under limited data condition by noise addition","volume":"38","author":"Krishnamoorthy","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Oquab, M., Bottou, L., and Laptev, I. (2014, January 23\u201328). Learning and transferring mid-level image representations using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.222"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Azmy, M.M. (2015, January 3\u20135). Classification of lung sounds based on linear prediction cepstral coefficients and support vector machine. Proceedings of the 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), Amman, Jordan.","DOI":"10.1109\/AEECT.2015.7360527"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wang, Y., and Lawlor, B. (2017, January 20\u201321). Speaker recognition based on MFCC and BP neural networks. Proceedings of the Irish Signals and Systems Conference (ISSC), Killarney, Ireland.","DOI":"10.1109\/ISSC.2017.7983644"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Garcia-Romero, D., and Mccree, A. (2014, January 4\u20139). Supervised domain adaptation for I-vector based speaker recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6854362"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1448","DOI":"10.1109\/TASL.2007.894527","article-title":"Speaker and session variability in GMM-based speaker verification","volume":"15","author":"Kenny","year":"2007","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1016\/j.specom.2006.06.011","article-title":"A tree-based kernel selection approach to efficient Gaussian mixture model\u2013universal background model based speaker identification","volume":"48","author":"Xiong","year":"2006","journal-title":"Speech Commun."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1366","DOI":"10.1109\/TASL.2009.2034187","article-title":"Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition","volume":"18","author":"Ferras","year":"2010","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: A review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"Lecun","year":"2015","journal-title":"Nature"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/j.csl.2016.06.007","article-title":"Building DNN acoustic models for large vocabulary speech recognition","volume":"41","author":"Maas","year":"2017","journal-title":"Comput. Speech Lang."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"T\u00f3th, L., and Gr\u00f3sz, T. (2013, January 1\u20135). A comparison of deep neural network training methods for large vocabulary speech recognition. Proceedings of the International Conference on Text, Speech, and Dialogue, Pilsen, Czech Republic.","DOI":"10.1007\/978-3-642-40585-3_6"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Chang, J., and Wang, D.L. (2017, January 5\u20139). Robust speaker recognition based on DNN\/i-Vectors and speech separation. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7953191"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, C., and Woodland, P.C. (2016, January 20\u201325). DNN speaker adaptation using parameterized sigmoid and ReLU hidden activation functions. Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing(ICASSP), Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7472689"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1109\/LSP.2017.2723507","article-title":"Low latency acoustic modeling using temporal convolution and LSTMs","volume":"25","author":"Peddinti","year":"2018","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1016\/j.specom.2017.05.004","article-title":"Transfer learning method for PLDA-based speaker verification","volume":"92","author":"Hong","year":"2017","journal-title":"Speech Commun."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"448","DOI":"10.1016\/j.neucom.2016.09.018","article-title":"A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition","volume":"218","author":"Huang","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lim, B.P., Wong, F., and Li, Y. (2016, January 8\u201312). Transfer learning with bottleneck feature networks for whispered speech recognition. Proceedings of the Interspeech 2016, San Francisco, CA, USA.","DOI":"10.21437\/Interspeech.2016-250"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.csl.2017.06.007","article-title":"Restricted Boltzmann machines for vector representation of speech in speaker recognition","volume":"47","author":"Ghahabi","year":"2018","journal-title":"Comput. Speech Lang."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhu, L.Z., Chen, L.M., and Zhao, D.H. (2017). Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17.","DOI":"10.3390\/s17071694"},{"key":"ref_25","unstructured":"Le, Q.V. (2011, January 22\u201327). Building high-level features using large scale unsupervised learning. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proceed. IEEE"},{"key":"ref_27","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference Machine Learning (ICML), Lille, France."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Dutta, T. (2008, January 27\u201330). Dynamic time warping based approach to text-dependent speaker identification using spectrograms. Proceedings of the 2008 IEEE Congress on Image and Signal Processing CISP\u201908, Hainan, China.","DOI":"10.1109\/CISP.2008.560"},{"key":"ref_29","unstructured":"Niu, Y.F., Zou, D.S., Niu, Y.D., He, Z.S., and Tan, H. A breakthrough in speech emotion recognition using deep retinal convolution neural networks, Comput. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1771","DOI":"10.1162\/089976602760128018","article-title":"Training products of experts by minimizing contrastive divergence","volume":"14","author":"Hinton","year":"2002","journal-title":"Neural Comput."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/j.desal.2012.02.002","article-title":"Studies on prediction of separation percent in electrodialysis process via BP neural networks and improved BP algorithms","volume":"291","author":"Jing","year":"2012","journal-title":"Desalination"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Li, J., Qiu, T., Wen, C., Xie, K., and Wen, F.-Q. (2018). Robust Face Recognition Using the Deep C2D-CNN Model Based on Decision-Level Fusion. Sensors, 18.","DOI":"10.3390\/s18072080"},{"key":"ref_33","unstructured":"NIST Multimodal Information Group (2011). 2008 NIST Speaker Recognition Evaluation Training Set Part 1 LDC2011S05, Linguistic Data Consortium."},{"key":"ref_34","unstructured":"(2017, December 25). DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. Available online: https:\/\/catalog.ldc.upenn.edu\/ldc93s1."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/7\/2399\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:13:54Z","timestamp":1760195634000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/7\/2399"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,23]]},"references-count":34,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2018,7]]}},"alternative-id":["s18072399"],"URL":"https:\/\/doi.org\/10.3390\/s18072399","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,7,23]]}}}