{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T17:40:56Z","timestamp":1778780456850,"version":"3.51.4"},"reference-count":37,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T00:00:00Z","timestamp":1624838400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100007834","name":"Natural Science Foundation of Ningbo","doi-asserted-by":"publisher","award":["202003N4089"],"award-info":[{"award-number":["202003N4089"]}],"id":[{"id":"10.13039\/100007834","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004731","name":"Natural Science Foundation of Zhejiang Province","doi-asserted-by":"publisher","award":["LY20F020010"],"award-info":[{"award-number":["LY20F020010"]}],"id":[{"id":"10.13039\/501100004731","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61300055"],"award-info":[{"award-number":["61300055"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100017549","name":"Science and Technology Innovation 2025 Major Project of Ningbo","doi-asserted-by":"publisher","award":["2018B10010"],"award-info":[{"award-number":["2018B10010"]}],"id":[{"id":"10.13039\/501100017549","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100017549","name":"Science and Technology Innovation 2025 Major Project of Ningbo","doi-asserted-by":"publisher","award":["2019B10075"],"award-info":[{"award-number":["2019B10075"]}],"id":[{"id":"10.13039\/501100017549","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The number of channels is one of the important criteria in regard to digital audio quality. Generally, stereo audio with two channels can provide better perceptual quality than mono audio. To seek illegal commercial benefit, one might convert a mono audio system to stereo with fake quality. Identifying stereo-faking audio is a lesser-investigated audio forensic issue. In this paper, a stereo faking corpus is first presented, which is created using the Haas effect technique. Two identification algorithms for fake stereo audio are proposed. One is based on Mel-frequency cepstral coefficient features and support vector machines. The other is based on a specially designed five-layer convolutional neural network. The experimental results on two datasets with five different cut-off frequencies show that the proposed algorithm can effectively detect stereo-faking audio and has good robustness.<\/jats:p>","DOI":"10.3390\/info12070263","type":"journal-article","created":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T13:39:22Z","timestamp":1624887562000},"page":"263","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":47,"title":["Identification of Fake Stereo Audio Using SVM and CNN"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0056-3447","authenticated-orcid":false,"given":"Tianyun","family":"Liu","sequence":"first","affiliation":[{"name":"College of Information Science and Engineering, Ningbo University, Ningbo 315211, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5241-7276","authenticated-orcid":false,"given":"Diqun","family":"Yan","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Ningbo University, Ningbo 315211, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rangding","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Ningbo University, Ningbo 315211, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nan","family":"Yan","sequence":"additional","affiliation":[{"name":"Ningbo Polytechnic, Ningbo 315800, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gang","family":"Chen","sequence":"additional","affiliation":[{"name":"Ningbo Polytechnic, Ningbo 315800, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,28]]},"reference":[{"key":"ref_1","first-page":"252","article-title":"Research Progress on Key Technologies of Audio Forensics","volume":"31","author":"Yongqiang","year":"2016","journal-title":"J. Data Acquis. Process."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"432","DOI":"10.1109\/TIFS.2016.2622012","article-title":"Detection of Double Compressed AMR Audio using Stacked Autoencoder","volume":"12","author":"Luo","year":"2017","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Luo, D., Yang, R., and Huang, J. (2014, January 4\u20139). Detecting Double Compressed AMR Audio using Deep Learning. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6854084"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1109\/TIFS.2014.2301912","article-title":"Identification of Electronic Disguised Voices","volume":"9","author":"Wu","year":"2014","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wu, H., Wang, Y., and Huang, J. (2013, January 26\u201331). Blind detection of Electronic Disguised Voice. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6638211"},{"key":"ref_6","first-page":"46","article-title":"Detection algorithm of Electronic Disguised Voice based on Convolutional Neural Network","volume":"34","author":"Xu","year":"2018","journal-title":"Telecommun. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2179","DOI":"10.1109\/TIFS.2018.2812185","article-title":"Band Energy difference for source attribution in Audio Forensics","volume":"13","author":"Luo","year":"2018","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zou, L., He, Q., and Feng, X. (2015, January 19\u201324). Cell Phone Verification from Speech Recordings using Sparse Representation. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, South Brisbane, QLD, Australia.","DOI":"10.1109\/ICASSP.2015.7178278"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Qi, S., Huang, Z., Li, Y., and Shi, S. (2016, January 13\u201315). Audio Recording Device Identification based on Deep Learning. Proceedings of the International Conference on Signal & Image Processing, Beijing, China.","DOI":"10.1109\/SIPROCESS.2016.7888298"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.specom.2014.12.003","article-title":"Playback Attack Detection for Text-Dependent Speaker Verification over Telephone Channels","volume":"67","author":"Grzywacz","year":"2015","journal-title":"Speech Commun."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/978-981-15-2756-2_3","article-title":"Detection of Operation Type and Order for Digital Speech","volume":"Volume 635","author":"Wu","year":"2020","journal-title":"Proceedings of the 7th Conference on Sound and MUSIC Technology (CSMT)"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yang, R., Shi, Y., and Huang, J. (2009, January 7\u20138). Defeating Fake-Quality MP3. Proceedings of the 11th ACM Workshop on Multimedia and Security, MM & Sec, Princeton, NJ, USA.","DOI":"10.1145\/1597817.1597838"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mascia, M., Canclini, A., Antonacci, F., Tagliasacchi, M., Sarti, A., and Tubaro, S. (September, January 31). Forensic and anti-forensic analysis of indoor\/outdoor classifiers based on acoustic clues. Proceedings of the 2015 23rd European Signal Processing Conference, Nice, France.","DOI":"10.1109\/EUSIPCO.2015.7362749"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Vijayasenan, D., Kalluri, S.B., K, S., and Issac, A. (2016, January 8\u201310). Study of Wireless Channel Effects on Audio Forensics. Proceedings of the 2016 22nd Annual International Conference on Advanced Computing and Communication, Bangalore, India.","DOI":"10.1109\/ADCOM.2016.15"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Hadoltikar, V.A., Ratnaparkhe, V.R., and Kumar, R. (2019, January 12\u201314). Optimization of MFCC parameters for mobile phone recognition from audio recordings. Proceedings of the 2019 3rd International conference on Electronics, Communication and Aerospace Technology, Coimbatore, India.","DOI":"10.1109\/ICECA.2019.8822177"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhao, H., and Malik, H. (2019, January 12\u201314). Audio forensics using acoustic environment traces. Proceedings of the 2012 IEEE Statistical Signal Processing Workshop, Coimbatore, India.","DOI":"10.1109\/SSP.2012.6319707"},{"key":"ref_17","unstructured":"Jiang, Y., and Leung, F.H.F. (2016, January 23\u201326). Mobile phone identification from speech recordings using Weighted Support Vector Machine. Proceedings of the IECON 2016\u201442nd Annual Conference of the IEEE Industrial Electronics Society, Florence, Italy."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mitra, V., Franco, H., Graciarena, M., and Vergyri, D. (2014, January 4\u20139). Medium-duration modulation cepstral feature for robust speech recognition. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853898"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Athanaselis, T., Bakamidis, S., Giannopoulos, G., Dologlou, I., and Fotinea, E. (2008, January 10\u201312). Robust speech recognition in the presence of noise using medical data. Proceedings of the 2008 IEEE International Workshop on Imaging Systems and Techniques, Chania, Greece.","DOI":"10.1109\/IST.2008.4659999"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Subramanian, A.S., Weng, C., Yu, M., Zhang, S., Xu, Y., Watanabe, S., and Yu, D. (2020, January 4\u20138). Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053692"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1109\/TASLP.2020.3039600","article-title":"Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition","volume":"Volume 29","author":"Fan","year":"2021","journal-title":"IEEE\/ACM Transactions on Audio, Speech, and Language Processing"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Toruk, M.M., and Gokay, R. (2019, January 21\u201324). Short Utterance Speaker Recognition Using Time-Delay Neural Network. Proceedings of the 2019 16th International Multi-Conference on Systems, Signals & Devices, Istanbul, Turkey.","DOI":"10.1109\/SSD.2019.8893188"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Huang, C. (2019, January 14\u201318). Exploring Effective Data Augmentation with TDNN-LSTM Neural Network Embedding for Speaker Recognition. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop, Singapore.","DOI":"10.1109\/ASRU46091.2019.9003938"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Jagiasi, R., Ghosalkar, S., Kulal, P., and Bharambe, A. (2019, January 12\u201314). CNN based speaker recognition in language and text-independent small scale system. Proceedings of the 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), Palladam, India.","DOI":"10.1109\/I-SMAC47947.2019.9032667"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Desai, N., and Tahilramani, N. (2016, January 22\u201323). Digital Speech Watermarking for Authenticity of Speaker in Speaker Recognition System. Proceedings of the 2016 International Conference on Micro-Electronics and Telecommunication Engineering, Ghaziabad, India.","DOI":"10.1109\/ICMETE.2016.13"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"853","DOI":"10.1109\/TIFS.2016.2636095","article-title":"ESPRIT-Hilbert-Based Audio Tampering Detection With SVM Classifier for Forensic Analysis via Electrical Network Frequency","volume":"12","author":"Reis","year":"2017","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1480","DOI":"10.1109\/TMM.2016.2571999","article-title":"Audio Recapture Detection with Convolutional Neural Networks","volume":"18","author":"Lin","year":"2016","journal-title":"IEEE Trans. Multimed."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ciaburro, G. (2020). Sound Event Detection in Underground Parking Garage Using Convolutional Neural Network. Big Data Cogn. Comput., 4.","DOI":"10.3390\/bdcc4030020"},{"key":"ref_29","first-page":"146","article-title":"The influence of a single echo on the audibility of speech","volume":"20","author":"Haas","year":"1972","journal-title":"J. Audio Eng. Soc."},{"key":"ref_30","unstructured":"(2021, June 28). IMDbTop250. Available online: https:\/\/www.imdb.com\/chart\/top."},{"key":"ref_31","unstructured":"(2021, June 28). QQ Music. Available online: https:\/\/music.qq.com."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1109\/TASSP.1980.1163420","article-title":"Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences","volume":"28","author":"Davis","year":"1980","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1109\/TIFS.2019.2941773","article-title":"Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals","volume":"15","author":"Chowdhury","year":"2020","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1109\/LSP.2012.2235067","article-title":"A Novel Windowing Technique for Efficient Computation of MFCC for Speaker Recognition","volume":"20","author":"Sahidullah","year":"2013","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_35","unstructured":"Bishop, C. (2006). Pattern Recognition and Machine Learning, Springer Science+Business Media."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_37","unstructured":"Collobert, R., Koray, K., and Farabet, C. (2021, June 28). Torch7: A Matlab-like Environment for Machine Learning. Available online: http:\/\/publications.idiap.ch\/downloads\/papers\/2011\/Collobert_NIPSWORKSHOP_2011.pdf."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/12\/7\/263\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:26:14Z","timestamp":1760163974000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/12\/7\/263"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,28]]},"references-count":37,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["info12070263"],"URL":"https:\/\/doi.org\/10.3390\/info12070263","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,28]]}}}