{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:28:54Z","timestamp":1775230134567,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,14]]},"DOI":"10.1145\/3552466.3556525","type":"proceedings-article","created":{"date-parts":[[2022,10,1]],"date-time":"2022-10-01T12:27:26Z","timestamp":1664627246000},"page":"61-68","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio"],"prefix":"10.1145","author":[{"given":"Xinrui","family":"Yan","sequence":"first","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Jiangyan","family":"Yi","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Jianhua","family":"Tao","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Chenglong","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Haoxin","family":"Ma","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Tao","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Shiming","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Ruibo","family":"Fu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","first-page":"7962","volume-title":"Statistical parametric speech synthesis using deep neural networks. In 2013 ieee international conference on acoustics, speech and signal processing","author":"Ze Heiga","year":"2013","unstructured":"Heiga Ze , Andrew Senior , and Mike Schuster . Statistical parametric speech synthesis using deep neural networks. In 2013 ieee international conference on acoustics, speech and signal processing , pages 7962 -- 7966 . IEEE , 2013 . Heiga Ze, Andrew Senior, and Mike Schuster. Statistical parametric speech synthesis using deep neural networks. In 2013 ieee international conference on acoustics, speech and signal processing, pages 7962--7966. IEEE, 2013."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854318"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178816"},{"key":"e_1_3_2_2_4_1","first-page":"2243","volume-title":"Interspeech","author":"Wang Wenfu","year":"2016","unstructured":"Wenfu Wang , Shuang Xu , Bo Xu , First step towards end-to-end parametric tts synthesis: Generating spectral parameters with neural attention . In Interspeech , pages 2243 -- 2247 , 2016 . Wenfu Wang, Shuang Xu, Bo Xu, et al. First step towards end-to-end parametric tts synthesis: Generating spectral parameters with neural attention. In Interspeech, pages 2243--2247, 2016."},{"key":"e_1_3_2_2_5_1","volume-title":"Emphasis: An emotional phonemebased acoustic model for speech synthesis system. arXiv preprint arXiv:1806.09276","author":"Li Hao","year":"2018","unstructured":"Hao Li , Yongguo Kang , and Zhenyu Wang . Emphasis: An emotional phonemebased acoustic model for speech synthesis system. arXiv preprint arXiv:1806.09276 , 2018 . Hao Li, Yongguo Kang, and Zhenyu Wang. Emphasis: An emotional phonemebased acoustic model for speech synthesis system. arXiv preprint arXiv:1806.09276, 2018."},{"key":"e_1_3_2_2_6_1","volume-title":"Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499","author":"van den Oord Aaron","year":"2016","unstructured":"Aaron van den Oord , Sander Dieleman , Heiga Zen , Karen Simonyan , Oriol Vinyals , Alex Graves , Nal Kalchbrenner , Andrew Senior , and Koray Kavukcuoglu . Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 , 2016 . Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016."},{"key":"e_1_3_2_2_7_1","volume-title":"Visualizing data using t-sne. Journal of machine learning research, 9(11)","author":"der Maaten Laurens Van","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton . Visualizing data using t-sne. Journal of machine learning research, 9(11) , 2008 . Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008."},{"key":"e_1_3_2_2_8_1","volume-title":"Investigating self-supervised front ends for speech spoofing countermeasures. arXiv preprint arXiv:2111.07725","author":"Wang Xin","year":"2021","unstructured":"Xin Wang and Junichi Yamagishi . Investigating self-supervised front ends for speech spoofing countermeasures. arXiv preprint arXiv:2111.07725 , 2021 . Xin Wang and Junichi Yamagishi. Investigating self-supervised front ends for speech spoofing countermeasures. arXiv preprint arXiv:2111.07725, 2021."},{"key":"e_1_3_2_2_9_1","volume-title":"Half-truth: A partially fake audio detection dataset. arXiv preprint arXiv:2104.03617","author":"Yi Jiangyan","year":"2021","unstructured":"Jiangyan Yi , Ye Bai , Jianhua Tao , Zhengkun Tian , Chenglong Wang , Tao Wang , and Ruibo Fu . Half-truth: A partially fake audio detection dataset. arXiv preprint arXiv:2104.03617 , 2021 . Jiangyan Yi, Ye Bai, Jianhua Tao, Zhengkun Tian, Chenglong Wang, Tao Wang, and Ruibo Fu. Half-truth: A partially fake audio detection dataset. arXiv preprint arXiv:2104.03617, 2021."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414234"},{"key":"e_1_3_2_2_11_1","volume-title":"Continual learning for fake audio detection. arXiv preprint arXiv:2104.07286","author":"Ma Haoxin","year":"2021","unstructured":"Haoxin Ma , Jiangyan Yi , Jianhua Tao , Ye Bai , Zhengkun Tian , and Chenglong Wang . Continual learning for fake audio detection. arXiv preprint arXiv:2104.07286 , 2021 . Haoxin Ma, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, and Chenglong Wang. Continual learning for fake audio detection. arXiv preprint arXiv:2104.07286, 2021."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9746939"},{"key":"e_1_3_2_2_13_1","first-page":"283","volume-title":"Odyssey","volume":"2016","author":"Todisco Massimiliano","year":"2016","unstructured":"Massimiliano Todisco , H\u00e9ctor Delgado , and Nicholas WD Evans . A new feature for automatic speaker verification anti-spoofing: Constant q cepstral coefficients . In Odyssey , volume 2016 , pages 283 -- 290 , 2016 . Massimiliano Todisco, H\u00e9ctor Delgado, and Nicholas WD Evans. A new feature for automatic speaker verification anti-spoofing: Constant q cepstral coefficients. In Odyssey, volume 2016, pages 283--290, 2016."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1980.1163420"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2011.5947590"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2015-462"},{"key":"e_1_3_2_2_17_1","volume-title":"The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection","author":"Kinnunen Tomi","year":"2017","unstructured":"Tomi Kinnunen , Md Sahidullah , H\u00e9ctor Delgado , Massimiliano Todisco , Nicholas Evans , Junichi Yamagishi , and Kong Aik Lee . The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection . 2017 . Tomi Kinnunen, Md Sahidullah, H\u00e9ctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, and Kong Aik Lee. The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. 2017."},{"key":"e_1_3_2_2_18_1","volume-title":"Asvspoof 2019: Future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441","author":"Todisco Massimiliano","year":"2019","unstructured":"Massimiliano Todisco , Xin Wang , Ville Vestman , Md Sahidullah , H\u00e9ctor Delgado , Andreas Nautsch , Junichi Yamagishi , Nicholas Evans , Tomi Kinnunen , and Kong Aik Lee . Asvspoof 2019: Future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 , 2019 . Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, H\u00e9ctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, and Kong Aik Lee. Asvspoof 2019: Future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441, 2019."},{"key":"e_1_3_2_2_19_1","volume-title":"Tomi Kinnunen, Nicholas Evans, et al. Asvspoof 2021: accelerating progress in spoofed and deepfake speech detection. arXiv preprint arXiv:2109.00537","author":"Yamagishi Junichi","year":"2021","unstructured":"Junichi Yamagishi , Xin Wang , Massimiliano Todisco , Md Sahidullah , Jose Patino , Andreas Nautsch , Xuechen Liu , Kong Aik Lee , Tomi Kinnunen, Nicholas Evans, et al. Asvspoof 2021: accelerating progress in spoofed and deepfake speech detection. arXiv preprint arXiv:2109.00537 , 2021 . Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, et al. Asvspoof 2021: accelerating progress in spoofed and deepfake speech detection. arXiv preprint arXiv:2109.00537, 2021."},{"key":"e_1_3_2_2_20_1","first-page":"7120","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Bagherinezhad Hessam","year":"2017","unstructured":"Hessam Bagherinezhad , Mohammad Rastegari , and Ali Farhadi . Lcnn : Lookupbased convolutional neural network . In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 7120 -- 7129 , 2017 . Hessam Bagherinezhad, Mohammad Rastegari, and Ali Farhadi. Lcnn: Lookupbased convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7120--7129, 2017."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461375"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/29.21701"},{"key":"e_1_3_2_2_25_1","volume-title":"exploitation of the other aspect of vocoder: Perceptually isomorphic decomposition of speech sounds. Acoustical science and technology, 27(6):349--353","author":"Kawahara Hideki","year":"2006","unstructured":"Hideki Kawahara . Straight , exploitation of the other aspect of vocoder: Perceptually isomorphic decomposition of speech sounds. Acoustical science and technology, 27(6):349--353 , 2006 . Hideki Kawahara. Straight, exploitation of the other aspect of vocoder: Perceptually isomorphic decomposition of speech sounds. Acoustical science and technology, 27(6):349--353, 2006."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1587\/transinf.2015EDP7457"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682804"},{"key":"e_1_3_2_2_28_1","first-page":"3918","volume-title":"International conference on machine learning","author":"Oord Aaron","year":"2018","unstructured":"Aaron Oord , Yazhe Li , Igor Babuschkin , Karen Simonyan , Oriol Vinyals , Koray Kavukcuoglu , George Driessche , Edward Lockhart , Luis Cobo , Florian Stimberg , Parallel wavenet : Fast high-fidelity speech synthesis . In International conference on machine learning , pages 3918 -- 3926 . PMLR, 2018 . Aaron Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George Driessche, Edward Lockhart, Luis Cobo, Florian Stimberg, et al. Parallel wavenet: Fast high-fidelity speech synthesis. In International conference on machine learning, pages 3918--3926. PMLR, 2018."},{"key":"e_1_3_2_2_29_1","first-page":"2410","volume-title":"International Conference on Machine Learning","author":"Kalchbrenner Nal","year":"2018","unstructured":"Nal Kalchbrenner , Erich Elsen , Karen Simonyan , Seb Noury , Norman Casagrande , Edward Lockhart , Florian Stimberg , Aaron Oord , Sander Dieleman , and Koray Kavukcuoglu . Efficient neural audio synthesis . In International Conference on Machine Learning , pages 2410 -- 2419 . PMLR, 2018 . Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron Oord, Sander Dieleman, and Koray Kavukcuoglu. Efficient neural audio synthesis. In International Conference on Machine Learning, pages 2410--2419. PMLR, 2018."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053795"},{"key":"e_1_3_2_2_31_1","first-page":"17022","article-title":"Generative adversarial networks for efficient and high fidelity speech synthesis","volume":"33","author":"Kong Jungil","year":"2020","unstructured":"Jungil Kong , Jaehyeon Kim , and Jaekyoung Bae . Hifi-gan : Generative adversarial networks for efficient and high fidelity speech synthesis . Advances in Neural Information Processing Systems , 33 : 17022 -- 17033 , 2020 . Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems, 33:17022--17033, 2020.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT48900.2021.9383551"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9413605"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1984.1164317"},{"key":"e_1_3_2_2_35_1","first-page":"1","article-title":"An overview on video forensics","author":"Milani Simone","year":"2012","unstructured":"Simone Milani , Marco Fontani , Paolo Bestagini , Mauro Barni , Alessandro Piva , Marco Tagliasacchi , and Stefano Tubaro . An overview on video forensics . APSIPA Transactions on Signal and Information Processing , 1 , 2012 . Simone Milani, Marco Fontani, Paolo Bestagini, Mauro Barni, Alessandro Piva, Marco Tagliasacchi, and Stefano Tubaro. An overview on video forensics. APSIPA Transactions on Signal and Information Processing, 1, 2012.","journal-title":"APSIPA Transactions on Signal and Information Processing"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2012.2205568"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00765"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12864-019-6413-7"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.21437\/Odyssey.2018-15"},{"key":"e_1_3_2_2_40_1","volume-title":"Stc antispoofing systems for the asvspoof2019 challenge. arXiv preprint arXiv:1904.05576","author":"Lavrentyeva Galina","year":"2019","unstructured":"Galina Lavrentyeva , Sergey Novoselov , Andzhukaev Tseren , Marina Volkova , Artem Gorlanov , and Alexandr Kozlov . Stc antispoofing systems for the asvspoof2019 challenge. arXiv preprint arXiv:1904.05576 , 2019 Galina Lavrentyeva, Sergey Novoselov, Andzhukaev Tseren, Marina Volkova, Artem Gorlanov, and Alexandr Kozlov. Stc antispoofing systems for the asvspoof2019 challenge. arXiv preprint arXiv:1904.05576, 2019"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3552466.3556525","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3552466.3556525","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:25Z","timestamp":1750182565000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3552466.3556525"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":40,"alternative-id":["10.1145\/3552466.3556525","10.1145\/3552466"],"URL":"https:\/\/doi.org\/10.1145\/3552466.3556525","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}