{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T06:14:59Z","timestamp":1773900899250,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T00:00:00Z","timestamp":1680739200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62177022"],"award-info":[{"award-number":["62177022"]}]},{"name":"National Natural Science Foundation of China","award":["61901165"],"award-info":[{"award-number":["61901165"]}]},{"name":"National Natural Science Foundation of China","award":["61501199"],"award-info":[{"award-number":["61501199"]}]},{"name":"National Natural Science Foundation of China","award":["CCNUAI&FE2022-03-01"],"award-info":[{"award-number":["CCNUAI&FE2022-03-01"]}]},{"name":"National Natural Science Foundation of China","award":["xtzd2021-005"],"award-info":[{"award-number":["xtzd2021-005"]}]},{"name":"National Natural Science Foundation of China","award":["2022CFA007"],"award-info":[{"award-number":["2022CFA007"]}]},{"name":"AI and Faculty Empowerment Pilot Project","award":["62177022"],"award-info":[{"award-number":["62177022"]}]},{"name":"AI and Faculty Empowerment Pilot Project","award":["61901165"],"award-info":[{"award-number":["61901165"]}]},{"name":"AI and Faculty Empowerment Pilot Project","award":["61501199"],"award-info":[{"award-number":["61501199"]}]},{"name":"AI and Faculty Empowerment Pilot Project","award":["CCNUAI&FE2022-03-01"],"award-info":[{"award-number":["CCNUAI&FE2022-03-01"]}]},{"name":"AI and Faculty Empowerment Pilot Project","award":["xtzd2021-005"],"award-info":[{"award-number":["xtzd2021-005"]}]},{"name":"AI and Faculty Empowerment Pilot Project","award":["2022CFA007"],"award-info":[{"award-number":["2022CFA007"]}]},{"name":"the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by the MOE and Hubei Province","award":["62177022"],"award-info":[{"award-number":["62177022"]}]},{"name":"the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by the MOE and Hubei Province","award":["61901165"],"award-info":[{"award-number":["61901165"]}]},{"name":"the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by the MOE and Hubei Province","award":["61501199"],"award-info":[{"award-number":["61501199"]}]},{"name":"the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by the MOE and Hubei Province","award":["CCNUAI&FE2022-03-01"],"award-info":[{"award-number":["CCNUAI&FE2022-03-01"]}]},{"name":"the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by the MOE and Hubei Province","award":["xtzd2021-005"],"award-info":[{"award-number":["xtzd2021-005"]}]},{"name":"the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by the MOE and Hubei Province","award":["2022CFA007"],"award-info":[{"award-number":["2022CFA007"]}]},{"name":"Natural Science Foundation of Hubei Province","award":["62177022"],"award-info":[{"award-number":["62177022"]}]},{"name":"Natural Science Foundation of Hubei Province","award":["61901165"],"award-info":[{"award-number":["61901165"]}]},{"name":"Natural Science Foundation of Hubei Province","award":["61501199"],"award-info":[{"award-number":["61501199"]}]},{"name":"Natural Science Foundation of Hubei Province","award":["CCNUAI&FE2022-03-01"],"award-info":[{"award-number":["CCNUAI&FE2022-03-01"]}]},{"name":"Natural Science Foundation of Hubei Province","award":["xtzd2021-005"],"award-info":[{"award-number":["xtzd2021-005"]}]},{"name":"Natural Science Foundation of Hubei Province","award":["2022CFA007"],"award-info":[{"award-number":["2022CFA007"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Source acquisition device identification from recorded audio aims to identify the source recording device by analyzing the intrinsic characteristics of audio, which is a challenging problem in audio forensics. In this paper, we propose a spatiotemporal representation learning framework with multi-attention mechanisms to tackle this problem. In the deep feature extraction stage of recording devices, a two-branch network based on residual dense temporal convolution networks (RD-TCNs) and convolutional neural networks (CNNs) is constructed. The spatial probability distribution features of audio signals are employed as inputs to the branch of the CNN for spatial representation learning, and the temporal spectral features of audio signals are fed into the branch of the RD-TCN network for temporal representation learning. This achieves simultaneous learning of long-term and short-term features to obtain an accurate representation of device-related information. In the spatiotemporal feature fusion stage, three attention mechanisms\u2014temporal, spatial, and branch attention mechanisms\u2014are designed to capture spatiotemporal weights and achieve effective deep feature fusion. The proposed framework achieves state-of-the-art performance on the benchmark CCNU_Mobile dataset, reaching an accuracy of 97.6% for the identification of 45 recording devices, with a significant reduction in training time compared to other models.<\/jats:p>","DOI":"10.3390\/e25040626","type":"journal-article","created":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T08:41:52Z","timestamp":1680770512000},"page":"626","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Source Acquisition Device Identification from Recorded Audio Based on Spatiotemporal Representation Learning with Multi-Attention Mechanisms"],"prefix":"10.3390","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5799-6692","authenticated-orcid":false,"given":"Chunyan","family":"Zeng","sequence":"first","affiliation":[{"name":"Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology, Wuhan 430068, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shixiong","family":"Feng","sequence":"additional","affiliation":[{"name":"Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology, Wuhan 430068, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongliang","family":"Zhu","sequence":"additional","affiliation":[{"name":"National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6960-509X","authenticated-orcid":false,"given":"Zhifeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Digital Media Technology, Central China Normal University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1108\/IJWIS-06-2020-0038","article-title":"An end-to-end deep source recording device identification system for Web media forensics","volume":"16","author":"Zeng","year":"2020","journal-title":"Int. J. Web Inf. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1109\/MSP.2008.931080","article-title":"Audio forensic examination","volume":"26","author":"Maher","year":"2009","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1186\/s13634-022-00900-4","article-title":"Shallow and Deep Feature Fusion for Digital Audio Tampering Detection","volume":"2022","author":"Wang","year":"2022","journal-title":"EURASIP J. Adv. Signal Process."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"94","DOI":"10.4018\/IJDCF.302894","article-title":"Audio Tampering Forensics Based on Representation Learning of ENF Phase Sequence","volume":"14","author":"Zeng","year":"2022","journal-title":"Int. J. Digit. Crime Forensics"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2179","DOI":"10.1109\/TIFS.2018.2812185","article-title":"Band Energy Difference for Source Attribution in Audio Forensics","volume":"13","author":"Luo","year":"2018","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Cuccovillo, L., and Aichroth, P. (2016, January 20\u201325). Open-set microphone classification via blind channel analysis. Proceedings of the IEEE 2016 International Conference on Communications and Signal Processing (ICCSP), Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7472042"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1746","DOI":"10.1109\/TIFS.2013.2278843","article-title":"Audio Recording Location Identification Using Acoustic Environment Signature","volume":"8","author":"Zhao","year":"2013","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1109\/TIFS.2011.2178403","article-title":"Recognition of Brand and Models of Cell-Phones from Recorded Speech Signals","volume":"7","author":"Hanilci","year":"2012","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hadoltikar, V.A., Ratnaparkhe, V.R., and Kumar, R. (2019, January 12\u201314). Optimization of MFCC parameters for mobile phone recognition from audio recordings. Proceedings of the IEEE 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India.","DOI":"10.1109\/ICECA.2019.8822177"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.dsp.2014.08.008","article-title":"Source cell-phone recognition from recorded speech using non-speech segments","volume":"35","author":"Hanilci","year":"2014","journal-title":"Digit. Signal Process."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Aggarwal, R., Singh, S., Roul, A.K., and Khanna, N. (2014, January 3\u20135). Cellphone identification using noise estimates from recorded audio. Proceedings of the IEEE 2014 International Conference on Communications and Signal Processing (ICCSP), Melmaruvathur, India.","DOI":"10.1109\/ICCSP.2014.6950045"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kotropoulos, C., and Samaras, S. (2014, January 20\u201323). Mobile phone identification using recorded speech signals. Proceedings of the IEEE 2014 International Conference on Digital Signal Processing (DSP), Hong Kong, China.","DOI":"10.1109\/ICDSP.2014.6900732"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2875","DOI":"10.1109\/TIFS.2019.2911175","article-title":"Source Microphone Recognition Aided by a Kernel-Based Projection Method","volume":"14","author":"Jiang","year":"2019","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Garcia-Romero, D., and Espy-Wilson, C.Y. (2010, January 14\u201319). Automatic acquisition device identification from speech recordings. Proceedings of the IEEE 2010 International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA.","DOI":"10.1109\/ICASSP.2010.5495407"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1109\/TIFS.2017.2774505","article-title":"Mobile Phone Clustering from Speech Recordings Using Deep Representation and Spectral Clustering","volume":"13","author":"Li","year":"2018","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Qin, T., Wang, R., Yan, D., and Lin, L. (2018). Source Cell-Phone Identification in the Presence of Additive Noise from CQT Domain. Information, 9.","DOI":"10.3390\/info9080205"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"6001504","DOI":"10.1109\/LSENS.2019.2923590","article-title":"Microphone Identification Using Convolutional Neural Networks","volume":"3","author":"Baldini","year":"2019","journal-title":"IEEE Sens. Lett."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3681","DOI":"10.1109\/TKDE.2020.3025580","article-title":"Deep Learning for Spatio-Temporal Data Mining: A Survey","volume":"34","author":"Wang","year":"2020","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lyu, L., Wang, Z., Yun, H., Yang, Z., and Li, Y. (2022). Deep Knowledge Tracing Based on Spatial and Temporal Representation Learning for Learning Performance Prediction. Appl. Sci., 12.","DOI":"10.3390\/app12147188"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wu, Y., Zhu, L., Yan, Y., and Yang, Y. (November, January 27). Dual Attention Matching for Audio-Visual Event Localization. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00639"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1109\/LSP.2006.870086","article-title":"Support vector machines using GMM supervectors for speaker verification","volume":"13","author":"Campbell","year":"2006","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"S109","DOI":"10.1121\/1.2027823","article-title":"A mixture modeling approach to text-independent speaker ID","volume":"87","author":"Reynolds","year":"1990","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Jin, C., Wang, R., Yan, D., Tao, B., Chen, Y., and Pei, A. (2016, January 17\u201319). Source Cell-Phone Identification Using Spectral Features of Device Self-noise. Proceedings of the Digital Forensics and Watermarking: 15th International Workshop (IWDW), Beijing, China.","DOI":"10.1007\/978-3-319-53465-7_3"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1108\/IJWIS-11-2020-0073","article-title":"SAE Based Unified Double JPEG Compression Detection System for Web Image Forensics","volume":"17","author":"Wang","year":"2021","journal-title":"Int. J. Web Inf. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1007\/s11760-021-01955-w","article-title":"Cascade Neural Network-Based Joint Sampling and Reconstruction for Image Compressed Sensing","volume":"16","author":"Zeng","year":"2022","journal-title":"Signal Image Video Process."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1593","DOI":"10.1007\/s00034-022-02181-6","article-title":"High-Quality Image Compressed Sensing and Reconstruction with Multi-Scale Dilated Convolutional Neural Network","volume":"42","author":"Wang","year":"2023","journal-title":"Circuits Syst. Signal Process."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zeng, C., Yan, K., Wang, Z., Yu, Y., Xia, S., and Zhao, N. (2022). Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks. Signal Image Video Process., 1\u20138.","DOI":"10.1007\/s11760-022-02313-0"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, X., Li, X., Feng, X., Yang, J., Chen, A., and He, Q. (2017, January 5\u20139). Mobile phone clustering from acquired speech recordings using deep Gaussian supervector and spectral clustering. Proceedings of the IEEE 2017 International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952534"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1109\/LSP.2020.2985594","article-title":"Subband Aware CNN for Cell-Phone Recognition","volume":"27","author":"Lin","year":"2020","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Qi, S., Huang, Z., Li, Y., and Shi, S. (2016, January 13\u201315). Audio recording device identification based on deep learning. Proceedings of the IEEE 2016 International Conference on Signal and Image Processing (ICSIP), Beijing, China.","DOI":"10.1109\/SIPROCESS.2016.7888298"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Graves, A., Mohamed, A., and Hinton, G. (2013, January 26\u201331). Speech recognition with deep recurrent neural networks. Proceedings of the IEEE 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"ref_32","unstructured":"Zaremba, W., Sutskever, I., and Vinyals, O. (2014). Recurrent neural network regularization. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Tian, Y., Kong, Y., Zhong, B., and Fu, Y. (2018, January 18\u201323). Residual dense network for image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00262"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1109\/TASL.2013.2243436","article-title":"Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning","volume":"21","author":"Rao","year":"2013","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., and Khudanpur, S. (2018, January 15\u201320). X-Vectors: Robust DNN Embeddings for Speaker Recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461375"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Peddinti, V., Povey, D., and Khudanpur, S. (2015, January 6\u201310). A time delay neural network architecture for efficient modeling of long temporal contexts. Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech 2015), Dresden, Germany.","DOI":"10.21437\/Interspeech.2015-647"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1551","DOI":"10.1631\/FITEE.2100463","article-title":"Multiple knowledge representation for big data artificial intelligence: Framework, applications, and case studies","volume":"22","author":"Yang","year":"2021","journal-title":"Front. Inf. Technol. Electron. Eng."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/4\/626\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:11:12Z","timestamp":1760123472000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/4\/626"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,6]]},"references-count":37,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["e25040626"],"URL":"https:\/\/doi.org\/10.3390\/e25040626","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,6]]}}}