{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T20:32:35Z","timestamp":1770755555944,"version":"3.50.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,3,8]],"date-time":"2024-03-08T00:00:00Z","timestamp":1709856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2021ZD0112100"],"award-info":[{"award-number":["2021ZD0112100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National NSF of China","award":["U1936212, 62120106009, 62261160653"],"award-info":[{"award-number":["U1936212, 62120106009, 62261160653"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>Over the past few years, deep generative models have significantly evolved, enabling the synthesis of realistic content and also bringing security concerns of illegal misuse. Therefore, active protection for generative models has been proposed recently, aiming to generate samples with hidden messages for future identification while preserving the original generating performance. However, existing active protection methods are specifically designed for generative adversarial networks (GANs), restricted to handling unconditional image generation. We observe that they get limited identification performance and visual quality when handling audio-driven video generation conditioned on target audio and source input to drive video generation with consistent context, e.g., identity and movement, between frame sequences. To address this issue, we introduce a simple yet effective active<jats:bold>P<\/jats:bold>rotection framework for<jats:bold>A<\/jats:bold>udio-<jats:bold>D<\/jats:bold>riven<jats:bold>V<\/jats:bold>ideo<jats:bold>G<\/jats:bold>eneration, named PADVG. To be specific, we present a novel frame-shared embedding module in which messages to hide are first transformed into frame-shared message coefficients. Then, these coefficients are assembled with the intermediate feature maps of video generators at multiple feature levels to generate the embedded video frames. Besides, PADVG further considers two visual consistent losses: (i) intra-frame loss is utilized to keep the visual consistency with different hidden messages; (ii) inter-frame loss is used to preserve the visual consistency across different video frames. Moreover, we also propose an auxiliary denoising training strategy through perturbing the assembled features by learnable pixel-level noise to improve identification performance, while enhancing robustness against real-world disturbances. Extensive experiments demonstrate that our proposed PADVG for audio-driven video generation can effectively identify the generated videos and achieve high visual quality.<\/jats:p>","DOI":"10.1145\/3638556","type":"journal-article","created":{"date-parts":[[2024,1,16]],"date-time":"2024-01-16T12:01:38Z","timestamp":1705406498000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["PADVG: A Simple Baseline of Active Protection for Audio-Driven Video Generation"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-6347-5060","authenticated-orcid":false,"given":"Huan","family":"Liu","sequence":"first","affiliation":[{"name":"Institute of Information Science, Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Haidian Qu, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3940-207X","authenticated-orcid":false,"given":"Xiaolong","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Information Science, Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Haidian Qu, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8501-4123","authenticated-orcid":false,"given":"Zichang","family":"Tan","sequence":"additional","affiliation":[{"name":"Institute of Deep Learning and National Engineering Laboratory for Deep Learning Technology and Application, Baidu Research, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6181-6044","authenticated-orcid":false,"given":"Xiaolong","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Information Science, Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Haidian Qu, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8581-9554","authenticated-orcid":false,"given":"Yao","family":"Zhao","sequence":"additional","affiliation":[{"name":"Institute of Information Science, Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Haidian Qu, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2024,3,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Triantafyllos Afouras Joon Son Chung Andrew Senior Oriol Vinyals and Andrew Zisserman. 2018. Deep audiovisual speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 12 (2018) 8717\u20138727.","DOI":"10.1109\/TPAMI.2018.2889052"},{"key":"e_1_3_2_3_2","unstructured":"Triantafyllos Afouras Joon Son Chung and Andrew Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. arXiv:1809.00496. Retrieved from https:\/\/arxiv.org\/abs\/1809.00496"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAIE.2010.5735066"},{"key":"e_1_3_2_5_2","unstructured":"Miles Brundage Shahar Avin Jack Clark Helen Toner Peter Eckersley Ben Garfinkel Allan Dafoe Paul Scharre Thomas Zeitzoff Bobby Filar Hyrum S. Anderson Heather Roff Gregory C. Allen Jacob Steinhardt Carrick Flynn Se\u00e1n \u00d3 h\u00c9igeartaigh Simon Beard Haydn Belfield Sebastian Farquhar Clare Lyle Rebecca Crootof Owain Evans Michael Page Joanna Bryson Roman Yampolskiy and Dario Amodei. 2018. The malicious use of artificial intelligence: Forecasting prevention and mitigation. arXiv:1802.07228 (2018). Retrieved from https:\/\/arxiv.org\/abs\/1802.07228"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i2.16193"},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","unstructured":"Joon Son Chung and Andrew Zisserman. 2016. Lip reading in the wild. In ACCV (2) (Lecture Notes in Computer Science Vol. 10112) Shang-Hong Lai Vincent Lepetit Ko Nishino and Yoichi Sato (Eds.). Springer 87\u2013103.","DOI":"10.1007\/978-3-319-54184-6_6"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9746058"},{"key":"e_1_3_2_9_2","first-page":"754105","volume-title":"Proceedings of the SPIE","volume":"7541","author":"Filler Tom\u00e1s","year":"2010","unstructured":"Tom\u00e1s Filler, Jan Judas, and Jessica J. Fridrich. 2010. Minimizing embedding impact in steganography using trellis-coded quantization. In Proceedings of the SPIE. Vol. 7541, SPIE, 754105."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3536426"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2021\/336"},{"key":"e_1_3_2_12_2","first-page":"2672","volume-title":"Proceedings of the NIPS","author":"Goodfellow Ian J.","year":"2014","unstructured":"Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the NIPS. 2672\u20132680."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00573"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_15_2","first-page":"6626","article-title":"GANs trained by a two time-scale update rule converge to a local nash equilibrium","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the NIPS. 6626\u20136637.","journal-title":"Proceedings of the NIPS"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2021\/102"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.167"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530745"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475324"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2970919"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00813"},{"key":"e_1_3_2_22_2","volume-title":"Proceedings of the ICLR (Poster)","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the ICLR (Poster)."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/79.879337"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00505"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00853"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018698"},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"Huan Liu Zichang Tan Qiang Chen Yunchao Wei Yao Zhao and Jingdong Wang. 2023. Unified frequency-assisted transformer framework for detecting and grounding multi-modal manipulation. arXiv:2309.09667. Retrieved from https:\/\/arxiv.org\/abs\/2309.09667","DOI":"10.1007\/s11263-024-02245-x"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3558004"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480484"},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","unstructured":"Arun Mallya Ting-Chun Wang Karan Sapra and Ming-Yu Liu. 2020. World-consistent video-to-video synthesis. In ECCV (8) (Lecture Notes in Computer Science Vol. 12353) Andrea Vedaldi Horst Bischof Thomas Brox and Jan-Michael Frahm (Eds.). Springer 359\u2013378.","DOI":"10.1007\/978-3-030-58598-3_22"},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","unstructured":"Iacopo Masi Aditya Killekar Royston Marian Mascarenhas Shenoy Pratik Gurudatt and Wael AbdAlmageed. 2020. Two-branch recurrent network for isolating deepfakes in videos. In ECCV (7) (Lecture Notes in Computer Science Vol. 12352) Andrea Vedaldi Horst Bischof Thomas Brox and Jan-Michael Frahm (Eds.). Springer 667\u2013684.","DOI":"10.1007\/978-3-030-58571-6_39"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2022.3233774"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2022.3198275"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017-950"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00363"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413532"},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","unstructured":"Yuyang Qian Guojun Yin Lu Sheng Zixuan Chen and Jing Shao. 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In ECCV (12) (Lecture Notes in Computer Science Vol. 12357) Andrea Vedaldi Horst Bischof Thomas Brox and Jan-Michael Frahm (Eds.). Springer 86\u2013103.","DOI":"10.1007\/978-3-030-58610-2_6"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01350"},{"key":"e_1_3_2_39_2","first-page":"1291","volume-title":"Proceedings of the USENIX Security Symposium","author":"Salem Ahmed","year":"2020","unstructured":"Ahmed Salem, Apratim Bhattacharya, Michael Backes, Mario Fritz, and Yang Zhang. 2020. Updates-leak: Data set inference and reconstruction attacks in online learning. In Proceedings of the USENIX Security Symposium. USENIX Association, 1291\u20131308."},{"key":"e_1_3_2_40_2","doi-asserted-by":"crossref","unstructured":"Toby Sharp. 2001. An implementation of key-based digital signal steganography. In Information Hiding (Lecture Notes in Computer Science Vol. 2137) Ira S. Moskowitz (Ed.). Springer 13\u201326.","DOI":"10.1007\/3-540-45496-9_2"},{"key":"e_1_3_2_41_2","volume-title":"Proceedings of the ICLR","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the ICLR."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00219"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2021.116593"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475518"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME51207.2021.9428410"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00991"},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","unstructured":"Xintao Wang Ke Yu Shixiang Wu Jinjin Gu Yihao Liu Chao Dong Yu Qiao and Chen Change Loy. 2018. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In ECCV Workshops (5) (Lecture Notes in Computer Science Vol. 11133) Laura Leal-Taix\u00e9 and Stefan Roth (Eds.). Springer 63\u201379.","DOI":"10.1007\/978-3-030-11021-5_5"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1631\/FITEE.2100463"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01418"},{"key":"e_1_3_2_50_2","article-title":"Responsible disclosure of generative models using scalable fingerprinting","author":"Yu Ning","year":"2022","unstructured":"Ning Yu, Vladislav Skripniuk, Dingfan Chen, Larry Davis, and Mario Fritz. 2022. Responsible disclosure of generative models using scalable fingerprinting. In Proceedings of the ICLR.","journal-title":"Proceedings of the ICLR"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3499026"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00384"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00222"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00416"},{"key":"e_1_3_2_55_2","volume-title":"Digital Watermarking for Verification of Perception-Based Integrity of Audio Data","author":"Zmudzinski Sascha","year":"2017","unstructured":"Sascha Zmudzinski. 2017. Digital Watermarking for Verification of Perception-Based Integrity of Audio Data. Ph.D. Dissertation. Darmstadt University of Technology, Germany."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3638556","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3638556","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:06:13Z","timestamp":1750291573000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3638556"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,8]]},"references-count":54,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3638556"],"URL":"https:\/\/doi.org\/10.1145\/3638556","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,8]]},"assertion":[{"value":"2023-04-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-10","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}