{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T18:23:01Z","timestamp":1780510981276,"version":"3.54.1"},"reference-count":270,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2025,1,22]],"date-time":"2025-01-22T00:00:00Z","timestamp":1737504000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["III-2106758, CRII-2246067, POSE-2346158, ATD-2427915"],"award-info":[{"award-number":["III-2106758, CRII-2246067, POSE-2346158, ATD-2427915"]}]},{"DOI":"10.13039\/100008234","name":"Lehigh","doi-asserted-by":"crossref","award":["FRGS00011497"],"award-info":[{"award-number":["FRGS00011497"]}],"id":[{"id":"10.13039\/100008234","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>\n            Recently, Artificial Intelligence Generated Content (AIGC) has gained significant attention from society, especially with the rise of Generative AI (GAI) techniques such as ChatGPT, GPT-4 [\n            <jats:xref ref-type=\"bibr\">165<\/jats:xref>\n            ], DALL-E-3 [\n            <jats:xref ref-type=\"bibr\">184<\/jats:xref>\n            ], and Sora [\n            <jats:xref ref-type=\"bibr\">137<\/jats:xref>\n            ]. AIGC involves using AI models to create digital content, such as images, music, and natural language, with the goal of making the content creation process more efficient and accessible. Large-scale models have become increasingly important in AIGC as they provide better intent extraction and generation results. This survey provides a comprehensive review of the history of generative models and recent advances in AIGC, focusing on both unimodal and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, the survey discusses the existing open problems and future challenges in AIGC. Overall, this survey serves as a valuable resource for individuals interested in understanding the background and secrets behind the impressive performance of AIGC techniques.\n          <\/jats:p>\n          <jats:p\/>","DOI":"10.1145\/3704262","type":"journal-article","created":{"date-parts":[[2024,12,6]],"date-time":"2024-12-06T11:07:17Z","timestamp":1733483237000},"page":"1-38","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":129,"title":["A Survey of AI-Generated Content (AIGC)"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4420-8252","authenticated-orcid":false,"given":"Yihan","family":"Cao","sequence":"first","affiliation":[{"name":"Premium AI, LinkedIn Corp, Mountain View, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-0561-1738","authenticated-orcid":false,"given":"Siyu","family":"Li","sequence":"additional","affiliation":[{"name":"Emory University, Atlanta, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3856-439X","authenticated-orcid":false,"given":"Yixin","family":"Liu","sequence":"additional","affiliation":[{"name":"Lehigh University, Bethlehem, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-0322-431X","authenticated-orcid":false,"given":"Zhiling","family":"Yan","sequence":"additional","affiliation":[{"name":"Lehigh University, Bethlehem, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4212-2017","authenticated-orcid":false,"given":"Yutong","family":"Dai","sequence":"additional","affiliation":[{"name":"Lehigh University, Bethlehem, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3491-5968","authenticated-orcid":false,"given":"Philip","family":"Yu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Chicago, Chicago, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1539-7939","authenticated-orcid":false,"given":"Lichao","family":"Sun","sequence":"additional","affiliation":[{"name":"Lehigh University, Bethlehem, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,1,22]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"2022. ChatGPT: Optimizing Language Models for Dialogue. Retrieved from https:\/\/openai.com\/blog\/chatgpt\/"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1109\/SP40001.2021.00083","volume-title":"Proceedings of the 2021 IEEE Symposium on Security and Privacy","author":"Abdelnabi Sahar","year":"2021","unstructured":"Sahar Abdelnabi and Mario Fritz. 2021. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In Proceedings of the 2021 IEEE Symposium on Security and Privacy. IEEE, 121\u2013140."},{"key":"e_1_3_2_4_2","unstructured":"Armen Aghajanyan Dmytro Okhonko Mike Lewis Mandar Joshi Hu Xu Gargi Ghosh and Luke Zettlemoyer. 2021. HTLM: Hyper-Text Pre-Training and Prompting of Language Models. arXiv:2107.06955. Retrieved from https:\/\/arxiv.org\/abs\/2107.06955"},{"key":"e_1_3_2_5_2","unstructured":"Armen Aghajanyan Lili Yu Alexis Conneau Wei-Ning Hsu Karen Hambardzumyan Susan Zhang Stephen Roller Naman Goyal Omer Levy and Luke Zettlemoyer. 2023. Scaling laws for generative mixed-modal language models. arXiv:2301.03728. Retrieved from https:\/\/arxiv.org\/abs\/2301.03728"},{"key":"e_1_3_2_6_2","unstructured":"Jorge Agnese Jonathan Herrera Haicheng Tao and Xingquan Zhu. 2019. A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis. arXiv:1910.09399. Retrieved from https:\/\/arxiv.org\/abs\/1910.09399"},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","unstructured":"Wasi Uddin Ahmad Saikat Chakraborty Baishakhi Ray and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. arXiv:2103.06333. Retrieved from https:\/\/arxiv.org\/abs\/2103.06333","DOI":"10.18653\/v1\/2021.naacl-main.211"},{"key":"e_1_3_2_8_2","unstructured":"Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katie Millican Malcolm Reynolds et\u00a0al. 2022. Flamingo: A visual language model for few-shot learning. arXiv:2204.14198. Retrieved from https:\/\/arxiv.org\/abs\/2204.14198"},{"key":"e_1_3_2_9_2","unstructured":"Miltiadis Allamanis Marc Brockschmidt and Mahmoud Khademi. 2017. Learning to represent programs with graphs. arXiv:1711.00740. Retrieved from https:\/\/arxiv.org\/abs\/1711.00740"},{"key":"e_1_3_2_10_2","unstructured":"Xavier Amatriain. 2023. Transformer models: An introduction and catalog. arXiv:2302.07730. Retrieved from https:\/\/arxiv.org\/abs\/2302.07730"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-021-10039-7"},{"key":"e_1_3_2_12_2","unstructured":"Vamsi Aribandi Yi Tay Tal Schuster Jinfeng Rao Huaixiu Steven Zheng Sanket Vaibhav Mehta Honglei Zhuang Vinh Q Tran Dara Bahri Jianmo Ni et\u00a0al. 2021. Ext5: Towards extreme multi-task scaling for transfer learning. arXiv:2111.10952. Retrieved from https:\/\/arxiv.org\/abs\/2111.10952"},{"key":"e_1_3_2_13_2","unstructured":"Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https:\/\/arxiv.org\/abs\/1607.06450"},{"key":"e_1_3_2_14_2","unstructured":"Arpit Bansal Eitan Borgnia Hong-Min Chu Jie S Li Hamid Kazemi Furong Huang Micah Goldblum Jonas Geiping and Tom Goldstein. 2022. Cold diffusion: Inverting arbitrary image transforms without noise. arXiv:2208.09392. Retrieved from https:\/\/arxiv.org\/abs\/2208.09392"},{"key":"e_1_3_2_15_2","unstructured":"Fan Bao Chongxuan Li Jun Zhu and Bo Zhang. 2022. Analytic-dpm: An analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv:2201.06503. Retrieved from https:\/\/arxiv.org\/abs\/2201.06503"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_2_17_2","first-page":"1137","article-title":"A neural probabilistic language model","volume":"3","author":"Bengio Yoshua","year":"2003","unstructured":"Yoshua Bengio, R\u00e9jean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. The Journal of Machine Learning Research 3, null (Mar2003), 1137\u20131155.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_3_2_18_2","unstructured":"Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et\u00a0al. 2021. On the opportunities and risks of foundation models. arXiv:2108.07258. Retrieved from https:\/\/arxiv.org\/abs\/2108.07258"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Ali Borji. 2023. A Categorical Archive of ChatGPT Failures. arXiv:2302.03494. Retrieved from https:\/\/arxiv.org\/abs\/2302.03494","DOI":"10.21203\/rs.3.rs-2895792\/v1"},{"key":"e_1_3_2_20_2","unstructured":"Andrew Brock Jeff Donahue and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv:1809.11096. Retrieved from https:\/\/arxiv.org\/abs\/1809.11096"},{"key":"e_1_3_2_21_2","first-page":"1877","volume-title":"Advances in Neural Information Processing Systems","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 1877\u20131901. Retrieved from https:\/\/papers.nips.cc\/paper\/2020\/hash\/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-08-051581-6.50065-9"},{"key":"e_1_3_2_23_2","unstructured":"Yihan Cao Yanbin Kang and Lichao Sun. 2023. Instruction mining: High-quality instruction data selection for large language models. arXiv:2307.06290. Retrieved from https:\/\/arxiv.org\/abs\/2307.06290"},{"key":"e_1_3_2_24_2","unstructured":"Nicholas Carlini Jamie Hayes Milad Nasr Matthew Jagielski Vikash Sehwag Florian Tram\u00e8r Borja Balle Daphne Ippolito and Eric Wallace. 2023. Extracting training data from diffusion models. arXiv:2301.13188. Retrieved from https:\/\/arxiv.org\/abs\/2301.13188"},{"key":"e_1_3_2_25_2","unstructured":"Nicholas Carlini Daphne Ippolito Matthew Jagielski Katherine Lee Florian Tram\u00e8r and Chiyuan Zhang. 2022. Quantifying memorization across neural language models. arXiv:2202.07646. Retrieved from https:\/\/arxiv.org\/abs\/2202.07646"},{"key":"e_1_3_2_26_2","volume-title":"Proceedings of the 30th USENIX Security Symposium","author":"Carlini Nicholas","year":"2021","unstructured":"Nicholas Carlini, Yuezun Liu, Hal Daume III, Ulfar Erlingsson, Tadayoshi Kohno, and Dawn Song. 2021. Extracting training data from large language models. In Proceedings of the 30th USENIX Security Symposium. Retrieved from https:\/\/www.usenix.org\/conference\/usenixsecurity21\/presentation\/carlini-extracting"},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"Ilias Chalkidis Manos Fergadiotis Prodromos Malakasiotis Nikolaos Aletras and Ion Androutsopoulos. 2020. LEGAL-BERT: The muppets straight out of law school. arXiv:2010.02559. Retrieved from https:\/\/arxiv.org\/abs\/2010.02559","DOI":"10.18653\/v1\/2020.findings-emnlp.261"},{"key":"e_1_3_2_28_2","unstructured":"Tong Che Yanran Li Athul Paul Jacob Yoshua Bengio and Wenjie Li. 2016. Mode regularized generative adversarial networks. arXiv:1612.02136. Retrieved from https:\/\/arxiv.org\/abs\/1612.02136"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2024.103310"},{"key":"e_1_3_2_30_2","unstructured":"Mingjian Chen Xu Tan Bohan Li Yanqing Liu Tao Qin Sheng Zhao and Tie-Yan Liu. 2021. Adaspeech: Adaptive text to speech for custom voice. arXiv:2103.00993. Retrieved from https:\/\/arxiv.org\/abs\/2103.00993"},{"key":"e_1_3_2_31_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et\u00a0al. 2021. Evaluating large language models trained on code. arXiv:2107.03374. Retrieved from https:\/\/arxiv.org\/abs\/2107.03374"},{"key":"e_1_3_2_32_2","unstructured":"Nanxin Chen Yu Zhang Heiga Zen Ron J Weiss Mohammad Norouzi and William Chan. 2020. Wavegrad: Estimating gradients for waveform generation. arXiv:2009.00713. Retrieved from https:\/\/arxiv.org\/abs\/2009.00713"},{"key":"e_1_3_2_33_2","unstructured":"Ruoxi Chen Haibo Jin Yixin Liu Jinyin Chen Haohan Wang and Lichao Sun. 2023. EditShield: Protecting unauthorized image editing by instruction-guided diffusion models. arXiv:2311.12066. Retrieved from https:\/\/arxiv.org\/abs\/2311.12066"},{"key":"e_1_3_2_34_2","article-title":"Infogan: Interpretable representation learning by information maximizing generative adversarial nets","volume":"29","author":"Chen Xi","year":"2016","unstructured":"Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in Neural Information Processing Systems 29 (2016).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_35_2","unstructured":"Keunwoo Choi George Fazekas Brian McFee Kyunghyun Cho and Mark Sandler. 2016. Towards music captioning: Generating music playlist descriptions. arXiv:1608.04868. Retrieved from https:\/\/arxiv.org\/abs\/1608.04868"},{"key":"e_1_3_2_36_2","unstructured":"Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham Hyung Won Chung Charles Sutton Sebastian Gehrmann et\u00a0al. 2022. Palm: Scaling language modeling with pathways. arXiv:2204.02311. Retrieved from https:\/\/arxiv.org\/abs\/2204.02311"},{"key":"e_1_3_2_37_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"30","author":"Christiano Paul F.","year":"2017","unstructured":"Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. Retrieved from https:\/\/papers.nips.cc\/paper\/2017\/hash\/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html"},{"key":"e_1_3_2_38_2","article-title":"Deep reinforcement learning from human preferences","volume":"30","author":"Christiano Paul F.","year":"2017","unstructured":"Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_39_2","unstructured":"Hyungjin Chung Byeongsu Sim and Jong Chul Ye. 2021. Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction. arXiv:2112.05146. Retrieved from https:\/\/arxiv.org\/abs\/2112.05146"},{"key":"e_1_3_2_40_2","unstructured":"Kevin Clark Minh-Thang Luong Quoc V. Le and Christopher D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv:2003.10555. Retrieved from https:\/\/arxiv.org\/abs\/2003.10555"},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1007\/978-3-540-87608-3_11","volume-title":"Proceedings of the Computers and Games: 6th International Conference, CG 2008. Proceedings 6","author":"Coulom R\u00e9mi","year":"2008","unstructured":"R\u00e9mi Coulom. 2008. Whole-history rating: A Bayesian rating system for players of time-varying strength. In Proceedings of the Computers and Games: 6th International Conference, CG 2008. Proceedings 6. Springer, 113\u2013124."},{"key":"e_1_3_2_42_2","unstructured":"Yingqian Cui Jie Ren Han Xu Pengfei He Hui Liu Lichao Sun Yue Xing and Jiliang Tang. 2023. Diffusionshield: A watermark for copyright protection against generative diffusion models. arXiv:2306.04642. Retrieved from https:\/\/arxiv.org\/abs\/2306.04642"},{"key":"e_1_3_2_43_2","article-title":"Semi-supervised sequence learning","volume":"28","author":"Dai Andrew M.","year":"2015","unstructured":"Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised sequence learning. Advances in Neural Information Processing Systems 28 (2015).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_44_2","unstructured":"Haixing Dai Zhengliang Liu Wenxiong Liao Xiaoke Huang Yihan Cao Zihao Wu Lin Zhao Shaochen Xu Wei Liu Ninghao Liu et\u00a0al. 2023. Auggpt: Leveraging chatgpt for text data augmentation. arXiv:2302.13007. Retrieved from https:\/\/arxiv.org\/abs\/2302.13007"},{"key":"e_1_3_2_45_2","unstructured":"Hanjun Dai Yingtao Tian Bo Dai Steven Skiena and Le Song. 2018. Syntax-directed variational autoencoder for structured data. arXiv:1802.08786. Retrieved from https:\/\/arxiv.org\/abs\/1802.08786"},{"key":"e_1_3_2_46_2","first-page":"16344","article-title":"Flashattention: Fast and memory-efficient exact attention with io-awareness","volume":"35","author":"Dao Tri","year":"2022","unstructured":"Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher R\u00e9. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems 35 (2022), 16344\u201316359.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_47_2","first-page":"1263","volume-title":"Proceedings of the Uncertainty in Artificial Intelligence","author":"Cao Nicola De","year":"2020","unstructured":"Nicola De Cao, Wilker Aziz, and Ivan Titov. 2020. Block neural autoregressive flow. In Proceedings of the Uncertainty in Artificial Intelligence. PMLR, 1263\u20131273."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_49_2","article-title":"Deep generative image models using a laplacian pyramid of adversarial networks","volume":"28","author":"Denton Emily L.","year":"2015","unstructured":"Emily L. Denton, Soumith Chintala, and Rob Fergus. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. Advances in Neural Information Processing Systems 28 (2015).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_50_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/MWSCAS.2017.8053243"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445924"},{"key":"e_1_3_2_53_2","unstructured":"Prafulla Dhariwal Heewoo Jun Christine Payne Jong Wook Kim Alec Radford and Ilya Sutskever. 2020. Jukebox: A generative model for music. arXiv:2005.00341. Retrieved from https:\/\/arxiv.org\/abs\/2005.00341"},{"key":"e_1_3_2_54_2","unstructured":"Laurent Dinh David Krueger and Yoshua Bengio. 2014. Nice: Non-linear independent components estimation. arXiv:1410.8516. Retrieved from https:\/\/arxiv.org\/abs\/1410.8516"},{"key":"e_1_3_2_55_2","unstructured":"Laurent Dinh Jascha Sohl-Dickstein and Samy Bengio. 2016. Density estimation using real nvp. arXiv:1605.08803. Retrieved from https:\/\/arxiv.org\/abs\/1605.08803"},{"key":"e_1_3_2_56_2","first-page":"1627","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Distiawan Bayu","year":"2018","unstructured":"Bayu Distiawan, Jianzhong Qi, Rui Zhang, and Wei Wang. 2018. GTR-LSTM: A triple encoder for sentence generation from RDF data. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1627\u20131637."},{"key":"e_1_3_2_57_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et\u00a0al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_2_58_2","unstructured":"Jinhao Duan Fei Kong Shiqi Wang Xiaoshuang Shi and Kaidi Xu. 2023. Are diffusion models vulnerable to membership inference attacks? arXiv:2302.01316. Retrieved from https:\/\/arxiv.org\/abs\/2302.01316"},{"key":"e_1_3_2_59_2","unstructured":"Ishan Durugkar Ian Gemp and Sridhar Mahadevan. 2016. Generative multi-adversarial networks. arXiv:1611.01673. Retrieved from https:\/\/arxiv.org\/abs\/1611.01673"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.5555\/850924.851569"},{"key":"e_1_3_2_61_2","article-title":"A mathematical framework for transformer circuits","author":"Elhage Nelson","year":"2021","unstructured":"Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et\u00a0al. 2021. A mathematical framework for transformer circuits. Transformer Circuits Thread (2021). Retrieved from https:\/\/transformer-circuits.pub\/2021\/framework\/index.html","journal-title":"Transformer Circuits Thread"},{"key":"e_1_3_2_62_2","doi-asserted-by":"crossref","unstructured":"Benjamin Elizalde Soham Deshmukh Mahmoud Al Ismail and Huaming Wang. 2022. Clap: Learning audio concepts from natural language supervision. arXiv:2206.04769. Retrieved from https:\/\/arxiv.org\/abs\/2206.04769","DOI":"10.1109\/ICASSP49357.2023.10095889"},{"key":"e_1_3_2_63_2","unstructured":"Owain Evans Owen Cotton-Barratt Lukas Finnveden Adam Bales Avital Balwit Peter Wills Luca Righetti and William Saunders. 2021. Truthful AI: Developing and governing AI that does not lie. arXiv:2206.04769. Retrieved from https:\/\/arxiv.org\/abs\/2206.04769"},{"key":"e_1_3_2_64_2","unstructured":"Angela Fan Yacine Jernite Ethan Perez David Grangier Jason Weston and Michael Auli. 2019. ELI5: Long form question answering. arXiv:1907.09190. Retrieved from https:\/\/arxiv.org\/abs\/1907.09190"},{"key":"e_1_3_2_65_2","first-page":"1","article-title":"Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity","volume":"23","author":"Fedus William","year":"2021","unstructured":"William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. The Journal of Machine Learning Research 23 (2021), 1\u201340.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","unstructured":"Zhangyin Feng Daya Guo Duyu Tang Nan Duan Xiaocheng Feng Ming Gong Linjun Shou Bing Qin Ting Liu Daxin Jiang et\u00a0al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155. Retrieved from https:\/\/arxiv.org\/abs\/2002.08155","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_3_2_67_2","doi-asserted-by":"crossref","unstructured":"Pierre Fernandez Guillaume Couairon Herv\u00e9 J\u00e9gou Matthijs Douze and Teddy Furon. 2023. The stable signature: Rooting watermarks in latent diffusion models. arXiv:2303.15435. Retrieved from https:\/\/arxiv.org\/abs\/2303.15435","DOI":"10.1109\/ICCV51070.2023.02053"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2021.3071082"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-011-9236-8"},{"key":"e_1_3_2_70_2","doi-asserted-by":"crossref","unstructured":"Giulio Franzese Simone Rossi Lixuan Yang Alessandro Finamore Dario Rossi Maurizio Filippone and Pietro Michiardi. 2022. How Much is Enough? A Study on Diffusion Times in Score-based Generative Models. arXiv:2206.05173. Retrieved from https:\/\/arxiv.org\/abs\/2206.05173","DOI":"10.3390\/e25040633"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1561\/2000000004"},{"key":"e_1_3_2_72_2","unstructured":"Deep Ganguli Liane Lovitt Jackson Kernion Amanda Askell Yuntao Bai Saurav Kadavath Ben Mann Ethan Perez Nicholas Schiefer Kamal Ndousse et\u00a0al. 2022. Red teaming language models to reduce harms: Methods scaling behaviors and lessons learned. arXiv:2209.07858. Retrieved from https:\/\/arxiv.org\/abs\/2209.07858"},{"key":"e_1_3_2_73_2","unstructured":"Chujie Gao Dongping Chen Qihui Zhang Yue Huang Yao Wan and Lichao Sun. 2024. Llm-as-a-coauthor: The challenges of detecting llm-human mixcase. arXiv:2401.05952. Retrieved from https:\/\/arxiv.org\/abs\/2401.05952"},{"key":"e_1_3_2_74_2","unstructured":"Partha Ghosh Mehdi S. M. Sajjadi Antonio Vergari Michael Black and Bernhard Sch\u00f6lkopf. 2019. From variational to deterministic autoencoders. arXiv:1903.12436. Retrieved from https:\/\/arxiv.org\/abs\/1903.12436"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.1259439"},{"key":"e_1_3_2_76_2","unstructured":"Amelia Glaese Nat McAleese Maja Tr\u0119bacz John Aslanides Vlad Firoiu Timo Ewalds Maribeth Rauh Laura Weidinger Martin Chadwick Phoebe Thacker et\u00a0al. 2022. Improving alignment of dialogue agents via targeted human judgements. arXiv:2209.14375. Retrieved from https:\/\/arxiv.org\/abs\/2209.14375"},{"key":"e_1_3_2_77_2","unstructured":"Aaron Gokaslan and Vanya Cohen. 2019. OpenWebText Corpus."},{"key":"e_1_3_2_78_2","article-title":"The reversible residual network: Backpropagation without storing activations","volume":"30","author":"Gomez Aidan N.","year":"2017","unstructured":"Aidan N. Gomez, Mengye Ren, Raquel Urtasun, and Roger B. Grosse. 2017. The reversible residual network: Backpropagation without storing activations. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_79_2","unstructured":"Ian J. Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv:1406.2661. Retrieved from https:\/\/arxiv.org\/abs\/1406.2661"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330955"},{"key":"e_1_3_2_81_2","unstructured":"Roberto Gozalo-Brizuela and Eduardo C. Garrido-Merchan. 2023. ChatGPT is not all you need. A State of the Art Review of large Generative AI models. arXiv:2301.04655. Retrieved from https:\/\/arxiv.org\/abs\/2301.04655"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24797-2_4"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21918"},{"key":"e_1_3_2_84_2","unstructured":"Dirk Groeneveld Iz Beltagy Pete Walsh Akshita Bhagia Rodney Kinney Oyvind Tafjord Ananya Harsh Jha Hamish Ivison Ian Magnusson Yizhong Wang et\u00a0al. 2024. OLMo: Accelerating the Science of Language Models. arXiv:2402.00838. Retrieved from https:\/\/arxiv.org\/abs\/2402.00838"},{"key":"e_1_3_2_85_2","article-title":"Improved training of wasserstein gans","volume":"30","author":"Gulrajani Ishaan","year":"2017","unstructured":"Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of wasserstein gans. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_86_2","unstructured":"Qipeng Guo Zhijing Jin Xipeng Qiu Weinan Zhang David Wipf and Zheng Zhang. 2020. CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via Cycle Training. arXiv:2006.04702. Retrieved from https:\/\/arxiv.org\/abs\/2006.04702"},{"key":"e_1_3_2_87_2","doi-asserted-by":"crossref","unstructured":"Suchin Gururangan Ana Marasovi\u0107 Swabha Swayamdipta Kyle Lo Iz Beltagy Doug Downey and Noah A. Smith. 2020. Don\u2019t stop pretraining: Adapt language models to domains and tasks. arXiv:2004.10964. Retrieved from https:\/\/arxiv.org\/abs\/2004.10964","DOI":"10.18653\/v1\/2020.acl-main.740"},{"key":"e_1_3_2_88_2","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence","author":"Haber Eldad","year":"2018","unstructured":"Eldad Haber, Lars Ruthotto, Elliot Holtham, and Seong-Hwan Jun. 2018. Learning across scales\u2014multiscale methods for convolution neural networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.1986.276672"},{"key":"e_1_3_2_90_2","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andy Zou Mantas Mazeika Dawn Song and Jacob Steinhardt. 2020. Measuring massive multitask language understanding. arXiv:2009.03300. Retrieved from https:\/\/arxiv.org\/abs\/2009.03300"},{"key":"e_1_3_2_91_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_92_2","volume-title":"Proceedings of the Workshop in Advances in Approximate Bayesian Inference, NIPS","volume":"1","author":"Hoffman Matthew D.","year":"2016","unstructured":"Matthew D. Hoffman and Matthew J. Johnson. 2016. Elbo surgery: Yet another way to carve up the variational evidence lower bound. In Proceedings of the Workshop in Advances in Approximate Bayesian Inference, NIPS, Vol. 1."},{"key":"e_1_3_2_93_2","unstructured":"Jordan Hoffmann Sebastian Borgeaud Arthur Mensch Elena Buchatskaya Trevor Cai Eliza Rutherford Diego de Las Casas Lisa Anne Hendricks Johannes Welbl Aidan Clark et\u00a0al. 2022. Training compute-optimal large language models. arXiv:2203.15556. Retrieved from https:\/\/arxiv.org\/abs\/2203.15556"},{"key":"e_1_3_2_94_2","first-page":"2771","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Hoogeboom Emiel","year":"2019","unstructured":"Emiel Hoogeboom, Rianne Van Den Berg, and Max Welling. 2019. Emerging convolutions for generative normalizing flows. In Proceedings of the International Conference on Machine Learning. PMLR, 2771\u20132780."},{"key":"e_1_3_2_95_2","unstructured":"Edward J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv:2106.09685. Retrieved from https:\/\/arxiv.org\/abs\/2106.09685"},{"key":"e_1_3_2_96_2","unstructured":"Hailong Hu and Jun Pang. 2023. Membership Inference of Diffusion Models. arXiv:2301.09956. Retrieved from https:\/\/arxiv.org\/abs\/2301.09956"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447867"},{"key":"e_1_3_2_98_2","first-page":"2078","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Huang Chin-Wei","year":"2018","unstructured":"Chin-Wei Huang, David Krueger, Alexandre Lacoste, and Aaron Courville. 2018. Neural autoregressive flows. In Proceedings of the International Conference on Machine Learning. PMLR, 2078\u20132087."},{"key":"e_1_3_2_99_2","unstructured":"Yue Huang and Lichao Sun. 2024. FakeGPT: Fake News Generation Explanation and Detection of Large Language Models. arxiv:2310.05046. Retrieved from https:\/\/arxiv.org\/abs\/2310.05046"},{"key":"e_1_3_2_100_2","unstructured":"Yue Huang Lichao Sun Haoran Wang Siyuan Wu Qihui Zhang Yuan Li Chujie Gao Yixin Huang Wenhan Lyu Yixuan Zhang et\u00a0al. 2024. Trustllm: Trustworthiness in large language models. arXiv:2401.05561. Retrieved from https:\/\/arxiv.org\/abs\/2401.05561"},{"key":"e_1_3_2_101_2","unstructured":"J\u00f6rn-Henrik Jacobsen Arnold Smeulders and Edouard Oyallon. 2018. i-revnet: Deep invertible networks. arXiv:1802.07088. Retrieved from https:\/\/arxiv.org\/abs\/1802.07088"},{"key":"e_1_3_2_102_2","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Arthur Mensch Chris Bamford Devendra Singh Chaplot Diego de las Casas Florian Bressand Gianna Lengyel Guillaume Lample Lucile Saulnier et\u00a0al. 2023. Mistral 7B. arXiv:2310.06825. Retrieved from https:\/\/arxiv.org\/abs\/2310.06825"},{"key":"e_1_3_2_103_2","first-page":"2410","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Kalchbrenner Nal","year":"2018","unstructured":"Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron Oord, Sander Dieleman, and Koray Kavukcuoglu. 2018. Efficient neural audio synthesis. In Proceedings of the International Conference on Machine Learning. PMLR, 2410\u20132419."},{"key":"e_1_3_2_104_2","first-page":"5110","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Kanade Aditya","year":"2020","unstructured":"Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and evaluating contextual embedding of source code. In Proceedings of the International Conference on Machine Learning. PMLR, 5110\u20135121."},{"key":"e_1_3_2_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/FIE.2016.7757570"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_3_2_107_2","unstructured":"Zachary Kenton Tom Everitt Laura Weidinger Iason Gabriel Vladimir Mikulik and Geoffrey Irving. 2021. Alignment of language agents. arXiv:2103.14659. Retrieved from https:\/\/arxiv.org\/abs\/2103.14659"},{"key":"e_1_3_2_108_2","doi-asserted-by":"crossref","unstructured":"Urvashi Khandelwal He He Peng Qi and Dan Jurafsky. 2018. Sharp Nearby Fuzzy Far Away: How Neural Language Models Use Context. arXiv:1805.04623. Retrieved from https:\/\/arxiv.org\/abs\/1805.04623","DOI":"10.18653\/v1\/P18-1027"},{"key":"e_1_3_2_109_2","doi-asserted-by":"publisher","DOI":"10.2501\/JAR-2018-035"},{"key":"e_1_3_2_110_2","unstructured":"Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv:1312.6114. Retrieved from https:\/\/arxiv.org\/abs\/1312.6114"},{"key":"e_1_3_2_111_2","unstructured":"John Kirchenbauer Jonas Geiping Yuxin Wen Jonathan Katz Ian Miers and Tom Goldstein. 2023. A watermark for large language models. arXiv:2301.10226. Retrieved from https:\/\/arxiv.org\/abs\/2301.10226"},{"key":"e_1_3_2_112_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-017-1183-8_2"},{"key":"e_1_3_2_113_2","unstructured":"Jing Yu Koh Ruslan Salakhutdinov and Daniel Fried. 2023. Grounding language models to images for multimodal generation. arXiv:2301.13823. Retrieved from https:\/\/arxiv.org\/abs\/2301.13823"},{"key":"e_1_3_2_114_2","unstructured":"Rik Koncel-Kedziorski Dhanush Bekal Yi Luan Mirella Lapata and Hannaneh Hajishirzi. 2019. Text generation from knowledge graphs with graph transformers. arXiv:1904.02342. Retrieved from https:\/\/arxiv.org\/abs\/1904.02342"},{"key":"e_1_3_2_115_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00849"},{"key":"e_1_3_2_116_2","unstructured":"Shuvendu K. Lahiri Aaditya Naik Georgios Sakkas Piali Choudhury Curtis von Veh Madanlal Musuvathi Jeevana Priya Inala Chenglong Wang and Jianfeng Gao. 2022. Interactive Code Generation via Test-Driven User-Intent Formalization. arXiv:2208.05950. Retrieved from https:\/\/arxiv.org\/abs\/2208.05950"},{"key":"e_1_3_2_117_2","unstructured":"Siddique Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Heriberto Cuay\u00e1huitl and Bj\u00f6rn W Schuller. 2023. Sparks of large audio models: A survey and outlook. arXiv:2308.12792. Retrieved from https:\/\/arxiv.org\/abs\/2308.12792"},{"key":"e_1_3_2_118_2","unstructured":"Nayeon Lee Wei Ping Peng Xu Mostofa Patwary Mohammad Shoeybi and Bryan Catanzaro. 2022. Factuality enhanced language models for open-ended text generation. arXiv:2206.04624. Retrieved from https:\/\/arxiv.org\/abs\/2206.04624"},{"key":"e_1_3_2_119_2","doi-asserted-by":"crossref","unstructured":"Mike Lewis Yinhan Liu Naman Goyal Marjan Ghazvininejad Abdelrahman Mohamed Omer Levy Ves Stoyanov and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation translation and comprehension. arXiv:1910.13461. Retrieved from https:\/\/arxiv.org\/abs\/1910.13461","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_2_120_2","unstructured":"Aitor Lewkowycz Anders Andreassen David Dohan Ethan Dyer Henryk Michalewski Vinay Ramasesh Ambrose Slone Cem Anil Imanol Schlag Theo Gutman-Solo et\u00a0al. 2022. Solving quantitative reasoning problems with language models. arXiv:2206.14858. Retrieved from https:\/\/arxiv.org\/abs\/2206.14858"},{"key":"e_1_3_2_121_2","first-page":"12888","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning. PMLR, 12888\u201312900."},{"key":"e_1_3_2_122_2","first-page":"9694","article-title":"Align before fuse: Vision and language representation learning with momentum distillation","volume":"34","author":"Li Junnan","year":"2021","unstructured":"Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. 2021. Align before fuse: Vision and language representation learning with momentum distillation. Advances in Neural Information Processing Systems 34 (2021), 9694\u20139705.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_123_2","unstructured":"Junyi Li Tianyi Tang Wayne Xin Zhao Jian-Yun Nie and Ji-Rong Wen. 2022. Pretrained Language Models for Text Generation: A Survey. arXiv:2201.05273. Retrieved from https:\/\/arxiv.org\/abs\/2201.05273"},{"key":"e_1_3_2_124_2","unstructured":"Liunian Harold Li Mark Yatskar Da Yin Cho-Jui Hsieh and Kai-Wei Chang. 2019. Visualbert: A simple and performant baseline for vision and language. arXiv:1908.03557. Retrieved from https:\/\/arxiv.org\/abs\/1908.03557"},{"key":"e_1_3_2_125_2","first-page":"1445","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Li Xiang","year":"2016","unstructured":"Xiang Li, Aynaz Taheri, Lifu Tu, and Kevin Gimpel. 2016. Commonsense knowledge base completion. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1445\u20131455."},{"key":"e_1_3_2_126_2","unstructured":"Yuan Li Yue Huang Hongyi Wang Xiangliang Zhang James Zou and Lichao Sun. 2024. Quantifying ai psychology: A psychometrics benchmark for large language models. arXiv:2406.17675. Retrieved from https:\/\/arxiv.org\/abs\/2406.17675"},{"key":"e_1_3_2_127_2","unstructured":"Yuan Li Yixuan Zhang and Lichao Sun. 2023. Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents. arXiv:2310.06500. Retrieved from https:\/\/arxiv.org\/abs\/2310.06500"},{"key":"e_1_3_2_128_2","unstructured":"Zheng Li Zijian Wang Ming Tan Ramesh Nallapati Parminder Bhatia Andrew Arnold Bing Xiang and Dan Roth. 2022. DQ-BART: Efficient sequence-to-sequence model via joint distillation and quantization. arXiv:2203.11239. Retrieved from https:\/\/arxiv.org\/abs\/2203.11239"},{"key":"e_1_3_2_129_2","unstructured":"Hongru Liang Haozheng Wang Jun Wang Shaodi You Zhe Sun Jin-Mao Wei and Zhenglu Yang. 2018. JTAV: Jointly learning social media content representation by fusing textual acoustic and visual features. arXiv:1806.01483. Retrieved from https:\/\/arxiv.org\/abs\/1806.01483"},{"key":"e_1_3_2_130_2","first-page":"6565","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Liang Paul Pu","year":"2021","unstructured":"Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2021. Towards understanding and mitigating social biases in language models. In Proceedings of the International Conference on Machine Learning. PMLR, 6565\u20136576."},{"key":"e_1_3_2_131_2","unstructured":"Paul Pu Liang Amir Zadeh and Louis-Philippe Morency. 2022. Foundations and Recent Trends in Multimodal Machine Learning: Principles Challenges and Open Questions. arXiv:2209.03430. Retrieved from https:\/\/arxiv.org\/abs\/2209.03430"},{"key":"e_1_3_2_132_2","first-page":"74","volume-title":"Proceedings of the Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. 74\u201381."},{"key":"e_1_3_2_133_2","unstructured":"Stephanie Lin Jacob Hilton and Owain Evans. 2021. Truthfulqa: Measuring how models mimic human falsehoods. arXiv:2109.07958. Retrieved from https:\/\/arxiv.org\/abs\/2109.07958"},{"key":"e_1_3_2_134_2","doi-asserted-by":"publisher","DOI":"10.1145\/3560815"},{"key":"e_1_3_2_135_2","first-page":"24219","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Liu Yixin","year":"2024","unstructured":"Yixin Liu, Chenrui Fan, Yutong Dai, Xun Chen, Pan Zhou, and Lichao Sun. 2024. MetaCloak: Preventing unauthorized subject-driven text-to-image diffusion-based synthesis via meta-learning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 24219\u201324228."},{"key":"e_1_3_2_136_2","unstructured":"Yixin Liu Hongsheng Hu Xuyun Zhang and Lichao Sun. 2023. Watermarking text data on large language models for dataset copyright protection. arXiv:2305.13257. Retrieved from https:\/\/arxiv.org\/abs\/2305.13257"},{"key":"e_1_3_2_137_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_138_2","unstructured":"Yixin Liu Kai Zhang Yuan Li Zhiling Yan Chujie Gao Ruoxi Chen Zhengqing Yuan Yue Huang Hanchi Sun Jianfeng Gao et\u00a0al. 2024. Sora: A review on background technology limitations and opportunities of large vision models. arXiv:2402.17177. Retrieved from https:\/\/arxiv.org\/abs\/2402.17177"},{"key":"e_1_3_2_139_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_2_140_2","unstructured":"Zhengliang Liu Aoxiao Zhong Yiwei Li Longtao Yang Chao Ju Zihao Wu Chong Ma Peng Shu Cheng Chen Sekeun Kim et\u00a0al. 2024. Radiology-GPT: A Large Language Model for Radiology. arxiv:2306.08666. Retrieved from https:\/\/arxiv.org\/abs\/2306.08666"},{"key":"e_1_3_2_141_2","volume-title":"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks","author":"Lu Jiasen","year":"2019","unstructured":"Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Curran Associates Inc., Red Hook, NY, USA."},{"key":"e_1_3_2_142_2","first-page":"282","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Lu Yongyi","year":"2018","unstructured":"Yongyi Lu, Yu-Wing Tai, and Chi-Keung Tang. 2018. Attribute-guided face generation using conditional cyclegan. In Proceedings of the European Conference on Computer Vision. 282\u2013297."},{"key":"e_1_3_2_143_2","unstructured":"Zhaoyang Lyu Xudong Xu Ceyuan Yang Dahua Lin and Bo Dai. 2022. Accelerating diffusion models via early stop of the diffusion process. arXiv:2205.12524. Retrieved from https:\/\/arxiv.org\/abs\/2205.12524"},{"key":"e_1_3_2_144_2","article-title":"Biva: A very deep hierarchy of latent variables for generative modeling","volume":"32","author":"Maal\u00f8e Lars","year":"2019","unstructured":"Lars Maal\u00f8e, Marco Fraccaro, Valentin Li\u00e9vin, and Ole Winther. 2019. Biva: A very deep hierarchy of latent variables for generative modeling. Advances in Neural Information Processing Systems 32 (2019).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_145_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i03.5684"},{"key":"e_1_3_2_146_2","first-page":"1","volume-title":"Proceedings of the 2021 International Joint Conference on Neural Networks","author":"Manco Ilaria","year":"2021","unstructured":"Ilaria Manco, Emmanouil Benetos, Elio Quinton, and Gy\u00f6rgy Fazekas. 2021. Muscaps: Generating captions for music audio. In Proceedings of the 2021 International Joint Conference on Neural Networks. IEEE, 1\u20138."},{"key":"e_1_3_2_147_2","first-page":"456","volume-title":"Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Manco Ilaria","year":"2022","unstructured":"Ilaria Manco, Emmanouil Benetos, Elio Quinton, and Gy\u00f6rgy Fazekas. 2022. Learning music audio representations via weak language supervision. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 456\u2013460."},{"key":"e_1_3_2_148_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00152"},{"key":"e_1_3_2_149_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.304"},{"key":"e_1_3_2_150_2","doi-asserted-by":"crossref","unstructured":"Tomoya Matsumoto Takayuki Miura and Naoto Yanai. 2023. Membership inference attacks against diffusion models. arXiv:2302.03262. Retrieved from https:\/\/arxiv.org\/abs\/2302.03262","DOI":"10.1109\/SPW59333.2023.00013"},{"key":"e_1_3_2_151_2","doi-asserted-by":"crossref","unstructured":"Igor Melnyk Pierre Dognin and Payel Das. 2022. Knowledge graph generation from text. arXiv:2211.10511. Retrieved from https:\/\/arxiv.org\/abs\/2211.10511","DOI":"10.18653\/v1\/2022.findings-emnlp.116"},{"key":"e_1_3_2_152_2","unstructured":"Luke Metz Ben Poole David Pfau and Jascha Sohl-Dickstein. 2016. Unrolled generative adversarial networks. arXiv:1611.02163. Retrieved from https:\/\/arxiv.org\/abs\/1611.02163"},{"key":"e_1_3_2_153_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2010-343"},{"key":"e_1_3_2_154_2","article-title":"Distributed representations of words and phrases and their compositionality","volume":"26","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 (2013).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_155_2","unstructured":"Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv:1411.1784. Retrieved from https:\/\/arxiv.org\/abs\/1411.1784"},{"key":"e_1_3_2_156_2","unstructured":"Takeru Miyato Toshiki Kataoka Masanori Koyama and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. arXiv:1802.05957. Retrieved from https:\/\/arxiv.org\/abs\/1802.05957"},{"key":"e_1_3_2_157_2","unstructured":"Eliya Nachmani Robin San Roman and Lior Wolf. 2021. Non Gaussian Denoising Diffusion Models. arXiv:2106.07582. Retrieved from https:\/\/arxiv.org\/abs\/2106.07582"},{"key":"e_1_3_2_158_2","unstructured":"Moin Nadeem Anna Bethke and Siva Reddy. 2020. StereoSet: Measuring stereotypical bias in pretrained language models. arXiv:2004.09456. Retrieved from https:\/\/arxiv.org\/abs\/2004.09456"},{"key":"e_1_3_2_159_2","unstructured":"Reiichiro Nakano Jacob Hilton Suchir Balaji Jeff Wu Long Ouyang Christina Kim Christopher Hesse Shantanu Jain Vineet Kosaraju William Saunders et\u00a0al. 2021. Webgpt: Browser-assisted question-answering with human feedback. arXiv:2112.09332. Retrieved from https:\/\/arxiv.org\/abs\/2112.09332"},{"key":"e_1_3_2_160_2","article-title":"Dual discriminator generative adversarial nets","volume":"30","author":"Nguyen Tu","year":"2017","unstructured":"Tu Nguyen, Trung Le, Hung Vu, and Dinh Phung. 2017. Dual discriminator generative adversarial nets. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_161_2","first-page":"1","article-title":"Recent advances in deep learning based dialogue systems: A systematic survey","author":"Ni Jinjie","year":"2022","unstructured":"Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, and Erik Cambria. 2022. Recent advances in deep learning based dialogue systems: A systematic survey. Artificial Intelligence Review (2022), 1\u2013101.","journal-title":"Artificial Intelligence Review"},{"key":"e_1_3_2_162_2","unstructured":"Alex Nichol Prafulla Dhariwal Aditya Ramesh Pranav Shyam Pamela Mishkin Bob McGrew Ilya Sutskever and Mark Chen. 2021. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. arXiv:2112.10741. Retrieved from https:\/\/arxiv.org\/abs\/2112.10741"},{"key":"e_1_3_2_163_2","first-page":"8162","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Nichol Alexander Quinn","year":"2021","unstructured":"Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning. PMLR, 8162\u20138171."},{"key":"e_1_3_2_164_2","unstructured":"Erik Nijkamp Bo Pang Hiroaki Hayashi Lifu Tu Huan Wang Yingbo Zhou Silvio Savarese and Caiming Xiong. 2022. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474. Retrieved from https:\/\/arxiv.org\/abs\/2203.13474"},{"key":"e_1_3_2_165_2","article-title":"f-gan: Training generative neural samplers using variational divergence minimization","volume":"29","author":"Nowozin Sebastian","year":"2016","unstructured":"Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. 2016. f-gan: Training generative neural samplers using variational divergence minimization. Advances in Neural Information Processing Systems 29 (2016).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_166_2","unstructured":"OpenAI. 2023. GPT-4 Technical Report.arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_2_167_2","first-page":"60","volume-title":"Proceedings of the Conference on Lifelong Learning Agents","author":"Ostapenko Oleksiy","year":"2022","unstructured":"Oleksiy Ostapenko, Timothee Lesort, Pau Rodr\u00edguez, Md Rifat Arefin, Arthur Douillard, Irina Rish, and Laurent Charlin. 2022. Continual learning with foundation models: An empirical study of latent replay. In Proceedings of the Conference on Lifelong Learning Agents. PMLR, 60\u201391."},{"key":"e_1_3_2_168_2","unstructured":"Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray et\u00a0al. 2022. Training Language Models to Follow Instructions with Human Feedback. arXiv:2203.02155. Retrieved from https:\/\/arxiv.org\/abs\/2203.02155"},{"key":"e_1_3_2_169_2","unstructured":"Kushagra Pandey Avideep Mukherjee Piyush Rai and Abhishek Kumar. 2022. DiffuseVAE: Efficient Controllable and High-Fidelity Generation from Low-Dimensional Latents. arXiv:2201.00308. Retrieved from https:\/\/arxiv.org\/abs\/2201.00308"},{"key":"e_1_3_2_170_2","article-title":"Masked autoregressive flow for density estimation","volume":"30","author":"Papamakarios George","year":"2017","unstructured":"George Papamakarios, Theo Pavlakou, and Iain Murray. 2017. Masked autoregressive flow for density estimation. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_171_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311\u2013318."},{"key":"e_1_3_2_172_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00209"},{"key":"e_1_3_2_173_2","doi-asserted-by":"crossref","unstructured":"Dipjyoti Paul Muhammed PV Shifas Yannis Pantazis and Yannis Stylianou. 2020. Enhancing speech intelligibility in text-to-speech synthesis using speaking style conversion. arXiv:2008.05809. Retrieved from https:\/\/arxiv.org\/abs\/2008.05809","DOI":"10.21437\/Interspeech.2020-2793"},{"key":"e_1_3_2_174_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01265-2"},{"key":"e_1_3_2_175_2","doi-asserted-by":"publisher","DOI":"10.1145\/3239550"},{"key":"e_1_3_2_176_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11431-020-1647-3"},{"key":"e_1_3_2_177_2","unstructured":"Alec Radford Jong Wook Kim Chris Hallacy Aditya Ramesh Gabriel Goh Sandhini Agarwal Girish Sastry Amanda Askell Pamela Mishkin Jack Clark et\u00a0al. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020. Retrieved from https:\/\/arxiv.org\/abs\/2103.00020"},{"key":"e_1_3_2_178_2","unstructured":"Alec Radford Luke Metz and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434. Retrieved from https:\/\/arxiv.org\/abs\/1511.06434"},{"key":"e_1_3_2_179_2","unstructured":"Alec Radford and Karthik Narasimhan. 2018. Improving language understanding by generative pre-training."},{"key":"e_1_3_2_180_2","unstructured":"Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language models are unsupervised multitask learners."},{"issue":"8","key":"e_1_3_2_181_2","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019). Retrieved from https:\/\/openai.com\/blog\/better-language-models\/","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_182_2","unstructured":"Jack W. Rae Sebastian Borgeaud Trevor Cai Katie Millican Jordan Hoffmann Francis Song John Aslanides Sarah Henderson Roman Ring Susannah Young et\u00a0al. 2021. Scaling language models: Methods analysis & insights from training gopher. arXiv:2112.11446. Retrieved from https:\/\/arxiv.org\/abs\/2112.11446"},{"key":"e_1_3_2_183_2","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_3_2_184_2","unstructured":"Aditya Ramesh Prafulla Dhariwal Alex Nichol Casey Chu and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125. Retrieved from https:\/\/arxiv.org\/abs\/2204.06125"},{"key":"e_1_3_2_185_2","first-page":"8821","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning. PMLR, 8821\u20138831."},{"key":"e_1_3_2_186_2","unstructured":"Aditya Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. arXiv:2102.12092. Retrieved from https:\/\/arxiv.org\/abs\/2102.12092"},{"key":"e_1_3_2_187_2","unstructured":"Vipula Rawte Amit Sheth and Amitava Das. 2023. A survey of hallucination in large foundation models. arXiv:2309.05922. Retrieved from https:\/\/arxiv.org\/abs\/2309.05922"},{"key":"e_1_3_2_188_2","article-title":"Generating diverse high-fidelity images with vq-vae-2","volume":"32","author":"Razavi Ali","year":"2019","unstructured":"Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. Advances in Neural Information Processing Systems 32 (2019).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_189_2","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocz192"},{"key":"e_1_3_2_190_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-73003-5_196"},{"key":"e_1_3_2_191_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00332"},{"key":"e_1_3_2_192_2","doi-asserted-by":"crossref","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752. Retrieved from https:\/\/arxiv.org\/abs\/2112.10752","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_193_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_194_2","unstructured":"Baptiste Roziere Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Ellen Tan Yossi Adi Jingyu Liu Tal Remez J\u00e9r\u00e9my Rapin et\u00a0al. 2023. Code llama: Open foundation models for code. arXiv:2308.12950. Retrieved from https:\/\/arxiv.org\/abs\/2308.12950"},{"key":"e_1_3_2_195_2","first-page":"36479","article-title":"Photorealistic text-to-image diffusion models with deep language understanding","volume":"35","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et\u00a0al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479\u201336494.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_196_2","unstructured":"Tim Salimans and Jonathan Ho. 2022. Progressive distillation for fast sampling of diffusion models. arXiv:2202.00512. Retrieved from https:\/\/arxiv.org\/abs\/2202.00512"},{"key":"e_1_3_2_197_2","unstructured":"Robin San-Roman Eliya Nachmani and Lior Wolf. 2021. Noise estimation for generative diffusion models. arXiv:2104.02600. Retrieved from https:\/\/arxiv.org\/abs\/2104.02600"},{"key":"e_1_3_2_198_2","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: Smaller faster cheaper and lighter. arXiv:1910.01108. Retrieved from https:\/\/arxiv.org\/abs\/1910.01108"},{"key":"e_1_3_2_199_2","unstructured":"Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ili\u0107 Daniel Hesslow Roman Castagn\u00e9 Alexandra Sasha Luccioni Fran\u00e7ois Yvon Matthias Gall\u00e9 et\u00a0al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv:2211.05100. Retrieved from https:\/\/arxiv.org\/abs\/2211.05100"},{"key":"e_1_3_2_200_2","unstructured":"Jiawen Shi Yixin Liu Pan Zhou and Lichao Sun. 2023. Badgpt: Exploring security vulnerabilities of chatgpt via backdoor attacks to instructgpt. arXiv:1701.00133. Retrieved from https:\/\/arxiv.org\/abs\/2304.12298"},{"key":"e_1_3_2_201_2","unstructured":"Jiawen Shi Zenghui Yuan Yinuo Liu Yue Huang Pan Zhou Lichao Sun and Neil Zhenqiang Gong. 2024. Optimization-based prompt injection attack to LLM-as-a-Judge. arXiv:2403.17710. Retrieved from https:\/\/arxiv.org\/abs\/2403.17710"},{"key":"e_1_3_2_202_2","unstructured":"Mohammad Shoeybi Mostofa Patwary Raul Puri Patrick LeGresley Jared Casper and Bryan Catanzaro. 2019. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.48550. Retrieved from https:\/\/arxiv.org\/abs\/1909.48550"},{"key":"e_1_3_2_203_2","unstructured":"Shaden Smith Mostofa Patwary Brandon Norick Patrick LeGresley Samyam Rajbhandari Jared Casper Zhun Liu Shrimai Prabhumoye George Zerveas Vijay Korthikanti et\u00a0al. 2022. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B A Large-Scale Generative Language Model. arXiv:2201.11990. Retrieved from https:\/\/arxiv.org\/abs\/2201.11990"},{"key":"e_1_3_2_204_2","unstructured":"Irene Solaiman Miles Brundage Jack Clark Amanda Askell Ariel Herbert-Voss Jeff Wu Alec Radford Gretchen Krueger Jong Wook Kim Sarah Kreps et\u00a0al. 2019. Release strategies and the social impacts of language models. arXiv:1908.09203. Retrieved from https:\/\/arxiv.org\/abs\/1908.09203"},{"key":"e_1_3_2_205_2","unstructured":"Gowthami Somepalli Vasu Singla Micah Goldblum Jonas Geiping and Tom Goldstein. 2021. Diffusion art or digital forgery? Investigating data replication in diffusion models. arXiv:2212.03860. Retrieved from https:\/\/arxiv.org\/abs\/2212.03860"},{"key":"e_1_3_2_206_2","unstructured":"Jiaming Song Chenlin Meng and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv:2010.02502. Retrieved from https:\/\/arxiv.org\/abs\/2010.02502"},{"key":"e_1_3_2_207_2","doi-asserted-by":"crossref","unstructured":"Linfeng Song Yue Zhang Zhiguo Wang and Daniel Gildea. 2018. A Graph-to-Sequence Model for AMR-to-Text Generation. arXiv:1805.02473. Retrieved from https:\/\/arxiv.org\/abs\/1805.02473","DOI":"10.18653\/v1\/P18-1150"},{"key":"e_1_3_2_208_2","article-title":"Generative modeling by estimating gradients of the data distribution","volume":"32","author":"Song Yang","year":"2019","unstructured":"Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems 32 (2019).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_209_2","doi-asserted-by":"crossref","unstructured":"Yang Song Jascha Sohl-Dickstein Diederik P. Kingma Abhishek Kumar Stefano Ermon and Ben Poole. 2020. Score-based generative modeling through stochastic differential equations. arXiv:2011.13456. Retrieved from https:\/\/arxiv.org\/abs\/2011.13456","DOI":"10.1155\/2011\/613695"},{"key":"e_1_3_2_210_2","unstructured":"Matteo Stefanini Marcella Cornia Lorenzo Baraldi Silvia Cascianelli Giuseppe Fiameni and Rita Cucchiara. 2021. From Show to Tell: A Survey on Deep Learning-based Image Captioning. arXiv:2107.06912. Retrieved from https:\/\/arxiv.org\/abs\/2107.06912"},{"key":"e_1_3_2_211_2","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3495977"},{"key":"e_1_3_2_212_2","unstructured":"Weijie Su Xizhou Zhu Yue Cao Bin Li Lewei Lu Furu Wei and Jifeng Dai. 2019. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv:1908.08530. Retrieved from https:\/\/arxiv.org\/abs\/1908.08530"},{"key":"e_1_3_2_213_2","unstructured":"Weixiang Sun Xiaocao You Ruizhe Zheng Zhengqing Yuan Xiang Li Lifang He Quanzheng Li and Lichao Sun. 2024. Bora: Biomedical generalist video generation model. arXiv:2407.08944. Retrieved from https:\/\/arxiv.org\/abs\/2407.08944"},{"key":"e_1_3_2_214_2","doi-asserted-by":"publisher","DOI":"10.1080\/01691864.2022.2035253"},{"key":"e_1_3_2_215_2","doi-asserted-by":"crossref","unstructured":"Hao Tan and Mohit Bansal. 2019. Lxmert: Learning cross-modality encoder representations from transformers. arXiv:1908.07490. Retrieved from https:\/\/arxiv.org\/abs\/1908.07490","DOI":"10.18653\/v1\/D19-1514"},{"key":"e_1_3_2_216_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA Model. Retrieved from https:\/\/github.com\/tatsu-lab\/stanford_alpaca"},{"key":"e_1_3_2_217_2","unstructured":"Romal Thoppilan Daniel De Freitas Jamie Hall Noam Shazeer Apoorv Kulshreshtha Heng-Tze Cheng Alicia Jin Taylor Bos Leslie Baker Yu Du et\u00a0al. 2022. Lamda: Language models for dialog applications. arXiv:2201.08239. Retrieved from https:\/\/arxiv.org\/abs\/2201.08239"},{"key":"e_1_3_2_218_2","first-page":"1214","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Tomczak Jakub","year":"2018","unstructured":"Jakub Tomczak and Max Welling. 2018. VAE with a VampPrior. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 1214\u20131223."},{"key":"e_1_3_2_219_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_220_2","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459838"},{"key":"e_1_3_2_221_2","article-title":"Multimodal few-shot learning with frozen language models","author":"Tsimpoukelli Maria","year":"2021","unstructured":"Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, SM Eslami, Oriol Vinyals, and Felix Hill. 2021. Multimodal few-shot learning with frozen language models. In Proceedings of the 35th International Conference on Neural Information Processing Systems (2021).","journal-title":"Proceedings of the 35th International Conference on Neural Information Processing Systems"},{"key":"e_1_3_2_222_2","first-page":"19667","article-title":"NVAE: A deep hierarchical variational autoencoder","volume":"33","author":"Vahdat Arash","year":"2020","unstructured":"Arash Vahdat and Jan Kautz. 2020. NVAE: A deep hierarchical variational autoencoder. Advances in Neural Information Processing Systems 33 (2020), 19667\u201319679.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_223_2","first-page":"11287","article-title":"Score-based generative modeling in latent space","volume":"34","author":"Vahdat Arash","year":"2021","unstructured":"Arash Vahdat, Karsten Kreis, and Jan Kautz. 2021. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems 34 (2021), 11287\u201311302.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_224_2","article-title":"Neural discrete representation learning","volume":"30","author":"Oord Aaron Van Den","year":"2017","unstructured":"Aaron Van Den Oord and Oriol Vinyals. 2017. Neural discrete representation learning. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_225_2","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141. ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_3_2_226_2","unstructured":"Petar Veli\u010dkovi\u0107 Guillem Cucurull Arantxa Casanova Adriana Romero Pietro Lio and Yoshua Bengio. 2017. Graph attention networks. arXiv:1710.10903. Retrieved from https:\/\/arxiv.org\/abs\/1710.10903"},{"key":"e_1_3_2_227_2","doi-asserted-by":"publisher","DOI":"10.1145\/2480362.2480557"},{"key":"e_1_3_2_228_2","doi-asserted-by":"crossref","unstructured":"Yizhong Wang Yeganeh Kordi Swaroop Mishra Alisa Liu Noah A. Smith Daniel Khashabi and Hannaneh Hajishirzi. 2022. Self-instruct: Aligning language model with self generated instructions. arXiv:2212.10560. Retrieved from https:\/\/arxiv.org\/abs\/2212.10560","DOI":"10.18653\/v1\/2023.acl-long.754"},{"key":"e_1_3_2_229_2","doi-asserted-by":"crossref","unstructured":"Yue Wang Weishi Wang Shafiq Joty and Steven CH Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv:2109.00859. Retrieved from https:\/\/arxiv.org\/abs\/2109.00859","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"e_1_3_2_230_2","unstructured":"Zirui Wang Jiahui Yu Adams Wei Yu Zihang Dai Yulia Tsvetkov and Yuan Cao. 2021. Simvlm: Simple visual language model pretraining with weak supervision. arXiv:2108.10904. Retrieved from https:\/\/arxiv.org\/abs\/2108.10904"},{"key":"e_1_3_2_231_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Watson Daniel","year":"2022","unstructured":"Daniel Watson, William Chan, Jonathan Ho, and Mohammad Norouzi. 2022. Learning fast samplers for diffusion models by differentiating through sample quality. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_232_2","unstructured":"Daniel Watson Jonathan Ho Mohammad Norouzi and William Chan. 2021. Learning to efficiently sample from diffusion probabilistic models. arXiv:2106.03802. Retrieved from https:\/\/arxiv.org\/abs\/2106.03802"},{"key":"e_1_3_2_233_2","unstructured":"Lai Wei Zihao Jiang Weiran Huang and Lichao Sun. 2023. Instructiongpt-4: A 200-instruction paradigm for fine-tuning minigpt-4. arXiv:2308.12067. Retrieved from https:\/\/arxiv.org\/abs\/2308.12067"},{"key":"e_1_3_2_234_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01754"},{"key":"e_1_3_2_235_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02156"},{"key":"e_1_3_2_236_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3167305"},{"key":"e_1_3_2_237_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00235"},{"key":"e_1_3_2_238_2","unstructured":"Siyuan Wu Yue Huang Chujie Gao Dongping Chen Qihui Zhang Yao Wan Tianyi Zhou Xiangliang Zhang Jianfeng Gao Chaowei Xiao et\u00a0al. 2024. Unigen: A unified framework for textual dataset generation using large language models. arXiv:2406.18966. Retrieved from https:\/\/arxiv.org\/abs\/2406.18966"},{"key":"e_1_3_2_239_2","unstructured":"Yixin Wu Ning Yu Zheng Li Michael Backes and Yang Zhang. 2022. Membership inference attacks against text-to-image generation models. arXiv:2210.00968. Retrieved from https:\/\/arxiv.org\/abs\/2210.00968"},{"key":"e_1_3_2_240_2","unstructured":"Zhisheng Xiao Karsten Kreis and Arash Vahdat. 2021. Tackling the generative learning trilemma with denoising diffusion GANs. arXiv:2112.07804. Retrieved from https:\/\/arxiv.org\/abs\/2112.07804"},{"key":"e_1_3_2_241_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00143"},{"key":"e_1_3_2_242_2","unstructured":"Zhiling Yan Weixiang Sun Rong Zhou Zhengqing Yuan Kai Zhang Yiwei Li Tianming Liu Quanzheng Li Xiang Li Lifang He et\u00a0al. 2024. Biomedical sam 2: Segment anything in biomedical images and videos. arXiv:2408.03286. Retrieved from https:\/\/arxiv.org\/abs\/2408.03286"},{"key":"e_1_3_2_243_2","unstructured":"Zhiling Yan Kai Zhang Rong Zhou Lifang He Xiang Li and Lichao Sun. 2023. Multimodal ChatGPT for medical applications: An experimental study of GPT-4V. arXiv:2310.19061. Retrieved from https:\/\/arxiv.org\/abs\/2310.19061"},{"key":"e_1_3_2_244_2","unstructured":"Shuoheng Yang Yuxin Wang and Xiaowen Chu. 2020. A survey of deep learning techniques for neural machine translation. arXiv:2002.07526. Retrieved from https:\/\/arxiv.org\/abs\/2002.07526"},{"key":"e_1_3_2_245_2","article-title":"Xlnet: Generalized autoregressive pretraining for language understanding","volume":"32","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32 (2019).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_246_2","unstructured":"Liang Yao Chengsheng Mao and Yuan Luo. 2019. KG-BERT: BERT for knowledge graph completion. arXiv:1909.03193. Retrieved from https:\/\/arxiv.org\/abs\/1909.03193"},{"key":"e_1_3_2_247_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.640"},{"key":"e_1_3_2_248_2","unstructured":"Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. arXiv:1704.01696. Retrieved from https:\/\/arxiv.org\/abs\/1704.01696"},{"key":"e_1_3_2_249_2","doi-asserted-by":"publisher","DOI":"10.1145\/3316767"},{"key":"e_1_3_2_250_2","unstructured":"Zhengqing Yuan Yunhong He Kun Wang Yanfang Ye and Lichao Sun. 2023. ArtGPT-4: Towards artistic-understanding large vision-language models with enhanced adapter. arXiv:2305.07490. Retrieved from https:\/\/arxiv.org\/abs\/2305.07490"},{"key":"e_1_3_2_251_2","unstructured":"Zhengqing Yuan Yixin Liu Yihan Cao Weixiang Sun Haolong Jia Ruoxi Chen Zhaoxu Li Bin Lin Li Yuan Lifang He et\u00a0al. 2024. Mora: Enabling generalist video generation via a multi-agent framework. arXiv:2403.13248. Retrieved from https:\/\/arxiv.org\/abs\/2403.13248"},{"key":"e_1_3_2_252_2","unstructured":"Zenghui Yuan Yixin Liu Kai Zhang Pan Zhou and Lichao Sun. 2023. Backdoor attacks to pre-trained unified foundation models. arXiv:2302.09360. Retrieved from https:\/\/arxiv.org\/abs\/2302.09360"},{"key":"e_1_3_2_253_2","doi-asserted-by":"publisher","DOI":"10.1177\/21582440221082130"},{"key":"e_1_3_2_254_2","unstructured":"Wojciech Zaremba Ilya Sutskever and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv:1409.2329. Retrieved from https:\/\/arxiv.org\/abs\/1409.2329"},{"key":"e_1_3_2_255_2","unstructured":"Zhiyuan Zeng Jiatong Yu Tianyu Gao Yu Meng Tanya Goyal and Danqi Chen. 2023. Evaluating large language models at evaluating instruction following. arXiv:2310.07641. Retrieved from https:\/\/arxiv.org\/abs\/2310.07641"},{"key":"e_1_3_2_256_2","doi-asserted-by":"crossref","first-page":"2729","DOI":"10.1145\/3219819.3219977","volume-title":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","author":"Zhang Hengtong","year":"2018","unstructured":"Hengtong Zhang, Yaliang Li, Fenglong Ma, Jing Gao, and Lu Su. 2018. Texttruth: An unsupervised approach to discover trustworthy information from multi-sourced text data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2729\u20132737."},{"key":"e_1_3_2_257_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.629"},{"key":"e_1_3_2_258_2","first-page":"1","article-title":"A generalist vision\u2013language foundation model for diverse biomedical tasks","author":"Zhang Kai","year":"2024","unstructured":"Kai Zhang, Rong Zhou, Eashan Adhikarla, Zhiling Yan, Yixin Liu, Jun Yu, Zhengliang Liu, Xun Chen, Brian D. Davison, and Hui Ren. 2024. A generalist vision\u2013language foundation model for diverse biomedical tasks. Nature Medicine (2024), 1\u201313.","journal-title":"Nature Medicine"},{"key":"e_1_3_2_259_2","first-page":"16280","article-title":"Diffusion normalizing flow","volume":"34","author":"Zhang Qinsheng","year":"2021","unstructured":"Qinsheng Zhang and Yongxin Chen. 2021. Diffusion normalizing flow. Advances in Neural Information Processing Systems 34 (2021), 16280\u201316291.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_260_2","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona Diab Xian Li Xi Victoria Lin et\u00a0al. 2022. OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068. Retrieved from https:\/\/arxiv.org\/abs\/2205.01068"},{"key":"e_1_3_2_261_2","doi-asserted-by":"crossref","unstructured":"Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu Zhifeng Chen RJ Skerry-Ryan Ye Jia Andrew Rosenberg and Bhuvana Ramabhadran. 2019. Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning. arXiv:1907.04448. Retrieved from https:\/\/arxiv.org\/abs\/1907.04448","DOI":"10.21437\/Interspeech.2019-2668"},{"key":"e_1_3_2_262_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.224"},{"key":"e_1_3_2_263_2","unstructured":"Guoqing Zheng Yiming Yang and Jaime Carbonell. 2017. Convolutional normalizing flows. arXiv:1711.02255. Retrieved from https:\/\/arxiv.org\/abs\/1711.02255"},{"key":"e_1_3_2_264_2","first-page":"7","article-title":"Truncated diffusion probabilistic models","volume":"1050","author":"Zheng Huangjie","year":"2022","unstructured":"Huangjie Zheng, Pengcheng He, Weizhu Chen, and Mingyuan Zhou. 2022. Truncated diffusion probabilistic models. Stat 1050 (2022), 7.","journal-title":"Stat"},{"key":"e_1_3_2_265_2","unstructured":"Lianmin Zheng Wei-Lin Chiang Ying Sheng Siyuan Zhuang Zhanghao Wu Yonghao Zhuang Zi Lin Zhuohan Li Dacheng Li Eric. P Xing et\u00a0al. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685. Retrieved from https:\/\/arxiv.org\/abs\/2306.05685"},{"key":"e_1_3_2_266_2","doi-asserted-by":"crossref","unstructured":"Ce Zhou Qian Li Chen Li Jun Yu Yixin Liu Guangjing Wang Kai Zhang Cheng Ji Qiben Yan Lifang He et\u00a0al. 2023. A comprehensive survey on pretrained foundation models: A history from BERT to ChatGPT. arXiv:1701.00133. Retrieved from https:\/\/arxiv.org\/abs\/2302.09419","DOI":"10.1007\/s13042-024-02443-6"},{"key":"e_1_3_2_267_2","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00368"},{"key":"e_1_3_2_268_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.7005"},{"key":"e_1_3_2_269_2","unstructured":"Fengbin Zhu Wenqiang Lei Chao Wang Jianming Zheng Soujanya Poria and Tat-Seng Chua. 2021. Retrieving and reading: A comprehensive survey on open-domain question answering. arXiv:2101.00774. Retrieved from https:\/\/arxiv.org\/abs\/2101.00774"},{"key":"e_1_3_2_270_2","first-page":"25146","article-title":"One model to edit them all: Free-form text-driven image manipulation with semantic modulations","volume":"35","author":"Zhu Yiming","year":"2022","unstructured":"Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, and Jue Wang. 2022. One model to edit them all: Free-form text-driven image manipulation with semantic modulations. Advances in Neural Information Processing Systems 35 (2022), 25146\u201325159.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_271_2","volume-title":"Proceedings of the 13th Annual Conference of the International Speech Communication Association","author":"Zorila Tudor-Catalin","year":"2012","unstructured":"Tudor-Catalin Zorila, Varvara Kandia, and Yannis Stylianou. 2012. Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In Proceedings of the 13th Annual Conference of the International Speech Communication Association."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3704262","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3704262","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:51Z","timestamp":1750295871000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3704262"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,22]]},"references-count":270,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3704262"],"URL":"https:\/\/doi.org\/10.1145\/3704262","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,22]]},"assertion":[{"value":"2023-03-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-10","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}