{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:45:59Z","timestamp":1777657559911,"version":"3.51.4"},"reference-count":109,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T00:00:00Z","timestamp":1731974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2024,12,19]]},"abstract":"<jats:p>\n            Reproducing realistic collective behaviors presents a captivating yet formidable challenge. Traditional rule-based methods rely on hand-crafted principles, limiting motion diversity and realism in generated collective behaviors. Recent imitation learning methods learn from data but often require ground-truth motion trajectories and struggle with authenticity, especially in high-density groups with erratic movements. In this paper, we present a scalable approach, Collective Behavior Imitation Learning (CBIL), for learning fish schooling behavior\n            <jats:italic>directly from videos<\/jats:italic>\n            , without relying on captured motion trajectories. Our method first leverages Video Representation Learning, in which a Masked Video AutoEncoder (MVAE) extracts implicit states from video inputs in a self-supervised manner. The MVAE effectively maps 2D observations to implicit states that are compact and expressive for following the imitation learning stage. Then, we propose a novel adversarial imitation learning method to effectively capture complex movements of the schools of fish, enabling efficient imitation of the distribution of motion patterns measured in the latent space. It also incorporates bio-inspired rewards alongside priors to regularize and stabilize training. Once trained, CBIL can be used for various animation tasks with the learned collective motion priors. We further show its effectiveness across different species. Finally, we demonstrate the application of our system in detecting abnormal fish behavior from in-the-wild videos.\n          <\/jats:p>","DOI":"10.1145\/3687904","type":"journal-article","created":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T15:46:04Z","timestamp":1732031164000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["CBIL: Collective Behavior Imitation Learning for Fish from Real Videos"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-7612-3297","authenticated-orcid":false,"given":"Yifan","family":"Wu","sequence":"first","affiliation":[{"name":"The University of Hong Kong, Pokfulam, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0186-8269","authenticated-orcid":false,"given":"Zhiyang","family":"Dou","sequence":"additional","affiliation":[{"name":"The University of Hong Kong (HKU), Philadelphia, United States of America"},{"name":"University of Pennsylvania, Philadelphia, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2243-3643","authenticated-orcid":false,"given":"Yuko","family":"Ishiwaka","sequence":"additional","affiliation":[{"name":"SoftBank Corp., Hakodate, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8278-4299","authenticated-orcid":false,"given":"Shun","family":"Ogawa","sequence":"additional","affiliation":[{"name":"SoftBank Corp., Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5165-6251","authenticated-orcid":false,"given":"Yuke","family":"Lou","sequence":"additional","affiliation":[{"name":"The University of Hong Kong, Pokfulam, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2284-3952","authenticated-orcid":false,"given":"Wenping","family":"Wang","sequence":"additional","affiliation":[{"name":"Texas A&amp;M University, College Station, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4301-1474","authenticated-orcid":false,"given":"Lingjie","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, Philadelphia, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2729-5860","authenticated-orcid":false,"given":"Taku","family":"Komura","sequence":"additional","affiliation":[{"name":"The University of Hong Kong, Pokfulam, Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2024,11,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.2331\/suisan.48.1081"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591487"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the national academy of sciences 105","author":"Ballerini Michele","year":"2008","unstructured":"Michele Ballerini, Nicola Cabibbo, Raphael Candelier, Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Vivien Lecomte, Alberto Orlandi, Giorgio Parisi, Andrea Procaccini, et al. 2008. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proceedings of the national academy of sciences 105, 4 (2008), 1232--1237."},{"key":"e_1_2_1_4_1","volume-title":"Siddhartha Chaudhuri, Eitan Grinspun, Yi Zhou, and Alec Jacobson.","author":"Benchekroun Otman","year":"2023","unstructured":"Otman Benchekroun, Jiayi Eris Zhang, Siddhartha Chaudhuri, Eitan Grinspun, Yi Zhou, and Alec Jacobson. 2023. Fast complementary dynamics via skinning eigenmodes. arXiv preprint arXiv:2303.11886 (2023)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356536"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1118633109"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2014.1362"},{"key":"e_1_2_1_8_1","unstructured":"Zhe Cao Gines Hidalgo Tomas Simon Shih-En Wei and Yaser Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv:1812.08008 [cs.CV]"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1005766107"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.physrep.2017.11.003"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592459"},{"key":"e_1_2_1_13_1","unstructured":"Chun-Tse Chien Rui-Yang Ju Kuang-Yi Chou Enkaer Xieerke and Jen-Shiun Chiang. 2024. YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection. arXiv:2402.09329 [cs.CV]"},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Yong Shean Chong and Yong Haur Tay. 2017. Abnormal Event Detection in Videos using Spatiotemporal Autoencoder. arXiv:1701.01546 [cs.CV]","DOI":"10.1007\/978-3-319-59081-3_23"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2018.2857475"},{"key":"e_1_2_1_16_1","volume-title":"LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment. arXiv preprint arXiv:2403.13307","author":"Cong Peishan","year":"2024","unstructured":"Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, and Yuexin Ma. 2024. LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment. arXiv preprint arXiv:2403.13307 (2024)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature03236"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1006\/jtbi.2002.3065"},{"key":"e_1_2_1_19_1","volume-title":"Alfonso P\u00e9rez-Escudero, Pietro Perona, Andrew D Straw, Martin Wikelski, et al.","author":"Dell Anthony I","year":"2014","unstructured":"Anthony I Dell, John A Bender, Kristin Branson, Iain D Couzin, Gonzalo G de Polavieja, Lucas PJJ Noldus, Alfonso P\u00e9rez-Escudero, Pietro Perona, Andrew D Straw, Martin Wikelski, et al. 2014. Automated image-based tracking and its application in ecology. Trends in ecology & evolution 29, 7 (2014), 417--428."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3610548.3618205"},{"key":"e_1_2_1_21_1","volume-title":"MuscleVAE: Model-Based Controllers of Muscle-Actuated Characters. In SIGGRAPH Asia 2023 Conference Papers. 1--11","author":"Feng Yusen","year":"2023","unstructured":"Yusen Feng, Xiyan Xu, and Libin Liu. 2023. MuscleVAE: Model-Based Controllers of Muscle-Actuated Characters. In SIGGRAPH Asia 2023 Conference Papers. 1--11."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.120.198101"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480527"},{"key":"e_1_2_1_24_1","unstructured":"Wayne M Getz. 2024. An Information Theoretic Treatment of Animal Movement Tracks. arXiv:2403.16290 [q-bio.PE]"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00509"},{"key":"e_1_2_1_26_1","volume-title":"Student close contact behavior and COVID-19 transmission in China's classrooms. PNAS nexus 2, 5","author":"Guo Yong","year":"2023","unstructured":"Yong Guo, Zhiyang Dou, Nan Zhang, Xiyue Liu, Boni Su, Yuguo Li, and Yinping Zhang. 2023. Student close contact behavior and COVID-19 transmission in China's classrooms. PNAS nexus 2, 5 (2023), pgad142."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00240"},{"key":"e_1_2_1_28_1","volume-title":"ACM SIGGRAPH 2016 Talks. 1--2.","author":"Gustafson Stephen","year":"2016","unstructured":"Stephen Gustafson, Hemagiri Arumugam, Paul Kanyuk, and Michael Lorenzen. 2016. Mure: fast agent based crowd simulation for vfx and animation. In ACM SIGGRAPH 2016 Talks. 1--2."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_2_1_32_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV]"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2320239121"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1109355108"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157608"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Hans A Hofmann Annaliese K Beery Daniel T Blumstein Iain D Couzin Ryan L Earley Loren D Hayes Peter L Hurd Eileen A Lacey Steven M Phelps Nancy G Solomon et al. 2014. An evolutionary framework for studying mechanisms of social behavior. Trends in ecology & evolution 29 10 (2014) 581--589.","DOI":"10.1016\/j.tree.2014.07.008"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(81)90024-2"},{"key":"e_1_2_1_38_1","volume-title":"Oh (Eds.)","volume":"35","author":"Ishiwaka Yuko","year":"2022","unstructured":"Yuko Ishiwaka, Xiao Zeng, Shun Ogawa, Donovan Westwater, Tadayuki Tone, and Masaki Nakada. 2022. DeepFoids: Adaptive Bio-Inspired Fish Simulation with Deep Reinforcement Learning. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 18377--18389. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/74fa9e6bc36aa567fe7cf002b733a30d-Paper-Conference.pdf"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480520"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.12852"},{"key":"e_1_2_1_41_1","volume-title":"Conference on Robot Learning. PMLR, 80--90","author":"Jena Rohit","year":"2021","unstructured":"Rohit Jena, Changliu Liu, and Katia Sycara. 2021. Augmenting gail with bc for sample efficient imitation learning. In Conference on Robot Learning. PMLR, 80--90."},{"key":"e_1_2_1_42_1","volume-title":"Text-Guided Synthesis of Crowd Animation. In ACM SIGGRAPH 2024 Conference Papers. 1--11","author":"Ji Xuebo","year":"2024","unstructured":"Xuebo Ji, Zherong Pan, Xifeng Gao, and Jia Pan. 2024. Text-Guided Synthesis of Crowd Animation. In ACM SIGGRAPH 2024 Conference Papers. 1--11."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.107.024411"},{"key":"e_1_2_1_44_1","volume-title":"Unity: A General Platform for Intelligent Agents. arXiv:1809.02627 [cs.LG]","author":"Juliani Arthur","year":"2020","unstructured":"Arthur Juliani, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy, Yuan Gao, Hunter Henry, Marwan Mattar, and Danny Lange. 2020. Unity: A General Platform for Intelligent Agents. arXiv:1809.02627 [cs.LG]"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2343045.2771822"},{"key":"e_1_2_1_46_1","unstructured":"Taekyung Ki Dongchan Min and Gyeongsu Chae. 2024. Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation. arXiv:2404.00636 [cs.CV]"},{"key":"e_1_2_1_47_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2017","unstructured":"Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]"},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Alexander Kirillov Eric Mintun Nikhila Ravi Hanzi Mao Chloe Rolland Laura Gustafson Tete Xiao Spencer Whitehead Alexander C. Berg Wan-Yen Lo Piotr Doll\u00e1r and Ross Girshick. 2023. Segment Anything. arXiv:2304.02643 [cs.CV]","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-013-9349-9"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274247.3274510"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 2007 ACM SIGGRAPH\/Eurographics Symposium on Computer Animation","author":"Lee Kang Hoon","year":"2007","unstructured":"Kang Hoon Lee, Myung Geol Choi, Qyoun Hong, and Jehee Lee. 2007. Group behavior from video: a data-driven approach to crowd simulation. In Proceedings of the 2007 ACM SIGGRAPH\/Eurographics Symposium on Computer Animation (San Diego, California) (SCA '07). Eurographics Association, Goslar, DEU, 109--118."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459774"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1833349.1781155"},{"key":"e_1_2_1_54_1","unstructured":"Dan Li Dacheng Chen Jonathan Goh and See kiong Ng. 2019. Anomaly Detection with Generative Adversarial Networks for Multivariate Time Series. arXiv:1809.04758 [cs.LG]"},{"key":"e_1_2_1_55_1","volume-title":"Lin","author":"Li Weizi","year":"2015","unstructured":"Weizi Li, David Wolinski, Julien Pettr\u00e9, and Ming C. Lin. 2015. Biologically-inspired visual simulation of insect swarms. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 425--434."},{"key":"e_1_2_1_56_1","volume-title":"Proceedings, Part V 13","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V 13. Springer, 740--755."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2982424"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/1833349.1778865"},{"key":"e_1_2_1_59_1","doi-asserted-by":"crossref","first-page":"129233","DOI":"10.1016\/j.jhazmat.2022.129233","article-title":"Close contact behavior-based COVID-19 transmission and interventions in a subway system","volume":"436","author":"Liu Xiyue","year":"2022","unstructured":"Xiyue Liu, Zhiyang Dou, Lei Wang, Boni Su, Tianyi Jin, Yong Guo, Jianjian Wei, and Nan Zhang. 2022. Close contact behavior-based COVID-19 transmission and interventions in a subway system. Journal of Hazardous Materials 436 (2022), 129233.","journal-title":"Journal of Hazardous Materials"},{"key":"e_1_2_1_60_1","unstructured":"Qiujing Lu Yipeng Zhang Mingjian Lu and Vwani Roychowdhury. 2022. Action-conditioned On-demand Motion Generation. arXiv:2207.08164 [cs.CV]"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01000"},{"key":"e_1_2_1_62_1","volume-title":"Universal Humanoid Motion Representations for Physics-Based Control. arXiv preprint arXiv:2310.04582","author":"Luo Zhengyi","year":"2023","unstructured":"Zhengyi Luo, Jinkun Cao, Josh Merel, Alexander Winkler, Jing Huang, Kris Kitani, and Weipeng Xu. 2023b. Universal Humanoid Motion Representations for Physics-Based Control. arXiv preprint arXiv:2310.04582 (2023)."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISEMANTIC.2018.8549751"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cag.2017.12.004"},{"key":"e_1_2_1_65_1","doi-asserted-by":"crossref","first-page":"2419","DOI":"10.1073\/pnas.1816098116","article-title":"Flow interactions between uncoordinated flapping swimmers give rise to group cohesion","volume":"116","author":"Newbolt Joel W","year":"2019","unstructured":"Joel W Newbolt, Jun Zhang, and Leif Ristroph. 2019. Flow interactions between uncoordinated flapping swimmers give rise to group cohesion. Proceedings of the National Academy of Sciences 116, 7 (2019), 2419--2424.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1006\/jtbi.1996.0114"},{"key":"e_1_2_1_67_1","unstructured":"Maxime Oquab Timoth\u00e9e Darcet Th\u00e9o Moutakanni Huy Vo Marc Szafraniec Vasil Khalidov Pierre Fernandez Daniel Haziza Francisco Massa Alaaeldin El-Nouby et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)."},{"key":"e_1_2_1_68_1","volume-title":"Synthesizing physically plausible human motions in 3d scenes. arXiv preprint arXiv:2308.09036","author":"Pan Liang","year":"2023","unstructured":"Liang Pan, Jingbo Wang, Buzhen Huang, Junyu Zhang, Haofan Wang, Xu Tang, and Yangang Wang. 2023. Synthesizing physically plausible human motions in 3d scenes. arXiv preprint arXiv:2308.09036 (2023)."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356501"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356501"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275014"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530110"},{"key":"e_1_2_1_73_1","volume-title":"SFV: Reinforcement Learning of Physical Skills from Videos. arXiv:1810.03599 [cs.GR]","author":"Peng Xue Bin","year":"2018","unstructured":"Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018b. SFV: Reinforcement Learning of Physical Skills from Videos. arXiv:1810.03599 [cs.GR]"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459670"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3023368.3036845"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01322"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/37402.37406"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/1278780.1278859"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925977"},{"key":"e_1_2_1_81_1","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG]"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/3658164"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591541"},{"key":"e_1_2_1_84_1","volume-title":"Human motion diffusion model. arXiv preprint arXiv:2209.14916","author":"Tevet Guy","year":"2022","unstructured":"Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, and Amit H Bermano. 2022. Human motion diffusion model. arXiv preprint arXiv:2209.14916 (2022)."},{"key":"e_1_2_1_85_1","unstructured":"Zhan Tong Yibing Song Jue Wang and Limin Wang. 2022. VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training. arXiv:2203.12602 [cs.CV]"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1800923115"},{"key":"e_1_2_1_87_1","volume-title":"Collective motion. Physics reports 517, 3--4","author":"Vicsek Tam\u00e1s","year":"2012","unstructured":"Tam\u00e1s Vicsek and Anna Zafeiris. 2012. Collective motion. Physics reports 517, 3--4 (2012), 71--140."},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185523"},{"key":"e_1_2_1_89_1","volume-title":"Tlcontrol: Trajectory and language control for human motion synthesis. arXiv preprint arXiv:2311.17135","author":"Wan Weilin","year":"2023","unstructured":"Weilin Wan, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, and Lingjie Liu. 2023. Tlcontrol: Trajectory and language control for human motion synthesis. arXiv preprint arXiv:2311.17135 (2023)."},{"key":"e_1_2_1_90_1","unstructured":"Chien-Yao Wang I-Hau Yeh and Hong-Yuan Mark Liao. 2024b. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv:2402.13616 [cs.CV]"},{"key":"e_1_2_1_91_1","volume-title":"On-Demand Pedestrian Animation Controller in Driving Scenarios. arXiv preprint arXiv:2404.19722","author":"Wang Jingbo","year":"2024","unstructured":"Jingbo Wang, Zhengyi Luo, Ye Yuan, Yixuan Li, and Bo Dai. 2024a. PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios. arXiv preprint arXiv:2404.19722 (2024)."},{"key":"e_1_2_1_92_1","doi-asserted-by":"crossref","unstructured":"Jingbo Wang Yu Rong Jingyuan Liu Sijie Yan Dahua Lin and Bo Dai. 2022. Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis. arXiv:2205.13001 [cs.CV]","DOI":"10.1109\/CVPR52688.2022.01981"},{"key":"e_1_2_1_93_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 20796--20806","author":"Wang Jingbo","year":"2023","unstructured":"Jingbo Wang, Ye Yuan, Zhengyi Luo, Kevin Xie, Dahua Lin, Umar Iqbal, Sanja Fidler, and Sameh Khamis. 2023. Learning human dynamics in autonomous driving scenarios. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 20796--20806."},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392381"},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459761"},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530067"},{"key":"e_1_2_1_97_1","first-page":"1","article-title":"Composite Motion Learning with Task Control","volume":"42","author":"Xu Pei","year":"2023","unstructured":"Pei Xu, Xiumin Shang, Victor Zordan, and Ioannis Karamouzas. 2023a. Composite Motion Learning with Task Control. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1--16.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_98_1","first-page":"1","article-title":"AdaptNet: Policy adaptation for physics-based character control","volume":"42","author":"Xu Pei","year":"2023","unstructured":"Pei Xu, Kaixiang Xie, Sheldon Andrews, Paul G Kry, Michael Neff, Morgan McGuire, Ioannis Karamouzas, and Victor Zordan. 2023b. AdaptNet: Policy adaptation for physics-based character control. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1--17.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555434"},{"key":"e_1_2_1_100_1","volume-title":"MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations. arXiv preprint arXiv:2310.10198","author":"Yao Heyuan","year":"2023","unstructured":"Heyuan Yao, Zhenhua Song, Yuyang Zhou, Tenglong Ao, Baoquan Chen, and Libin Liu. 2023. MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations. arXiv preprint arXiv:2310.10198 (2023)."},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480504"},{"key":"e_1_2_1_102_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01467"},{"key":"e_1_2_1_103_1","doi-asserted-by":"publisher","unstructured":"Haotian Zhang Ye Yuan Viktor Makoviychuk Yunrong Guo Sanja Fidler Xue Bin Peng and Kayvon Fatahalian. [n. d.]. Learning Physically Simulated Tennis Skills from Broadcast Videos. ACM Trans. Graph. ([n. d.]) 14 pages. 10.1145\/3592408","DOI":"10.1145\/3592408"},{"key":"e_1_2_1_104_1","volume-title":"Motiondiffuse: Text-driven human motion generation with diffusion model","author":"Zhang Mingyuan","year":"2024","unstructured":"Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, and Ziwei Liu. 2024a. Motiondiffuse: Text-driven human motion generation with diffusion model. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)."},{"key":"e_1_2_1_105_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.scitotenv.2024.170346"},{"key":"e_1_2_1_106_1","doi-asserted-by":"crossref","unstructured":"Yunbo Zhang Deepak Gopinath Yuting Ye Jessica Hodgins Greg Turk and Jungdam Won. 2023. Simulation and Retargeting of Complex Multi-Character Interactions. arXiv:2305.20041 [cs.GR]","DOI":"10.1145\/3588432.3591491"},{"key":"e_1_2_1_107_1","volume-title":"EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Motion Generation. arXiv preprint arXiv:2312.02256","author":"Zhou Wenyang","year":"2023","unstructured":"Wenyang Zhou, Zhiyang Dou, Zeyu Cao, Zhouyingcheng Liao, Jingbo Wang, Wenjia Wang, Yuan Liu, Taku Komura, Wenping Wang, and Lingjie Liu. 2023. EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Motion Generation. arXiv preprint arXiv:2312.02256 (2023)."},{"key":"e_1_2_1_108_1","doi-asserted-by":"crossref","unstructured":"Xin Zhou Xiangyong Wen Zhepei Wang Yuman Gao Haojia Li Qianhao Wang Tiankai Yang Haojian Lu Yanjun Cao Chao Xu et al. 2022. Swarm of micro flying robots in the wild. Science Robotics 7 66 (2022) eabm5954.","DOI":"10.1126\/scirobotics.abm5954"},{"key":"e_1_2_1_109_1","unstructured":"Haosheng Zou Hang Su Shihong Song and Jun Zhu. 2018. Understanding human behaviors in crowds by imitating the decision-making process. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (New Orleans Louisiana USA) (AAAI'18\/IAAI'18\/EAAI'18). AAAI Press Article 937 8 pages."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687904","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687904","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:57Z","timestamp":1750295397000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687904"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,19]]},"references-count":109,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,19]]}},"alternative-id":["10.1145\/3687904"],"URL":"https:\/\/doi.org\/10.1145\/3687904","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,19]]},"assertion":[{"value":"2024-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}