{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T20:14:43Z","timestamp":1762028083429,"version":"build-2065373602"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2020,11,27]],"date-time":"2020-11-27T00:00:00Z","timestamp":1606435200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/M013774\/1"],"award-info":[{"award-number":["EP\/M013774\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2020,12,31]]},"abstract":"<jats:p>\n            We present a method for retiming people in an ordinary, natural video --- manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up\/slowing down, or entirely \"freezing\" people), or \"erase\" selected people from the video altogether. We achieve these effects computationally via a dedicated learning-based layered video representation, where each frame in the video is decomposed into separate RGBA layers, representing the appearance of different people in the video. A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person\n            <jats:italic>automatically<\/jats:italic>\n            with the scene changes they generate---e.g., shadows, reflections, and motion of loose clothing. The layers can be individually retimed and recombined into a new video, allowing us to achieve realistic, high-quality renderings of retiming effects for real-world videos depicting complex actions and involving multiple individuals, including dancing, trampoline jumping, or group running.\n          <\/jats:p>","DOI":"10.1145\/3414685.3417760","type":"journal-article","created":{"date-parts":[[2020,11,27]],"date-time":"2020-11-27T21:51:05Z","timestamp":1606513865000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":39,"title":["Layered neural rendering for retiming people in video"],"prefix":"10.1145","volume":"39","author":[{"given":"Erika","family":"Lu","sequence":"first","affiliation":[{"name":"University of Oxford, Google Research"}]},{"given":"Forrester","family":"Cole","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"Tali","family":"Dekel","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"Weidi","family":"Xie","sequence":"additional","affiliation":[{"name":"University of Oxford"}]},{"given":"Andrew","family":"Zisserman","sequence":"additional","affiliation":[{"name":"University of Oxford"}]},{"given":"David","family":"Salesin","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"William T.","family":"Freeman","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"Michael","family":"Rubinstein","sequence":"additional","affiliation":[{"name":"Google Research"}]}],"member":"320","published-online":{"date-parts":[[2020,11,27]]},"reference":[{"volume-title":"Computer Graphics Forum","author":"Aberman Kfir","key":"e_1_2_2_1_1","unstructured":"Kfir Aberman , Mingyi Shi , Jing Liao , Dani Lischinski , Baoquan Chen , and Daniel Cohen-Or . 2019. Deep Video-Based Performance Cloning . In Computer Graphics Forum , Vol. 38 . Wiley Online Library , 219--233. Kfir Aberman, Mingyi Shi, Jing Liao, Dani Lischinski, Baoquan Chen, and Daniel Cohen-Or. 2019. Deep Video-Based Performance Cloning. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 219--233."},{"key":"e_1_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Aseem Agarwala Colin Zheng Chris Pal Maneesh Agrawala Michael Cohen Brian Curless David Salesin and Richard Szeliski. 2005. Panoramic Video Textures. In SIGGRAPH.  Aseem Agarwala Colin Zheng Chris Pal Maneesh Agrawala Michael Cohen Brian Curless David Salesin and Richard Szeliski. 2005. Panoramic Video Textures. In SIGGRAPH.","DOI":"10.1145\/1186822.1073268"},{"key":"e_1_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Jean-Baptiste Alayrac Jo\u00e3o Carreira and Andrew Zisserman. 2019a. The Visual Centrifuge: Model-Free Layered Video Representations. In CVPR.  Jean-Baptiste Alayrac Jo\u00e3o Carreira and Andrew Zisserman. 2019a. The Visual Centrifuge: Model-Free Layered Video Representations. In CVPR.","DOI":"10.1109\/CVPR.2019.00256"},{"key":"e_1_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Jean-Baptiste Alayrac Joao Carreira Relja Arandjelovic and Andrew Zisserman. 2019b. Controllable Attention for Structured Layered Video Decomposition. In ICCV.  Jean-Baptiste Alayrac Joao Carreira Relja Arandjelovic and Andrew Zisserman. 2019b. Controllable Attention for Structured Layered Video Decomposition. In ICCV.","DOI":"10.1109\/ICCV.2019.00583"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1576246.1531376"},{"key":"e_1_2_2_6_1","volume-title":"Video Tapestries with Continuous Temporal Zoom. SIGGRAPH","author":"Barnes Connelly","year":"2010","unstructured":"Connelly Barnes , Dan B Goldman , Eli Shechtman , and Adam Finkelstein . 2010. Video Tapestries with Continuous Temporal Zoom. SIGGRAPH ( 2010 ). Connelly Barnes, Dan B Goldman, Eli Shechtman, and Adam Finkelstein. 2010. Video Tapestries with Continuous Temporal Zoom. SIGGRAPH (2010)."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1275808.1276505"},{"key":"e_1_2_2_8_1","unstructured":"Daniel Castro Steven Hickson Patsorn Sangkloy Bhavishya Mittal Sean Dai James Hays and Irfan Essa. 2018. Let's Dance: Learning From Online Dance Videos. In eprint arXiv:2139179.  Daniel Castro Steven Hickson Patsorn Sangkloy Bhavishya Mittal Sean Dai James Hays and Irfan Essa. 2018. Let's Dance: Learning From Online Dance Videos. In eprint arXiv:2139179."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00603"},{"key":"e_1_2_2_10_1","doi-asserted-by":"crossref","unstructured":"Yung-Yu Chuang Aseem Agarwala Brian Curless David Salesin and Richard Szeliski. 2002. Video matting of complex scenes. In SIGGRAPH.  Yung-Yu Chuang Aseem Agarwala Brian Curless David Salesin and Richard Szeliski. 2002. Video matting of complex scenes. In SIGGRAPH.","DOI":"10.1145\/566570.566572"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201371"},{"key":"e_1_2_2_12_1","volume-title":"RMPE: Regional Multiperson Pose Estimation. In ICCV.","author":"Fang Hao-Shu","year":"2017","unstructured":"Hao-Shu Fang , Shuqin Xie , Yu-Wing Tai , and Cewu Lu . 2017 . RMPE: Regional Multiperson Pose Estimation. In ICCV. Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional Multiperson Pose Estimation. In ICCV."},{"key":"e_1_2_2_13_1","unstructured":"Oran Gafni Lior Wolf and Yaniv Taigman. 2020. Vid2Game: Controllable Characters Extracted from Real-World Videos. In ICLR.  Oran Gafni Lior Wolf and Yaniv Taigman. 2020. Vid2Game: Controllable Characters Extracted from Real-World Videos. In ICLR."},{"key":"e_1_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Yossi Gandelsman Assaf Shocher and Michal Irani. 2019. \"Double-DIP\": Unsupervised Image Decomposition via Coupled Deep-Image-Priors. In CVPR.  Yossi Gandelsman Assaf Shocher and Michal Irani. 2019. \"Double-DIP\": Unsupervised Image Decomposition via Coupled Deep-Image-Priors. In CVPR.","DOI":"10.1109\/CVPR.2019.01128"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1449715.1449719"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995525"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00762"},{"key":"e_1_2_2_18_1","doi-asserted-by":"crossref","unstructured":"Qiqi Hou and Feng Liu. 2019. Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation. In ICCV.  Qiqi Hou and Feng Liu. 2019. Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation. In ICCV.","DOI":"10.1109\/ICCV.2019.00423"},{"key":"e_1_2_2_19_1","doi-asserted-by":"crossref","unstructured":"Phillip Isola Jun-Yan Zhu Tinghui Zhou and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR.  Phillip Isola Jun-Yan Zhu Tinghui Zhou and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR.","DOI":"10.1109\/CVPR.2017.632"},{"key":"e_1_2_2_20_1","unstructured":"Njegica Jojic and B.J. Frey. 2001. Learning flexible sprites in video layers. In CVPR.  Njegica Jojic and B.J. Frey. 2001. Learning flexible sprites in video layers. In CVPR."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766954"},{"key":"e_1_2_2_22_1","doi-asserted-by":"crossref","unstructured":"H. Kim P. Garrido A. Tewari W. Xu J. Thies M. Nie\u00dfner P. P\u00e9erez C. Richardt M. Zollh\u00f6fer and C. Theobalt. 2018. Deep Video Portraits. ACM Transactions on Graphics 2018 (TOG) (2018).  H. Kim P. Garrido A. Tewari W. Xu J. Thies M. Nie\u00dfner P. P\u00e9erez C. Richardt M. Zollh\u00f6fer and C. Theobalt. 2018. Deep Video Portraits. ACM Transactions on Graphics 2018 (TOG) (2018).","DOI":"10.1145\/3197517.3201283"},{"key":"e_1_2_2_23_1","volume-title":"Adam: A Method for Stochastic Optimization. ICLR","author":"Kingma Diederik","year":"2014","unstructured":"Diederik Kingma and Jimmy Ba . 2014 . Adam: A Method for Stochastic Optimization. ICLR (2014). Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. ICLR (2014)."},{"key":"e_1_2_2_24_1","volume-title":"Learning Layered Motion Segmentations of Video. IJCV","author":"Kumar M. Pawan","year":"2008","unstructured":"M. Pawan Kumar , Philip H. S. Torr , and Andrew Zisserman . 2008. Learning Layered Motion Segmentations of Video. IJCV ( 2008 ). M. Pawan Kumar, Philip H. S. Torr, and Andrew Zisserman. 2008. Learning Layered Motion Segmentations of Video. IJCV (2008)."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00708"},{"key":"e_1_2_2_26_1","volume-title":"MetaPix: Few-Shot Video Retargeting. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJx1URNKwH","author":"Lee Jessica","year":"2020","unstructured":"Jessica Lee , Deva Ramanan , and Rohit Girdhar . 2020 . MetaPix: Few-Shot Video Retargeting. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJx1URNKwH Jessica Lee, Deva Ramanan, and Rohit Girdhar. 2020. MetaPix: Few-Shot Video Retargeting. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJx1URNKwH"},{"key":"e_1_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Yin Li Jian Sun and Heung-Yeung Shum. 2005. Video Object Cut and Paste. In SIGGRAPH.  Yin Li Jian Sun and Heung-Yeung Shum. 2005. Video Object Cut and Paste. In SIGGRAPH.","DOI":"10.1145\/1186822.1073234"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3333002"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818013"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275099"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1218064.1218092"},{"key":"e_1_2_2_32_1","doi-asserted-by":"crossref","unstructured":"Moustafa Meshry Dan B Goldman Sameh Khamis Hugues Hoppe Rohit Pandey Noah Snavely and Ricardo Martin-Brualla. 2019. Neural rerendering in the wild.  Moustafa Meshry Dan B Goldman Sameh Khamis Hugues Hoppe Rohit Pandey Noah Snavely and Ricardo Martin-Brualla. 2019. Neural rerendering in the wild.","DOI":"10.1109\/CVPR.2019.00704"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6878--6887","author":"In","key":"e_1_2_2_33_1","unstructured":"In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6878--6887 . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6878--6887."},{"key":"e_1_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Ajay Nandoriya Elgharib Mohamed Changil Kim Mohamed Hefeeda and Wojciech Matusik. 2017. Video Reflection Removal Through Spatio-Temporal Optimization. In ICCV.  Ajay Nandoriya Elgharib Mohamed Changil Kim Mohamed Hefeeda and Wojciech Matusik. 2017. Video Reflection Removal Through Spatio-Temporal Optimization. In ICCV.","DOI":"10.1109\/ICCV.2017.264"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299109"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/964965.808606"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.29"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00253"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00254"},{"key":"e_1_2_2_41_1","doi-asserted-by":"crossref","unstructured":"Pratul P. Srinivasan Richard Tucker Jonathan T. Barron Ravi Ramamoorthi Ren Ng and Noah Snavely. 2019. Pushing the Boundaries of View Extrapolation with Multiplane Images. In CVPR.  Pratul P. Srinivasan Richard Tucker Jonathan T. Barron Ravi Ramamoorthi Ren Ng and Noah Snavely. 2019. Pushing the Boundaries of View Extrapolation with Multiplane Images. In CVPR.","DOI":"10.1109\/CVPR.2019.00026"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323035"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1002\/cem.859"},{"key":"e_1_2_2_44_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9446--9454","author":"Ulyanov Dmitry","year":"2018","unstructured":"Dmitry Ulyanov , Andrea Vedaldi , and Victor Lempitsky . 2018 . Deep image prior . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9446--9454 . Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2018. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9446--9454."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/83.334981"},{"key":"e_1_2_2_46_1","volume-title":"Cohen","author":"Wang Jue","year":"2005","unstructured":"Jue Wang , Pravin Bhat , R. Alex Colburn , Maneesh Agrawala , and Michael F . Cohen . 2005 . Interactive Video Cutout. TOG ( 2005). Jue Wang, Pravin Bhat, R. Alex Colburn, Maneesh Agrawala, and Michael F. Cohen. 2005. Interactive Video Cutout. TOG (2005)."},{"key":"e_1_2_2_47_1","volume-title":"Space-Time Completion of Video. PAMI","author":"Wexler Yonatan","year":"2007","unstructured":"Yonatan Wexler , Eli Shechtman , and Michal Irani . 2007. Space-Time Completion of Video. PAMI ( 2007 ). Yonatan Wexler, Eli Shechtman, and Michal Irani. 2007. Space-Time Completion of Video. PAMI (2007)."},{"key":"e_1_2_2_48_1","volume-title":"Pose Flow: Efficient Online Pose Tracking. In BMVC.","author":"Xiu Yuliang","year":"2018","unstructured":"Yuliang Xiu , Jiefeng Li , Haoyu Wang , Yinghong Fang , and Cewu Lu . 2018 . Pose Flow: Efficient Online Pose Tracking. In BMVC. Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. 2018. Pose Flow: Efficient Online Pose Tracking. In BMVC."},{"key":"e_1_2_2_49_1","unstructured":"Ning Xu Brian Price Scott Cohen and Thomas Huang. 2017. Deep Image Matting. In CVPR.  Ning Xu Brian Price Scott Cohen and Thomas Huang. 2017. Deep Image Matting. In CVPR."},{"key":"e_1_2_2_50_1","volume-title":"Freeman","author":"Xue Tianfan","year":"2015","unstructured":"Tianfan Xue , Michael Rubinstein , Ce Liu , and William T . Freeman . 2015 . A Computational Approach for Obstruction-Free Photography. ACM Transactions on Graphics (Proc. SIGGRAPH) 34, 4 (2015). Tianfan Xue, Michael Rubinstein, Ce Liu, and William T. Freeman. 2015. A Computational Approach for Obstruction-Free Photography. ACM Transactions on Graphics (Proc. SIGGRAPH) 34, 4 (2015)."},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cag.2014.11.001"},{"key":"e_1_2_2_52_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3358--3365","author":"Zhou Feng","year":"2014","unstructured":"Feng Zhou , Sing Bing Kang , and Michael F Cohen . 2014 . Time-mapping using spacetime saliency . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3358--3365 . Feng Zhou, Sing Bing Kang, and Michael F Cohen. 2014. Time-mapping using spacetime saliency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3358--3365."},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201323"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00153"},{"key":"e_1_2_2_55_1","volume-title":"Matthew Uyttendaele, Simon Winder, and Richard Szeliski.","author":"Zitnick C. Lawrence","year":"2004","unstructured":"C. Lawrence Zitnick , Sing Bing Kang , Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004 . High-quality video view interpolation using a layered representation. TOG. C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. TOG."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3414685.3417760","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3414685.3417760","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:03:13Z","timestamp":1750197793000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3414685.3417760"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,27]]},"references-count":55,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2020,12,31]]}},"alternative-id":["10.1145\/3414685.3417760"],"URL":"https:\/\/doi.org\/10.1145\/3414685.3417760","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2020,11,27]]},"assertion":[{"value":"2020-11-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}