{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,26]],"date-time":"2025-10-26T21:28:10Z","timestamp":1761514090506,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,10,23]],"date-time":"2017-10-23T00:00:00Z","timestamp":1508716800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Guangdong Science and Technology Project","award":["2014B010117007"],"award-info":[{"award-number":["2014B010117007"]}]},{"name":"Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality","award":["ZDSYS201703031405467"],"award-info":[{"award-number":["ZDSYS201703031405467"]}]},{"name":"Shenzhen Peacock Plan","award":["20130408-183003656"],"award-info":[{"award-number":["20130408-183003656"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,10,23]]},"DOI":"10.1145\/3126686.3126737","type":"proceedings-article","created":{"date-parts":[[2017,10,23]],"date-time":"2017-10-23T19:20:32Z","timestamp":1508786432000},"page":"358-366","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":29,"title":["Video Imagination from a Single Image with Transformation Generation"],"prefix":"10.1145","author":[{"given":"Baoyang","family":"Chen","sequence":"first","affiliation":[{"name":"Peking University, Shenzhen, China"}]},{"given":"Wenmin","family":"Wang","sequence":"additional","affiliation":[{"name":"Peking University, Shenzhen, China"}]},{"given":"Jinzhuo","family":"Wang","sequence":"additional","affiliation":[{"name":"Peking University, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2017,10,23]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Ashish Agarwal , Paul Barham , Eugene Brevdo , Zhifeng Chen , Craig Citro , Greg S Corrado , Andy Davis , Jeffrey Dean , Matthieu Devin , 2016 . Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016). Mart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)."},{"key":"e_1_3_2_1_2_1","volume-title":"Wasserstein gan. arXiv preprint arXiv:1701.07875","author":"Arjovsky Martin","year":"2017","unstructured":"Martin Arjovsky , Soumith Chintala , and L\u00e9on Bottou . 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 ( 2017 ). Martin Arjovsky, Soumith Chintala, and L\u00e9on Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.1990.139486"},{"key":"e_1_3_2_1_4_1","first-page":"25","article-title":"High accuracy optical flow estimation based on a theory for warping","volume":"2004","author":"Brox Thomas","year":"2004","unstructured":"Thomas Brox , Andr\u00e9s Bruhn , Nils Papenberg , and Joachim Weickert . 2004 . High accuracy optical flow estimation based on a theory for warping . Computer Vision-ECCV 2004 (2004), 25 -- 36 . Thomas Brox, Andr\u00e9s Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. Computer Vision-ECCV 2004 (2004), 25--36.","journal-title":"Computer Vision-ECCV"},{"key":"e_1_3_2_1_5_1","unstructured":"Bert De Brabandere Xu Jia Tinne Tuytelaars and Luc Van Gool. 2016. Dynamic lter networks. In Neural Information Processing Systems (NIPS).  Bert De Brabandere Xu Jia Tinne Tuytelaars and Luc Van Gool. 2016. Dynamic lter networks. In Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_6_1","unstructured":"Emily L. Denton Soumith Chintala Rob Fergus etal 2015. Deep Generative Image Models using a\u00ef\u00a3j Laplacian Pyramid of Adversarial Networks. In Advances in neural information processing systems. 1486--1494.   Emily L. Denton Soumith Chintala Rob Fergus et al. 2015. Deep Generative Image Models using a\u00ef\u00a3j Laplacian Pyramid of Adversarial Networks. In Advances in neural information processing systems. 1486--1494."},{"key":"e_1_3_2_1_7_1","unstructured":"Chelsea Finn Ian Goodfellow and Sergey Levine. 2016. Unsupervised learning for physical interaction through video prediction. In Advances In Neural Information Processing Systems. 64--72.  Chelsea Finn Ian Goodfellow and Sergey Levine. 2016. Unsupervised learning for physical interaction through video prediction. In Advances In Neural Information Processing Systems. 64--72."},{"key":"e_1_3_2_1_8_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.   Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680."},{"key":"e_1_3_2_1_9_1","volume-title":"Danilo Jimenez Rezende, and Daan Wierstra","author":"Gregor Karol","year":"2015","unstructured":"Karol Gregor , Ivo Danihelka , Alex Graves , Danilo Jimenez Rezende, and Daan Wierstra . 2015 . DRAW : A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015). Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0683-3"},{"key":"e_1_3_2_1_12_1","unstructured":"Max Jaderberg Karen Simonyan Andrew Zisserman etal 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017-- 2025.   Max Jaderberg Karen Simonyan Andrew Zisserman et al. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017-- 2025."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.59"},{"key":"e_1_3_2_1_14_1","volume-title":"Video pixel networks. arXiv preprint arXiv:1610.00527","author":"Kalchbrenner Nal","year":"2016","unstructured":"Nal Kalchbrenner , Aaron van den Oord , Karen Simonyan , Ivo Danihelka , Oriol Vinyals , Alex Graves , and Koray Kavukcuoglu . 2016. Video pixel networks. arXiv preprint arXiv:1610.00527 ( 2016 ). Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. 2016. Video pixel networks. arXiv preprint arXiv:1610.00527 (2016)."},{"key":"e_1_3_2_1_15_1","volume-title":"Kingma and Max Welling","author":"Diederik","year":"2013","unstructured":"Diederik P. Kingma and Max Welling . 2013 . Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013). Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33765-9_15"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10578-9_45"},{"key":"e_1_3_2_1_18_1","volume-title":"Video Frame Synthesis using Deep Voxel Flow. arXiv preprint arXiv:1702.02463","author":"Liu Ziwei","year":"2017","unstructured":"Ziwei Liu , Raymond Yeh , Xiaoou Tang , Yiming Liu , and Aseem Agarwala . 2017. Video Frame Synthesis using Deep Voxel Flow. arXiv preprint arXiv:1702.02463 ( 2017 ). Ziwei Liu, Raymond Yeh, Xiaoou Tang, Yiming Liu, and Aseem Agarwala. 2017. Video Frame Synthesis using Deep Voxel Flow. arXiv preprint arXiv:1702.02463 (2017)."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531348"},{"key":"e_1_3_2_1_20_1","volume-title":"Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440","author":"Mathieu Michael","year":"2015","unstructured":"Michael Mathieu , Camille Couprie , and Yann LeCun . 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 ( 2015 ). Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2012.2214050"},{"volume-title":"European Conference on Computer Vision. Springer, 172--187","author":"Pintea Silvia L","key":"e_1_3_2_1_22_1","unstructured":"Silvia L Pintea , Jan C. van Gemert , and Arnold W. M. Smeulders . 2014. D\u00e9ja vu . In European Conference on Computer Vision. Springer, 172--187 . Silvia L Pintea, Jan C. van Gemert, and Arnold W. M. Smeulders. 2014. D\u00e9ja vu. In European Conference on Computer Vision. Springer, 172--187."},{"key":"e_1_3_2_1_23_1","volume-title":"Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434","author":"Radford Alec","year":"2015","unstructured":"Alec Radford , Luke Metz , and Soumith Chintala . 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 ( 2015 ). Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)."},{"key":"e_1_3_2_1_24_1","volume-title":"a baseline for generative models of natural videos. arXiv preprint arXiv:1412.6604","author":"Ranzato MarcAurelio","year":"2014","unstructured":"MarcAurelio Ranzato , Arthur Szlam , Joan Bruna , Michael Mathieu , Ronan Collobert , and Sumit Chopra . 2014. Video (language) modeling : a baseline for generative models of natural videos. arXiv preprint arXiv:1412.6604 ( 2014 ). MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, and Sumit Chopra. 2014. Video (language) modeling: a baseline for generative models of natural videos. arXiv preprint arXiv:1412.6604 (2014)."},{"key":"e_1_3_2_1_25_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro , Amir Roshan Zamir, and Mubarak Shah . 2012 . UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012). Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)."},{"key":"e_1_3_2_1_26_1","unstructured":"Nitish Srivastava Elman Mansimov and Ruslan Salakhutdinov. 2015. Unsupervised Learning of Video Representations using LSTMs.. In ICML. 843--852.   Nitish Srivastava Elman Mansimov and Ruslan Salakhutdinov. 2015. Unsupervised Learning of Video Representations using LSTMs.. In ICML. 843--852."},{"key":"e_1_3_2_1_27_1","unstructured":"Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.   Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2006.312685"},{"key":"e_1_3_2_1_29_1","volume-title":"A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844","author":"Theis Lucas","year":"2015","unstructured":"Lucas Theis , A\u00e4ron van den Oord , and Matthias Bethge . 2015. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844 ( 2015 ). Lucas Theis, A\u00e4ron van den Oord, and Matthias Bethge. 2015. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844 (2015)."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_3_2_1_31_1","volume-title":"Transformation-based models of video sequences. arXiv preprint arXiv:1701.08435","author":"van Amersfoort Joost","year":"2017","unstructured":"Joost van Amersfoort , Anitha Kannan , Marc'Aurelio Ranzato , Arthur Szlam , Du Tran , and Soumith Chintala . 2017. Transformation-based models of video sequences. arXiv preprint arXiv:1701.08435 ( 2017 ). Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, and Soumith Chintala. 2017. Transformation-based models of video sequences. arXiv preprint arXiv:1701.08435 (2017)."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390294"},{"key":"e_1_3_2_1_33_1","unstructured":"Carl Vondrick Hamed Pirsiavash and Antonio Torralba. 2016. Generating videos with scene dynamics. In Advances In Neural Information Processing Systems. 613-- 621.  Carl Vondrick Hamed Pirsiavash and Antonio Torralba. 2016. Generating videos with scene dynamics. In Advances In Neural Information Processing Systems. 613-- 621."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46478-7_51"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.416"},{"volume-title":"Computer Vision and Pattern Recognition, 1993. Proceedings CVPR'93, 1993 IEEE Computer Society Conference on. IEEE, 361--366","author":"John Y.","key":"e_1_3_2_1_36_1","unstructured":"John Y. A. Wang and Edward H. Adelson. 1993. Layered representation for motion analysis . In Computer Vision and Pattern Recognition, 1993. Proceedings CVPR'93, 1993 IEEE Computer Society Conference on. IEEE, 361--366 . John Y. A. Wang and Edward H. Adelson. 1993. Layered representation for motion analysis. In Computer Vision and Pattern Recognition, 1993. Proceedings CVPR'93, 1993 IEEE Computer Society Conference on. IEEE, 361--366."},{"key":"e_1_3_2_1_37_1","volume-title":"Bovik","author":"Wang Zhou","year":"2009","unstructured":"Zhou Wang and Alan C . Bovik . 2009 . Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE signal processing magazine 26, 1 (2009), 98--117. Zhou Wang and Alan C. Bovik. 2009. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE signal processing magazine 26, 1 (2009), 98--117."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_51"},{"key":"e_1_3_2_1_40_1","volume-title":"Synthesizing Dynamic Textures and Sounds by Spatial-Temporal Generative ConvNet. arXiv preprint arXiv:1606.00972","author":"Xie Jianwen","year":"2016","unstructured":"Jianwen Xie , Song-Chun Zhu , and Ying Nian Wu. 2016. Synthesizing Dynamic Textures and Sounds by Spatial-Temporal Generative ConvNet. arXiv preprint arXiv:1606.00972 ( 2016 ). Jianwen Xie, Song-Chun Zhu, and Ying Nian Wu. 2016. Synthesizing Dynamic Textures and Sounds by Spatial-Temporal Generative ConvNet. arXiv preprint arXiv:1606.00972 (2016)."},{"key":"e_1_3_2_1_41_1","unstructured":"Tianfan Xue Jiajun Wu Katherine Bouman and Bill Freeman. 2016. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. In Advances in Neural Information Processing Systems. 91--99.  Tianfan Xue Jiajun Wu Katherine Bouman and Bill Freeman. 2016. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. In Advances in Neural Information Processing Systems. 91--99."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_47"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5539957"},{"key":"e_1_3_2_1_44_1","volume-title":"Efros","author":"Zhou Tinghui","year":"2016","unstructured":"Tinghui Zhou , Shubham Tulsiani , Weilun Sun , Jitendra Malik , and Alexei A . Efros . 2016 . View synthesis by appearance flow. In European Conference on Computer Vision. Springer , 286--301. Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A. Efros. 2016. View synthesis by appearance flow. In European Conference on Computer Vision. Springer, 286--301."}],"event":{"name":"MM '17: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Mountain View California USA","acronym":"MM '17"},"container-title":["Proceedings of the on Thematic Workshops of ACM Multimedia 2017"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3126686.3126737","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3126686.3126737","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:10:54Z","timestamp":1750212654000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3126686.3126737"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,23]]},"references-count":44,"alternative-id":["10.1145\/3126686.3126737","10.1145\/3126686"],"URL":"https:\/\/doi.org\/10.1145\/3126686.3126737","relation":{},"subject":[],"published":{"date-parts":[[2017,10,23]]},"assertion":[{"value":"2017-10-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}