{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T00:14:49Z","timestamp":1760228089463,"version":"build-2065373602"},"reference-count":31,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T00:00:00Z","timestamp":1651795200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Singapore Ministry of Education","award":["MOE2018-T2-2-161"],"award-info":[{"award-number":["MOE2018-T2-2-161"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we introduce an approach for future frames prediction based on a single input image. Our method is able to generate an entire video sequence based on the information contained in the input frame. We adopt an autoregressive approach in our generation process, i.e., the output from each time step is fed as the input to the next step. Unlike other video prediction methods that use \u201cone shot\u201d generation, our method is able to preserve much more details from the input image, while also capturing the critical pixel-level changes between the frames. We overcome the problem of generation quality degradation by introducing a \u201ccomplementary mask\u201d module in our architecture, and we show that this allows the model to only focus on the generation of the pixels that need to be changed, and to reuse those that should remain static from its previous frame. We empirically validate our methods against various video prediction models on the UT Dallas Dataset, and show that our approach is able to generate high quality realistic video sequences from one static input image. In addition, we also validate the robustness of our method by testing a pre-trained model on the unseen ADFES facial expression dataset. We also provide qualitative results of our model tested on a human action dataset: The Weizmann Action database.<\/jats:p>","DOI":"10.3390\/s22093533","type":"journal-article","created":{"date-parts":[[2022,5,8]],"date-time":"2022-05-08T23:27:25Z","timestamp":1652052445000},"page":"3533","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Single Image Video Prediction with Auto-Regressive GANs"],"prefix":"10.3390","volume":"22","author":[{"given":"Jiahui","family":"Huang","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yew Ken","family":"Chia","sequence":"additional","affiliation":[{"name":"Information Systems Technology and Design (ISTD), Singapore University of Technology and Design, Singapore 487372, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4647-8539","authenticated-orcid":false,"given":"Samson","family":"Yu","sequence":"additional","affiliation":[{"name":"Information Systems Technology and Design (ISTD), Singapore University of Technology and Design, Singapore 487372, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kevin","family":"Yee","sequence":"additional","affiliation":[{"name":"Information Systems Technology and Design (ISTD), Singapore University of Technology and Design, Singapore 487372, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8992-5648","authenticated-orcid":false,"given":"Dennis","family":"K\u00fcster","sequence":"additional","affiliation":[{"name":"Cognitive Systems Lab, Department of Mathematics and Computer Science, University of Bremen, 28359 Bremen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1894-2517","authenticated-orcid":false,"given":"Eva G.","family":"Krumhuber","sequence":"additional","affiliation":[{"name":"Department of Experimental Psychology, University College London, 26 Bedford Way, London WC1H 0AP, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8607-1640","authenticated-orcid":false,"given":"Dorien","family":"Herremans","sequence":"additional","affiliation":[{"name":"Information Systems Technology and Design (ISTD), Singapore University of Technology and Design, Singapore 487372, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6439-8076","authenticated-orcid":false,"given":"Gemma","family":"Roig","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Goethe University Frankfurt, Robert-Meyer-Str. 11-15, 60325 Frankfurt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,6]]},"reference":[{"key":"ref_1","unstructured":"Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014, January 8\u201313). Generative Adversarial Networks. Proceedings of the 28th Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J.Y., Zhou, T., and Efros, A.A. (2017, January 21\u201326). Image-to-Image Translation with Conditional Adversarial Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.632"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Karras, T., Laine, S., and Aila, T. (2019, January 15\u201320). A Style-Based Generator Architecture for Generative Adversarial Networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00453"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhu, J.Y., Park, T., Isola, P., and Efros, A.A. (2017, January 22\u201329). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.244"},{"key":"ref_5","first-page":"613","article-title":"Generating videos with scene dynamics","volume":"29","author":"Vondrick","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_6","unstructured":"Shi, X., Chen, Z., Wang, H., Yeung, D., Wong, W., and chun Woo, W. (2015, January 7\u201312). Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Proceedings of the Advances in Neural Information Processing Systems 28, Montreal, QC, Canada."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., and Yang, M.H. (2018, January 8\u201314). Flow-Grounded Spatial-Temporal Video Prediction from Still Images. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_37"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Cai, H., Bai, C., Tai, Y.-W., and Tang, C.-K. (2018, January 8\u201314). Deep Video Generation, Prediction and Completion of Human Action Sequences. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_23"},{"key":"ref_9","unstructured":"Villegas, R., Yang, J., Hong, S., Lin, X., and Lee, H. (2017). Decomposing Motion and Content for Natural Video Sequence Prediction. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Reda, F.A., Liu, G., Shih, K.J., Kirby, R., Barker, J., Tarjan, D., Tao, A., and Catanzaro, B. (2018, January 8\u201314). SDCNet: Video Prediction Using Spatially-Displaced Convolution. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_44"},{"key":"ref_11","unstructured":"Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N. (2015, January 7\u201312). Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Proceedings of the Advances in Neural Information Processing Systems 28, Montreal, QC, Canada."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NE, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Shen, G., Huang, W., Gan, C., Tan, M., Huang, J., Zhu, W., and Gong, B. (2019, January 21\u201325). Facial Image-to-Video Translation by a Hidden Affine Transformation. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France.","DOI":"10.1145\/3343031.3350981"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Pumarola, A., Agudo, A., Mart\u00ednez, A., Sanfeliu, A., and Moreno-Noguer, F. (2018, January 8\u201314). GANimation: Anatomically-aware Facial Animation from a Single Image. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_50"},{"key":"ref_16","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"812","DOI":"10.1109\/TPAMI.2005.90","article-title":"A video database of moving faces and people","volume":"27","author":"Harms","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2247","DOI":"10.1109\/TPAMI.2007.70711","article-title":"Actions as Space-Time Shapes","volume":"29","author":"Gorelick","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_19","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2015, January 7\u201312). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Proceedings of the Advances in Neural Information Processing Systems 30, Montreal, QC, Canada."},{"key":"ref_20","unstructured":"Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., and Chen, X. (2016, January 6\u201310). Improved Techniques for Training GANs. Proceedings of the Advances in Neural Information Processing Systems 29, Barcelona, Spain."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1037\/a0023853","article-title":"Moving faces, looking places: Validation of the Amsterdam Dynamic Facial Expression Set (ADFES)","volume":"11","author":"Hawk","year":"2011","journal-title":"Emotion"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., and Choo, J. (2018, January 18\u201323). StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00916"},{"key":"ref_23","unstructured":"Kim, J., Kim, M., Kang, H., and Lee, K. (2019). U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chen, Y.S., Wang, Y.C., Kao, M.H., and Chuang, Y.Y. (2018, January 18\u201323). Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00660"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Deng, Y., Loy, C.C., and Tang, X. (2018, January 18\u201323). Aesthetic-Driven Image Enhancement by Adversarial Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1145\/3240508.3240531"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Berg, T.L. (2016, January 8\u201316). Learning Temporal Transformations From Time-Lapse Videos. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_16"},{"key":"ref_27","unstructured":"Mirza, M., and Osindero, S. (2014). Conditional Generative Adversarial Nets. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1109\/TIP.2019.2930152","article-title":"Unsupervised online video object segmentation with motion property understanding","volume":"29","author":"Zhuo","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_29","unstructured":"Minkesh, A., Worranitta, K., and Taizo, M. (2019). Human extraction and scene transition utilizing Mask R-CNN. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Tulyakov, S., Liu, M.-Y., Yang, X., and Kautz, J. (2018, January 18\u201323). Mocogan: Decomposing motion and content for video generation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00165"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/9\/3533\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:06:58Z","timestamp":1760137618000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/9\/3533"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,6]]},"references-count":31,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["s22093533"],"URL":"https:\/\/doi.org\/10.3390\/s22093533","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,5,6]]}}}