{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T19:24:07Z","timestamp":1776108247752,"version":"3.50.1"},"reference-count":34,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,2,24]],"date-time":"2023-02-24T00:00:00Z","timestamp":1677196800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science KAKENHI","doi-asserted-by":"publisher","award":["JP19K12039"],"award-info":[{"award-number":["JP19K12039"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we propose a sequential variational autoencoder for video disentanglement, which is a representation learning method that can be used to separately extract static and dynamic features from videos. Building sequential variational autoencoders with a two-stream architecture induces inductive bias for video disentanglement. However, our preliminary experiment demonstrated that the two-stream architecture is insufficient for video disentanglement because static features frequently contain dynamic features. Additionally, we found that dynamic features are not discriminative in the latent space. To address these problems, we introduced an adversarial classifier using supervised learning into the two-stream architecture. The strong inductive bias through supervision separates dynamic features from static features and yields discriminative representations of the dynamic features. Through a comparison with other sequential variational autoencoders, we qualitatively and quantitatively demonstrate the effectiveness of the proposed method on the Sprites and MUG datasets.<\/jats:p>","DOI":"10.3390\/s23052515","type":"journal-article","created":{"date-parts":[[2023,2,24]],"date-time":"2023-02-24T03:01:01Z","timestamp":1677207661000},"page":"2515","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Sequential Variational Autoencoder with Adversarial Classifier for Video Disentanglement"],"prefix":"10.3390","volume":"23","author":[{"given":"Takeshi","family":"Haga","sequence":"first","affiliation":[{"name":"Department of Applied and Cognitive Informatics, Graduate School of Science and Engineering, Chiba University, Chiba 263-8522, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9830-0436","authenticated-orcid":false,"given":"Hiroshi","family":"Kera","sequence":"additional","affiliation":[{"name":"Graduate School of Engineering, Chiba University, Chiba 263-8522, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3701-1961","authenticated-orcid":false,"given":"Kazuhiko","family":"Kawamoto","sequence":"additional","affiliation":[{"name":"Graduate School of Engineering, Chiba University, Chiba 263-8522, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation Learning: A Review and New Perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","unstructured":"Kingma, D.P., and Welling, M. (2014, January 14\u201316). Auto-Encoding Variational Bayes. Proceedings of the ICLR, Banff, AB, Canada."},{"key":"ref_3","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014, January 8\u201313). Generative Adversarial Nets. Proceedings of the NeurIPS, Montreal, QC, Canada."},{"key":"ref_4","unstructured":"Mirza, M., and Osindero, S. (2014). Conditional Generative Adversarial Nets. arXiv."},{"key":"ref_5","unstructured":"Odena, A., Olah, C., and Shlens, J. (2017, January 6\u201311). Conditional Image Synthesis with Auxiliary Classifier GANs. Proceedings of the ICML, Sydney, Australia."},{"key":"ref_6","unstructured":"Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. (2016, January 5\u201310). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Proceedings of the NeurIPS, Barcelona, Spain."},{"key":"ref_7","unstructured":"Lin, Z., Thekumparampil, K., Fanti, G., and Oh, S. (2020, January 13\u201318). Infogan-cr and modelcentrality: Self-supervised model training and selection for disentangling gans. Proceedings of the ICML, Virtual."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Tulyakov, S., Liu, M.Y., Yang, X., and Kautz, J. (2018, January 18\u201322). Mocogan: Decomposing motion and content for video generation. Proceedings of the CVPR, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00165"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Skorokhodov, I., Tulyakov, S., and Elhoseiny, M. (2022, January 19\u201324). Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2. Proceedings of the CVPR, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00361"},{"key":"ref_10","unstructured":"Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2017, January 24\u201326). beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Proceedings of the ICLR, Toulon, France."},{"key":"ref_11","unstructured":"Kim, H., and Mnih, A. (2018, January 10\u201315). Disentangling by Factorising. Proceedings of the ICML, Stockholm, Sweden."},{"key":"ref_12","unstructured":"Chen, R.T.Q., Li, X., Grosse, R.B., and Duvenaud, D.K. (2018, January 3\u20138). Isolating Sources of Disentanglement in Variational Autoencoders. Proceedings of the NeurIPS, Montr\u00e9al, QC, Canada."},{"key":"ref_13","unstructured":"Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Sch\u00f6lkopf, B., and Bachem, O. (2019, January 9\u201315). Challenging common assumptions in the unsupervised learning of disentangled representations. Proceedings of the ICML, Long Beach, CA, USA."},{"key":"ref_14","unstructured":"Li, Y., and Mandt, S. (2018, January 10\u201315). Disentangled Sequential Autoencoder. Proceedings of the ICML, Stockholm, Sweden."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Min, M.R., Kadav, A., and Graf, H.P. (2020, January 13\u201319). S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation. Proceedings of the CVPR, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00657"},{"key":"ref_16","unstructured":"Han, J., Min, M.R., Han, L., Li, L.E., and Zhang, X. (2021, January 3\u20137). Disentangled Recurrent Wasserstein Autoencoder. Proceedings of the ICLR, Virtual Event, Vienna, Austria."},{"key":"ref_17","unstructured":"Bai, J., Wang, W., and Gomes, C.P. (2021, January 6\u201314). Contrastively Disentangled Sequential Variational Autoencoder. Proceedings of the NeurIPS, Virtual."},{"key":"ref_18","unstructured":"Qin, T., Wang, S., and Li, H. (2022, January 17\u201323). Generalizing to Evolving Domains with Latent Structure-Aware Sequential Autoencoder. Proceedings of the ICML, Baltimore, MD, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lian, J., Zhang, C., and Yu, D. (2022, January 22\u201327). Robust disentangled variational speech representation learning for zero-shot voice conversion. Proceedings of the ICASSP, Singapore.","DOI":"10.1109\/ICASSP43922.2022.9747272"},{"key":"ref_20","unstructured":"Tonekaboni, S., Li, C.L., Arik, S.O., Goldenberg, A., and Pfister, T. (2022, January 22\u201326). Decoupling Local and Global Representations of Time Series. Proceedings of the ICAIS, Qinghai, China."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Luo, Y.J., Ewert, S., and Dixon, S. (2022, January 23\u201329). Towards Robust Unsupervised Disentanglement of Sequential Data\u2013A Case Study Using Music Audio. Proceedings of the IJCAI2022, Vienna, Austria.","DOI":"10.24963\/ijcai.2022\/458"},{"key":"ref_22","unstructured":"Liu, A.H., Liu, Y.C., Yeh, Y.Y., and Wang, Y.C.F. (2018, January 3\u20138). A unified feature disentangler for multi-domain image translation and manipulation. Proceedings of the NeurIPS, Montr\u00e9al, QC, Canada."},{"key":"ref_23","unstructured":"Peng, X., Huang, Z., Sun, X., and Saenko, K. (2019, January 9\u201315). Domain agnostic learning with disentangled representations. Proceedings of the ICML, Long Beach, CA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, Y., Wei, F., Shao, J., Sheng, L., Yan, J., and Wang, X. (2018, January 18\u201322). Exploring disentangled feature representation beyond face identification. Proceedings of the CVPR, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00222"},{"key":"ref_25","unstructured":"Zhou, H., Liu, Y., Liu, Z., Luo, P., and Wang, X. (February, January 27). Talking face generation by adversarially disentangled audio-visual representation. Proceedings of the AAAI, Honolulu, HI, USA."},{"key":"ref_26","unstructured":"Aifanti, N., Papachristou, C., and Delopoulos, A. (2010, January 12\u201314). The MUG facial expression database. Proceedings of the 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10, Garda, Italy."},{"key":"ref_27","first-page":"2579","article-title":"Visualizing Data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_28","unstructured":"Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., and Bengio, Y. (2015, January 7\u201312). A Recurrent Latent Variable Model for Sequential Data. Proceedings of the NeurIPS, Montreal, QC, Canada."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_30","unstructured":"Hsu, W.N., Zhang, Y., and Glass, J. (2017, January 4\u20139). Unsupervised learning of disentangled and interpretable representations from sequential data. Proceedings of the NeurIPS, Long Beach, CA, USA."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. (2017, January 21\u201326). Flownet 2.0: Evolution of optical flow estimation with deep networks. Proceedings of the CVPR, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.179"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Dong, X., Yan, Y., Ouyang, W., and Yang, Y. (2018, January 18\u201322). Style aggregated network for facial landmark detection. Proceedings of the CVPR, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00047"},{"key":"ref_33","unstructured":"Mathieu, M.F., Zhao, J.J., Zhao, J., Ramesh, A., Sprechmann, P., and LeCun, Y. (2016, January 5\u201310). Disentangling factors of variation in deep representation using adversarial training. Proceedings of the NeurIPS, Barcelona, Spain."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Ding, Z., Xu, Y., Xu, W., Parmar, G., Yang, Y., Welling, M., and Tu, Z. (2020, January 13\u201319). Guided variational autoencoder for disentanglement learning. Proceedings of the CVPR, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00794"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2515\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:41:14Z","timestamp":1760121674000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2515"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,24]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23052515"],"URL":"https:\/\/doi.org\/10.3390\/s23052515","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,24]]}}}