{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:15:57Z","timestamp":1750220157357,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":72,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3548213","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:35Z","timestamp":1665416555000},"page":"5446-5455","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Weakly-supervised Disentanglement Network for Video Fingerspelling Detection"],"prefix":"10.1145","author":[{"given":"Ziqi","family":"Jiang","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"given":"Shengyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"given":"Siyuan","family":"Yao","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"given":"Wenqiao","family":"Zhang","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"given":"Sihan","family":"Zhang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, China"}]},{"given":"Juncheng","family":"Li","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"given":"Zhou","family":"Zhao","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"given":"Fei","family":"Wu","sequence":"additional","affiliation":[{"name":"Shanghai Institute for Advanced Study of Zhejiang University &amp; Shanghai AI Laboratory, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58621-8_3"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462986"},{"key":"e_1_3_2_2_3_1","volume-title":"Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT)","volume":"2","author":"Athitsos Vassilis","year":"2010","unstructured":"Vassilis Athitsos , Carol Neidle , Stan Sclaroff , Joan Nash , Alexandra Stefan , Ashwin Thangali , Haijing Wang , and Quan Yuan . 2010 . Large lexicon project: American sign language video corpus and sign language indexing\/retrieval algorithms . In Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT) , Vol. 2 . Vassilis Athitsos, Carol Neidle, Stan Sclaroff, Joan Nash, Alexandra Stefan, Ashwin Thangali, Haijing Wang, and Quan Yuan. 2010. Large lexicon project: American sign language video corpus and sign language indexing\/retrieval algorithms. In Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT), Vol. 2."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00437"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_2_6_1","volume-title":"Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)","author":"Forster Jens","year":"2012","unstructured":"Jens Forster , Christoph Schmidt , Thomas Hoyoux , Oscar Koller , Uwe Zelle , Justus Piater , and Hermann Ney . 2012 . Rwth-phoenix-weather: A large vocabulary sign language recognition and translation corpus . In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12) . 3785--3789. Jens Forster, Christoph Schmidt, Thomas Hoyoux, Oscar Koller, Uwe Zelle, Justus Piater, and Hermann Ney. 2012. Rwth-phoenix-weather: A large vocabulary sign language recognition and translation corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). 3785--3789."},{"key":"e_1_3_2_2_7_1","volume-title":"Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)","author":"Forster Jens","year":"2014","unstructured":"Jens Forster , Christoph Schmidt , Oscar Koller , Martin Bellgardt , and Hermann Ney . 2014 . Extensions of the sign language recognition and translation corpus RWTH-PHOENIX-Weather . In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) . 1911--1916. Jens Forster, Christoph Schmidt, Oscar Koller, Martin Bellgardt, and Hermann Ney. 2014. Extensions of the sign language recognition and translation corpus RWTH-PHOENIX-Weather. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). 1911--1916."},{"key":"e_1_3_2_2_8_1","volume-title":"Dynamic memory induction networks for few-shot text classification. arXiv preprint arXiv:2005.05727","author":"Geng Ruiying","year":"2020","unstructured":"Ruiying Geng , Binhua Li , Yongbin Li , Jian Sun , and Xiaodan Zhu . 2020. Dynamic memory induction networks for few-shot text classification. arXiv preprint arXiv:2005.05727 ( 2020 ). Ruiying Geng, Binhua Li, Yongbin Li, Jian Sun, and Xiaodan Zhu. 2020. Dynamic memory induction networks for few-shot text classification. arXiv preprint arXiv:2005.05727 (2020)."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"crossref","unstructured":"Paul Goh and Eun-Jung Holden. 2006. Dynamic Fingerspelling Recognition using Geometric and Motion Features.  Paul Goh and Eun-Jung Holden. 2006. Dynamic Fingerspelling Recognition using Geometric and Motion Features.","DOI":"10.1109\/ICIP.2006.313114"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00800"},{"key":"e_1_3_2_2_12_1","volume-title":"Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385","author":"He Kaiming","year":"2015","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015. Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385 ( 2015 ). arXiv:1512.03385 http:\/\/arxiv.org\/abs\/1512.03385 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385 (2015). arXiv:1512.03385 http:\/\/arxiv.org\/abs\/1512.03385"},{"key":"e_1_3_2_2_13_1","volume-title":"Introvae: Introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems 31","author":"Huang Huaibo","year":"2018","unstructured":"Huaibo Huang , Ran He , Zhenan Sun , Tieniu Tan , 2018 . Introvae: Introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems 31 (2018). Huaibo Huang, Ran He, Zhenan Sun, Tieniu Tan, et al. 2018. Introvae: Introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems 31 (2018)."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475437"},{"key":"e_1_3_2_2_15_1","volume-title":"Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao.","author":"Huang Rongjie","year":"2022","unstructured":"Rongjie Huang , Max WY Lam , Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao. 2022 . FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis . arXiv preprint arXiv:2204.09934 (2022). Rongjie Huang, Max WY Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao. 2022. FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. arXiv preprint arXiv:2204.09934 (2022)."},{"key":"e_1_3_2_2_16_1","volume-title":"TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. arXiv preprint arXiv:2205.12523","author":"Huang Rongjie","year":"2022","unstructured":"Rongjie Huang , Zhou Zhao , Jinglin Liu , Huadai Liu , Yi Ren , Lichao Zhang , and Jinzheng He. 2022. TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. arXiv preprint arXiv:2205.12523 ( 2022 ). Rongjie Huang, Zhou Zhao, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, and Jinzheng He. 2022. TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. arXiv preprint arXiv:2205.12523 (2022)."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.575"},{"key":"e_1_3_2_2_18_1","unstructured":"Michelle Jay. 2021. American Sign Language. https:\/\/www.startasl.com\/american-sign-language\/. Accessed: 2021-03-05.  Michelle Jay. 2021. American Sign Language. https:\/\/www.startasl.com\/american-sign-language\/. Accessed: 2021-03-05."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463234"},{"key":"e_1_3_2_2_20_1","volume-title":"Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053","author":"Vaezi Joze Hamid Reza","year":"2018","unstructured":"Hamid Reza Vaezi Joze and Oscar Koller . 2018 . Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053 (2018). Hamid Reza Vaezi Joze and Oscar Koller. 2018. Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053 (2018)."},{"key":"e_1_3_2_2_21_1","volume-title":"Variational autoencoders with riemannian brownian motion priors. arXiv preprint arXiv:2002.05227","author":"Kalatzis Dimitris","year":"2020","unstructured":"Dimitris Kalatzis , David Eklund , Georgios Arvanitidis , and S\u00f8ren Hauberg . 2020. Variational autoencoders with riemannian brownian motion priors. arXiv preprint arXiv:2002.05227 ( 2020 ). Dimitris Kalatzis, David Eklund, Georgios Arvanitidis, and S\u00f8ren Hauberg. 2020. Variational autoencoders with riemannian brownian motion priors. arXiv preprint arXiv:2002.05227 (2020)."},{"key":"e_1_3_2_2_22_1","volume-title":"Towards an articulatory model of handshape: What fingerspelling tells us about the phonetics and phonology of handshape in American Sign Language. Ph. D. Dissertation","author":"Keane Jonathan","year":"2014","unstructured":"Jonathan Keane . 2014. Towards an articulatory model of handshape: What fingerspelling tells us about the phonetics and phonology of handshape in American Sign Language. Ph. D. Dissertation . University of Chicago . Doctoral dissertation, defended 22 August 2014 Advisors : Diane Brentari , Jason Riggle, and Karen Livescu. Jonathan Keane. 2014. Towards an articulatory model of handshape: What fingerspelling tells us about the phonetics and phonology of handshape in American Sign Language. Ph. D. Dissertation. University of Chicago. Doctoral dissertation, defended 22 August 2014 Advisors: Diane Brentari, Jason Riggle, and Karen Livescu."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2017.05.009"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2015.09.013"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.412"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.364"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-018-1121-3"},{"key":"e_1_3_2_2_28_1","volume-title":"International Conference on Learning Representations.","author":"Lee Yoonhyung","year":"2020","unstructured":"Yoonhyung Lee , Joongbo Shin , and Kyomin Jung . 2020 . Bidirectional variational inference for non-autoregressive text-to-speech . In International Conference on Learning Representations. Yoonhyung Lee, Joongbo Shin, and Kyomin Jung. 2020. Bidirectional variational inference for non-autoregressive text-to-speech. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093512"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413886"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01214"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00304"},{"key":"e_1_3_2_2_33_1","unstructured":"Mengze Li Kun Kuang Qiang Zhu Xiaohong Chen Qing Guo and Fei Wu. 2020. IB-M: A Flexible Framework to Align an Interpretable Model and a Black-box Model. In BIBM.  Mengze Li Kun Kuang Qiang Zhu Xiaohong Chen Qing Guo and Fei Wu. 2020. IB-M: A Flexible Framework to Align an Interpretable Model and a Black-box Model. In BIBM."},{"key":"e_1_3_2_2_34_1","unstructured":"Mengze Li Tianbao Wang Haoyu Zhang Shengyu Zhang Zhou Zhao Jiaxu Miao Wenqiao Zhang Wenming Tan Jin Wang Peng Wang Shiliang Pu and Fei Wu. 2022. End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding. In ACL.  Mengze Li Tianbao Wang Haoyu Zhang Shengyu Zhang Zhou Zhao Jiaxu Miao Wenqiao Zhang Wenming Tan Jin Wang Peng Wang Shiliang Pu and Fei Wu. 2022. End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding. In ACL."},{"key":"e_1_3_2_2_35_1","volume-title":"HERO: HiErarchical spatiotempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding. In ACM MM.","author":"Li Mengze","year":"2022","unstructured":"Mengze Li , Tianbao Wang , Haoyu Zhang , Shengyu Zhang , Zhou Zhao , Wenqiao Zhang , Jiaxu Miao , Shiliang Pu , and Fei Wu . 2022 . HERO: HiErarchical spatiotempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding. In ACM MM. Mengze Li, Tianbao Wang, Haoyu Zhang, Shengyu Zhang, Zhou Zhao, Wenqiao Zhang, Jiaxu Miao, Shiliang Pu, and Fei Wu. 2022. HERO: HiErarchical spatiotempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding. In ACM MM."},{"key":"e_1_3_2_2_36_1","volume-title":"Automatic Recognition of Fingerspelled Words in British Sign Language. In 2nd IEEE Workshop on CVPR for Human Communicative Behavior Analysis.","author":"Liwicki Stephan","year":"2009","unstructured":"Stephan Liwicki and Mark Everingham . 2009 . Automatic Recognition of Fingerspelled Words in British Sign Language. In 2nd IEEE Workshop on CVPR for Human Communicative Behavior Analysis. Stephan Liwicki and Mark Everingham. 2009. Automatic Recognition of Fingerspelled Words in British Sign Language. In 2nd IEEE Workshop on CVPR for Human Communicative Behavior Analysis."},{"key":"e_1_3_2_2_37_1","volume-title":"Summer 2016 and","author":"Looney Dennis","year":"2016","unstructured":"Dennis Looney and Natalia Lusin . 2018. Enrollments in Languages Other than English in United States Institutions of Higher Education , Summer 2016 and Fall 2016 : Preliminary Report. In Modern Language Association . ERIC. Dennis Looney and Natalia Lusin. 2018. Enrollments in Languages Other than English in United States Institutions of Higher Education, Summer 2016 and Fall 2016: Preliminary Report. In Modern Language Association. ERIC."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_3_2_2_39_1","volume-title":"Adversarial autoencoders. arXiv preprint arXiv:1511.05644","author":"Makhzani Alireza","year":"2015","unstructured":"Alireza Makhzani , Jonathon Shlens , Navdeep Jaitly , Ian Goodfellow , and Brendan Frey . 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644 ( 2015 ). Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015)."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-66096-3_17"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2011.6130290"},{"key":"e_1_3_2_2_42_1","volume-title":"Intelligent Request Strategy Design in Recommender System. CoRR abs\/2206.12296","author":"Qian Xufeng","year":"2022","unstructured":"Xufeng Qian , Yue Xu , Fuyu Lv , Shengyu Zhang , Ziwen Jiang , Qingwen Liu , Xiaoyi Zeng , Tat-Seng Chua , and Fei Wu. 2022. Intelligent Request Strategy Design in Recommender System. CoRR abs\/2206.12296 ( 2022 ). Xufeng Qian, Yue Xu, Fuyu Lv, Shengyu Zhang, Ziwen Jiang, Qingwen Liu, Xiaoyi Zeng, Tat-Seng Chua, and Fei Wu. 2022. Intelligent Request Strategy Design in Recommender System. CoRR abs\/2206.12296 (2022)."},{"key":"e_1_3_2_2_43_1","volume-title":"Aaron Van den Oord, and Oriol Vinyals","author":"Razavi Ali","year":"2019","unstructured":"Ali Razavi , Aaron Van den Oord, and Oriol Vinyals . 2019 . Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32 (2019). Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_2_2_44_1","volume-title":"Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren , Kaiming He , Ross Girshick , and Jian Sun . 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 ( 2015 ). Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015)."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"crossref","unstructured":"Susanna Ricco and Carlo Tomasi. 2009. Fingerspelling recognition through classification of letter-to-letter transitions.  Susanna Ricco and Carlo Tomasi. 2009. Fingerspelling recognition through classification of letter-to-letter transitions.","DOI":"10.1007\/978-3-642-12297-2_21"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00415"},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2018.8639639"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2017.8268962"},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00550"},{"key":"e_1_3_2_2_51_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_2_52_1","volume-title":"S\u00f8ren Kaae S\u00f8nderby, and Ole Winther","author":"S\u00f8nderby Casper Kaae","year":"2016","unstructured":"Casper Kaae S\u00f8nderby , Tapani Raiko , Lars Maal\u00f8e , S\u00f8ren Kaae S\u00f8nderby, and Ole Winther . 2016 . Ladder variational autoencoders. Advances in neural information processing systems 29 (2016). Casper Kaae S\u00f8nderby, Tapani Raiko, Lars Maal\u00f8e, S\u00f8ren Kaae S\u00f8nderby, and Ole Winther. 2016. Ladder variational autoencoders. Advances in neural information processing systems 29 (2016)."},{"key":"e_1_3_2_2_53_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"6114","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le . 2019 . EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks . In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105-- 6114 . https:\/\/proceedings.mlr.press\/v97\/tan19a.html Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105--6114. https:\/\/proceedings.mlr.press\/v97\/tan19a.html"},{"key":"e_1_3_2_2_54_1","volume-title":"International Conference on Artificial Intelligence and Statistics. PMLR, 1214--1223","author":"Tomczak Jakub","year":"2018","unstructured":"Jakub Tomczak and Max Welling . 2018 . VAE with a VampPrior . In International Conference on Artificial Intelligence and Statistics. PMLR, 1214--1223 . Jakub Tomczak and Max Welling. 2018. VAE with a VampPrior. In International Conference on Artificial Intelligence and Statistics. PMLR, 1214--1223."},{"key":"e_1_3_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2006.07.009"},{"key":"e_1_3_2_2_56_1","volume-title":"Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop. 155--161","author":"Tsechpenakis Gabriel","year":"2006","unstructured":"Gabriel Tsechpenakis , Dimitris N Metaxas , Carol Neidle , and Olympia Hadjiliadis . 2006 . Robust online change-point detection in video sequences . In Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop. 155--161 . Gabriel Tsechpenakis, Dimitris N Metaxas, Carol Neidle, and Olympia Hadjiliadis. 2006. Robust online change-point detection in video sequences. In Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop. 155--161."},{"key":"e_1_3_2_2_57_1","first-page":"492","article-title":"Deep hierarchical variational autoencoder","volume":"17","author":"Vahdat Arash","year":"2021","unstructured":"Arash Vahdat and Jan Kautz . 2021 . Deep hierarchical variational autoencoder . US Patent App. 17\/089 , 492 . Arash Vahdat and Jan Kautz. 2021. Deep hierarchical variational autoencoder. US Patent App. 17\/089,492.","journal-title":"US Patent App."},{"key":"e_1_3_2_2_58_1","volume-title":"International Gesture Workshop. Springer, 247--258","author":"Vogler Christian","year":"2003","unstructured":"Christian Vogler and Dimitris Metaxas . 2003 . Handshapes and movements: Multiple-channel american sign language recognition . In International Gesture Workshop. Springer, 247--258 . Christian Vogler and Dimitris Metaxas. 2003. Handshapes and movements: Multiple-channel american sign language recognition. In International Gesture Workshop. Springer, 247--258."},{"key":"e_1_3_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00720"},{"key":"e_1_3_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.617"},{"key":"e_1_3_2_2_61_1","volume-title":"Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)","author":"Yanovich Polina","year":"2016","unstructured":"Polina Yanovich , Carol Neidle , and Dimitris Metaxas . 2016 . Detection of major ASL sign types in continuous signing for ASL recognition . In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) . 3067--3073. Polina Yanovich, Carol Neidle, and Dimitris Metaxas. 2016. Detection of major ASL sign types in continuous signing for ASL recognition. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 3067--3073."},{"key":"e_1_3_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/IWCIA.2015.7449458"},{"key":"e_1_3_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2020.2995959"},{"key":"e_1_3_2_2_64_1","volume-title":"Crossing variational autoencoders for answer retrieval. arXiv preprint arXiv:2005.02557","author":"Yu Wenhao","year":"2020","unstructured":"Wenhao Yu , Lingfei Wu , Qingkai Zeng , Shu Tao , Yu Deng , and Meng Jiang . 2020. Crossing variational autoencoders for answer retrieval. arXiv preprint arXiv:2005.02557 ( 2020 ). Wenhao Yu, Lingfei Wu, Qingkai Zeng, Shu Tao, Yu Deng, and Meng Jiang. 2020. Crossing variational autoencoders for answer retrieval. arXiv preprint arXiv:2005.02557 (2020)."},{"key":"e_1_3_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548059"},{"key":"e_1_3_2_2_66_1","volume-title":"DeVLBert: Learning Deconfounded Visio- Linguistic Representations. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 4373--4382","author":"Zhang Shengyu","year":"2020","unstructured":"Shengyu Zhang , Tan Jiang , Tan Wang , Kun Kuang , Zhou Zhao , Jianke Zhu , Jin Yu , Hongxia Yang , and Fei Wu . 2020 . DeVLBert: Learning Deconfounded Visio- Linguistic Representations. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 4373--4382 . Shengyu Zhang, Tan Jiang, Tan Wang, Kun Kuang, Zhou Zhao, Jianke Zhu, Jin Yu, Hongxia Yang, and Fei Wu. 2020. DeVLBert: Learning Deconfounded Visio- Linguistic Representations. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 4373--4382."},{"key":"e_1_3_2_2_67_1","volume-title":"Poet: Product-oriented Video Captioner for E-commerce. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 1292--1301","author":"Zhang Shengyu","year":"2020","unstructured":"Shengyu Zhang , Ziqi Tan , Jin Yu , Zhou Zhao , Kun Kuang , Jie Liu , Jingren Zhou , Hongxia Yang , and Fei Wu . 2020 . Poet: Product-oriented Video Captioner for E-commerce. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 1292--1301 . Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Jie Liu, Jingren Zhou, Hongxia Yang, and Fei Wu. 2020. Poet: Product-oriented Video Captioner for E-commerce. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 1292--1301."},{"key":"e_1_3_2_2_68_1","volume-title":"Comprehensive Information Integration Modeling Framework for Video Titling. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2744--2754","author":"Zhang Shengyu","year":"2020","unstructured":"Shengyu Zhang , Ziqi Tan , Zhou Zhao , Jin Yu , Kun Kuang , Tan Jiang , Jingren Zhou , Hongxia Yang , and Fei Wu . 2020 . Comprehensive Information Integration Modeling Framework for Video Titling. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2744--2754 . Shengyu Zhang, Ziqi Tan, Zhou Zhao, Jin Yu, Kun Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, and Fei Wu. 2020. Comprehensive Information Integration Modeling Framework for Video Titling. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2744--2754."},{"key":"e_1_3_2_2_69_1","volume-title":"Re-construct for Multi-interest Recommendation. In WWW '22: The ACM Web Conference","author":"Zhang Shengyu","year":"2022","unstructured":"Shengyu Zhang , Lingxiao Yang , Dong Yao , Yujie Lu , Fuli Feng , Zhou Zhao , Tat-Seng Chua , and Fei Wu . 2022 . Re4: Learning to Re-contrast, Re-attend , Re-construct for Multi-interest Recommendation. In WWW '22: The ACM Web Conference 2022. ACM, 2216--2226. Shengyu Zhang, Lingxiao Yang, Dong Yao, Yujie Lu, Fuli Feng, Zhou Zhao, Tat-Seng Chua, and Fei Wu. 2022. Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation. In WWW '22: The ACM Web Conference 2022. ACM, 2216--2226."},{"key":"e_1_3_2_2_70_1","volume-title":"MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning. arXiv preprint arXiv:2112.06558","author":"Zhang Wenqiao","year":"2021","unstructured":"Wenqiao Zhang , Haochen Shi , Jiannan Guo , Shengyu Zhang , Qingpeng Cai , Juncheng Li , Sihui Luo , and Yueting Zhuang . 2021 . MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning. arXiv preprint arXiv:2112.06558 (2021). Wenqiao Zhang, Haochen Shi, Jiannan Guo, Shengyu Zhang, Qingpeng Cai, Juncheng Li, Sihui Luo, and Yueting Zhuang. 2021. MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning. arXiv preprint arXiv:2112.06558 (2021)."},{"key":"e_1_3_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413746"},{"key":"e_1_3_2_2_72_1","volume-title":"BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation. arXiv preprint arXiv:2203.02533","author":"Zhang Wenqiao","year":"2022","unstructured":"Wenqiao Zhang , Lei Zhu , James Hallinan , Andrew Makmur , Shengyu Zhang , Qingpeng Cai , and Beng Chin Ooi . 2022. BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation. arXiv preprint arXiv:2203.02533 ( 2022 ). Wenqiao Zhang, Lei Zhu, James Hallinan, Andrew Makmur, Shengyu Zhang, Qingpeng Cai, and Beng Chin Ooi. 2022. BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation. arXiv preprint arXiv:2203.02533 (2022)."}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Lisboa Portugal","acronym":"MM '22"},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548213","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3548213","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:20Z","timestamp":1750186820000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548213"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":72,"alternative-id":["10.1145\/3503161.3548213","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3548213","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}