{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:12:52Z","timestamp":1750219972327,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":26,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3551567","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:43:12Z","timestamp":1665416592000},"page":"7003-7007","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Multi-Stream Approach for Video Understanding"],"prefix":"10.1145","author":[{"given":"Lutharsanen","family":"Kunam","sequence":"first","affiliation":[{"name":"University of Zurich, Zurich, Switzerland"}]},{"given":"Luca","family":"Rossetto","sequence":"additional","affiliation":[{"name":"University of Zurich, Zurich, Switzerland"}]},{"given":"Abraham","family":"Bernstein","sequence":"additional","affiliation":[{"name":"University of Zurich, Zurich, Switzerland"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding. In MM '21: ACM Multimedia Conference","author":"Anand Vishal","year":"2021","unstructured":"Vishal Anand , Raksha Ramesh , Boshen Jin , Ziyin Wang , Xiaoxiao Lei , and Ching-Yung Lin . 2021 . MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding. In MM '21: ACM Multimedia Conference , Virtual Event, China, October 20 - 24 , 2021,, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 4868--4872. https:\/\/doi.org\/10.1145\/3474085.3479220 Vishal Anand, Raksha Ramesh, Boshen Jin, Ziyin Wang, Xiaoxiao Lei, and Ching-Yung Lin. 2021. MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021,, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 4868--4872. https:\/\/doi.org\/10.1145\/3474085.3479220"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3416305"},{"key":"e_1_3_2_1_3_1","volume-title":"Towards Using Semantic-Web Technologies for Multi-Modal Knowledge Graph Construction. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA, October 12--16","author":"Baumgartner Matthias","year":"2020","unstructured":"Matthias Baumgartner , Luca Rossetto , and Abraham Bernstein . 2020 . Towards Using Semantic-Web Technologies for Multi-Modal Knowledge Graph Construction. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA, October 12--16 , 2020,, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 4645--4649. https:\/\/doi.org\/10.1145\/3394171.3416292 Matthias Baumgartner, Luca Rossetto, and Abraham Bernstein. 2020. Towards Using Semantic-Web Technologies for Multi-Modal Knowledge Graph Construction. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA, October 12--16, 2020,, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 4645--4649. https:\/\/doi.org\/10.1145\/3394171.3416292"},{"key":"e_1_3_2_1_4_1","volume-title":"Pyannote. Audio: Neural Building Blocks for Speaker Diarization. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020","author":"Bredin Herv\u00e9","year":"2020","unstructured":"Herv\u00e9 Bredin , Ruiqing Yin , Juan Manuel Coria , Gregory Gelly , Pavel Korshunov , Marvin Lavechin , Diego Fustes , Hadrien Titeux , Wassim Bouaziz , and Marie-Philippe Gill . 2020 . Pyannote. Audio: Neural Building Blocks for Speaker Diarization. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020 , Barcelona, Spain, May 4--8 , 2020. IEEE, 7124--7128. https:\/\/doi.org\/10.1109\/ICASSP40776.2020.9052974 Herv\u00e9 Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, and Marie-Philippe Gill. 2020. Pyannote. Audio: Neural Building Blocks for Speaker Diarization. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4--8, 2020. IEEE, 7124--7128. https:\/\/doi.org\/10.1109\/ICASSP40776.2020.9052974"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372278.3390742"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_7_1","unstructured":"Haoqi Fan Yanghao Li Bo Xiong Wan-Yen Lo and Christoph Feichtenhofer. 2020. PySlowFast. https:\/\/github.com\/facebookresearch\/slowfast.  Haoqi Fan Yanghao Li Bo Xiong Wan-Yen Lo and Christoph Feichtenhofer. 2020. PySlowFast. https:\/\/github.com\/facebookresearch\/slowfast."},{"key":"e_1_3_2_1_8_1","volume-title":"SlowFast Networks for Video Recognition. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019","author":"Feichtenhofer Christoph","year":"2019","unstructured":"Christoph Feichtenhofer , Haoqi Fan , Jitendra Malik , and Kaiming He . 2019 . SlowFast Networks for Video Recognition. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019 , Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 6201--6210. https:\/\/doi.org\/10.1109\/ICCV.2019.00630 Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast Networks for Video Recognition. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 6201--6210. https:\/\/doi.org\/10.1109\/ICCV.2019.00630"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/358669.358692"},{"key":"e_1_3_2_1_10_1","volume-title":"Greedy function approximation: a gradient boosting machine. Annals of statistics","author":"Friedman Jerome H","year":"2001","unstructured":"Jerome H Friedman . 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics ( 2001 ), 1189--1232. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232."},{"volume-title":"Proceedings of the 7th Python in Science Conference, Ga\u00ebl Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11 -- 15","author":"Hagberg Aric A.","key":"e_1_3_2_1_11_1","unstructured":"Aric A. Hagberg , Daniel A. Schult , and Pieter J. Swart . 2008. Exploring Network Structure, Dynamics, and Function using NetworkX . In Proceedings of the 7th Python in Science Conference, Ga\u00ebl Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11 -- 15 . Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference, Ga\u00ebl Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11 -- 15."},{"key":"e_1_3_2_1_12_1","volume-title":"Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2016 . Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 , Las Vegas, NV, USA , June 27-30, 2016. IEEE Computer Society, 770--778. https:\/\/doi.org\/10.1109\/CVPR.2016.90 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770--778. https:\/\/doi.org\/10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_13_1","volume-title":"UK","volume":"727","author":"Huang Qingqiu","year":"2020","unstructured":"Qingqiu Huang , Yu Xiong , Anyi Rao , Jiaze Wang , and Dahua Lin . 2020 . MovieNet: A Holistic Dataset for Movie Understanding. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow , UK , August 23-28, 2020, Proceedings, Part IV (Lecture Notes in Computer Science , Vol. 12349), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 709-- 727 . https:\/\/doi.org\/10.1007\/978-3-030-58548-8_41 Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, and Dahua Lin. 2020. MovieNet: A Holistic Dataset for Movie Understanding. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IV (Lecture Notes in Computer Science, Vol. 12349), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 709--727. https:\/\/doi.org\/10.1007\/978-3-030-58548-8_41"},{"key":"e_1_3_2_1_14_1","volume-title":"Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman.","author":"Kay Will","year":"2017","unstructured":"Will Kay , Jo a o Carreira , Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017 . The Kinetics Human Action Video Dataset. CoRR , Vol. abs\/ 1705 .06950 (2017). showeprint[arXiv]1705.06950 http:\/\/arxiv.org\/abs\/1705.06950 Will Kay, Jo a o Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. CoRR, Vol. abs\/1705.06950 (2017). showeprint[arXiv]1705.06950 http:\/\/arxiv.org\/abs\/1705.06950"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1167"},{"key":"e_1_3_2_1_16_1","unstructured":"Steven Loria et al. 2018. textblob Documentation. Release 0.15 Vol. 2 (2018) 269.  Steven Loria et al. 2018. textblob Documentation. Release 0.15 Vol. 2 (2018) 269."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00272"},{"key":"e_1_3_2_1_18_1","volume-title":"Large-Scale Image Retrieval with Attentive Deep Local Features. In IEEE International Conference on Computer Vision, ICCV 2017","author":"Noh Hyeonwoo","year":"2017","unstructured":"Hyeonwoo Noh , Andre Araujo , Jack Sim , Tobias Weyand , and Bohyung Han . 2017 . Large-Scale Image Retrieval with Attentive Deep Local Features. In IEEE International Conference on Computer Vision, ICCV 2017 , Venice, Italy , October 22-29, 2017. IEEE Computer Society, 3476--3485. https:\/\/doi.org\/10.1109\/ICCV.2017.374 Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, and Bohyung Han. 2017. Large-Scale Image Retrieval with Attentive Deep Local Features. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society, 3476--3485. https:\/\/doi.org\/10.1109\/ICCV.2017.374"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01016"},{"key":"e_1_3_2_1_20_1","volume-title":"Renato De Mori, and Yoshua Bengio","author":"Ravanelli Mirco","year":"2021","unstructured":"Mirco Ravanelli , Titouan Parcollet , Peter Plantinga , Aku Rouhe , Samuele Cornell , Loren Lugosch , Cem Subakan , Nauman Dawalatabad , Abdelwahab Heba , Jianyuan Zhong , Ju-Chieh Chou , Sung-Lin Yeh , Szu-Wei Fu , Chien-Feng Liao , Elena Rastorgueva , Fran\u00e7ois Grondin , William Aris , Hwidong Na , Yan Gao , Renato De Mori, and Yoshua Bengio . 2021 . SpeechBrain: A General-Purpose Speech Toolkit. CoRR , Vol. abs\/ 2106 .04624 (2021). showeprint[arXiv]2106.04624 https:\/\/arxiv.org\/abs\/2106.04624 Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, Fran\u00e7ois Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, and Yoshua Bengio. 2021. SpeechBrain: A General-Purpose Speech Toolkit. CoRR, Vol. abs\/2106.04624 (2021). showeprint[arXiv]2106.04624 https:\/\/arxiv.org\/abs\/2106.04624"},{"key":"e_1_3_2_1_21_1","volume-title":"LightFace: A Hybrid Deep Face Recognition Framework. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU). IEEE, 23--27","author":"Serengil Sefik Ilkin","year":"2020","unstructured":"Sefik Ilkin Serengil and Alper Ozpinar . 2020 . LightFace: A Hybrid Deep Face Recognition Framework. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU). IEEE, 23--27 . https:\/\/doi.org\/10.1109\/ASYU50717.2020.9259802 Sefik Ilkin Serengil and Alper Ozpinar. 2020. LightFace: A Hybrid Deep Face Recognition Framework. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU). IEEE, 23--27. https:\/\/doi.org\/10.1109\/ASYU50717.2020.9259802"},{"key":"e_1_3_2_1_22_1","volume-title":"HyperExtended LightFace: A Facial Attribute Analysis Framework. In 2021 International Conference on Engineering and Emerging Technologies (ICEET). IEEE, 1--4. https:\/\/doi.org\/10","author":"Serengil Sefik Ilkin","year":"2021","unstructured":"Sefik Ilkin Serengil and Alper Ozpinar . 2021 . HyperExtended LightFace: A Facial Attribute Analysis Framework. In 2021 International Conference on Engineering and Emerging Technologies (ICEET). IEEE, 1--4. https:\/\/doi.org\/10 .1109\/ICEET53442.2021.9659697 Sefik Ilkin Serengil and Alper Ozpinar. 2021. HyperExtended LightFace: A Facial Attribute Analysis Framework. In 2021 International Conference on Engineering and Emerging Technologies (ICEET). IEEE, 1--4. https:\/\/doi.org\/10.1109\/ICEET53442.2021.9659697"},{"key":"e_1_3_2_1_23_1","volume-title":"MovieGraphs: Towards Understanding Human-Centric Situations From Videos. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018","author":"Vicol Paul","year":"2018","unstructured":"Paul Vicol , Makarand Tapaswi , Llu\u00eds Castrej\u00f3n , and Sanja Fidler . 2018 . MovieGraphs: Towards Understanding Human-Centric Situations From Videos. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 , Salt Lake City, UT, USA , June 18-22, 2018. Computer Vision Foundation \/ IEEE Computer Society, 8581--8590. https:\/\/doi.org\/10.1109\/CVPR.2018.00895 Paul Vicol, Makarand Tapaswi, Llu\u00eds Castrej\u00f3n, and Sanja Fidler. 2018. MovieGraphs: Towards Understanding Human-Centric Situations From Videos. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation \/ IEEE Computer Society, 8581--8590. https:\/\/doi.org\/10.1109\/CVPR.2018.00895"},{"key":"e_1_3_2_1_24_1","volume-title":"Deep Relationship Analysis in Video with Multimodal Feature Fusion. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA","author":"Yu Fan","year":"2020","unstructured":"Fan Yu , Dandan Wang , Beibei Zhang , and Tongwei Ren . 2020 . Deep Relationship Analysis in Video with Multimodal Feature Fusion. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA , October 12-16, 2020, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 4640--4644. https:\/\/doi.org\/10.1145\/3394171.3416303 Fan Yu, Dandan Wang, Beibei Zhang, and Tongwei Ren. 2020. Deep Relationship Analysis in Video with Multimodal Feature Fusion. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA, October 12-16, 2020, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 4640--4644. https:\/\/doi.org\/10.1145\/3394171.3416303"},{"key":"e_1_3_2_1_25_1","volume-title":"Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion. In MM '21: ACM Multimedia Conference","author":"Zhang Beibei","year":"2021","unstructured":"Beibei Zhang , Fan Yu , Yanxin Gao , Tongwei Ren , and Gangshan Wu . 2021 . Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion. In MM '21: ACM Multimedia Conference , Virtual Event, China, October 20 - 24 , 2021,, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 4848--4852. https:\/\/doi.org\/10.1145\/3474085.3479214 Beibei Zhang, Fan Yu, Yanxin Gao, Tongwei Ren, and Gangshan Wu. 2021. Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021,, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 4848--4852. https:\/\/doi.org\/10.1145\/3474085.3479214"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2723009"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Lisboa Portugal","acronym":"MM '22"},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3551567","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3551567","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:18Z","timestamp":1750182558000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3551567"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":26,"alternative-id":["10.1145\/3503161.3551567","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3551567","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}