{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T20:49:42Z","timestamp":1769633382661,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":61,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,14]],"date-time":"2022-06-14T00:00:00Z","timestamp":1655164800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,14]]},"DOI":"10.1145\/3524273.3528187","type":"proceedings-article","created":{"date-parts":[[2022,8,5]],"date-time":"2022-08-05T22:23:21Z","timestamp":1659738201000},"page":"136-149","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Unsupervised method for video action segmentation through spatio-temporal and positional-encoded embeddings"],"prefix":"10.1145","author":[{"given":"Guilherme","family":"de A. P. Marques","sequence":"first","affiliation":[{"name":"Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, RJ, Brazil"}]},{"given":"Antonio Jos\u00e9 G.","family":"Busson","sequence":"additional","affiliation":[{"name":"Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, RJ, Brazil"}]},{"given":"Alan L\u00edvio V.","family":"Guedes","sequence":"additional","affiliation":[{"name":"Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, RJ, Brazil"}]},{"given":"Julio Cesar","family":"Duarte","sequence":"additional","affiliation":[{"name":"Military Institute of Engineering (IME), Rio de Janeiro, RJ, Brazil"}]},{"given":"S\u00e9rgio","family":"Colcher","sequence":"additional","affiliation":[{"name":"Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, RJ, Brazil"}]}],"member":"320","published-online":{"date-parts":[[2022,8,5]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Aakur and Sudeep Sarkar","author":"Sathyanarayanan","year":"2019","unstructured":"Sathyanarayanan N. Aakur and Sudeep Sarkar . 2019 . A Perceptual Prediction Framework for Self Supervised Event Segmentation . arXiv:1811.04869 [cs] (April 2019). http:\/\/arxiv.org\/abs\/1811.04869 arXiv: 1811.04869. Sathyanarayanan N. Aakur and Sudeep Sarkar. 2019. A Perceptual Prediction Framework for Self Supervised Event Segmentation. arXiv:1811.04869 [cs] (April 2019). http:\/\/arxiv.org\/abs\/1811.04869 arXiv: 1811.04869."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.495"},{"key":"e_1_3_2_1_3_1","volume-title":"Weakly Supervised Action Labeling in Videos Under Ordering Constraints. arXiv:1407.1208 [cs] (July","author":"Bojanowski Piotr","year":"2014","unstructured":"Piotr Bojanowski , R\u00e9mi Lajugie , Francis Bach , Ivan Laptev , Jean Ponce , Cordelia Schmid , and Josef Sivic . 2014. Weakly Supervised Action Labeling in Videos Under Ordering Constraints. arXiv:1407.1208 [cs] (July 2014 ). http:\/\/arxiv.org\/abs\/1407.1208 arXiv: 1407.1208. Piotr Bojanowski, R\u00e9mi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, and Josef Sivic. 2014. Weakly Supervised Action Labeling in Videos Under Ordering Constraints. arXiv:1407.1208 [cs] (July 2014). http:\/\/arxiv.org\/abs\/1407.1208 arXiv: 1407.1208."},{"key":"e_1_3_2_1_4_1","volume-title":"A Short Note about Kinetics-600. CoRR abs\/1808.01340","author":"Carreira Jo\u00e3o","year":"2018","unstructured":"Jo\u00e3o Carreira , Eric Noland , Andras Banki-Horvath , Chloe Hillier , and Andrew Zisserman . 2018. A Short Note about Kinetics-600. CoRR abs\/1808.01340 ( 2018 ). arXiv:1808.01340 http:\/\/arxiv.org\/abs\/1808.01340 Jo\u00e3o Carreira, Eric Noland, Andras Banki-Horvath, Chloe Hillier, and Andrew Zisserman. 2018. A Short Note about Kinetics-600. CoRR abs\/1808.01340 (2018). arXiv:1808.01340 http:\/\/arxiv.org\/abs\/1808.01340"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Joao Carreira and Andrew Zisserman. 2017. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. 6299--6308. https:\/\/openaccess.thecvf.com\/content_cvpr_2017\/html\/Carreira_Quo_Vadis_Action_CVPR_2017_paper.html  Joao Carreira and Andrew Zisserman. 2017. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. 6299--6308. https:\/\/openaccess.thecvf.com\/content_cvpr_2017\/html\/Carreira_Quo_Vadis_Action_CVPR_2017_paper.html","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_6_1","volume-title":"D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation. arXiv:1901.02598 [cs] (April","author":"Chang Chien-Yi","year":"2019","unstructured":"Chien-Yi Chang , De-An Huang , Yanan Sui , Li Fei-Fei , and Juan Carlos Niebles . 2019. D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation. arXiv:1901.02598 [cs] (April 2019 ). http:\/\/arxiv.org\/abs\/1901.02598 arXiv: 1901.02598. Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, and Juan Carlos Niebles. 2019. D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation. arXiv:1901.02598 [cs] (April 2019). http:\/\/arxiv.org\/abs\/1901.02598 arXiv: 1901.02598."},{"key":"e_1_3_2_1_7_1","volume-title":"NIPS 2014 Workshop on Deep Learning","author":"Chung Junyoung","year":"2014","unstructured":"Junyoung Chung , Caglar Gulcehre , Kyunghyun Cho , and Yoshua Bengio . 2014 . Empirical evaluation of gated recurrent neural networks on sequence modeling . In NIPS 2014 Workshop on Deep Learning , December 2014. Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014."},{"key":"e_1_3_2_1_8_1","volume-title":"Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment. arXiv:1803.10699 [cs] (March","author":"Ding Li","year":"2018","unstructured":"Li Ding and Chenliang Xu. 2018. Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment. arXiv:1803.10699 [cs] (March 2018 ). http:\/\/arxiv.org\/abs\/1803.10699 arXiv: 1803.10699. Li Ding and Chenliang Xu. 2018. Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment. arXiv:1803.10699 [cs] (March 2018). http:\/\/arxiv.org\/abs\/1803.10699 arXiv: 1803.10699."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2599174"},{"key":"e_1_3_2_1_10_1","volume-title":"SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation. arXiv:2003.14266 [cs] (March","author":"Fayyaz Mohsen","year":"2020","unstructured":"Mohsen Fayyaz and Juergen Gall . 2020 . SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation. arXiv:2003.14266 [cs] (March 2020). http:\/\/arxiv.org\/abs\/2003.14266 arXiv: 2003.14266. Mohsen Fayyaz and Juergen Gall. 2020. SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation. arXiv:2003.14266 [cs] (March 2020). http:\/\/arxiv.org\/abs\/2003.14266 arXiv: 2003.14266."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00028"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Christoph Feichtenhofer Haoqi Fan Jitendra Malik and Kaiming He. 2019. SlowFast Networks for Video Recognition. 6202--6211. https:\/\/openaccess.thecvf.com\/content_ICCV_2019\/html\/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.html  Christoph Feichtenhofer Haoqi Fan Jitendra Malik and Kaiming He. 2019. SlowFast Networks for Video Recognition. 6202--6211. https:\/\/openaccess.thecvf.com\/content_ICCV_2019\/html\/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.html","DOI":"10.1109\/ICCV.2019.00630"},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning -","volume":"70","author":"Gehring Jonas","unstructured":"Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , and Yann N. Dauphin . 2017. Convolutional Sequence to Sequence Learning . In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML'17). JMLR.org, 1243--1252. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional Sequence to Sequence Learning. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML'17). JMLR.org, 1243--1252."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.622"},{"key":"e_1_3_2_1_15_1","volume-title":"Complex Action Segmentation in Compressed Videos. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.","author":"Han Hongfeng","year":"2021","unstructured":"Hongfeng Han , Guoxing Yang , Yuqi Huo , Zhiwu Lu , and Ji-Rong Wen . 2021 . Complex Action Segmentation in Compressed Videos. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6. Hongfeng Han, Guoxing Yang, Yuqi Huo, Zhiwu Lu, and Ji-Rong Wen. 2021. Complex Action Segmentation in Compressed Videos. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2017.01.010"},{"key":"e_1_3_2_1_17_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ). Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(81)90024-2"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_1_20_1","volume-title":"The Kinetics Human Action Video Dataset. arXiv:1705.06950 [cs] (May","author":"Kay Will","year":"2017","unstructured":"Will Kay , Joao Carreira , Karen Simonyan , Brian Zhang , Chloe Hillier , Sudheendra Vijayanarasimhan , Fabio Viola , Tim Green , Trevor Back , Paul Natsev , Mustafa Suleyman , and Andrew Zisserman . 2017. The Kinetics Human Action Video Dataset. arXiv:1705.06950 [cs] (May 2017 ). http:\/\/arxiv.org\/abs\/1705.06950 arXiv: 1705.06950. Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. arXiv:1705.06950 [cs] (May 2017). http:\/\/arxiv.org\/abs\/1705.06950 arXiv: 1705.06950."},{"key":"e_1_3_2_1_21_1","volume-title":"BMVC 2008 - 19th British Machine Vision Conference, Mark Everingham, Chris Needham, and Roberto Fraile (Eds.). British Machine Vision Association","author":"Klaser Alexander","year":"2008","unstructured":"Alexander Klaser , Marcin Marszalek , and Cordelia Schmid . 2008 . A SpatioTemporal Descriptor Based on 3D-Gradients . In BMVC 2008 - 19th British Machine Vision Conference, Mark Everingham, Chris Needham, and Roberto Fraile (Eds.). British Machine Vision Association , Leeds, United Kingdom, 275:1--10. https:\/\/hal.inria.fr\/inria-00514853 Alexander Klaser, Marcin Marszalek, and Cordelia Schmid. 2008. A SpatioTemporal Descriptor Based on 3D-Gradients. In BMVC 2008 - 19th British Machine Vision Conference, Mark Everingham, Chris Needham, and Roberto Fraile (Eds.). British Machine Vision Association, Leeds, United Kingdom, 275:1--10. https:\/\/hal.inria.fr\/inria-00514853"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.364"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01576"},{"key":"e_1_3_2_1_24_1","volume-title":"String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison (Jan.","author":"Kruskal JB","year":"1983","unstructured":"JB Kruskal and Mark Liberman . 1983. The symmetric time-warping problem: From continuous to discrete. Time Warps , String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison (Jan. 1983 ). JB Kruskal and Mark Liberman. 1983. The symmetric time-warping problem: From continuous to discrete. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison (Jan. 1983)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.105"},{"key":"e_1_3_2_1_26_1","volume-title":"Weakly supervised learning of actions from transcripts. arXiv:1610.02237 [cs] (June","author":"Kuehne Hilde","year":"2017","unstructured":"Hilde Kuehne , Alexander Richard , and Juergen Gall . 2017. Weakly supervised learning of actions from transcripts. arXiv:1610.02237 [cs] (June 2017 ). http:\/\/arxiv.org\/abs\/1610.02237 arXiv: 1610.02237. Hilde Kuehne, Alexander Richard, and Juergen Gall. 2017. Weakly supervised learning of actions from transcripts. arXiv:1610.02237 [cs] (June 2017). http:\/\/arxiv.org\/abs\/1610.02237 arXiv: 1610.02237."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2884469"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1002\/nav.3800020109"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1002\/nav.3800030404"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Anna Kukleva Hilde Kuehne Fadime Sener and Jurgen Gall. 2019. Unsupervised Learning of Action Classes With Continuous Temporal Embedding. 12066--12074. https:\/\/openaccess.thecvf.com\/content_CVPR_2019\/html\/Kukleva_Unsupervised_Learning_of_Action_Classes_With_Continuous_Temporal_Embedding_CVPR_2019_paper.html  Anna Kukleva Hilde Kuehne Fadime Sener and Jurgen Gall. 2019. Unsupervised Learning of Action Classes With Continuous Temporal Embedding. 12066--12074. https:\/\/openaccess.thecvf.com\/content_CVPR_2019\/html\/Kukleva_Unsupervised_Learning_of_Action_Classes_With_Continuous_Temporal_Embedding_CVPR_2019_paper.html","DOI":"10.1109\/CVPR.2019.01234"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Heeseung Kwon Manjin Kim Suha Kwak and Minsu Cho. 2020. MotionSqueeze: Neural Motion Feature learning for Video Understanding. In ECCV.  Heeseung Kwon Manjin Kim Suha Kwak and Minsu Cho. 2020. MotionSqueeze: Neural Motion Feature learning for Video Understanding. In ECCV.","DOI":"10.1007\/978-3-030-58517-4_21"},{"key":"e_1_3_2_1_32_1","volume-title":"Weakly Supervised Energy-Based Learning for Action Segmentation. arXiv:1909.13155 [cs] (Sept","author":"Li Jun","year":"2019","unstructured":"Jun Li , Peng Lei , and Sinisa Todorovic . 2019. Weakly Supervised Energy-Based Learning for Action Segmentation. arXiv:1909.13155 [cs] (Sept . 2019 ). http:\/\/arxiv.org\/abs\/1909.13155 arXiv: 1909.13155. Jun Li, Peng Lei, and Sinisa Todorovic. 2019. Weakly Supervised Energy-Based Learning for Action Segmentation. arXiv:1909.13155 [cs] (Sept. 2019). http:\/\/arxiv.org\/abs\/1909.13155 arXiv: 1909.13155."},{"key":"e_1_3_2_1_33_1","volume-title":"Action Shuffle Alternating Learning for Unsupervised Action Segmentation. arXiv:2104.02116 [cs] (April","author":"Li Jun","year":"2021","unstructured":"Jun Li and Sinisa Todorovic . 2021. Action Shuffle Alternating Learning for Unsupervised Action Segmentation. arXiv:2104.02116 [cs] (April 2021 ). http:\/\/arxiv.org\/abs\/2104.02116 arXiv: 2104.02116. Jun Li and Sinisa Todorovic. 2021. Action Shuffle Alternating Learning for Unsupervised Action Segmentation. arXiv:2104.02116 [cs] (April 2021). http:\/\/arxiv.org\/abs\/2104.02116 arXiv: 2104.02116."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00718"},{"key":"e_1_3_2_1_35_1","unstructured":"J. Macqueen. 1967. Some methods for classification and analysis of multivariate observations. In In 5-th Berkeley Symposium on Mathematical Statistics and Probability. 281--297.  J. Macqueen. 1967. Some methods for classification and analysis of multivariate observations. In In 5-th Berkeley Symposium on Mathematical Statistics and Probability. 281--297."},{"key":"e_1_3_2_1_36_1","volume-title":"Proceedings of the Brazilian Symposium on Multimedia and the Web. 97--104","author":"Mendes P.","unstructured":"P. Mendes , A. Busson , S. Colcher , D. Schwabe , A. Guedes , and C. Laufer . 2020. A Cluster-Matching-Based Method for Video Face Recognition . In Proceedings of the Brazilian Symposium on Multimedia and the Web. 97--104 . P. Mendes, A. Busson, S. Colcher, D. Schwabe, A. Guedes, and C. Laufer. 2020. A Cluster-Matching-Based Method for Video Face Recognition. In Proceedings of the Brazilian Symposium on Multimedia and the Web. 97--104."},{"key":"e_1_3_2_1_37_1","volume-title":"Shaping the Video Conferences of Tomorrow With AI. In Anais Estendidos do XXVI Simp\u00f3sio Brasileiro de Sistemas Multim\u00eddia e Web. SBC, 165--168","author":"Mendes Paulo Renato C","year":"2020","unstructured":"Paulo Renato C Mendes , Eduardo S Vieira , Pedro Vinicius A de Freitas , Antonio Jos\u00e9 G Busson , \u00c1lan L\u00edvio V Guedes , Carlos de Salles Soares Neto , and S\u00e9rgio Colcher . 2020 . Shaping the Video Conferences of Tomorrow With AI. In Anais Estendidos do XXVI Simp\u00f3sio Brasileiro de Sistemas Multim\u00eddia e Web. SBC, 165--168 . Paulo Renato C Mendes, Eduardo S Vieira, Pedro Vinicius A de Freitas, Antonio Jos\u00e9 G Busson, \u00c1lan L\u00edvio V Guedes, Carlos de Salles Soares Neto, and S\u00e9rgio Colcher. 2020. Shaping the Video Conferences of Tomorrow With AI. In Anais Estendidos do XXVI Simp\u00f3sio Brasileiro de Sistemas Multim\u00eddia e Web. SBC, 165--168."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1137\/0105003"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2009.11.014"},{"key":"e_1_3_2_1_40_1","volume-title":"Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling. arXiv:1703.08132 [cs] (Oct","author":"Richard Alexander","year":"2017","unstructured":"Alexander Richard , Hilde Kuehne , and Juergen Gall . 2017. Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling. arXiv:1703.08132 [cs] (Oct . 2017 ). http:\/\/arxiv.org\/abs\/1703.08132 arXiv: 1703.08132. Alexander Richard, Hilde Kuehne, and Juergen Gall. 2017. Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling. arXiv:1703.08132 [cs] (Oct. 2017). http:\/\/arxiv.org\/abs\/1703.08132 arXiv: 1703.08132."},{"key":"e_1_3_2_1_41_1","volume-title":"Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling. CoRR abs\/1703.08132","author":"Richard Alexander","year":"2017","unstructured":"Alexander Richard , Hilde Kuehne , and Juergen Gall . 2017. Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling. CoRR abs\/1703.08132 ( 2017 ). arXiv:1703.08132 http:\/\/arxiv.org\/abs\/1703.08132 Alexander Richard, Hilde Kuehne, and Juergen Gall. 2017. Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling. CoRR abs\/1703.08132 (2017). arXiv:1703.08132 http:\/\/arxiv.org\/abs\/1703.08132"},{"key":"e_1_3_2_1_42_1","volume-title":"NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning. arXiv:1805.06875 [cs] (May","author":"Richard Alexander","year":"2018","unstructured":"Alexander Richard , Hilde Kuehne , Ahsan Iqbal , and Juergen Gall . 2018. NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning. arXiv:1805.06875 [cs] (May 2018 ). http:\/\/arxiv.org\/abs\/1805.06875 arXiv: 1805.06875. Alexander Richard, Hilde Kuehne, Ahsan Iqbal, and Juergen Gall. 2018. NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning. arXiv:1805.06875 [cs] (May 2018). http:\/\/arxiv.org\/abs\/1805.06875 arXiv: 1805.06875."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323503.3345029"},{"key":"e_1_3_2_1_44_1","volume-title":"Efficient Parameter-free Clustering Using First Neighbor Relations. (Feb","author":"Sarfraz M. Saquib","year":"2019","unstructured":"M. Saquib Sarfraz , Vivek Sharma , and Rainer Stiefelhagen . 2019. Efficient Parameter-free Clustering Using First Neighbor Relations. (Feb . 2019 ). https:\/\/arxiv.org\/abs\/1902.11266v1 M. Saquib Sarfraz, Vivek Sharma, and Rainer Stiefelhagen. 2019. Efficient Parameter-free Clustering Using First Neighbor Relations. (Feb. 2019). https:\/\/arxiv.org\/abs\/1902.11266v1"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01107"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587730"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00873"},{"key":"e_1_3_2_1_48_1","volume-title":"Self-Attention with Relative Position Representations. CoRR abs\/1803.02155","author":"Shaw Peter","year":"2018","unstructured":"Peter Shaw , Jakob Uszkoreit , and Ashish Vaswani . 2018. Self-Attention with Relative Position Representations. CoRR abs\/1803.02155 ( 2018 ). arXiv:1803.02155 http:\/\/arxiv.org\/abs\/1803.02155 Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. CoRR abs\/1803.02155 (2018). arXiv:1803.02155 http:\/\/arxiv.org\/abs\/1803.02155"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_31"},{"key":"e_1_3_2_1_50_1","volume-title":"Advances in Neural Information Processing Systems 27","author":"Simonyan Karen","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Two-Stream Convolutional Networks for Action Recognition in Videos . In Advances in Neural Information Processing Systems 27 , Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc. , 568--576. http:\/\/papers.nips.cc\/paper\/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 568--576. http:\/\/papers.nips.cc\/paper\/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf"},{"key":"e_1_3_2_1_51_1","volume-title":"Fast Weakly Supervised Action Segmentation Using Mutual Consistency. arXiv:1904.03116 [cs] (March","author":"Souri Yaser","year":"2020","unstructured":"Yaser Souri , Mohsen Fayyaz , Luca Minciullo , Gianpiero Francesca , and Juergen Gall . 2020. Fast Weakly Supervised Action Segmentation Using Mutual Consistency. arXiv:1904.03116 [cs] (March 2020 ). http:\/\/arxiv.org\/abs\/1904.03116 arXiv: 1904.03116. Yaser Souri, Mohsen Fayyaz, Luca Minciullo, Gianpiero Francesca, and Juergen Gall. 2020. Fast Weakly Supervised Action Segmentation Using Mutual Consistency. arXiv:1904.03116 [cs] (March 2020). http:\/\/arxiv.org\/abs\/1904.03116 arXiv: 1904.03116."},{"key":"e_1_3_2_1_52_1","volume-title":"PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. CoRR abs\/1709.02371","author":"Sun Deqing","year":"2017","unstructured":"Deqing Sun , Xiaodong Yang , Ming-Yu Liu , and Jan Kautz . 2017. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. CoRR abs\/1709.02371 ( 2017 ). arXiv:1709.02371 http:\/\/arxiv.org\/abs\/1709.02371 Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2017. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. CoRR abs\/1709.02371 (2017). arXiv:1709.02371 http:\/\/arxiv.org\/abs\/1709.02371"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.5201\/ipol.2013.26"},{"key":"e_1_3_2_1_54_1","volume-title":"arXiv:1706.03762 [cs] (Dec","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention Is All You Need. arXiv:1706.03762 [cs] (Dec . 2017 ). http:\/\/arxiv.org\/abs\/1706.03762 arXiv: 1706.03762. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv:1706.03762 [cs] (Dec. 2017). http:\/\/arxiv.org\/abs\/1706.03762 arXiv: 1706.03762."},{"key":"e_1_3_2_1_55_1","volume-title":"Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences. arXiv:2001.11122 [cs] (Sept","author":"VidalMata Rosaura G.","year":"2020","unstructured":"Rosaura G. VidalMata , Walter J. Scheirer , Anna Kukleva , David Cox , and Hilde Kuehne . 2020. Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences. arXiv:2001.11122 [cs] (Sept . 2020 ). http:\/\/arxiv.org\/abs\/2001.11122 arXiv: 2001.11122. Rosaura G. VidalMata, Walter J. Scheirer, Anna Kukleva, David Cox, and Hilde Kuehne. 2020. Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences. arXiv:2001.11122 [cs] (Sept. 2020). http:\/\/arxiv.org\/abs\/2001.11122 arXiv: 2001.11122."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"crossref","unstructured":"Heng Wang and Cordelia Schmid. 2013. Action Recognition with Improved Trajectories. 3551--3558. https:\/\/openaccess.thecvf.com\/content_iccv_2013\/html\/Wang_Action_Recognition_with_2013_ICCV_paper.html  Heng Wang and Cordelia Schmid. 2013. Action Recognition with Improved Trajectories. 3551--3558. https:\/\/openaccess.thecvf.com\/content_iccv_2013\/html\/Wang_Action_Recognition_with_2013_ICCV_paper.html","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_3_2_1_58_1","volume-title":"Nonlocal Neural Networks. CoRR abs\/1711.07971","author":"Wang Xiaolong","year":"2017","unstructured":"Xiaolong Wang , Ross B. Girshick , Abhinav Gupta , and Kaiming He. 2017. Nonlocal Neural Networks. CoRR abs\/1711.07971 ( 2017 ). arXiv:1711.07971 http:\/\/arxiv.org\/abs\/1711.07971 Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. 2017. Nonlocal Neural Networks. CoRR abs\/1711.07971 (2017). arXiv:1711.07971 http:\/\/arxiv.org\/abs\/1711.07971"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00631"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2986861"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299101"}],"event":{"name":"MMSys '22: 13th ACM Multimedia Systems Conference","location":"Athlone Ireland","acronym":"MMSys '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia","SIGCOMM ACM Special Interest Group on Data Communication","SIGMOBILE ACM Special Interest Group on Mobility of Systems, Users, Data and Computing"]},"container-title":["Proceedings of the 13th ACM Multimedia Systems Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524273.3528187","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524273.3528187","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:06Z","timestamp":1750188666000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524273.3528187"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,14]]},"references-count":61,"alternative-id":["10.1145\/3524273.3528187","10.1145\/3524273"],"URL":"https:\/\/doi.org\/10.1145\/3524273.3528187","relation":{},"subject":[],"published":{"date-parts":[[2022,6,14]]},"assertion":[{"value":"2022-08-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}