{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T22:41:27Z","timestamp":1764715287477,"version":"build-2065373602"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>\n            Advances in computer vision and deep learning have made video-based Human Action Recognition (HAR) increasingly feasible. However, running HAR on live video streams encounters significant delays on embedded platforms due to computational demands. This work addresses real-time HAR performance challenges through four key contributions: (1) an experimental study identifying standard Optical Flow (OF) extraction as the primary latency bottleneck in a state-of-the-art HAR pipeline, (2) an analysis of the latency-accuracy trade-off between traditional and deep learning-based OF methods, underscoring the need for an efficient motion feature extractor with minimal impact on accuracy, (3) the design of\n            <jats:italic toggle=\"yes\">Integrated Motion Feature Extractor (IMFE)<\/jats:italic>\n            , a novel unified neural network architecture that substantially reduces motion feature extraction latency, and (4) the development of\n            <jats:bold>RT-HARE<\/jats:bold>\n            , a real-time HAR system optimized for embedded platforms. Experiments on three benchmark datasets of various characteristics using the Nvidia Jetson Xavier NX platform demonstrate that RT-HARE achieves real-time HAR with lower and more stable latency, reduced power consumption, and a smaller memory footprint while maintaining recognition accuracy comparable to more complex server-based HAR models.\n          <\/jats:p>","DOI":"10.1145\/3761795","type":"journal-article","created":{"date-parts":[[2025,8,16]],"date-time":"2025-08-16T11:07:12Z","timestamp":1755342432000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Real-Time Video-Based Human Action Recognition on Embedded Platforms"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3430-5411","authenticated-orcid":false,"given":"Ruiqi","family":"Wang","sequence":"first","affiliation":[{"name":"Computer Science & Engineering, Washington University in St Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8767-4920","authenticated-orcid":false,"given":"Zichen","family":"Wang","sequence":"additional","affiliation":[{"name":"Computer Science & Engineering, Washington University in St Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2755-104X","authenticated-orcid":false,"given":"Peiqi","family":"Gao","sequence":"additional","affiliation":[{"name":"Computer Science & Engineering, Washington University in St Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-8381-3216","authenticated-orcid":false,"given":"Mingzhen","family":"Li","sequence":"additional","affiliation":[{"name":"Computer Science & Engineering, Washington University in St Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6251-816X","authenticated-orcid":false,"given":"Jaehwan","family":"Jeong","sequence":"additional","affiliation":[{"name":"Computer Science & Engineering, Washington University in St Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3722-1260","authenticated-orcid":false,"given":"Yihang","family":"Xu","sequence":"additional","affiliation":[{"name":"Computer Science & Engineering, Washington University in St Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5140-9952","authenticated-orcid":false,"given":"Yejin","family":"Lee","sequence":"additional","affiliation":[{"name":"Occupational Therapy, Washington University School of Medicine in Saint Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7972-1200","authenticated-orcid":false,"given":"Carolyn","family":"Baum","sequence":"additional","affiliation":[{"name":"Occupational Therapy, Washington University School of Medicine in Saint Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9776-8328","authenticated-orcid":false,"given":"Lisa","family":"Connor","sequence":"additional","affiliation":[{"name":"Occupational Therapy, Washington University School of Medicine in Saint Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1709-6769","authenticated-orcid":false,"given":"Chenyang","family":"Lu","sequence":"additional","affiliation":[{"name":"Computer Science & Engineering, Washington University in St Louis","place":["St. Louis, United States"]},{"name":"AI for Health Institute, Washington University in St Louis","place":["St. Louis, United States"]}]}],"member":"320","published-online":{"date-parts":[[2025,9,26]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1007\/978-3-642-25446-8_4","volume-title":"Proceedings of the Human Behavior Understanding: Second International Workshop, HBU 2011, Amsterdam, The Netherlands, November 16, 2011. Proceedings 2","author":"Baccouche Moez","year":"2011","unstructured":"Moez Baccouche, Franck Mamalet, Christian Wolf, Christophe Garcia, and Atilla Baskurt. 2011. Sequential deep learning for human action recognition. In Proceedings of the Human Behavior Understanding: Second International Workshop, HBU 2011, Amsterdam, The Netherlands, November 16, 2011. Proceedings 2. Springer, 29\u201339."},{"issue":"12","key":"e_1_3_3_3_2","doi-asserted-by":"crossref","first-page":"2799","DOI":"10.1109\/TPAMI.2017.2769085","article-title":"Action recognition with dynamic image networks","volume":"40","author":"Bilen Hakan","year":"2017","unstructured":"Hakan Bilen, Basura Fernando, Efstratios Gavves, and Andrea Vedaldi. 2017. Action recognition with dynamic image networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2017), 2799\u20132813.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_3_4_2","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1007\/978-3-642-33783-3_44","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12","author":"Butler Daniel J","year":"2012","unstructured":"Daniel J Butler, Jonas Wulff, Garrett B Stanley, and Michael J Black. 2012. A naturalistic open source movie for optical flow evaluation. In Proceedings of the Computer Vision\u2013ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12. Springer, 611\u2013625."},{"key":"e_1_3_3_5_2","unstructured":"Joao Carreira and Andrew Zisserman. 2018. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. arxiv:1705.07750. Retrieved from https:\/\/arxiv.org\/abs\/1705.07750"},{"key":"e_1_3_3_6_2","unstructured":"Guodong Ding Fadime Sener and Angela Yao. 2023. Temporal Action Segmentation: An Analysis of Modern Techniques. arxiv:2210.10352. Retrieved from https:\/\/arxiv.org\/abs\/2210.10352"},{"key":"e_1_3_3_7_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Dong Qiaole","year":"2024","unstructured":"Qiaole Dong and Yanwei Fu. 2024. MemFlow: Optical flow estimation and prediction with memory. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_3_8_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arxiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00089"},{"key":"e_1_3_3_10_2","unstructured":"Yazan Abu Farha and Juergen Gall. 2019. MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation. arxiv:1903.01945. Retrieved from https:\/\/arxiv.org\/abs\/1903.01945"},{"key":"e_1_3_3_11_2","doi-asserted-by":"crossref","first-page":"3281","DOI":"10.1109\/CVPR.2011.5995444","volume-title":"Proceedings of the CVPR 2011","author":"Fathi Alireza","year":"2011","unstructured":"Alireza Fathi, Xiaofeng Ren, and James M Rehg. 2011. Learning to recognize objects in egocentric activities. In Proceedings of the CVPR 2011. IEEE, 3281\u20133288."},{"key":"e_1_3_3_12_2","unstructured":"Edward Fish Jon Weinbren and Andrew Gilbert. 2022. Two-stream transformer architecture for long form video understanding. In BMVC."},{"key":"e_1_3_3_13_2","unstructured":"Jiyang Gao Zhenheng Yang and Ram Nevatia. 2017. Red: Reinforced encoder-decoder networks for action anticipation. arXiv:1707.04818. Retrieved from https:\/\/arxiv.org\/abs\/1707.04818. (2017)."},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354978"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"e_1_3_3_17_2","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. arxiv:1503.02531. Retrieved from https:\/\/arxiv.org\/abs\/1503.02531"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_3_3_19_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arxiv:1704.04861. Retrieved from https:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCSoC60832.2023.00046"},{"issue":"1","key":"e_1_3_3_21_2","doi-asserted-by":"crossref","first-page":"17996","DOI":"10.1038\/s41598-023-45149-5","article-title":"A lightweight hybrid vision transformer network for radar-based human activity recognition","volume":"13","author":"Huan Sha","year":"2023","unstructured":"Sha Huan, Zhaoyue Wang, Xiaoqiang Wang, Limei Wu, Xiaoxuan Yang, Hongming Huang, and Gan E Dai. 2023. A lightweight hybrid vision transformer network for radar-based human activity recognition. Scientific Reports 13, 1 (2023), 17996.","journal-title":"Scientific Reports"},{"key":"e_1_3_3_22_2","first-page":"668","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Huang Zhaoyang","year":"2022","unstructured":"Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, and Hongsheng Li. 2022. Flowformer: A transformer architecture for optical flow. In Proceedings of the European Conference on Computer Vision. Springer, 668\u2013685."},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","unstructured":"Altaf Hussain Tanveer Hussain Waseem Ullah and Sung Wook Baik. 2022. Vision transformer and deep sequence learning for human activity recognition in surveillance videos. Computational Intelligence and Neuroscience 2022 1 (2022) 3454167. DOI:10.1155\/2022\/3454167","DOI":"10.1155\/2022\/3454167"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.179"},{"key":"e_1_3_3_25_2","first-page":"2321","article-title":"Alleviating Over-segmentation errors by detecting action boundaries","author":"Ishikawa Yuchi","year":"2020","unstructured":"Yuchi Ishikawa, Seito Kasai, Yoshimitsu Aoki, and Hirokatsu Kataoka. 2020. Alleviating Over-segmentation errors by detecting action boundaries. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) (2020), 2321\u20132330.","journal-title":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS55097.2022.00033"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.59"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00963"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS52674.2021.00038"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","unstructured":"Jinsoo Kim and Jeongho Cho. 2021. Low-cost embedded system using convolutional neural networks-based spatiotemporal feature map for real-time human action recognition. Applied Sciences 11 11 (2021). DOI:10.3390\/app11114940","DOI":"10.3390\/app11114940"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2016.10"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","unstructured":"Xiangbo Kong Zelin Meng Naoto Nojiri Yuji Iwahori Lin Meng and Hiroyuki Tomiyama. 2019. A HOG-SVM based fall detection IoT system for elderly persons using deep sensor. Procedia Computer Science 147 (2019) 276\u2013282. DOI:10.1016\/j.procs.2019.01.264","DOI":"10.1016\/j.procs.2019.01.264"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995496"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.113"},{"key":"e_1_3_3_35_2","first-page":"36","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14","author":"Lea Colin","year":"2016","unstructured":"Colin Lea, Austin Reiter, Ren\u00e9 Vidal, and Gregory D Hager. 2016. Segmental spatiotemporal cnns for fine-grained action segmentation. In Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14. Springer, 36\u201352."},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW59228.2023.00555"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS59052.2023.00023"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS59052.2023.00027"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00718"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3560905.3568520"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485730.3485938"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS55097.2022.00034"},{"key":"e_1_3_3_43_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv:1711.05101. Retrieved from https:\/\/arxiv.org\/abs\/1711.05101. (2017)."},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.438"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i5.28224"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","unstructured":"Asanka G. Perera Yee Wei Law and Javaan Chahl. 2019. Drone-action: An outdoor recorded drone video dataset for action recognition. Drones 3 4 (2019). DOI:10.3390\/drones3040082","DOI":"10.3390\/drones3040082"},{"issue":"1","key":"e_1_3_3_47_2","first-page":"140","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 1, Article 140 (jan2020), 67 pages.","journal-title":"Journal of Machine Learning Research"},{"issue":"6","key":"e_1_3_3_48_2","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1007\/s00138-023-01475-2","article-title":"Vision-based approach to assess performance levels while eating","volume":"34","author":"Raza Muhammad Ahmed","year":"2023","unstructured":"Muhammad Ahmed Raza and Robert B Fisher. 2023. Vision-based approach to assess performance levels while eating. Machine Vision and Applications 34, 6 (2023), 124.","journal-title":"Machine Vision and Applications"},{"key":"e_1_3_3_49_2","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2020. DistilBERT a distilled version of BERT: smaller faster cheaper and lighter. arxiv:1910.01108. Retrieved from https:\/\/arxiv.org\/abs\/1910.01108"},{"key":"e_1_3_3_50_2","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1007\/978-3-319-46448-0_31","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Proceedings, Part I 14","author":"Sigurdsson Gunnar A","year":"2016","unstructured":"Gunnar A Sigurdsson, G\u00fcl Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. 2016. Hollywood in homes: Crowdsourcing data collection for activity understanding. In Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Proceedings, Part I 14. Springer, 510\u2013526."},{"key":"e_1_3_3_51_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems. Curran Associates Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2014\/file\/ca007296a63f7d1721a2399d56363022-Paper.pdf"},{"key":"e_1_3_3_52_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cogsys.2022.10.003"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-022-07883-1"},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","unstructured":"Sebastian Stein and Stephen J. McKenna. 2013. Combining embedded accelerometers with computer vision for recognizing food preparation activities. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp\u201913). Association for Computing Machinery Zurich Switzerland 729\u2013738. DOI:10.1145\/2493432.2493482","DOI":"10.1145\/2493432.2493482"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01708"},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.5201\/ipol.2013.26"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58536-5_24"},{"key":"e_1_3_3_59_2","first-page":"7565","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Wang Xiang","year":"2021","unstructured":"Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, and Nong Sang. 2021. OadTR: Online action detection with transformers. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 7565\u20137575."},{"key":"e_1_3_3_60_2","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1007\/978-3-030-58595-2_3","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXV 16","author":"Wang Zhenzhi","year":"2020","unstructured":"Zhenzhi Wang, Ziteng Gao, Limin Wang, Zhifeng Li, and Gangshan Wu. 2020. Boundary-aware cascade networks for temporal action segmentation. In Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXV 16. Springer, 34\u201351."},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581791.3596870"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00563"},{"key":"e_1_3_3_63_2","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)","author":"Xu Mingze","year":"2021","unstructured":"Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, and Stefano Soatto. 2021. Long short-term transformer for online action detection. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01367"},{"key":"e_1_3_3_65_2","unstructured":"Fangqiu Yi Hongyu Wen and Tingting Jiang. 2021. ASFormer: Transformer for Action Segmentation. arxiv:2110.08568. Retrieved from https:\/\/arxiv.org\/abs\/2110.08568"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01707"},{"key":"e_1_3_3_67_2","first-page":"485","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Zhao Yue","year":"2022","unstructured":"Yue Zhao and Philipp Kr\u00e4henb\u00fchl. 2022. Real-time online video detection with temporal smoothing transformers. In Proceedings of the European Conference on Computer Vision. Springer, 485\u2013502."},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW59228.2023.00567"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3761795","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T14:06:13Z","timestamp":1759500373000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3761795"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,26]]},"references-count":67,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3761795"],"URL":"https:\/\/doi.org\/10.1145\/3761795","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2025,9,26]]},"assertion":[{"value":"2025-08-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}