{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T07:20:09Z","timestamp":1768634409570,"version":"3.49.0"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"crossref","award":["JP21H03519 and JP24H00733"],"award-info":[{"award-number":["JP21H03519 and JP24H00733"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>Open-vocabulary Temporal Action Detection (Open-vocab TAD) extends the detection scope of Closed-vocabulary Temporal Action Detection (Closed-vocab TAD) to unseen action classes specified by vocabularies not included in the training data, within untrimmed video. Typical Open-vocab TAD methods adopt a two-stage approach that first proposes candidate action intervals and then identifies those actions. However, errors in the first stage can affect the subsequent stage and the final detection results. Moreover, conventional methods for temporal context analyses tend to focus solely on either global or local context. Focusing solely on the global context can lead to lack of momentary detail, making it difficult to distinguish one action from another. Conversely, focusing only on the local context makes it challenging to determine the start and end timings of action intervals. To address these challenges, we introduce a one-stage approach named Hierarchical Open-vocab TAD (HOTAD), consisting of two branches: Temporal Context Analysis (TCA) and Video\u2013Text Alignment (VTA). The former utilizes Hierarchical Encoder (HE) to fuse global and local temporal features, enabling a comprehensive capture of temporal actions, while the latter branch exploits the synergy between visual and textual modalities for precisely detecting unseen actions in the Open-vocab setting. Experiments and in-depth analysis using the widely recognized datasets THUMOS14 and ActivityNet-1.3 are performed to show the effectiveness of HOTAD. The results highlight remarkable accuracy in detecting a wide range of unseen actions. Furthermore, HOTAD significantly reduces wrong labels and localizes action instances with high precision, showcasing its robustness in complex and dynamic video settings.<\/jats:p>","DOI":"10.1145\/3773986","type":"journal-article","created":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T14:55:10Z","timestamp":1761836110000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Hierarchical Global\u2013Local Fusion for One-stage Open-vocabulary Temporal Action Detection"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8976-2922","authenticated-orcid":false,"given":"Trung Thanh","family":"Nguyen","sequence":"first","affiliation":[{"name":"Nagoya University, Nagoya, Japan and RIKEN, Kyoto, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3799-4550","authenticated-orcid":false,"given":"Yasutomo","family":"Kawanishi","sequence":"additional","affiliation":[{"name":"RIKEN, Kyoto, Japan and Nagoya University, Nagoya, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3041-4330","authenticated-orcid":false,"given":"Takahiro","family":"Komamizu","sequence":"additional","affiliation":[{"name":"Nagoya University, Nagoya, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3942-9296","authenticated-orcid":false,"given":"Ichiro","family":"Ide","sequence":"additional","affiliation":[{"name":"Nagoya University, Nagoya, Japan"}]}],"member":"320","published-online":{"date-parts":[[2026,1,13]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"264","volume-title":"Proceedings of the 15th European Conference on Computer Vision","volume":"9","author":"Alwassel Humam","year":"2018","unstructured":"Humam Alwassel, Fabian Caba Heilbron, Victor Escorcia, and Bernard Ghanem. 2018. Diagnosing error in temporal action detectors. In Proceedings of the 15th European Conference on Computer Vision, Vol. 9, 264\u2013280."},{"key":"e_1_3_2_3_2","first-page":"2979","volume-title":"Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Bao Wentao","year":"2022","unstructured":"Wentao Bao, Qi Yu, and Yu Kong. 2022. OpenTAL: Towards open set temporal action localization. In Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2979\u20132989."},{"key":"e_1_3_2_4_2","first-page":"5561","volume-title":"Proceedings of the 16th IEEE International Conference on Computer Vision","author":"Bodla Navaneeth","year":"2017","unstructured":"Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis. 2017. Soft-NMS\u2014Improving object detection with one line of code. In Proceedings of the 16th IEEE International Conference on Computer Vision, 5561\u20135569."},{"key":"e_1_3_2_5_2","first-page":"213","volume-title":"Proceedings of the 16th European Conference on Computer Vision","volume":"1","author":"Carion Nicolas","year":"2020","unstructured":"Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Proceedings of the 16th European Conference on Computer Vision, Vol. 1, 213\u2013229."},{"key":"e_1_3_2_6_2","first-page":"6299","volume-title":"Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition","author":"Carreira Joao","year":"2017","unstructured":"Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the Kinetics dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 6299\u20136308."},{"key":"e_1_3_2_7_2","first-page":"14741","volume-title":"Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chen Mengyuan","year":"2023","unstructured":"Mengyuan Chen, Junyu Gao, and Changsheng Xu. 2023. Cascade evidential learning for open-world weakly-supervised temporal action localization. In Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14741\u201314750."},{"key":"e_1_3_2_8_2","first-page":"2793","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Dong Yihe","year":"2021","unstructured":"Yihe Dong, Jean-Baptiste Cordonnier, and Andreas Loukas. 2021. Attention is not all you need: Pure attention loses rank doubly exponentially with depth. In Proceedings of the 38th International Conference on Machine Learning, 2793\u20132803."},{"key":"e_1_3_2_9_2","first-page":"961","volume-title":"Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition","author":"Fabian Caba Heilbron","year":"2015","unstructured":"Caba Heilbron Fabian, Escorcia Victor, Ghanem Bernard, and Carlos Niebles Juan. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, 961\u2013970."},{"key":"e_1_3_2_10_2","unstructured":"Xiuye Gu Tsung-Yi Lin Weicheng Kuo and Yin Cui. 2021. Open-vocabulary object detection via vision and language knowledge distillation. arXiv:2104.13921. Retrieved from https:\/\/arxiv.org\/abs\/2104.13921"},{"key":"e_1_3_2_11_2","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (GELUs). arXiv:1606.08415. Retrieved from https:\/\/arxiv.org\/abs\/1606.08415"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3295748"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2016.10.018"},{"key":"e_1_3_2_14_2","first-page":"4904","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Jia Chao","year":"2021","unstructured":"Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision\u2013language representation learning with noisy text supervision. In Proceedings of the 38th International Conference on Machine Learning, 4904\u20134916."},{"key":"e_1_3_2_15_2","first-page":"105","volume-title":"Proceedings of the 17th European Conference on Computer Vision","volume":"35","author":"Ju Chen","year":"2022","unstructured":"Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang, and Weidi Xie. 2022. Prompting visual\u2013language models for efficient video understanding. In Proceedings of the 17th European Conference on Computer Vision, Vol. 35, 105\u2013124."},{"key":"e_1_3_2_16_2","unstructured":"Chen Ju Zeqian Li Peisen Zhao Ya Zhang Xiaopeng Zhang Qi Tian Yanfeng Wang and Weidi Xie. 2023. Multi-modal prompting for low-shot temporal action localization. arXiv:2303.11732. Retrieved from https:\/\/arxiv.org\/abs\/2303.11732"},{"key":"e_1_3_2_17_2","unstructured":"Will Kay Joao Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev et al. 2017. The Kinetics human action video dataset. arXiv:1705.06950. Retrieved from https:\/\/arxiv.org\/abs\/1705.06950"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3395778"},{"key":"e_1_3_2_19_2","first-page":"3320","volume-title":"Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Lin Chuming","year":"2021","unstructured":"Chuming Lin, Chengming Xu, Donghao Luo, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Yanwei Fu. 2021. Learning salient boundary feature for anchor-free temporal action localization. In Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 3320\u20133329."},{"key":"e_1_3_2_20_2","first-page":"3889","volume-title":"Proceedings of the 17th IEEE\/CVF International Conference on Computer Vision","author":"Lin Tianwei","year":"2019","unstructured":"Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, and Shilei Wen. 2019. BMN: Boundary-matching network for temporal action proposal generation. In Proceedings of the 17th IEEE\/CVF International Conference on Computer Vision, 3889\u20133898."},{"key":"e_1_3_2_21_2","first-page":"2980","volume-title":"Proceedings of the 16th IEEE International Conference on Computer Vision","author":"Lin Tsung-Yi","year":"2017","unstructured":"Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll\u00e1r. 2017. Focal loss for dense object detection. In Proceedings of the 16th IEEE International Conference on Computer Vision, 2980\u20132988."},{"key":"e_1_3_2_22_2","first-page":"11612","volume-title":"Proceedings of the 36th AAAI Conference on Artificial Intelligence","author":"Liu Qinying","year":"2020","unstructured":"Qinying Liu and Zilei Wang. 2020. Progressive boundary refinement network for temporal action detection. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, 11612\u201311619."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3195321"},{"key":"e_1_3_2_24_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. arXiv:1411.2539. Retrieved from https:\/\/arxiv.org\/abs\/1411.2539"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.07.028"},{"key":"e_1_3_2_26_2","first-page":"681","volume-title":"Proceedings of the 17th European Conference on Computer Vision","volume":"3","author":"Nag Sauradip","year":"2022","unstructured":"Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, and Tao Xiang. 2022. Zero-shot temporal action detection via vision\u2013language prompting. In Proceedings of the 17th European Conference on Computer Vision, Vol. 3, 681\u2013697."},{"key":"e_1_3_2_27_2","first-page":"121168","article-title":"A survey on multimodal bidirectional machine learning translation of image and natural language processing","volume":"235","author":"Nam Wongyung","year":"2023","unstructured":"Wongyung Nam and Beakcheol Jang. 2023. A survey on multimodal bidirectional machine learning translation of image and natural language processing. Expert Systems with Applications 235, 121168 (2023), 1\u201314.","journal-title":"Expert Systems with Applications"},{"key":"e_1_3_2_28_2","first-page":"1","volume-title":"Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition","author":"Nguyen Trung Thanh","year":"2024","unstructured":"Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, and Ichiro Ide. 2024. One-stage open-vocabulary temporal action detection leveraging temporal multi-scale and action label features. In Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition, 1\u201310."},{"key":"e_1_3_2_29_2","unstructured":"OpenAI. 2022. ChatGPT 3.5. Retrieved October 2025 from https:\/\/www.chat.openai.com\/"},{"key":"e_1_3_2_30_2","unstructured":"Hieu Pham Zihang Dai Golnaz Ghiasi Kenji Kawaguchi Hanxiao Liu Adams Wei Yu Jiahui Yu Yi-Ting Chen Minh-Thang Luong Yonghui Wu et al. 2021. Combined scaling for zero-shot transfer learning. arXiv:2111.10050. Retrieved from https:\/\/arxiv.org\/abs\/2111.10050"},{"key":"e_1_3_2_31_2","first-page":"15691","volume-title":"Proceedings of the 19th IEEE\/CVF International Conference on Computer Vision","author":"Pratt Sarah","year":"2023","unstructured":"Sarah Pratt, Ian Covert, Rosanne Liu, and Ali Farhadi. 2023. What does a platypus look like? Generating customized prompts for zero-shot image classification. In Proceedings of the 19th IEEE\/CVF International Conference on Computer Vision, 15691\u201315701."},{"key":"e_1_3_2_32_2","unstructured":"Rui Qian Yeqing Li Zheng Xu Ming-Hsuan Yang Serge Belongie and Yin Cui. 2022. Multimodal open-vocabulary video classification via pre-trained vision and language models. arXiv:2207.07646. Retrieved from https:\/\/arxiv.org\/abs\/2207.07646"},{"key":"e_1_3_2_33_2","first-page":"485","volume-title":"Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Qing Zhiwu","year":"2021","unstructured":"Zhiwu Qing, Haisheng Su, Weihao Gan, Dongliang Wang, Wei Wu, Xiang Wang, Yu Qiao, Junjie Yan, Changxin Gao, and Nong Sang. 2021. Temporal context aggregation network for temporal action proposal refinement. In Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 485\u2013494."},{"key":"e_1_3_2_34_2","first-page":"8748","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, 8748\u20138763."},{"key":"e_1_3_2_35_2","unstructured":"Vivek Rathod Bryan Seybold Sudheendra Vijayanarasimhan Austin Myers Xiuye Gu Vighnesh Birodkar and David A. Ross. 2022. Open-vocabulary temporal action detection with off-the-shelf image-text features. arXiv:2212.10596. Retrieved from https:\/\/arxiv.org\/abs\/2212.10596"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2024.3414275"},{"issue":"104327","key":"e_1_3_2_37_2","first-page":"1","article-title":"A survey of methods, datasets and evaluation metrics for visual question answering","volume":"116","author":"Sharma Himanshu","year":"2021","unstructured":"Himanshu Sharma and Anand Singh Jalal. 2021. A survey of methods, datasets and evaluation metrics for visual question answering. Image and Vision Computing 116, 104327 (2021), 1\u201334.","journal-title":"Image and Vision Computing"},{"key":"e_1_3_2_38_2","first-page":"18857","volume-title":"Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Shi Dingfeng","year":"2023","unstructured":"Dingfeng Shi, Yujie Zhong, Qiong Cao, Lin Ma, Jia Li, and Dacheng Tao. 2023. TriDet: Temporal action detection with relative boundary modeling. In Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 18857\u201318866."},{"key":"e_1_3_2_39_2","first-page":"105","volume-title":"Proceedings of the 17th European Conference on Computer Vision","volume":"10","author":"Shi Dingfeng","year":"2022","unstructured":"Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, and Dacheng Tao. 2022. ReAct: Temporal action detection with relational queries. In Proceedings of the 17th European Conference on Computer Vision, Vol 10, 105\u2013121."},{"key":"e_1_3_2_40_2","unstructured":"Tuan N. Tang Kwonyoung Kim and Kwanghoon Sohn. 2023. TemporalMaxer: Maximize temporal context with only max pooling for temporal action localization. arXiv:2303.09055. Retrieved from https:\/\/arxiv.org\/abs\/2303.09055"},{"key":"e_1_3_2_41_2","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30, 5998\u20136008.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_42_2","unstructured":"Chenhao Wang Hongxiang Cai Yuxin Zou and Yichao Xiong. 2021. RGB stream is enough for temporal action detection. arXiv:2107.04362. Retrieved from https:\/\/arxiv.org\/abs\/2107.04362"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2868668"},{"key":"e_1_3_2_44_2","unstructured":"Mengmeng Wang Jiazheng Xing and Yong Liu. 2021. ActionCLIP: A new paradigm for video action recognition. arXiv:2109.08472. Retrieved from https:\/\/arxiv.org\/abs\/2109.08472"},{"key":"e_1_3_2_45_2","unstructured":"Zejia Weng Xitong Yang Ang Li Zuxuan Wu and Yu-Gang Jiang. 2023. Transforming CLIP to an open-vocabulary video model via interpolated weight optimization. arXiv:2302.00624. Retrieved from https:\/\/arxiv.org\/abs\/2302.00624"},{"key":"e_1_3_2_46_2","first-page":"3","volume-title":"Proceedings of the 15th European Conference on Computer Vision","volume":"19","author":"Wu Yuxin","year":"2018","unstructured":"Yuxin Wu and Kaiming He. 2018. Group normalization. In Proceedings of the 15th European Conference on Computer Vision, Vol. 19, 3\u201319."},{"key":"e_1_3_2_47_2","first-page":"10156","volume-title":"Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Xu Mengmeng","year":"2020","unstructured":"Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, and Bernard Ghanem. 2020. G-TAD: Sub-graph localization for temporal action detection. In Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 10156\u201310165."},{"key":"e_1_3_2_48_2","first-page":"14393","volume-title":"Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zareian Alireza","year":"2021","unstructured":"Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, and Shih-Fu Chang. 2021. Open-vocabulary object detection using captions. In Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14393\u201314402."},{"key":"e_1_3_2_49_2","first-page":"7094","volume-title":"Proceedings of the 17th IEEE\/CVF International Conference on Computer Vision","author":"Zeng Runhao","year":"2019","unstructured":"Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, and Chuang Gan. 2019. Graph convolutional networks for temporal action localization. In Proceedings of the 17th IEEE\/CVF International Conference on Computer Vision, 7094\u20137103."},{"key":"e_1_3_2_50_2","first-page":"492","volume-title":"Proceedings of the 17th European Conference on Computer Vision","volume":"4","author":"Zhang Chen-Lin","year":"2022","unstructured":"Chen-Lin Zhang, Jianxin Wu, and Yin Li. 2022. ActionFormer: Localizing moments of actions with transformers. In Proceedings of the 17th European Conference on Computer Vision, Vol. 4, 492\u2013510."},{"key":"e_1_3_2_51_2","first-page":"13658","volume-title":"Proceedings of the 18th IEEE\/CVF International Conference on Computer Vision","author":"Zhao Chen","year":"2021","unstructured":"Chen Zhao, Ali K. Thabet, and Bernard Ghanem. 2021. Video self-stitching graph network for temporal action localization. In Proceedings of the 18th IEEE\/CVF International Conference on Computer Vision, 13658\u201313667."},{"key":"e_1_3_2_52_2","first-page":"539","volume-title":"Proceedings of the 16th European Conference on Computer Vision","volume":"8","author":"Zhao Peisen","year":"2020","unstructured":"Peisen Zhao, Lingxi Xie, Chen Ju, Ya Zhang, Yanfeng Wang, and Qi Tian. 2020. Bottom-up temporal action localization with mutual regularization. In Proceedings of the 16th European Conference on Computer Vision, Vol. 8, 539\u2013555."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2021.3132287"},{"key":"e_1_3_2_54_2","first-page":"12993","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence","author":"Zheng Zhaohui","year":"2020","unstructured":"Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, and Dongwei Ren. 2020. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, 12993\u201313000."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3773986","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T14:20:07Z","timestamp":1768314007000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3773986"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,13]]},"references-count":53,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3773986"],"URL":"https:\/\/doi.org\/10.1145\/3773986","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,13]]},"assertion":[{"value":"2024-06-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-21","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}