{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T04:20:25Z","timestamp":1778127625673,"version":"3.51.4"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"10","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62002005"],"award-info":[{"award-number":["62002005"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,10,31]]},"abstract":"<jats:p>\n            Current vision-language trackers often struggle to fuse multimodal information comprehensively and effectively, leading to suboptimal performance in multimodal tasks. This study introduces LGTrack, a novel language-guided visual tracking framework designed to achieve a more comprehensive and efficient fusion of vision and language information. In the encoding stage, an Enhanced Multimodal Interaction Module is proposed to achieve full multimodal fusion, and it is used to construct Early Language Multilevel-guided Multimodal Encoding, which leverages deep semantic information for early and multilevel guidance of vision encoding. In the decoding stage, a multimodal decoding based on Joint Query is proposed, utilizing global features from both vision and language modalities, guiding the efficient operation of the decoding layers. These innovations achieve a more comprehensive fusion of multimodal information. Additionally, a contrastive learning strategy is introduced to align vision-language features in the semantic space, further enhancing the fusion effectiveness. Extensive experiments on multiple benchmarks such as LaSOT,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\rm{LaSOT_{ext}}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , TNL2K, and OTB99-Lang demonstrate that our approach outperforms existing state-of-the-art trackers.\n          <\/jats:p>","DOI":"10.1145\/3757322","type":"journal-article","created":{"date-parts":[[2025,8,5]],"date-time":"2025-08-05T15:22:40Z","timestamp":1754407360000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Language-guided Visual Tracking: Comprehensive and Effective Multimodal Information Fusion"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5847-9630","authenticated-orcid":false,"given":"Jianbo","family":"Song","sequence":"first","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1282-3755","authenticated-orcid":false,"given":"Hong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-5891-8674","authenticated-orcid":false,"given":"Yachun","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering and Automation, Beihang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8956-1245","authenticated-orcid":false,"given":"Hanyang","family":"Liu","sequence":"additional","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4237-5874","authenticated-orcid":false,"given":"Yifan","family":"Yang","sequence":"additional","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,14]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00879"},{"key":"e_1_3_1_4_2","first-page":"1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV)","author":"Cao Ziang","year":"2021","unstructured":"Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li. 2021. HiFT: Hierarchical feature transformer for aerial tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1\u201310."},{"key":"e_1_3_1_5_2","first-page":"14798","volume-title":"Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cao Ziang","year":"2022","unstructured":"Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, and Changhong Fu. 2022. TCTrack: Temporal contexts for aerial tracking. In Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14798\u201314808."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3307174"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Boyu Chen Peixia Li Lei Bai Lei Qiao Qiuhong Shen Bo Li Weihao Gan Wei Wu and Wanli Ouyang. 2022. Backbone is all your need: A simplified architecture for visual object tracking. arXiv:2203.05328. Retrieved from https:\/\/arxiv.org\/abs\/2203.05328","DOI":"10.1007\/978-3-031-20047-2_22"},{"key":"e_1_3_1_9_2","first-page":"1597","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning. PMLR, 1597\u20131607."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01400"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3557896"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01324"},{"key":"e_1_3_1_14_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_1_15_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-020-01387-y"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00552"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00579"},{"key":"e_1_3_1_19_2","first-page":"1","article-title":"Onboard Real-Time aerial tracking with efficient siamese anchor proposal network","author":"Fu Changhong","year":"2021","unstructured":"Changhong Fu, Ziang Cao, Yiming Li, Junjie Ye, and Chen Feng. 2021. Onboard Real-Time aerial tracking with efficient siamese anchor proposal network. IEEE Transactions on Geoscience and Remote Sensing 60 (2021), 1\u201313.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"e_1_3_1_20_2","first-page":"1","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)","author":"Fu Changhong","year":"2021","unstructured":"Changhong Fu, Ziang Cao, Yiming Li, Junjie Ye, and Chen Feng. 2021. Siamese anchor proposal network for high-speed aerial tracking. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1\u20137."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20047-2_9"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01792"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3360308"},{"key":"e_1_3_1_24_2","first-page":"4446","article-title":"Divert more attention to vision-language tracking","volume":"35","author":"Guo Mingzhe","year":"2022","unstructured":"Mingzhe Guo, Zhipeng Zhang, Heng Fan, and Liping Jing. 2022. Divert more attention to vision-language tracking. Advances in Neural Information Processing Systems 35 (2022), 4446\u20134460.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"e_1_3_1_26_2","first-page":"4904","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Jia Chao","year":"2021","unstructured":"Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 4904\u20134916."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3246792"},{"key":"e_1_3_1_28_2","first-page":"1051","volume-title":"Proceedings of the 37th AAAI Conference on Artificial Intelligence","author":"Jiang Zutao","year":"2023","unstructured":"Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei Zhang, Xiaojun Chang, and Hang Xu. 2023. 3d-Togo: Towards text-guided cross-category 3d object generation. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, 1051\u20131059."},{"key":"e_1_3_1_29_2","first-page":"5583","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Kim Wonjae","year":"2021","unstructured":"Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. Vilt: Vision-and-language transformer without convolution or region supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 5583\u20135594."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00935"},{"key":"e_1_3_1_31_2","first-page":"9694","article-title":"Align before fuse: Vision and language representation learning with momentum distillation","volume":"34","author":"Li Junnan","year":"2021","unstructured":"Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. 2021. Align before fuse: Vision and language representation learning with momentum distillation. Advances in Neural Information Processing Systems 34 (2021), 9694\u20139705.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_32_2","first-page":"4140","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence","author":"Li Siyi","year":"2017","unstructured":"Siyi Li and Dit-Yan Yeung. 2017. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, 4140\u20134146."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01194"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.777"},{"key":"e_1_3_1_35_2","first-page":"3441","volume-title":"Proceedings of the 38th AAAI Conference on Artificial Intelligence","author":"Lin Luoyang","year":"2024","unstructured":"Luoyang Lin, Zutao Jiang, Xiaodan Liang, Liqian Ma, Michael C. Kampffmeyer, and Xiaochun Cao. 2024. PTUS: Photo-realistic talking upper-body synthesis via 3D-aware motion decomposition warping. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 3441\u20133449."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02484"},{"key":"e_1_3_1_37_2","unstructured":"Haotian Liu Chunyuan Li Yuheng Li Bo Li Yuanhan Zhang Sheng Shen and Yong Jae Lee. 2024. LLaVA-NeXT: Improved reasoning OCR and world knowledge. Retrieved from https:\/\/llava-vl.github.io\/blog\/2024-01-30-llava-next\/"},{"key":"e_1_3_1_38_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv:1711.05101. Retrieved from https:\/\/arxiv.org\/abs\/1711.05101"},{"key":"e_1_3_1_39_2","unstructured":"Yinchao Ma Yuyang Tang Wenfei Yang Tianzhu Zhang Jinpeng Zhang and Mengxue Kang. 2024. Unifying visual and vision-language tracking via contrastive learning. arXiv:2401.11228. Retrieved from https:\/\/arxiv.org\/abs\/2401.11228"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.9"},{"key":"e_1_3_1_41_2","first-page":"920","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Miao Bo","year":"2023","unstructured":"Bo Miao, Mohammed Bennamoun, Yongsheng Gao, and Ajmal Mian. 2023. Spectrum-guided multi-granularity referring video object segmentation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), 920\u2013930."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_19"},{"key":"e_1_3_1_43_2","first-page":"8748","volume-title":"International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_1_44_2","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 5998\u20136008.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00162"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01355"},{"key":"e_1_3_1_47_2","first-page":"01","volume-title":"Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME)","author":"Wang Xucheng","year":"2022","unstructured":"Xucheng Wang, Dan Zeng, Qijun Zhao, and Shuiwang Li. 2022. Rank-based filter pruning for real-time uav tracking. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 01\u201306."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00935"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01406"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3497746"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00492"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2388226"},{"key":"e_1_3_1_53_2","first-page":"22826","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Xie Fei","year":"2023","unstructured":"Fei Xie, Lei Chu, Jiahao Li, Yan Lu, and Chao Ma. 2023. VideoTrack: Learning to track objects via video transformer. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 22826\u201322835."},{"key":"e_1_3_1_54_2","first-page":"12549","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence","author":"Xu Yinda","year":"2020","unstructured":"Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, and Gang Yu. 2020. Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, 12549\u201312556."},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01028"},{"key":"e_1_3_1_56_2","first-page":"15180","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yan Bin","year":"2021","unstructured":"Bin Yan, Houwen Peng, Kan Wu, Dong Wang, Jianlong Fu, and Huchuan Lu. 2021. Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 15180\u201315189."},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i6.28465"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00928"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01522"},{"key":"e_1_3_1_60_2","unstructured":"Xiaoyu Yang Lijian Xu Hao Sun Hongsheng Li and Shaoting Zhang. 2024. Enhancing visual grounding and generalization: A multi-task cycle training approach for vision-language models. arXiv:2311.12327. Retrieved from https:\/\/arxiv.org\/abs\/2311.12327"},{"key":"e_1_3_1_61_2","first-page":"3353","volume-title":"Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA)","author":"Yao Liangliang","year":"2023","unstructured":"Liangliang Yao, Changhong Fu, Sihang Li, Guangze Zheng, and Junjie Ye. 2023. SGDViT: Saliency-guided dynamic vision transformer for UAV tracking. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3353\u20133359."},{"key":"e_1_3_1_62_2","first-page":"341","volume-title":"Proceedings of the 17th European Conference on Computer Vision (ECCV \u201922)","author":"Ye Botao","year":"2022","unstructured":"Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2022. Joint feature learning and relation modeling for tracking: A one-stream framework. In Proceedings of the 17th European Conference on Computer Vision (ECCV \u201922), Part XXII. Springer, 341\u2013357."},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3486678"},{"issue":"7","key":"e_1_3_1_64_2","doi-asserted-by":"crossref","first-page":"9186","DOI":"10.1109\/TPAMI.2022.3232854","article-title":"WebUAV-3M: A benchmark for unveiling the power of million-scale deep UAV tracking","volume":"45","author":"Zhang Chunhui","year":"2022","unstructured":"Chunhui Zhang, Guanjie Huang, Li Liu, Shan Huang, Yinan Yang, Xiang Wan, Shiming Ge, and Dacheng Tao. 2022. WebUAV-3M: A benchmark for unveiling the power of million-scale deep UAV tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 7 (2022), 9186\u20139205.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_65_2","unstructured":"Haotian Zhang Haoxuan You Philipp Dufter Bowen Zhang Chen Chen Hong-You Chen Tsu-Jui Fu William Yang Wang Shih Fu Chang Zhe Gan and Yinfei Yang. 2024. Ferret-v2: An improved baseline for referring and grounding with large language models. arXiv:2404.07973. Retrieved from https:\/\/arxiv.org\/abs\/2404.07973"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01309"},{"key":"e_1_3_1_67_2","first-page":"771","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV \u201920)","author":"Zhang Zhipeng","year":"2020","unstructured":"Zhipeng Zhang, Houwen Peng, Jianlong Fu, Bing Li, and Weiming Hu. 2020. Ocean: Object-aware anchor-free tracking. In Proceedings of the 16th European Conference on Computer Vision (ECCV \u201920), Part XXI 16. Springer, 771\u2013787."},{"issue":"4","key":"e_1_3_1_68_2","doi-asserted-by":"crossref","first-page":"2125","DOI":"10.1109\/TCSVT.2023.3301933","article-title":"Towards unified token learning for vision-language tracking","volume":"34","author":"Zheng Yaozong","year":"2023","unstructured":"Yaozong Zheng, Bineng Zhong, Qihua Liang, Guorong Li, Rongrong Ji, and Xianxian Li. 2023. Towards unified token learning for vision-language tracking. IEEE Transactions on Circuits and Systems for Video Technology 34, 4 (2023), 2125\u20132135.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02217"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3757322","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T21:25:39Z","timestamp":1760477139000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3757322"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,14]]},"references-count":68,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10,31]]}},"alternative-id":["10.1145\/3757322"],"URL":"https:\/\/doi.org\/10.1145\/3757322","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,14]]},"assertion":[{"value":"2024-05-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}