{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T14:07:15Z","timestamp":1780063635161,"version":"3.54.0"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"10","license":[{"start":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T00:00:00Z","timestamp":1730246400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U20A20196, 61906168, and 62201400"],"award-info":[{"award-number":["U20A20196, 61906168, and 62201400"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Zhejiang Provincial Natural Science Foundation of China","award":["LY23F020023 and LZ23F020001"],"award-info":[{"award-number":["LY23F020023 and LZ23F020001"]}]},{"name":"Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects","award":["2022SDSJ01"],"award-info":[{"award-number":["2022SDSJ01"]}]},{"DOI":"10.13039\/501100018553","name":"Project of Science and Technology Plans of Wenzhou City","doi-asserted-by":"crossref","award":["H20210001"],"award-info":[{"award-number":["H20210001"]}],"id":[{"id":"10.13039\/501100018553","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Hangzhou AI Major Scientific and Technological Innovation Project","award":["2022AIZD0061"],"award-info":[{"award-number":["2022AIZD0061"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,10,31]]},"abstract":"<jats:p>\n            In recent years, one-stage HOI (Human\u2013Object Interaction) detection methods tend to divide the original task into multiple sub-tasks by using a multi-branch network structure. However, there is no sufficient attention to information communication between these branches. The inference approach in the cascaded structure is singular, while fully parallel methods will disrupt the associations between different pieces of information. Besides, noise interference may occur during the fusion of different features and thus affect the detection performance. To address these issues, this article proposes a one-stage three-branch parallel HOI detection method, which treats HOI as three separate sub-tasks (human detection, object detection, and interaction detection) and leverages three distinct reasoning relationships to generate richer relational information.\n            <jats:italic>Firstly<\/jats:italic>\n            , an auxiliary feature fusion (AFF) module is introduced, which integrates features originally extracted independently to form fused features enriched with supplementary information. This approach strengthens communication between branches in the network while handling the three sub-tasks concurrently, thereby facilitating the exchange of more contextual information.\n            <jats:italic>Secondly<\/jats:italic>\n            , to mitigate noise interference generated during the fusion process, a fusion noise suppression (FNS) module is introduced, which effectively suppresses noise and enhances the model\u2019s performance in interaction detection tasks.\n            <jats:italic>Finally<\/jats:italic>\n            , experiments are conducted on two major benchmark datasets, and experimental results show that our HOI detection method is superior to previous methods. Also, ablation studies confirm the effectiveness of all the components in our proposed method.\n          <\/jats:p>","DOI":"10.1145\/3674980","type":"journal-article","created":{"date-parts":[[2024,6,27]],"date-time":"2024-06-27T19:53:12Z","timestamp":1719517992000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Auxiliary Feature Fusion and Noise Suppression for HOI Detection"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8916-1174","authenticated-orcid":false,"given":"Sixian","family":"Chan","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China and Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, College of Computer and Information, China Three Gorges University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0529-3593","authenticated-orcid":false,"given":"Xianpeng","family":"Zeng","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-1188-900X","authenticated-orcid":false,"given":"Xinhua","family":"Wang","sequence":"additional","affiliation":[{"name":"Hangzhou GANX Science &amp; Technology Co., Ltd, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3296-5459","authenticated-orcid":false,"given":"Jie","family":"Hu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Informatics for Safety &amp; Emergency of Zhejiang Province, Wenzhou University, Wenzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6177-3862","authenticated-orcid":false,"given":"Cong","family":"Bai","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,10,30]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160329"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00048"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551626.3564944"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00889"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58577-8_7"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2023.110021"},{"key":"e_1_3_1_9_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=YicbFdNTTy"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3307896"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611735"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58610-2_41"},{"key":"e_1_3_1_13_2","first-page":"41","volume-title":"Proceedings of the British Machine Vision Conference 2018 (BMVC\u201918)","author":"Gao Chen","year":"2018","unstructured":"Chen Gao, Yuliang Zou, and Jia-Bin Huang. 2018. iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection. In Proceedings of the British Machine Vision Conference 2018 (BMVC\u201918). BMVA Press, 41. Retrieved from http:\/\/bmvc2018.org\/contents\/papers\/0017.pdf"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00872"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-44223-0_35"},{"key":"e_1_3_1_16_2","unstructured":"Saurabh Gupta and Jitendra Malik. 2015. Visual Semantic Role Labeling. arXiv:1505.04474. Retrieved from http:\/\/arxiv.org\/abs\/1505.04474"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01568"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00528"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00014"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01897"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00370"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3617596"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00056"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01949"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3409388"},{"key":"e_1_3_1_27_2","first-page":"13","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (NeurIPS\u201919)","author":"Lu Jiasen","year":"2019","unstructured":"Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (NeurIPS\u201919). Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.), Curran Associates, Inc, 13\u201323. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/c74d97b01eae257e44aa9d5bade97baf-Abstract.html"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02251"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00109"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01645"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01240-3_25"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01895"},{"key":"e_1_3_1_33_2","first-page":"8748","volume-title":"Proceedings of the 38th International Conference on Machine Learning (ICML\u201921)","volume":"139","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML\u201921), Vol. 139. Marina Meila and Tong Zhang (Eds.), PMLR, 8748\u20138763. DOI: http:\/\/proceedings.mlr.press\/v139\/radford21a.html"},{"key":"e_1_3_1_34_2","first-page":"91","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015a. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015. Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.), IEEE Computer Society. 91\u201399. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2015\/hash\/14bfa6bb14875e45bba028a21ed38046-Abstract.html"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00075"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00015"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00181"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3615868"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01027"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19772-7_6"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01363"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00956"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_15"},{"key":"e_1_3_1_44_2","unstructured":"Hangjie Yuan Jianwen Jiang Samuel Albanie Tao Feng Ziyuan Huang Dong Ni and Mingqian Tang. 2022. RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection. In NeurIPS. Retrieved from http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/f37347375d8b54e3203e5d24aeb6c58c-Abstract-Conference.html"},{"key":"e_1_3_1_45_2","first-page":"17209","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS\u201921)","author":"Zhang Aixi","year":"2021","unstructured":"Aixi Zhang, Yue Liao, Si Liu, Miao Lu, Yongliang Wang, Chen Gao, and Xiaobo Li. 2021b. Mining the Benefits of Two-stage and One-stage HOI Detection. In Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 (NeurIPS\u201921). Marc\u2019Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.), Curran Associates, Inc. 17209\u201317220. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/8f1d43620bc6bb580df6e80b0dc05c48-Abstract.html"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10096029"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01307"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01947"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01894"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01858"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19812-0_26"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01896"},{"key":"e_1_3_1_53_2","volume-title":"IEEE Transactions on Neural Networks and Learning Systems","author":"Zong Daoming","year":"2023","unstructured":"Daoming Zong and Shiliang Sun. 2023. Zero-Shot Human\u2013Object Interaction Detection via Similarity Propagation. IEEE Transactions on Neural Networks and Learning Systems (2023)."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01165"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674980","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674980","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:05:56Z","timestamp":1750291556000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674980"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,30]]},"references-count":53,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,31]]}},"alternative-id":["10.1145\/3674980"],"URL":"https:\/\/doi.org\/10.1145\/3674980","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,30]]},"assertion":[{"value":"2023-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-20","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}