{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T11:52:38Z","timestamp":1775217158636,"version":"3.50.1"},"reference-count":44,"publisher":"Wiley","issue":"4","license":[{"start":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T00:00:00Z","timestamp":1725494400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Journal of Field Robotics"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>ABSTRACT<\/jats:title><jats:p>Underwater object detection serves as a crucial means for autonomous underwater vehicles (AUVs) to gain awareness of their surroundings. Currently, AUVs predominantly depend on underwater optical cameras or sonar sensing techniques to furnish vital information sources for subsequent tasks such as underwater rescue and mining exploration. However, the influence of underwater light attenuation or significant background noise often leads to the failure of either the acoustic or optical sensor. Consequently, the traditional single\u2010modal object detection network, which relies exclusively on either the optical or acoustic modality, struggles to adapt to the varying complexities of underwater environments. To address this challenge, this paper proposes a novel underwater acoustic\u2010optical fusion\u2010based underwater multi\u2010modal object detection paradigm termed\u00a0UAMFDet, which fuses highly misaligned acoustic\u2010optical features in the spatial dimension at both the fine\u2010grained level and the instance level. First, we propose a multi\u2010modal deformable self\u2010aligned feature fusion module to adaptively capture feature dependencies between multi\u2010modal targets, and perform self\u2010aligned multi\u2010modal fine\u2010grained feature fusion by differential fusion. Then a multi\u2010modal instance\u2010level feature matching network is designed. It matches multi\u2010modal instance features by a lightweight cross\u2010attention mechanism and performs differential fusion to achieve instance\u2010level feature fusion. In addition, we establish a data set dedicated to underwater acoustic\u2010optical fusion object detection tasks called UAOF, and conduct a large number of experiments on the UAOF data set to verify the effectiveness of UAMFDet.<\/jats:p>","DOI":"10.1002\/rob.22432","type":"journal-article","created":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T13:27:45Z","timestamp":1725542865000},"page":"970-983","update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["UAMFDet: Acoustic\u2010Optical Fusion for Underwater Multi\u2010Modal Object Detection"],"prefix":"10.1002","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-0489-0651","authenticated-orcid":false,"given":"Haojie","family":"Chen","sequence":"first","affiliation":[{"name":"National Key Laboratory of Autonomous Marine Vehicle Technology Harbin Engineering University Harbin Heilongjiang China"}]},{"given":"Zhuo","family":"Wang","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Autonomous Marine Vehicle Technology Harbin Engineering University Harbin Heilongjiang China"}]},{"given":"Hongde","family":"Qin","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Autonomous Marine Vehicle Technology Harbin Engineering University Harbin Heilongjiang China"}]},{"given":"Xiaokai","family":"Mu","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Autonomous Marine Vehicle Technology Harbin Engineering University Harbin Heilongjiang China"},{"name":"Qingdao Innovation and Development Center Harbin Engineering University Qingdao Shandong China"}]}],"member":"311","published-online":{"date-parts":[[2024,9,5]]},"reference":[{"key":"e_1_2_8_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01656"},{"key":"e_1_2_8_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00644"},{"key":"e_1_2_8_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_2_8_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2022.3158668"},{"key":"e_1_2_8_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.108926"},{"key":"e_1_2_8_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58621-8_33"},{"key":"e_1_2_8_8_1","first-page":"628","volume-title":"European Conference on Computer Vision","author":"Chen Z.","year":"2022"},{"key":"e_1_2_8_9_1","first-page":"2988","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Dai X.","year":"2021"},{"key":"e_1_2_8_10_1","first-page":"1601","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Dai Z.","year":"2021"},{"key":"e_1_2_8_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2023.3318629"},{"key":"e_1_2_8_12_1","unstructured":"Ge Z. S.Liu F.Wang Z.Li andJ.Sun.2021. \u201cYolox: Exceeding Yolo Series in 2021.\u201darXiv preprint arXiv:2107.08430."},{"key":"e_1_2_8_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_2_8_14_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TIM.2023.3267529","article-title":"SFAF\u2010MA: Spatial Feature Aggregation and Fusion With Modality Adaptation for RGB\u2010Thermal Semantic Segmentation","volume":"72","author":"He X.","year":"2023","journal-title":"IEEE Transactions on Instrumentation and Measurement"},{"key":"e_1_2_8_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8803025"},{"key":"e_1_2_8_16_1","unstructured":"Jiang J. L.Zheng F.Luo andZ.Zhang.2018. \u201cRedNet: Residual Encoder\u2010Decoder Network for Indoor RGB\u2010D Semantic Segmentation.\u201darXiv preprint arXiv:1806.01054."},{"key":"e_1_2_8_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01667"},{"key":"e_1_2_8_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2023.3272269"},{"key":"e_1_2_8_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_2_8_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01247-4"},{"issue":"7","key":"e_1_2_8_21_1","first-page":"8324","article-title":"EPNet++: Cascade Bi\u2010Directional Fusion for Multi\u2010Modal 3D Object Detection","volume":"45","author":"Liu Z.","year":"2023","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_2_8_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3349072"},{"key":"e_1_2_8_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00363"},{"key":"e_1_2_8_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-021-10025-z"},{"key":"e_1_2_8_25_1","unstructured":"Redmon J. andA.Farhadi.2018. \u201cYolov3: An Incremental Improvement.\u201darXiv preprint arXiv:1804.02767."},{"key":"e_1_2_8_26_1","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21999"},{"key":"e_1_2_8_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_2_8_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.01.088"},{"key":"e_1_2_8_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/JOE.2019.2950974"},{"key":"e_1_2_8_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3176540"},{"key":"e_1_2_8_31_1","first-page":"13029","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Wang C.\u2010Y.","year":"2021"},{"key":"e_1_2_8_32_1","first-page":"7464","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Wang C.\u2010Y.","year":"2023"},{"key":"e_1_2_8_33_1","first-page":"135","volume-title":"Proceedings of the European conference on computer vision (ECCV)","author":"Wang W.","year":"2018"},{"key":"e_1_2_8_34_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2022.3224815","article-title":"MLFFNet: Multilevel Feature Fusion Network for Object Detection in Sonar Images","volume":"60","author":"Wang Z.","year":"2022","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"e_1_2_8_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.01.056"},{"key":"e_1_2_8_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3072414"},{"key":"e_1_2_8_37_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs13183555"},{"key":"e_1_2_8_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2022.3210839"},{"key":"e_1_2_8_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58555-6_16"},{"key":"e_1_2_8_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2023.3300537"},{"key":"e_1_2_8_41_1","first-page":"1","article-title":"Weakly Aligned Feature Fusion for Multimodal Object Detection","author":"Zhang L.","year":"2021","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_2_8_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00266"},{"key":"e_1_2_8_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIV.2023.3268051"},{"key":"e_1_2_8_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3074738"},{"key":"e_1_2_8_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3109518"}],"container-title":["Journal of Field Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/rob.22432","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T13:02:11Z","timestamp":1747314131000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/rob.22432"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,5]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["10.1002\/rob.22432"],"URL":"https:\/\/doi.org\/10.1002\/rob.22432","archive":["Portico"],"relation":{},"ISSN":["1556-4959","1556-4967"],"issn-type":[{"value":"1556-4959","type":"print"},{"value":"1556-4967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,5]]},"assertion":[{"value":"2024-06-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-05","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}