{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,1]],"date-time":"2026-07-01T21:19:48Z","timestamp":1782940788691,"version":"3.54.5"},"reference-count":87,"publisher":"SAGE Publications","issue":"14","license":[{"start":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T00:00:00Z","timestamp":1748304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"},{"start":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T00:00:00Z","timestamp":1748304000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"Kakenhi Grant Aid B of the Japan Society Promotion of Science","award":["22H01695"],"award-info":[{"award-number":["22H01695"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>Seafloor surveys often gather multiple modes of remote sensed mapping and sampling data to infer kilo- to mega-hectare scale seafloor habitat distributions. However, efforts to extract information from multimodal data are complicated by inconsistencies between measurement modes (e.g., resolution, positional offsets, geometric distortions) and different acquisition periods for dynamically changing environments. In this study, we investigate the use of location information during multimodal feature learning and its impact on habitat classification. Experiments on multimodal datasets gathered from three Marine Protected Areas (MPAs) showed improved robustness and performance when using location-based regularisation terms compared to equivalent autoencoder-based and contrastive self-supervised feature learners. Location-guiding improved F1 scores by 7.7% for autoencoder-based and 28.8% for contrastive feature learners averaged across 78 experiments on datasets spanning three distinct sites and 18 data modes. Location-guiding enhances performance when combining multimodal data, increasing F1 scores by an average of 8.8% and 37.8% compared to the best-performing individual mode being combined for autoencoder-based and contrastive self-supervised models, respectively. Performance gains are maintained over a large range of location-guiding distance hyperparameters, where improvements of 5.3% and 29.4% are achieved on average over an order-of-magnitude range of hyperparameters for the autoencoder and contrastive learners, respectively, both comparing favourably with optimally tuned conditions. Location-guiding also exhibits robustness to position inconsistencies between combined data modes, still achieving an average of 3.0% and 30.4% increase in performance compared to equivalent feature learners without location regularisation when position offsets of up to 10\u00a0m are artificially introduced to the remote sensed data. Our results show that the classifier used to delineate the learned feature spaces has less impact on performance than the feature learner, with probabilistic classifiers averaging 3.4% higher F1 scores than non-probabilistic classifiers.<\/jats:p>","DOI":"10.1177\/02783649251343640","type":"journal-article","created":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T12:58:31Z","timestamp":1748350711000},"page":"2340-2364","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference"],"prefix":"10.1177","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8691-836X","authenticated-orcid":false,"given":"Cailei","family":"Liang","sequence":"first","affiliation":[{"name":"Centre for In Situ and Remote Intelligent Sensing, University of Southampton, Southampton, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8891-6915","authenticated-orcid":false,"given":"Jose","family":"Cappelletto","sequence":"additional","affiliation":[{"name":"Centre for In Situ and Remote Intelligent Sensing, University of Southampton, Southampton, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Miquel","family":"Massot-Campos","sequence":"additional","affiliation":[{"name":"Centre for In Situ and Remote Intelligent Sensing, University of Southampton, Southampton, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3195-0602","authenticated-orcid":false,"given":"Adrian","family":"Bodenmann","sequence":"additional","affiliation":[{"name":"Centre for In Situ and Remote Intelligent Sensing, University of Southampton, Southampton, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7135-6360","authenticated-orcid":false,"given":"Veerle AI","family":"Huvenne","sequence":"additional","affiliation":[{"name":"National Oceanography Centre"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Catherine","family":"Wardell","sequence":"additional","affiliation":[{"name":"National Oceanography Centre"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Brian J.","family":"Bett","sequence":"additional","affiliation":[{"name":"National Oceanography Centre"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Darryl","family":"Newborough","sequence":"additional","affiliation":[{"name":"Sonardyne International Ltd., Yateley, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Blair","family":"Thornton","sequence":"additional","affiliation":[{"name":"Centre for In Situ and Remote Intelligent Sensing, University of Southampton, Southampton, UK"},{"name":"Institute of Industrial Science, The University of Tokyo, Tokyo, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2025,5,27]]},"reference":[{"key":"e_1_3_5_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/11744023_32"},{"key":"e_1_3_5_3_1","doi-asserted-by":"publisher","DOI":"10.1111\/cobi.13312"},{"key":"e_1_3_5_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01007"},{"key":"e_1_3_5_5_1","unstructured":"Bijjahalli S Pizarro O Williams SB (2023) A semi-supervised object detection algorithm for underwater imagery. arXiv preprint arXiv:2306.04834."},{"key":"e_1_3_5_6_1","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21682"},{"key":"e_1_3_5_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00138-021-01249-8"},{"key":"e_1_3_5_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ecss.2011.02.007"},{"key":"e_1_3_5_9_1","doi-asserted-by":"crossref","unstructured":"Chaganti SY Nanda I Pandi KR et al. (2020) Image classification using SVM and CNN. In: 2020 international conference on computer science engineering and applications (ICCSEA) Gunupur India 13\u201314 March 2020 pp. 1\u20135. IEEE.","DOI":"10.1109\/ICCSEA49143.2020.9132851"},{"key":"e_1_3_5_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-39935-3_12"},{"key":"e_1_3_5_11_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_3_5_12_1","unstructured":"Chen T Kornblith S Norouzi M et al. (2020a) A simple framework for contrastive learning of visual representations. In: International conference on machine learning Austria Vienna July 12-18 pp. 1597\u20131607. PMLR."},{"key":"e_1_3_5_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2009.2014161"},{"key":"e_1_3_5_14_1","unstructured":"Chen X Fan H Girshick R et al. (2020b) Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297."},{"key":"e_1_3_5_15_1","doi-asserted-by":"publisher","DOI":"10.3723\/ut.29.117"},{"key":"e_1_3_5_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apacoust.2020.107728"},{"key":"e_1_3_5_17_1","doi-asserted-by":"crossref","unstructured":"Deng J Dong W Socher R et al. (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition Miami FL USA 20\u201325 June 2009 pp. 248\u2013255.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_5_18_1","unstructured":"Dogget M Northen K (2023) Studland Bay Marine Conservation Zone (MCZ): Subtidal Seagrass Monitoring Survey 2021. Technical Report NECR449 Natural England Commissioned Report."},{"key":"e_1_3_5_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/LGRS.2013.2261796"},{"key":"e_1_3_5_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0275-4"},{"issue":"1","key":"e_1_3_5_21_1","first-page":"71","article-title":"Seafloor classification based on combined multibeam bathymetry and backscatter using deep convolution neural network","volume":"50","author":"Fanlin Y","year":"2021","unstructured":"Fanlin Y, Zhengren Z, Jiabiao L, et al. (2021) Seafloor classification based on combined multibeam bathymetry and backscatter using deep convolution neural network. Acta Geodaetica et Cartographica Sinica 50(1): 71.","journal-title":"Acta Geodaetica et Cartographica Sinica"},{"key":"e_1_3_5_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.2972974"},{"key":"e_1_3_5_23_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.11192"},{"key":"e_1_3_5_24_1","doi-asserted-by":"crossref","unstructured":"Gadzicki K Khamsehashari R Zetzsche C (2020) Early vs late fusion in multimodal convolutional neural networks. In: 2020 IEEE 23rd international conference on information fusion (FUSION) Rustenburg South Africa 06\u201309 July 2020 pp. 1\u20136. IEEE.","DOI":"10.23919\/FUSION45008.2020.9190246"},{"key":"e_1_3_5_25_1","article-title":"Aerial images in studland bay","author":"Google","year":"2021","unstructured":"Google (2021) Aerial images in studland bay. Data in July 2021 is available in Google Earth Pro. https:\/\/earth.google.com\/web\/@50.65522539,-1.93786417,-0.00458829a,6187.85720405d,35y,-0.00020058h,37.26266728t,0.00089703r\/data=OgMKATA","journal-title":"Data in July 2021 is available in Google Earth Pro"},{"key":"e_1_3_5_26_1","unstructured":"Grandini M Bagli E Visani G (2020) Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756."},{"key":"e_1_3_5_27_1","doi-asserted-by":"publisher","DOI":"10.1029\/2023EA003220"},{"key":"e_1_3_5_28_1","doi-asserted-by":"crossref","unstructured":"Gunes H Piccardi M (2005) Affect recognition from face and body: early fusion vs. late fusion. In: 2005 IEEE international conference on systems man and cybernetics Waikoloa HI USA 12 October 2005 Vol. 4 pp. 3437\u20133443. IEEE.","DOI":"10.1109\/ICSMC.2005.1571679"},{"key":"e_1_3_5_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2014.12.009"},{"key":"e_1_3_5_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3152247"},{"key":"e_1_3_5_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1973.4309314"},{"key":"e_1_3_5_32_1","doi-asserted-by":"crossref","unstructured":"Hasib KM Iqbal MS Shah FM et al. (2020) A survey of methods for managing the classification and solution of data imbalance problem. arXiv preprint arXiv:2012.11870.","DOI":"10.3844\/jcssp.2020.1546.1557"},{"key":"e_1_3_5_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.239"},{"key":"e_1_3_5_34_1","doi-asserted-by":"crossref","unstructured":"He K Zhang X Ren S et al. (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition Las Vegas NV USA 27\u201330 June 2016 pp. 770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"issue":"1106","key":"e_1_3_5_35_1","first-page":"1","article-title":"ImageNet classification with deep convolutional neural networks","volume":"25","author":"Hinton GE","year":"2012","unstructured":"Hinton GE, Krizhevsky A, Sutskever I (2012) ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25(1106\u20131114): 1.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_5_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2020.3016820"},{"key":"e_1_3_5_37_1","unstructured":"Howard AG Zhu M Chen B et al. (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861."},{"key":"e_1_3_5_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2011.01.064"},{"key":"e_1_3_5_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.biocon.2016.05.030"},{"key":"e_1_3_5_40_1","unstructured":"Jain U Wilson A Gulshan V (2022) Multimodal contrastive learning for remote sensing tasks."},{"key":"e_1_3_5_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ecoinf.2023.102185"},{"key":"e_1_3_5_42_1","doi-asserted-by":"crossref","unstructured":"Karpathy A Toderici G Shetty S et al. (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition Columbus OH USA 23\u201328 June 2014 pp. 1725\u20131732.","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_5_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-5347(98)01533-X"},{"key":"e_1_3_5_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2725580"},{"key":"e_1_3_5_45_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-018-0151-6"},{"key":"e_1_3_5_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_5_47_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_3_5_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/JOE.2017.2786878"},{"key":"e_1_3_5_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dsr.2011.05.006"},{"key":"#cr-split#-e_1_3_5_50_1.1","doi-asserted-by":"crossref","unstructured":"Massot-Campos M Yamada T Walker-Rouse B et al. (2023) Shallow water seagrass survey at studland bay with the AUV Smarty200. In: 2023 IEEE Underwater Technology","DOI":"10.1109\/UT49729.2023.10103389"},{"key":"#cr-split#-e_1_3_5_50_1.2","unstructured":"(UT) Tokyo Japan 6-9 March 2023 pp. 1-5. IEEE."},{"key":"e_1_3_5_51_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129065722500496"},{"key":"e_1_3_5_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/JOE.2020.2978967"},{"key":"e_1_3_5_53_1","doi-asserted-by":"publisher","DOI":"10.1117\/12.898652"},{"key":"e_1_3_5_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2002.1017623"},{"key":"e_1_3_5_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/JOE.2013.2278891"},{"key":"e_1_3_5_56_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs12183054"},{"key":"e_1_3_5_57_1","doi-asserted-by":"publisher","DOI":"10.5194\/amt-8-4699-2015"},{"key":"e_1_3_5_58_1","doi-asserted-by":"crossref","unstructured":"Preciado-Grijalva A Wehbe B Firvida MB et al. (2022) Self-supervised learning for sonar image classification. Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. June 19-23 New Orleans Louisiana USA 1499\u20131508.","DOI":"10.1109\/CVPRW56347.2022.00156"},{"key":"e_1_3_5_59_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs14030480"},{"key":"e_1_3_5_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00338-019-01802-y"},{"key":"e_1_3_5_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2009.09.019"},{"key":"e_1_3_5_62_1","unstructured":"Ramachandram D Lisicki M Shields TJ et al. (2017) Structure optimization for deep multimodal fusion networks using graph-induced kernels. arXiv preprint arXiv:1707.00750."},{"key":"e_1_3_5_63_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364916679892"},{"key":"e_1_3_5_64_1","unstructured":"Samuli L Timo A (2017) Temporal ensembling for semi-supervised learning. International conference on learning representations (ICLR). April 24-26 Toulon France. 4: 6."},{"key":"e_1_3_5_65_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs10040523"},{"key":"e_1_3_5_66_1","doi-asserted-by":"crossref","unstructured":"Shields J Pizarro O Williams SB (2020) Towards adaptive benthic habitat mapping. In: 2020 IEEE international conference on robotics and automation (ICRA) Paris France 31 May 2020\u201331 August 2020 pp. 9263\u20139270.","DOI":"10.1109\/ICRA40945.2020.9196811"},{"key":"e_1_3_5_67_1","unstructured":"Simonyan K Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556."},{"key":"e_1_3_5_68_1","doi-asserted-by":"publisher","DOI":"10.5120\/ijca2015906677"},{"key":"e_1_3_5_69_1","doi-asserted-by":"publisher","DOI":"10.12928\/telkomnika.v14i4.3956"},{"key":"e_1_3_5_70_1","doi-asserted-by":"publisher","DOI":"10.1364\/OE.470878"},{"key":"e_1_3_5_71_1","unstructured":"Tan M Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri K Salakhutdinov R (eds). Proceedings of the 36th International Conference on Machine Learning Proceedings of Machine Learning Research. Long Beach California USA: PMLR Vol. 97 6105\u20136114."},{"key":"e_1_3_5_72_1","doi-asserted-by":"publisher","DOI":"10.5670\/oceanog.2021.supplement.02-34"},{"key":"e_1_3_5_73_1","doi-asserted-by":"publisher","DOI":"10.2307\/143141"},{"key":"e_1_3_5_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.oregeorev.2016.09.032"},{"key":"e_1_3_5_75_1","unstructured":"Verfaillie E Van Lancker V (2008) Mapping european seabed habitats the mesh project as a case study."},{"key":"e_1_3_5_76_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.margeo.2015.12.001"},{"key":"e_1_3_5_77_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-016-0043-6"},{"key":"e_1_3_5_78_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(90)90086-O"},{"key":"e_1_3_5_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2017.2776228"},{"key":"e_1_3_5_80_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.margeo.2014.03.012"},{"key":"e_1_3_5_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3101881"},{"key":"e_1_3_5_82_1","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21961"},{"key":"e_1_3_5_83_1","doi-asserted-by":"publisher","DOI":"10.55417\/fr.2022037"},{"key":"e_1_3_5_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3140060"},{"key":"e_1_3_5_85_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11119-022-09954-8"},{"key":"e_1_3_5_86_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs12101572"},{"key":"e_1_3_5_87_1","unstructured":"Zhou J Wei C Wang H et al. (2021) ibot: image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832."}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649251343640","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/02783649251343640","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649251343640","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:17:29Z","timestamp":1777457849000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/02783649251343640"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,27]]},"references-count":87,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1177\/02783649251343640"],"URL":"https:\/\/doi.org\/10.1177\/02783649251343640","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,27]]}}}