{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T11:09:11Z","timestamp":1775819351337,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2021,12,20]],"date-time":"2021-12-20T00:00:00Z","timestamp":1639958400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["2018XKQYMS27"],"award-info":[{"award-number":["2018XKQYMS27"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>\n            With the rapid development of sensor technology, lots of remote sensing data have been collected. It effectively obtains good semantic segmentation performance by extracting feature maps based on multi-modal remote sensing images since extra modal data provides more information. How to make full use of multi-model remote sensing data for semantic segmentation is challenging. Toward this end, we propose a new network called\n            <jats:bold>\n              Multi-Stage Fusion and Multi-Source Attention Network ((MS)\n              <jats:sup>2<\/jats:sup>\n              -Net)\n            <\/jats:bold>\n            for multi-modal remote sensing data segmentation. The multi-stage fusion module fuses complementary information after calibrating the deviation information by filtering the noise from the multi-modal data. Besides, similar feature points are aggregated by the proposed multi-source attention for enhancing the discriminability of features with different modalities. The proposed model is evaluated on publicly available multi-modal remote sensing data sets, and results demonstrate the effectiveness of the proposed method.\n          <\/jats:p>","DOI":"10.1145\/3484440","type":"journal-article","created":{"date-parts":[[2021,12,20]],"date-time":"2021-12-20T17:29:58Z","timestamp":1640021398000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":33,"title":["Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation"],"prefix":"10.1145","volume":"12","author":[{"given":"Jiaqi","family":"Zhao","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, China University of Mining and Technology, Engineering Research Center of Mine Digitization, Ministry of Education of the Peoples Republic of China, Xuzhou, Jiangsu, China"}]},{"given":"Yong","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, China University of Mining and Technology, Engineering Research Center of Mine Digitization, Ministry of Education of the Peoples Republic of China, Xuzhou, Jiangsu, China"}]},{"given":"Boyu","family":"Shi","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, China University of Mining and Technology, Engineering Research Center of Mine Digitization, Ministry of Education of the Peoples Republic of China, Xuzhou, Jiangsu, China"}]},{"given":"Jingsong","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, China University of Mining and Technology, Engineering Research Center of Mine Digitization, Ministry of Education of the Peoples Republic of China, Xuzhou, Jiangsu, China"}]},{"given":"Di","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, China University of Mining and Technology, Engineering Research Center of Mine Digitization, Ministry of Education of the Peoples Republic of China, Xuzhou, Jiangsu, China"}]},{"given":"Rui","family":"Yao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, China University of Mining and Technology, Engineering Research Center of Mine Digitization, Ministry of Education of the Peoples Republic of China, Xuzhou, Jiangsu, China"}]}],"member":"320","published-online":{"date-parts":[[2021,12,20]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-018-3395-3"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.312"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2959253"},{"key":"e_1_3_2_5_2","article-title":"Employing bilinear fusion and saliency prior information for RGB-D salient object detection","author":"Huang Nianchang","year":"2021","unstructured":"Nianchang Huang, Yang Yang, Dingwen Zhang, Qiang Zhang, and Jungong Han. 2021. Employing bilinear fusion and saliency prior information for RGB-D salient object detection. IEEE Transactions on Multimedia (2021).","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_6_2","article-title":"Joint cross-modal and unimodal features for RGB-D salient object detection","author":"Huang Nianchang","year":"2020","unstructured":"Nianchang Huang, Yi Liu, Qiang Zhang, and Jungong Han. 2020. Joint cross-modal and unimodal features for RGB-D salient object detection. IEEE Transactions on Multimedia (2020).","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMI.2018.2791721"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1002\/9781119616016.ch22"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_2_10_2","article-title":"Indoor scene segmentation algorithm based on full convolutional neural network","author":"Zhu Zijiang","year":"2020","unstructured":"Zijiang Zhu, Deming Li, Yi Hu, Junshan Li, Dong Liu, and Jianjun Li. 2020. Indoor scene segmentation algorithm based on full convolutional neural network. Neural Computing and Applications (2020).","journal-title":"Neural Computing and Applications"},{"key":"e_1_3_2_11_2","first-page":"1","article-title":"A new spatio-temporal background\u2013foreground bimodal for motion segmentation and detection in urban traffic scenes","author":"Abdulrahim Khairi","year":"2019","unstructured":"Khairi Abdulrahim, Kamaruzzaman Seman, Rosalina Abdul Salam, et al. 2019. A new spatio-temporal background\u2013foreground bimodal for motion segmentation and detection in urban traffic scenes. Neural Computing and Applications (2019), 1\u201317.","journal-title":"Neural Computing and Applications"},{"key":"e_1_3_2_12_2","first-page":"1","article-title":"Multi-deep features fusion for high-resolution remote sensing image scene classification","author":"Yuan Baohua","year":"2020","unstructured":"Baohua Yuan, Lixin Han, Xiangping Gu, and Hong Yan. 2020. Multi-deep features fusion for high-resolution remote sensing image scene classification. Neural Computing and Applications (2020), 1\u201317.","journal-title":"Neural Computing and Applications"},{"key":"e_1_3_2_13_2","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Zhou Hao","year":"2020","unstructured":"Hao Zhou, Lu Qi, Zhaoliang Wan, Hai Huang, and Xu Yang. 2020. RGB-D Co-attention network for semantic segmentation. In Proceedings of the Asian Conference on Computer Vision."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2020.101131"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.161"},{"key":"e_1_3_2_16_2","article-title":"Indoor semantic segmentation using depth information","author":"Couprie Camille","year":"2013","unstructured":"Camille Couprie, Cl\u00e9ment Farabet, Laurent Najman, and Yann LeCun. 2013. Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572 (2013).","journal-title":"arXiv preprint arXiv:1301.3572"},{"key":"e_1_3_2_17_2","first-page":"213","volume-title":"Asian Conference on Computer Vision","author":"Hazirbas Caner","year":"2016","unstructured":"Caner Hazirbas, Lingni Ma, Csaba Domokos, and Daniel Cremers. 2016. Fusenet: Incorporating depth into semantic segmentation via fusion-based CNN architecture. In Asian Conference on Computer Vision. Springer, 213\u2013228."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3384675"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3361741"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3003914"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2016.2598228"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CCISP51026.2020.9273497"},{"key":"e_1_3_2_23_2","first-page":"1","article-title":"A structured support vector machine for hyperspectral satellite image segmentation and classification based on modified swarm optimization approach","author":"Manju S.","year":"2019","unstructured":"S. Manju and K. Helenprabha. 2019. A structured support vector machine for hyperspectral satellite image segmentation and classification based on modified swarm optimization approach. Journal of Ambient Intelligence and Humanized Computing (2019), 1\u201310.","journal-title":"Journal of Ambient Intelligence and Humanized Computing"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.35940\/ijitee.K1596.129219"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2644615"},{"key":"e_1_3_2_28_2","article-title":"Semantic image segmentation with deep convolutional nets and fully connected CRFs","author":"Chen Liang-Chieh","year":"2014","unstructured":"Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062 (2014).","journal-title":"arXiv preprint arXiv:1412.7062"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2699184"},{"key":"e_1_3_2_30_2","article-title":"Rethinking atrous convolution for semantic image segmentation","author":"Chen Liang-Chieh","year":"2017","unstructured":"Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).","journal-title":"arXiv preprint arXiv:1706.05587"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs11010083"},{"key":"e_1_3_2_34_2","first-page":"2595","volume-title":"IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium","author":"Liu Siyu","year":"2020","unstructured":"Siyu Liu, Changtao He, Haiwei Bai, Yijie Zhang, and Jian Cheng. 2020. Light-weight attention semantic segmentation network for high-resolution remote sensing images. In IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2595\u20132598."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2016.90"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.304"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2016.2532927"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045167"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2019.100981"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00464"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2019.00167"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.189"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01228-1_26"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/LGRS.2021.3079925"},{"key":"e_1_3_2_45_2","first-page":"4980","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Park Seong-Jin","year":"2017","unstructured":"Seong-Jin Park, Ki-Sang Hong, and Seungyong Lee. 2017. RDFnet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 4980\u20134989."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8803025"},{"key":"e_1_3_2_47_2","article-title":"Efficient RGB-D semantic segmentation for indoor scene analysis","author":"Seichter Daniel","year":"2020","unstructured":"Daniel Seichter, Mona K\u00f6hler, Benjamin Lewandowski, Tim Wengefeld, and Horst-Michael Gross. 2020. Efficient RGB-D semantic segmentation for indoor scene analysis. arXiv preprint arXiv:2011.06961 (2020).","journal-title":"arXiv preprint arXiv:2011.06961"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3484440","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3484440","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:14Z","timestamp":1750191434000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3484440"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,20]]},"references-count":46,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3484440"],"URL":"https:\/\/doi.org\/10.1145\/3484440","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,20]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}