{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T13:54:32Z","timestamp":1770990872092,"version":"3.50.1"},"reference-count":50,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2022,7,13]],"date-time":"2022-07-13T00:00:00Z","timestamp":1657670400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62001063"],"award-info":[{"award-number":["62001063"]}]},{"name":"National Natural Science Foundation of China","award":["U20A20157"],"award-info":[{"award-number":["U20A20157"]}]},{"name":"National Natural Science Foundation of China","award":["U2133211"],"award-info":[{"award-number":["U2133211"]}]},{"name":"National Natural Science Foundation of China","award":["2020M673135"],"award-info":[{"award-number":["2020M673135"]}]},{"name":"National Natural Science Foundation of China","award":["XmT2020050"],"award-info":[{"award-number":["XmT2020050"]}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["62001063"],"award-info":[{"award-number":["62001063"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["U20A20157"],"award-info":[{"award-number":["U20A20157"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["U2133211"],"award-info":[{"award-number":["U2133211"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["2020M673135"],"award-info":[{"award-number":["2020M673135"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["XmT2020050"],"award-info":[{"award-number":["XmT2020050"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Chongqing Postdoctoral Research Program","award":["62001063"],"award-info":[{"award-number":["62001063"]}]},{"name":"Chongqing Postdoctoral Research Program","award":["U20A20157"],"award-info":[{"award-number":["U20A20157"]}]},{"name":"Chongqing Postdoctoral Research Program","award":["U2133211"],"award-info":[{"award-number":["U2133211"]}]},{"name":"Chongqing Postdoctoral Research Program","award":["2020M673135"],"award-info":[{"award-number":["2020M673135"]}]},{"name":"Chongqing Postdoctoral Research Program","award":["XmT2020050"],"award-info":[{"award-number":["XmT2020050"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Multi-spectral semantic segmentation has shown great advantages under poor illumination conditions, especially for remote scene understanding of autonomous vehicles, since the thermal image can provide complementary information for RGB image. However, methods to fuse the information from RGB image and thermal image are still under-explored. In this paper, we propose a simple but effective module, add\u2013multiply fusion (AMFuse) for RGB and thermal information fusion, consisting of two simple math operations\u2014addition and multiplication. The addition operation focuses on extracting cross-modal complementary features, while the multiplication operation concentrates on the cross-modal common features. Moreover, the attention module and atrous spatial pyramid pooling (ASPP) modules are also incorporated into our proposed AMFuse modules, to enhance the multi-scale context information. Finally, in the UNet-style encoder\u2013decoder framework, the ResNet model is adopted as the encoder. As for the decoder part, the multi-scale information obtained from our proposed AMFuse modules is hierarchically merged layer-by-layer to restore the feature map resolution for semantic segmentation. The experiments of RGBT multi-spectral semantic segmentation and salient object detection demonstrate the effectiveness of our proposed AMFuse module for fusing the RGB and thermal information.<\/jats:p>","DOI":"10.3390\/rs14143368","type":"journal-article","created":{"date-parts":[[2022,7,14]],"date-time":"2022-07-14T00:12:40Z","timestamp":1657757560000},"page":"3368","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["AMFuse: Add\u2013Multiply-Based Cross-Modal Fusion Network for Multi-Spectral Semantic Segmentation"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5782-4543","authenticated-orcid":false,"given":"Haijun","family":"Liu","sequence":"first","affiliation":[{"name":"School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China"}]},{"given":"Fenglei","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China"}]},{"given":"Zhihong","family":"Zeng","sequence":"additional","affiliation":[{"name":"School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China"}]},{"given":"Xiaoheng","family":"Tan","sequence":"additional","affiliation":[{"name":"School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Chen, B., Xia, M., and Huang, J. (2021). MFANet: A Multi-Level Feature Aggregation Network for Semantic Segmentation of Land Cover. Remote Sens., 13.","DOI":"10.3390\/rs13040731"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Chen, F., Liu, H., Zeng, Z., Zhou, X., and Tan, X. (2022). BES-Net: Boundary Enhancing Semantic Context Network for High-Resolution Image Semantic Segmentation. Remote Sens., 14.","DOI":"10.3390\/rs14071638"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Liu, L., Cao, J., Liu, M., Guo, Y., Chen, Q., and Tan, M. (2020, January 12\u201316). Dynamic Extension Nets for Few-shot Semantic Segmentation. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413915"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2930","DOI":"10.1109\/TMM.2019.2914870","article-title":"Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation","volume":"21","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Multimed."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Gu, Z., Zhou, S., Niu, L., Zhao, Z., and Zhang, L. (2020, January 12\u201316). Context-aware Feature Generation For Zero-shot Semantic Segmentation. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413593"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"115764","DOI":"10.1016\/j.image.2019.115764","article-title":"Convolutional neural networks for multispectral pedestrian detection","volume":"82","author":"Ding","year":"2020","journal-title":"Signal Process. Image Commun."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., and Liu, Z. (2019, January 27\u201328). Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00523"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"4414","DOI":"10.1109\/TMM.2020.3042080","article-title":"Parameter Sharing Exploration and Hetero-center Triplet Loss for Visible-Thermal Person Re-Identification","volume":"23","author":"Liu","year":"2020","journal-title":"IEEE Trans. Multimed."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1109\/LSP.2021.3065903","article-title":"Strong but Simple Baseline with Dual-Granularity Triplet Loss for Visible-Thermal Person Re-Identification","volume":"28","author":"Liu","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_11","first-page":"3523","article-title":"Image Segmentation Using Deep Learning: A Survey","volume":"44","author":"Minaee","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., and Harada, T. (2017, January 24\u201328). MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206396"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2576","DOI":"10.1109\/LRA.2019.2904733","article-title":"RTFNet: RGB-Thermal Fusion Network for Semantic Segmentation of Urban Scenes","volume":"4","author":"Sun","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1016\/j.patrec.2021.03.015","article-title":"Attention fusion network for multi-spectral semantic segmentation","volume":"146","author":"Xu","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1016\/j.inffus.2018.06.005","article-title":"Pedestrian detection with unsupervised multispectral feature learning using deep neural networks","volume":"46","author":"Cao","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_16","unstructured":"Wolpert, A., Teutsch, M., Sarfraz, M.S., and Stiefelhagen, R. (2020). Anchor-free Small-scale Multispectral Pedestrian Detection. arXiv."},{"key":"ref_17","unstructured":"Wagner, J., Fischer, V., Herman, S., and Behnke, S. (2016, January 27\u201329). Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks. Proceedings of the European Symposium on Artificial Neural Networks, Bruges, Belgium."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"4204","DOI":"10.1109\/TIP.2017.2711277","article-title":"Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning","volume":"26","author":"Song","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"K\u00f6nig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., and Teutsch, M. (2017, January 21\u201326). Fully Convolutional Region Proposal Networks for Multispectral Person Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.36"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Fu, K., Fan, D.P., Ji, G.P., and Zhao, Q. (2020, January 13\u201319). JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00312"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","article-title":"Fully Convolutional Networks for Semantic Segmentation","volume":"39","author":"Shelhamer","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_25","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_29","unstructured":"Fu, J., Liu, J., Wang, Y., and Lu, H. (2019). Stacked Deconvolutional Network for Semantic Segmentation. IEEE Trans. Image Process."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Milletari, F., Navab, N., and Ahmadi, S.A. (2016, January 25\u201328). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Proceedings of the Fourth International Conference on 3D Vision, Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.79"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","article-title":"The Pascal Visual Object Classes Challenge: A Retrospective","volume":"111","author":"Everingham","year":"2014","journal-title":"Int. J. Comput. Vis."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J., Lin, S., Wei, F., and Hu, H. (2019, January 27\u201328). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. Proceedings of the Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_34","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H. (2021, January 20\u201325). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.neucom.2020.01.089","article-title":"Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification","volume":"398","author":"Liu","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhu, H., Tan, R.J., Han, L., Fan, H., Wang, Z., Du, B., Liu, S., and Liu, Q. (2022). DSSM: A Deep Neural Network with Spectrum Separable Module for Multi-Spectral Remote Sensing Image Segmentation. Remote Sens., 14.","DOI":"10.3390\/rs14040818"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, G., Li, C., Ma, Y., Zheng, A., Tang, J., and Luo, B. (2018). RGB-T saliency detection benchmark: Dataset, baselines, analysis and a novel approach. Proceedings of the Chinese Conference on Image and Graphics Technologies, Springer.","DOI":"10.1007\/978-981-13-1702-6_36"},{"key":"ref_39","unstructured":"Tu, Z., Ma, Y., Li, Z., Li, C., Xu, J., and Liu, Y. (2020). RGBT salient object detection: A large-scale dataset and benchmark. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Tu, Z., Li, Z., Li, C., Lang, Y., and Tang, J. (2020). Multi-interactive siamese decoder for RGBT salient object detection. arXiv.","DOI":"10.1109\/TIP.2021.3087412"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1109\/TIP.2020.3037470","article-title":"Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient Object Detection","volume":"30","author":"Wang","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"3376","DOI":"10.1109\/TIP.2021.3060167","article-title":"CDNet: Complementary Depth Network for RGB-D Salient Object Detection","volume":"30","author":"Jin","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Hazirbas, C., Ma, L., Domokos, C., and Cremers, D. (2016, January 20\u201324). FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture. Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan.","DOI":"10.1007\/978-3-319-54181-5_14"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1109\/TITS.2017.2750080","article-title":"ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation","volume":"19","author":"Romera","year":"2018","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., and Cottrell, G. (2018, January 12\u201315). Understanding Convolution for Semantic Segmentation. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00163"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1109\/TMM.2019.2924578","article-title":"RGB-T image saliency detection via collaborative graph learning","volume":"22","author":"Tu","year":"2019","journal-title":"IEEE Trans. Multimed."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Tu, Z., Xia, T., Li, C., Lu, Y., and Tang, J. (2019, January 28\u201330). M3S-NIR: Multi-modal multi-scale noise-insensitive ranking for RGB-T saliency detection. Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA.","DOI":"10.1109\/MIPR.2019.00032"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Piao, Y., Ji, W., Li, J., Zhang, M., and Lu, H. (2019, January 27\u201328). Depth-induced multi-scale recurrent attention network for saliency detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00735"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Liu, N., Zhang, N., and Han, J. (2020, January 13\u201319). Learning selective self-mutual attention for RGB-D saliency detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01377"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/14\/3368\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:49:39Z","timestamp":1760140179000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/14\/3368"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,13]]},"references-count":50,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["rs14143368"],"URL":"https:\/\/doi.org\/10.3390\/rs14143368","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,13]]}}}