{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T16:03:07Z","timestamp":1765382587355,"version":"build-2065373602"},"reference-count":48,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T00:00:00Z","timestamp":1761955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Technology Major Project of Changsha","award":["KH2401024"],"award-info":[{"award-number":["KH2401024"]}]},{"name":"Research and Development Plan of Key Areas in Hunan Province","award":["2024AQ2017"],"award-info":[{"award-number":["2024AQ2017"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Crack segmentation in images plays a pivotal role in the monitoring of structural surfaces, serving as a fundamental technique for assessing structural integrity. However, existing methods that rely solely on RGB images exhibit high sensitivity to light conditions, which significantly restricts their adaptability in complex environmental scenarios. To address this, we propose a structure-aware progressive multi-modal fusion network (SPMFNet) for RGB-thermal (RGB-T) crack segmentation. The main idea is to integrate complementary information from RGB and thermal images and incorporate structural priors (edge information) to achieve accurate segmentation. Here, to better fuse multi-layer features from different modalities, a progressive multi-modal fusion strategy is designed. In the shallow encoder layers, two gate control attention (GCA) modules are introduced to dynamically regulate the fusion process through a gating mechanism, allowing the network to adaptively integrate modality-specific structural details based on the input. In the deeper layers, two attention feature fusion (AFF) modules are employed to enhance semantic consistency by leveraging both local and global attention, thereby facilitating the effective interaction and complementarity of high-level multi-modal features. In addition, edge prior information is introduced to encourage the predicted crack regions to preserve structural integrity, which is constrained by a joint loss of edge-guided loss, multi-scale focal loss, and adaptive fusion loss. Experimental results on publicly available RGB-T crack detection datasets demonstrate that the proposed method outperforms both classical and advanced approaches, verifying the effectiveness of the progressive fusion strategy and the utilization of the structural prior.<\/jats:p>","DOI":"10.3390\/jimaging11110384","type":"journal-article","created":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T14:40:48Z","timestamp":1762180848000},"page":"384","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Structure-Aware Progressive Multi-Modal Fusion Network for RGB-T Crack Segmentation"],"prefix":"10.3390","volume":"11","author":[{"given":"Zhengrong","family":"Yuan","sequence":"first","affiliation":[{"name":"Hunan Architectural Design Institute Group Co., Ltd., Changsha 410208, China"}]},{"given":"Xin","family":"Ding","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Robotics, Hunan University, Changsha 410012, China"}]},{"given":"Xinhong","family":"Xia","sequence":"additional","affiliation":[{"name":"Hunan Architectural Design Institute Group Co., Ltd., Changsha 410208, China"}]},{"given":"Yibin","family":"He","sequence":"additional","affiliation":[{"name":"Hunan Architectural Design Institute Group Co., Ltd., Changsha 410208, China"}]},{"given":"Hui","family":"Fang","sequence":"additional","affiliation":[{"name":"Hunan Architectural Design Institute Group Co., Ltd., Changsha 410208, China"}]},{"given":"Bo","family":"Yang","sequence":"additional","affiliation":[{"name":"Hunan Architectural Design Institute Group Co., Ltd., Changsha 410208, China"}]},{"given":"Wei","family":"Fu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Wang, H., Wu, G., and Liu, Y. (2025). Efficient generative-adversarial U-Net for multi-organ medical image segmentation. J. Imaging, 11.","DOI":"10.3390\/jimaging11010019"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"103606","DOI":"10.1016\/j.autcon.2021.103606","article-title":"Automatic crack classification and segmentation on masonry surfaces using convolutional neural networks and transfer learning","volume":"125","author":"Dais","year":"2021","journal-title":"Autom. Constr."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"132839","DOI":"10.1016\/j.conbuildmat.2023.132839","article-title":"Advanced crack detection and segmentation on bridge decks using deep learning","volume":"400","author":"Tran","year":"2023","journal-title":"Constr. Build. Mater."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"103545","DOI":"10.1016\/j.compind.2021.103545","article-title":"Pixellevel tunnel crack segmentation using a weakly supervised annotation approach","volume":"133","author":"Wang","year":"2021","journal-title":"Comput. Ind."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"105332","DOI":"10.1016\/j.autcon.2024.105332","article-title":"Crackdiffusion: A two-stage semantic segmentation framework for pavement crack combining unsupervised and supervised processes","volume":"160","author":"Han","year":"2024","journal-title":"Autom. Constr."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2014MICCAI 2015, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"24","DOI":"10.3141\/2645-03","article-title":"Detection of crack growth in asphalt pavement through use of infrared imaging","volume":"2645","author":"Du","year":"2017","journal-title":"Transp. Res. Rec."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"114159","DOI":"10.1016\/j.measurement.2024.114159","article-title":"An attention-based progressive fusion network for pixelwise pavement crack detection","volume":"226","author":"Ma","year":"2024","journal-title":"Measurement"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"9240","DOI":"10.1109\/TITS.2023.3266776","article-title":"CrackFormer network for pavement crack segmentation","volume":"24","author":"Liu","year":"2023","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"117367","DOI":"10.1016\/j.conbuildmat.2019.117367","article-title":"Image-based concrete crack detection in tunnels using deep fully convolutional networks","volume":"234","author":"Ren","year":"2020","journal-title":"Constr. Build. Mater."},{"key":"ref_13","first-page":"423","article-title":"DeepLabV3+ Based Mask R-CNN for Crack Detection and Segmentation in Concrete Structures","volume":"16","author":"Liu","year":"2025","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"103249","DOI":"10.1016\/j.aei.2025.103249","article-title":"Segmentation refinement of thin cracks with minimum strip cuts","volume":"65","author":"Hou","year":"2025","journal-title":"Adv. Eng. Inf."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, S., Gou, S., Yao, Y., Chen, Y., and Wang, X. (2024, January 18\u201320). Physically informed prior and cross-correlation constraint for fine-grained road crack segmentation. Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Urumqi, China.","DOI":"10.1007\/978-981-97-8502-5_32"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yoon, H., Kim, H.K., and Kim, S. (2025). PPDD: Egocentric crack segmentation in the port pavement with deep learning-based methods. Appl. Sci., 15.","DOI":"10.3390\/app15105446"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sun, W., Liu, X., and Lei, Z. (2025). Research on tunnel crack identification localization and segmentation method based on improved YOLOX and UNETR++. Sensors, 25.","DOI":"10.3390\/s25113417"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"117215","DOI":"10.1016\/j.measurement.2025.117215","article-title":"ISTD-CrackNet: Hybrid CNN-transformer models focusing on fine-grained segmentation of multi-scale pavement cracks","volume":"251","author":"Zhang","year":"2025","journal-title":"Measurement"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1007\/s12145-024-01511-3","article-title":"Segmentation of crack disaster images based on feature extraction enhancement and multi-scale fusion","volume":"18","author":"Wang","year":"2025","journal-title":"Earth Sci. Inform."},{"key":"ref_20","first-page":"1117","article-title":"An FCN-based segmentation network for fine linear crack detection and measurement in metals","volume":"16","author":"Si","year":"2025","journal-title":"Int. J. Struct. Integr."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"2508919","DOI":"10.1080\/10298436.2025.2508919","article-title":"A U-Net-like full convolutional pavement crack segmentation network based on multi-layer feature fusion","volume":"26","author":"Wang","year":"2025","journal-title":"Int. J. Pavement Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"04025009","DOI":"10.1061\/JCCEE5.CPENG-5926","article-title":"DCNCrack: Pavement crack segmentation based on large-scaled deformable convolutional network","volume":"39","author":"Wang","year":"2025","journal-title":"J. Comput. Civ. Eng."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"8219","DOI":"10.1109\/TITS.2025.3558782","article-title":"MorFormer: Morphology-aware transformer for generalized pavement crack segmentation","volume":"26","author":"Guo","year":"2025","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"065011","DOI":"10.1088\/1361-6501\/addc0a","article-title":"Deep crack segmentation: A semi-supervised approach with coordinate attention and adaptive loss","volume":"36","author":"Zeng","year":"2025","journal-title":"Meas. Sci. Technol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liang, F., Li, Q., Yu, H., and Wang, W. (2025). CrackCLIP: Adapting vision-language models for weakly supervised crack segmentation. Entropy, 27.","DOI":"10.3390\/e27020127"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"K\u00fct\u00fck, Z., and Algan, G. (2022, January 18\u201323). Semantic segmentation for thermal images: A comparative survey. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00043"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"2286461","DOI":"10.1080\/10298436.2023.2286461","article-title":"A complex scene pavement crack semantic segmentation method based on dual-stream framework","volume":"24","author":"Wang","year":"2023","journal-title":"Int. J. Pavement Eng."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., and Harada, T. (2017, January 24\u201328). MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206396"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Zhao, S., Luo, Y., Zhang, D., Huang, N., and Han, J. (2021, January 20\u201325). ABMDRNet: Adaptive-weighted bi-dTectional modality difference reduction network for RGB-T semantic segmentation. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00266"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"7790","DOI":"10.1109\/TIP.2021.3109518","article-title":"GMNet: Graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation","volume":"30","author":"Zhou","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhou, W., Dong, S., Xu, C., and Yaguan, Q. (2022, January 22\u201330). Edge-aware guidance fusion network for RGB-thermal scene parsing. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual.","DOI":"10.1609\/aaai.v36i3.20269"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"7096","DOI":"10.1109\/TCSVT.2023.3275314","article-title":"MMSMCNet: Modal memory sharing and morphological complementary networks for RGB-T urban scene semantic segmentation","volume":"33","author":"Zhou","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2223","DOI":"10.1109\/TIV.2023.3296219","article-title":"On exploring shape and semantic enhancements for RGB-X semantic segmentation","volume":"9","author":"Yang","year":"2024","journal-title":"IEEE Trans. Intell. Veh."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"7001205","DOI":"10.1109\/LGRS.2023.3322452","article-title":"UTFNet: Uncertainty-guided trustworthy fusion network for RGB-Thermal semantic segmentation","volume":"20","author":"Wang","year":"2023","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"127594","DOI":"10.1016\/j.neucom.2024.127594","article-title":"DHFNet: Decoupled hierarchical fusion network for RGB-T dense prediction tasks","volume":"583","author":"Chen","year":"2024","journal-title":"Neurocomputing"},{"key":"ref_36","unstructured":"Zhao, G., Huang, J., and Peng, T. (October, January 29). Open-vocabulary RGB-Thermal semantic segmentation. Proceedings of the European Conference on Computer Vision, Milan, Italy."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"111951","DOI":"10.1016\/j.patcog.2025.111951","article-title":"Implicit alignment and query refinement for RGB-T semantic segmentation","volume":"169","author":"Liu","year":"2026","journal-title":"Pattern Recognit."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1686","DOI":"10.1109\/TIP.2025.3544484","article-title":"MiLNet: Multiplex interactive learning network for RGB-T semantic segmentation","volume":"34","author":"Liu","year":"2025","journal-title":"IEEE Trans. Image Process."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.inffus.2021.12.004","article-title":"Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and RGB image fusion network","volume":"82","author":"Tang","year":"2022","journal-title":"Inform. Fusion"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 13\u201318). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Dai, Y., Gieseke, F., Oehmcke, S., and Wu, Y. (2021, January 5\u20139). Attentional feature fusion. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Virtual.","DOI":"10.1109\/WACV48630.2021.00360"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"22145","DOI":"10.1109\/TITS.2022.3142393","article-title":"Asphalt pavement crack detection based on convolutional neural network and infrared thermography","volume":"23","author":"Liu","year":"2022","journal-title":"IEEE Trans. Intell. Transport. Syst."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"105213","DOI":"10.1016\/j.autcon.2023.105213","article-title":"Crack detection of masonry structure based on thermal and visible image fusion and semantic segmentation","volume":"158","author":"Huang","year":"2024","journal-title":"Autom. Constr."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"6348","DOI":"10.1109\/TMM.2023.3349072","article-title":"Context-aware interaction network for RGB-T semantic segmentation","volume":"26","author":"Lv","year":"2024","journal-title":"IEEE Trans. Multimedia"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"12474","DOI":"10.1109\/TITS.2025.3555617","article-title":"Transferring prior thermal knowledge for snowy urban scene semantic segmentation","volume":"26","author":"Guo","year":"2025","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_46","first-page":"1","article-title":"SFAF-MA: Spatial feature aggregation and fusion with modality adaptation for RGB-thermal semantic segmentation","volume":"72","author":"He","year":"2023","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"7737","DOI":"10.1109\/TCSVT.2023.3281419","article-title":"SGFNet: Semantic-guided fusion network for RGB-thermal semantic segmentation","volume":"33","author":"Wang","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_48","unstructured":"Xiao, R., and Chen, X. (2024). IRFusionFormer: Enhancing Pavement Crack Segmentation with RGB-T Fusion and Topological-Based Loss. arXiv."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/11\/384\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T15:29:29Z","timestamp":1762183769000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/11\/384"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,1]]},"references-count":48,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["jimaging11110384"],"URL":"https:\/\/doi.org\/10.3390\/jimaging11110384","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,1]]}}}