{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T15:32:03Z","timestamp":1776785523213,"version":"3.51.2"},"reference-count":53,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T00:00:00Z","timestamp":1744156800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China (NSFC)","doi-asserted-by":"publisher","award":["61873274"],"award-info":[{"award-number":["61873274"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>In open and dynamic environments, object detection is affected by rain, fog, snow, and complex lighting conditions, leading to decreased accuracy and posing a threat to driving safety. Infrared images can provide clear images at nighttime or in adverse weather conditions. Combined with the mature development of existing cross-modality object detection technologies, both of them offer support for addressing object detection issues in adverse weather scenarios. This paper establishes a novel dataset named Adverse Weather and Illumination Dataset (AWID) to simulate intricate real-world scenarios and proposes a cross-modal object detection algorithm for adverse weather scenarios in autonomous driving, named CME-YOLO, which is based on RGB and infrared images. It integrates the Cross-Perception Transformer Fusion algorithm, CPTFusion, and the Adaptive upsampling technique, AdSample, to enhance the extraction of detailed information and supplement effective information. CPTFusion fuses features from different modalities through multi-scale feature extraction and optimal fusion strategy computation. AdSample adaptively improves the utilization of key features and the quality of the resulting feature tensor. Experiments on two public datasets and AWID show that CME-YOLO performs optimally, with an mAP50 value on the FLIR dataset 6.8% higher than the state-of-the-art MPFT algorithm, verifying its excellent performance in autonomous driving object detection tasks.<\/jats:p>","DOI":"10.3390\/bdcc9040092","type":"journal-article","created":{"date-parts":[[2025,4,10]],"date-time":"2025-04-10T11:26:41Z","timestamp":1744284401000},"page":"92","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["CME-YOLO: A Cross-Modal Enhanced YOLO Algorithm for Adverse Weather Object Detection in Autonomous Driving"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-7894-177X","authenticated-orcid":false,"given":"Yifei","family":"Yuan","sequence":"first","affiliation":[{"name":"Laboratory for Big Data and Decision, College of Systems Engineering, National University of Defense Technology, Deya Road, Changsha 410073, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4568-551X","authenticated-orcid":false,"given":"Yingmei","family":"Wei","sequence":"additional","affiliation":[{"name":"Laboratory for Big Data and Decision, College of Systems Engineering, National University of Defense Technology, Deya Road, Changsha 410073, China"}]},{"given":"Yanming","family":"Guo","sequence":"additional","affiliation":[{"name":"Laboratory for Big Data and Decision, College of Systems Engineering, National University of Defense Technology, Deya Road, Changsha 410073, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9844-2370","authenticated-orcid":false,"given":"Jiangming","family":"Chen","sequence":"additional","affiliation":[{"name":"Laboratory for Big Data and Decision, College of Systems Engineering, National University of Defense Technology, Deya Road, Changsha 410073, China"}]},{"given":"Tingshuai","family":"Jiang","sequence":"additional","affiliation":[{"name":"Laboratory for Big Data and Decision, College of Systems Engineering, National University of Defense Technology, Deya Road, Changsha 410073, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Patil, P.W., Gupta, S., Rana, S., Venkatesh, S., and Murala, S. (2023, January 1\u20136). Multi-weather image restoration via domain translation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01983"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hwang, S., Park, J., Kim, N., Choi, Y., and So Kweon, I. (2015, January 7\u201312). Multispectral pedestrian detection: Benchmark dataset and baseline. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298706"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"769","DOI":"10.1007\/s00445-006-0107-0","article-title":"Strombolian explosive styles and source conditions: Insights from thermal (FLIR) video","volume":"69","author":"Patrick","year":"2007","journal-title":"Bull. Volcanol."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Jia, X., Zhu, C., Li, M., Tang, W., and Zhou, W. (2021, January 11\u201317). LLVIP: A visible-infrared paired dataset for low-light vision. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00389"},{"key":"ref_5","unstructured":"Qingyun, F., Dapeng, H., and Zhaokui, W. (2021). Cross-modality fusion transformer for multispectral object detection. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"9984","DOI":"10.1109\/TITS.2023.3266487","article-title":"Multi-modal feature pyramid transformer for rgb-infrared object detection","volume":"24","author":"Zhu","year":"2023","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, H., Fromont, E., Lefevre, S., and Avignon, B. (2020, January 25\u201328). Multispectral fusion for object detection with cyclic fuse-and-refine blocks. Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Virtual Conference.","DOI":"10.1109\/ICIP40778.2020.9191080"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhang, H., Fromont, E., Lef\u00e8vre, S., and Avignon, B. (2021, January 5\u20139). Guided attentive feature fusion for multispectral pedestrian detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Virtual Conference.","DOI":"10.1109\/WACV48630.2021.00012"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Marathe, A., Ramanan, D., Walambe, R., and Kotecha, K. (2023, January 17\u201324). Wedge: A multi-weather autonomous driving dataset built from generative vision-language models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPRW59228.2023.00334"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"16","DOI":"10.58496\/MJCSC\/2023\/003","article-title":"The ethical implications of DALL-E: Opportunities and challenges","volume":"2023","author":"Zhou","year":"2023","journal-title":"Mesopotamian J. Comput. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. (2020, January 13\u201319). Bdd100k: A diverse driving dataset for heterogeneous multitask learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00271"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13640-024-00633-4","article-title":"Impact of LiDAR point cloud compression on 3D object detection evaluated on the KITTI dataset","volume":"2024","author":"Martins","year":"2024","journal-title":"J. Image Video Process."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chen, J., Deng, W., Peng, B., Liu, T., Wei, Y., and Liu, L. (2023, January 10\u201314). Variational information bottleneck for cross domain object detection. Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia.","DOI":"10.1109\/ICME55011.2023.00381"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Pu, N., Chen, W., Liu, Y., Bakker, E.M., and Lew, M.S. (2020, January 12\u201316). Dual gaussian-based variational subspace disentanglement for visible-infrared person re-identification. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413673"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"8249","DOI":"10.1007\/s40747-024-01571-4","article-title":"Hybrid attentive prototypical network for few-shot action recognition","volume":"10","author":"Ruan","year":"2024","journal-title":"Complex Intell. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Kumar, S., and Sharma, S. (2024). An improved deep learning framework for multimodal medical data analysis. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8100125"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"4733","DOI":"10.1109\/TIP.2020.2975984","article-title":"MDLatLRR: A novel decomposition method for infrared and visible image fusion","volume":"29","author":"Li","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.inffus.2021.02.008","article-title":"An infrared and visible image fusion method based on multi-scale transformation and norm optimization","volume":"71","author":"Li","year":"2021","journal-title":"Inf. Fusion"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TIM.2020.3022438","article-title":"SEDRFuse: A symmetric encoder\u2013decoder with residual block network for infrared and visible image fusion","volume":"70","author":"Jian","year":"2020","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"4980","DOI":"10.1109\/TIP.2020.2977573","article-title":"DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion","volume":"29","author":"Ma","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_21","first-page":"1","article-title":"DRF: Disentangled representation for visible and infrared image fusion","volume":"70","author":"Xu","year":"2021","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3360","DOI":"10.1109\/TCSVT.2021.3109895","article-title":"UNFusion: A unified multi-scale densely connected network for infrared and visible image fusion","volume":"32","author":"Wang","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1016\/j.inffus.2021.02.023","article-title":"RFN-Nest: An end-to-end residual fusion network for infrared and visible images","volume":"73","author":"Li","year":"2021","journal-title":"Inf. Fusion"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1109\/TCI.2021.3100986","article-title":"Classification saliency-based rule for visible and infrared image fusion","volume":"7","author":"Xu","year":"2021","journal-title":"IEEE Trans. Comput. Imaging"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1109\/TCI.2021.3119954","article-title":"GAN-FM: Infrared and visible image fusion using GAN with full-scale skip connection and dual Markovian discriminators","volume":"7","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Comput. Imaging"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Liu, J., Fan, X., Huang, Z., Wu, G., Liu, R., Zhong, W., and Luo, Z. (2022, January 18\u201324). Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00571"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, D., Liu, J., Fan, X., and Liu, R. (2022). Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. arXiv.","DOI":"10.24963\/ijcai.2022\/487"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"5413","DOI":"10.1109\/TMM.2022.3192661","article-title":"YDTR: Infrared and visible image fusion via Y-shape dynamic transformer","volume":"25","author":"Tang","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Huang, Z., Liu, J., Fan, X., Liu, R., Zhong, W., and Luo, Z. (2022). Reconet: Recurrent correction network for fast and efficient multi-modality image fusion. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-031-19797-0_31"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"770","DOI":"10.1109\/TCSVT.2023.3289170","article-title":"Cross-modal transformers for infrared and visible image fusion","volume":"34","author":"Park","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Althoupety, A., Wang, L.Y., Feng, W.C., and Rekabdar, B. (2024, January 16\u201322). DaFF: Dual Attentive Feature Fusion for Multispectral Pedestrian Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00305"},{"key":"ref_32","unstructured":"Tian, Y., Carballo, A., Li, R., and Takeda, K. (2020). Road scene graph: A semantic graph-based scene representation dataset for intelligent vehicles. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.inffus.2022.03.007","article-title":"PIAFusion: A progressive infrared and visible image fusion network based on illumination aware","volume":"83","author":"Tang","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhang, P., Zhao, J., Wang, D., Lu, H., and Ruan, X. (2022, January 18\u201324). Visible-thermal UAV tracking: A large-scale benchmark and new baseline. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00868"},{"key":"ref_35","unstructured":"Chen, Z., Qian, Y., Yang, X., Wang, C., and Yang, M. (2024). AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection. arXiv."},{"key":"ref_36","unstructured":"Chu, C., Zhmoginov, A., and Sandler, M. (2017). Cyclegan, a master of steganography. arXiv."},{"key":"ref_37","unstructured":"Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., and Ding, G. (2024). Yolov10: Real-time end-to-end object detection. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1109\/TCSVT.2021.3056725","article-title":"Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion","volume":"32","author":"Liu","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1186","DOI":"10.1109\/TCSVT.2021.3075745","article-title":"Efficient and model-based infrared and visible image fusion via algorithm unrolling","volume":"32","author":"Zhao","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"13463","DOI":"10.1109\/TCSVT.2024.3449638","article-title":"Adjustable Visible and Infrared Image Fusion","volume":"34","author":"Wu","year":"2024","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"126117","DOI":"10.1109\/ACCESS.2022.3226564","article-title":"Multiscale progressive fusion of infrared and visible images","volume":"10","author":"Park","year":"2022","journal-title":"IEEE Access"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TIM.2022.3218574","article-title":"CGTF: Convolution-guided transformer for infrared and visible image fusion","volume":"71","author":"Li","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TIM.2022.3216413","article-title":"SwinFuse: A residual swin transformer fusion network for infrared and visible images","volume":"71","author":"Wang","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1109\/JAS.2022.105686","article-title":"SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer","volume":"9","author":"Ma","year":"2022","journal-title":"IEEE CAA J. Autom. Sin."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Ma, C., Wang, X., and Deng, B. (2024, January 1\u20133). MdcFormer: Transformers based on dynamic weights and multi-scale for medical image segmentation. Proceedings of the International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024), Guangzhou, China.","DOI":"10.1117\/12.3033531"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"10270","DOI":"10.1109\/TPAMI.2021.3134200","article-title":"Non-local graph neural networks","volume":"44","author":"Liu","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Park, S., Vien, A.G., and Lee, C. (2022, January 16\u201319). Infrared and visible image fusion using bimodal transformers. Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France.","DOI":"10.1109\/ICIP46576.2022.9897993"},{"key":"ref_48","first-page":"28522","article-title":"Vitae: Vision transformer advanced by exploring intrinsic inductive bias","volume":"34","author":"Xu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"3159","DOI":"10.1109\/TCSVT.2023.3234340","article-title":"DATFuse: Infrared and visible image fusion via dual attention transformer","volume":"33","author":"Tang","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_50","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_51","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv."},{"key":"ref_52","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_53","unstructured":"Jocher, G., Stoken, A., Borovec, J., Changyu, L., Hogan, A., Diaconu, L., Poznanski, J., Yu, L., Rai, P., and Ferriday, R. (2020). ultralytics\/yolov5: v3. 0, Zenodo."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/4\/92\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:13:05Z","timestamp":1760029985000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/4\/92"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,9]]},"references-count":53,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["bdcc9040092"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9040092","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,9]]}}}