{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T03:11:24Z","timestamp":1778037084267,"version":"3.51.4"},"reference-count":72,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2024,9,24]],"date-time":"2024-09-24T00:00:00Z","timestamp":1727136000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>With the rapid growth in demand for security surveillance, assisted driving, and remote sensing, object detection networks with robust environmental perception and high detection accuracy have become a research focus. However, single-modality image detection technologies face limitations in environmental adaptability, often affected by factors such as lighting conditions, fog, rain, and obstacles like vegetation, leading to information loss and reduced detection accuracy. We propose an object detection network that integrates features from visible light and infrared images\u2014IV-YOLO\u2014to address these challenges. This network is based on YOLOv8 (You Only Look Once v8) and employs a dual-branch fusion structure that leverages the complementary features of infrared and visible light images for target detection. We designed a Bidirectional Pyramid Feature Fusion structure (Bi-Fusion) to effectively integrate multimodal features, reducing errors from feature redundancy and extracting fine-grained features for small object detection. Additionally, we developed a Shuffle-SPP structure that combines channel and spatial attention to enhance the focus on deep features and extract richer information through upsampling. Regarding model optimization, we designed a loss function tailored for multi-scale object detection, accelerating the convergence speed of the network during training. Compared with the current state-of-the-art Dual-YOLO model, IV-YOLO achieves mAP improvements of 2.8%, 1.1%, and 2.2% on the Drone Vehicle, FLIR, and KAIST datasets, respectively. On the Drone Vehicle and FLIR datasets, IV-YOLO has a parameter count of 4.31 M and achieves a frame rate of 203.2 fps, significantly outperforming YOLOv8n (5.92 M parameters, 188.6 fps on the Drone Vehicle dataset) and YOLO-FIR (7.1 M parameters, 83.3 fps on the FLIR dataset), which had previously achieved the best performance on these datasets. This demonstrates that IV-YOLO achieves higher real-time detection performance while maintaining lower parameter complexity, making it highly promising for applications in autonomous driving, public safety, and beyond.<\/jats:p>","DOI":"10.3390\/s24196181","type":"journal-article","created":{"date-parts":[[2024,9,24]],"date-time":"2024-09-24T10:41:47Z","timestamp":1727174507000},"page":"6181","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["IV-YOLO: A Lightweight Dual-Branch Object Detection Network"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2383-6367","authenticated-orcid":false,"given":"Dan","family":"Tian","sequence":"first","affiliation":[{"name":"Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xin","family":"Yan","sequence":"additional","affiliation":[{"name":"Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dong","family":"Zhou","sequence":"additional","affiliation":[{"name":"Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8667-0315","authenticated-orcid":false,"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenshuai","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,9,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zhang, H., Xu, H., Xiao, Y., Guo, X., and Ma, J. (2020, January 7\u201312). Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6975"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (2017, January 4\u20139). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Huang, X., Liu, M.Y., Belongie, S., and Kautz, J. (2018, January 8\u201314). Multimodal unsupervised image-to-image translation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01219-9_11"},{"key":"ref_4","unstructured":"Qu, L., Liu, S., Wang, M., and Song, Z. (March, January 22). Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning. Proceedings of the AAAI Conference on Artificial Intelligence, Online."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Munsif, M., Khan, N., Hussain, A., Kim, M.J., and Baik, S.W. (IEEE Trans. Ind. Inform., 2024). Darkness-Adaptive Action Recognition: Leveraging Efficient Tubelet Slow-Fast Network for Industrial Applications, IEEE Trans. Ind. Inform., early access.","DOI":"10.1109\/TII.2024.3431070"},{"key":"ref_6","first-page":"1","article-title":"Attention-based deep learning framework for action recognition in a dark environment","volume":"14","author":"Munsif","year":"2024","journal-title":"Hum. Centric Comput. Inf. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Wen, X., Wang, F., Feng, Z., Lin, J., and Shi, C. (2023, January 26\u201328). MDFN: Multi-scale Dense Fusion Network for RGB-D Salient Object Detection. Proceedings of the 2023 IEEE 3rd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China.","DOI":"10.1109\/ICIBA56860.2023.10164875"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1016\/j.inffus.2021.10.006","article-title":"Multi-exposure image fusion via deep perceptual enhancement","volume":"79","author":"Han","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hou, J., Zhang, D., Wu, W., Ma, J., and Zhou, H. (2021). A generative adversarial network for infrared and visible image fusion based on semantic segmentation. Entropy, 23.","DOI":"10.3390\/e23030376"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ding, J., Xue, N., Long, Y., Xia, G.S., and Lu, Q. (2019, January 16\u201317). Learning RoI transformer for oriented object detection in aerial images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00296"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"French, G., Finlayson, G., and Mackiewicz, M. (2018). Multi-spectral pedestrian detection via image fusion and deep neural networks. J. Imaging Sci. Technol., 176\u2013181.","DOI":"10.2352\/J.lmagingSci.Technol.2018.62.5.050406"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Bao, C., Cao, J., Hao, Q., Cheng, Y., Ning, Y., and Zhao, T. (2023). Dual-YOLO architecture from infrared and visible images for object detection. Sensors, 23.","DOI":"10.3390\/s23062934"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.inffus.2018.11.017","article-title":"Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection","volume":"50","author":"Guan","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1510","DOI":"10.1109\/TCSVT.2021.3076466","article-title":"Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection","volume":"32","author":"Kim","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhu, J., Zhang, X., Dong, F., Yan, S., Meng, X., Li, Y., and Tan, P. (2022, January 15\u201317). Transformer-based Adaptive Interactive Promotion Network for RGB-T Salient Object Detection. Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC), Hefei, China.","DOI":"10.1109\/CCDC55256.2022.10034159"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ye, Z., Peng, Y., Han, B., Hao, H., and Liu, W. (2024, January 10\u201312). Unmanned Aerial Vehicle Target Detection Algorithm Based on Infrared Visible Light Feature Level Fusion. Proceedings of the 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China.","DOI":"10.1109\/CISCE62493.2024.10653241"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Gao, Y., Cheng, Z., Su, H., Ji, Z., Hu, J., and Peng, Z. (2023, January 20\u201322). Infrared and Visible Image Fusion Method based on Residual Network. Proceedings of the 2023 4th International Conference on Computer Engineering and Intelligent Control (ICCEIC), Guangzhou, China.","DOI":"10.1109\/ICCEIC60201.2023.10426703"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ataman, F.C., and Akar, G.B. (2021, January 19\u201322). Visible and infrared image fusion using encoder-decoder network. Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA.","DOI":"10.1109\/ICIP42928.2021.9506740"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1186","DOI":"10.1109\/TCSVT.2021.3075745","article-title":"Efficient and model-based infrared and visible image fusion via algorithm unrolling","volume":"32","author":"Zhao","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wang, S., Li, X., Huo, W., and You, J. (2022, January 22\u201324). Fusion of infrared and visible images based on improved generative adversarial networks. Proceedings of the 2022 3rd International Conference on Information Science, Parallel and Distributed Systems (ISPDS), Guangzhou, China.","DOI":"10.1109\/ISPDS56360.2022.9874034"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"4050","DOI":"10.1109\/TIP.2022.3180210","article-title":"Fine-grained multilevel fusion for anti-occlusion monocular 3d object detection","volume":"31","author":"Liu","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"8586","DOI":"10.1109\/JSTARS.2022.3208928","article-title":"Detecting Fine-Grained Airplanes in SAR Images with Sparse Attention-Guided Pyramid and Class-Balanced Data Augmentation","volume":"15","author":"Bao","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"28208","DOI":"10.1109\/ACCESS.2023.3258400","article-title":"Accelerating deep neural networks for efficient scene understanding in multi-modal automotive applications","volume":"11","author":"Nousias","year":"2023","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Poeppel, A., Eym\u00fcller, C., and Reif, W. (2023, January 10\u201312). SensorClouds: A Framework for Real-Time Processing of Multi-modal Sensor Data for Human-Robot-Collaboration. Proceedings of the 2023 9th International Conference on Automation, Robotics and Applications (ICARA), Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/ICARA56516.2023.10125740"},{"key":"ref_25","unstructured":"(2023, November 15). Ultralytics. YOLOv8. Available online: https:\/\/github.com\/ultralytics\/ultralytics."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Hwang, S., Park, J., Kim, N., Choi, Y., and So Kweon, I. (2015, January 7\u201312). Multispectral pedestrian detection: Benchmark dataset and baseline. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298706"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"6700","DOI":"10.1109\/TCSVT.2022.3168279","article-title":"Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning","volume":"32","author":"Sun","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_28","unstructured":"(2022, January 19). FLIR ADAS Dataset. Available online: https:\/\/www.flir.com\/oem\/adas\/adas-dataset-form."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Proceedings, Part I 14.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_33","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_34","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_35","unstructured":"Jocher, G. (2024, September 20). YOLOv5 by Ultralytics. Available online: https:\/\/github.com\/ultralytics\/yolov5."},{"key":"ref_36","unstructured":"Li, C.Y., Wang, N., Mu, Y.Q., Wang, J., and Liao, H.Y.M. (2022). YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv.","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"ref_38","unstructured":"Wang, C.Y., Yeh, I.H., and Liao, H.Y.M. (2024). Yolov9: Learning what you want to learn using programmable gradient information. arXiv."},{"key":"ref_39","unstructured":"Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., and Ding, G. (2024). Yolov10: Real-time end-to-end object detection. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Farahnakian, F., and Heikkonen, J. (2020). Deep learning based multi-modal fusion architectures for maritime vessel detection. Remote Sens., 12.","DOI":"10.3390\/rs12162509"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Li, R., Peng, Y., and Yang, Q. (2023, January 15\u201317). Fusion enhancement: UAV target detection based on multi-modal GAN. Proceedings of the 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China.","DOI":"10.1109\/ITOEC57671.2023.10291920"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Pahde, F., Puscas, M., Klein, T., and Nabi, M. (2021, January 5\u20139). Multimodal prototypical networks for few-shot learning. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Virtual Event.","DOI":"10.1109\/WACV48630.2021.00269"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Liang, P., Jiang, J., Liu, X., and Ma, J. (2022, January 23\u201327). Fusion from decomposition: A self-supervised decomposition approach for image fusion. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19797-0_41"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1244","DOI":"10.1007\/s10489-020-01882-2","article-title":"TIRNet: Object detection in thermal infrared images for autonomous driving","volume":"51","author":"Dai","year":"2021","journal-title":"Appl. Intell."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Wang, Q., Chi, Y., Shen, T., Song, J., Zhang, Z., and Zhu, Y. (2022). Improving RGB-infrared object detection by reducing cross-modality redundancy. Remote Sens., 14.","DOI":"10.3390\/rs14092020"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"15808","DOI":"10.1109\/TITS.2022.3145476","article-title":"Thermal infrared image colorization for nighttime driving scenes with top-down guided attention","volume":"23","author":"Luo","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"104517","DOI":"10.1016\/j.dsp.2024.104517","article-title":"NLNet: A narrow-channel lightweight network for finger multimodal recognition","volume":"150","author":"Guo","year":"2024","journal-title":"Digit. Signal Process."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J., Lin, S., Wei, F., and Hu, H. (2019, January 27\u201328). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Gao, Z., Xie, J., Wang, Q., and Li, P. (2019, January 15\u201320). Global second-order pooling convolutional networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00314"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., and Hu, Q. (2020, January 13\u201319). ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Zhang, Q.L., and Yang, Y.B. (2021, January 6\u201311). Sa-net: Shuffle attention for deep convolutional neural networks. Proceedings of the ICASSP 2021\u20142021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9414568"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Qin, Z., Zhang, P., Wu, F., and Li, X. (2021, January 11\u201317). Fcanet: Frequency channel attention networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00082"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Salman, H., Parks, C., Swan, M., and Gauch, J. (2023, January 15\u201318). Orthonets: Orthogonal channel attention networks. Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy.","DOI":"10.1109\/BigData59044.2023.10386646"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 22\u201329). Deformable convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201323). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Li, W., Chen, Y., Hu, K., and Zhu, J. (2022, January 18\u201324). Oriented reppoints for aerial object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00187"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"15269","DOI":"10.1109\/JSTARS.2024.3447649","article-title":"YOLOFIV: Object detection algorithm for around-the-clock aerial remote sensing images by fusing infrared and visible features","volume":"17","author":"Wang","year":"2024","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (2019). FCOS: Fully convolutional one-stage object detection. arXiv.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Zhang, S., Wen, L., Bian, X., Lei, Z., and Li, S.Z. (2018, January 18\u201323). Single-shot refinement neural network for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00442"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"141861","DOI":"10.1109\/ACCESS.2021.3120870","article-title":"Yolo-firi: Improved yolov5 for infrared image object detection","volume":"9","author":"Li","year":"2021","journal-title":"IEEE Access"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"2289","DOI":"10.1007\/s13369-021-06181-7","article-title":"IARet: A lightweight multiscale infrared aerocraft recognition algorithm","volume":"47","author":"Jiang","year":"2022","journal-title":"Arab. J. Sci. Eng."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"3420","DOI":"10.1109\/TMM.2022.3160589","article-title":"Confidence-aware fusion using dempster-shafer theory for multispectral pedestrian detection","volume":"25","author":"Li","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., and Sun, J. (2021, January 20\u201325). You only look one-level feature. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01284"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wu, Y., Han, X., and Shi, J. (2020, January 23\u201328). Forkgan: Seeing into the rainy night. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK. Proceedings, Part III 16.","DOI":"10.1007\/978-3-030-58580-8_10"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Anoosheh, A., Sattler, T., Timofte, R., Pollefeys, M., and Van Gool, L. (2019, January 20\u201324). Night-to-day image translation for retrieval-based localization. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, BC, Canada.","DOI":"10.1109\/ICRA.2019.8794387"},{"key":"ref_72","unstructured":"Liu, M.Y., Breuel, T., and Kautz, J. (2017). Unsupervised image-to-image translation networks. Adv. Neural Inf. Process. Syst., 30."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/19\/6181\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:01:51Z","timestamp":1760112111000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/19\/6181"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,24]]},"references-count":72,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["s24196181"],"URL":"https:\/\/doi.org\/10.3390\/s24196181","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,24]]}}}