{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T18:20:56Z","timestamp":1780424456372,"version":"3.54.1"},"reference-count":39,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2021,6,18]],"date-time":"2021-06-18T00:00:00Z","timestamp":1623974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018AAA0102600"],"award-info":[{"award-number":["2018AAA0102600"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61906050"],"award-info":[{"award-number":["61906050"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Multispectral pedestrian detection, which consists of a color stream and thermal stream, is essential under conditions of insufficient illumination because the fusion of the two streams can provide complementary information for detecting pedestrians based on deep convolutional neural networks (CNNs). In this paper, we introduced and adapted a simple and efficient one-stage YOLOv4 to replace the current state-of-the-art two-stage fast-RCNN for multispectral pedestrian detection and to directly predict bounding boxes with confidence scores. To further improve the detection performance, we analyzed the existing multispectral fusion methods and proposed a novel multispectral channel feature fusion (MCFF) module for integrating the features from the color and thermal streams according to the illumination conditions. Moreover, several fusion architectures, such as Early Fusion, Halfway Fusion, Late Fusion, and Direct Fusion, were carefully designed based on the MCFF to transfer the feature information from the bottom to the top at different stages. Finally, the experimental results on the KAIST and Utokyo pedestrian benchmarks showed that Halfway Fusion was used to obtain the best performance of all architectures and the MCFF could adapt fused features in the two modalities. The log-average miss rate (MR) for the two modalities with reasonable settings were 4.91% and 23.14%, respectively.<\/jats:p>","DOI":"10.3390\/s21124184","type":"journal-article","created":{"date-parts":[[2021,6,18]],"date-time":"2021-06-18T11:19:20Z","timestamp":1624015160000},"page":"4184","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":64,"title":["Attention Fusion for One-Stage Multispectral Pedestrian Detection"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8409-8089","authenticated-orcid":false,"given":"Zhiwei","family":"Cao","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6334-4044","authenticated-orcid":false,"given":"Huihua","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Juan","family":"Zhao","sequence":"additional","affiliation":[{"name":"China Mobile Research Institute, Beijing 100053, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4541-1706","authenticated-orcid":false,"given":"Shuhong","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9402-0421","authenticated-orcid":false,"given":"Lingqiao","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, Z., Wang, K., Li, L., and Wang, F. (2006, January 13\u201315). A Review on Vision-Based Pedestrian Detection for Intelligent Vehicles. Proceedings of the 2006 IEEE International Conference on Vehicular Electronics and Safety, Shanghai, China.","DOI":"10.1109\/ICVES.2006.371554"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.1109\/TPAMI.2014.2300479","article-title":"Fast Feature Pyramids for Object Detection","volume":"36","author":"Appel","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Girshick, R.B. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Hosang, J.H., Benenson, R., and Schiele, B. (2014, January 1\u20135). How good are detection proposals, really?. Proceedings of the British Machine Vision Conference, Nottingham, UK.","DOI":"10.5244\/C.28.24"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhang, L., Lin, L., Liang, X., and He, K. (2016, January 11\u201314). Is Faster R-CNN Doing Well for Pedestrian Detection?. Proceedings of the Computer Vision-ECCV 2016-14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_28"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, S., Benenson, R., Omran, M., Hosang, J.H., and Schiele, B. (2016, January 27\u201330). How Far are We from Solving Pedestrian Detection?. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.141"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Brazil, G., and Liu, X. (2019, January 16\u201320). Pedestrian Detection With Autoregressive Network Phases. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00740"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.neucom.2019.12.110","article-title":"Hybrid channel based pedestrian detection","volume":"389","author":"Tesema","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, W., Liao, S., Ren, W., Hu, W., and Yu, Y. (2019, January 16\u201320). High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00533"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1239","DOI":"10.1109\/TPAMI.2009.122","article-title":"Survey of Pedestrian Detection for Advanced Driver Assistance Systems","volume":"32","author":"Sappa","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","unstructured":"Wiegersma, A.J. (2006). Real-Time Pedestrian Detection in FIR and Grayscale Images, Bochum University."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1244","DOI":"10.1007\/s10489-020-01882-2","article-title":"TIRNet: Object detection in thermal infrared images for autonomous driving","volume":"51","author":"Dai","year":"2021","journal-title":"Appl. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hwang, S., Park, J., Kim, N., Choi, Y., and Kweon, I.S. (2015, January 7\u201312). Multispectral pedestrian detection: Benchmark dataset and baseline. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298706"},{"key":"ref_15","unstructured":"Karasawa, T., Watanabe, K., Ha, Q., Tejero-de-Pablos, A., Ushiku, Y., and Harada, T. (2017, January 23\u201327). Multispectral Object Detection for Autonomous Vehicles. Proceedings of the Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Brazil, G., Yin, X., and Liu, X. (2017, January 22\u201329). Illuminating Pedestrians via Simultaneous Detection and Segmentation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.530"},{"key":"ref_17","unstructured":"Li, C., Song, D., Tong, R., and Tang, M. (2018, January 3\u20136). Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation. Proceedings of the British Machine Vision Conference 2018, Newcastle, UK."},{"key":"ref_18","unstructured":"Wolpert, A., Teutsch, M., Sarfraz, M.S., and Stiefelhagen, R. (2020, January 7\u201310). Anchor-free Small-scale Multispectral Pedestrian Detection. Proceedings of the 31st British Machine Vision Conference 2020, Online, UK."},{"key":"ref_19","unstructured":"Zheng, Y., Izzat, I.H., and Ziaee, S. (2019). GFD-SSD: Gated Fusion Double SSD for Multispectral Pedestrian Detection. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Nataprawira, J., Gu, Y., Goncharenko, I., and Kamijo, S. (2021). Pedestrian Detection Using Multispectral Images and a Deep Neural Network. Sensors, 21.","DOI":"10.3390\/s21072536"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"103178","DOI":"10.1016\/j.infrared.2019.103178","article-title":"A fast RetinaNet fusion framework for multi-spectral pedestrian detection","volume":"105","author":"Pei","year":"2020","journal-title":"Infrared Phys. Technol."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, J., Zhang, S., Wang, S., and Metaxas, D.N. (2016, January 19\u201322). Multispectral Deep Neural Networks for Pedestrian Detection. Proceedings of the British Machine Vision Conference 2016, York, UK.","DOI":"10.5244\/C.30.73"},{"key":"ref_23","unstructured":"Bochkovskiy, A., Wang, C., and Liao, H.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_25","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.inffus.2018.11.017","article-title":"Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection","volume":"50","author":"Guan","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.patcog.2018.08.005","article-title":"Illumination-aware faster R-CNN for robust multispectral pedestrian detection","volume":"85","author":"Li","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_29","unstructured":"Wagner, J., Fischer, V., Herman, M., and Behnke, S. (2016, January 27\u201329). Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks. Proceedings of the 24th European Symposium on Artificial Neural Networks, Bruges, Belgium."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"K\u00f6nig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., and Teutsch, M. (2017, January 21\u201326). Fully Convolutional Region Proposal Networks for Multispectral Person Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.36"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Hou, Y., Song, Y., Hao, X., Shen, Y., and Qian, M. (2017, January 22\u201325). Multispectral pedestrian detection based on deep convolutional neural networks. Proceedings of the 2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xiamen, China.","DOI":"10.1109\/ICSPCC.2017.8242507"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"115764","DOI":"10.1016\/j.image.2019.115764","article-title":"Convolutional neural networks for multispectral pedestrian detection","volume":"82","author":"Ding","year":"2020","journal-title":"Signal Process. Image Commun."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1179","DOI":"10.1049\/iet-cvi.2018.5315","article-title":"Multi-layer fusion techniques using a CNN for multispectral pedestrian detection","volume":"12","author":"Chen","year":"2018","journal-title":"IET Comput. Vision"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Gani, M.O., Kuiry, S., Das, A., Nasipuri, M., and Das, N. (2021). Multispectral Object Detection with Deep Learning. arXiv.","DOI":"10.1007\/978-3-030-75529-4_9"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1109\/TPAMI.2011.155","article-title":"Pedestrian Detection: An Evaluation of the State of the Art","volume":"34","author":"Wojek","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.inffus.2018.09.015","article-title":"Cross-modality interactive attention network for multispectral pedestrian detection","volume":"50","author":"Zhang","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., and Liu, Z. (November, January 27). Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00523"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhou, K., Chen, L., and Cao, X. (2020, January 23\u201328). Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems. Proceedings of the Computer Vision-ECCV 2020-16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58523-5_46"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/12\/4184\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:18:17Z","timestamp":1760163497000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/12\/4184"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,18]]},"references-count":39,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2021,6]]}},"alternative-id":["s21124184"],"URL":"https:\/\/doi.org\/10.3390\/s21124184","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,18]]}}}