{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T01:37:11Z","timestamp":1773106631946,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2023,4,12]],"date-time":"2023-04-12T00:00:00Z","timestamp":1681257600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Multispectral pedestrian detection via visible and thermal image pairs has received widespread attention in recent years. It provides a promising multi-modality solution to address the challenges of pedestrian detection in low-light environments and occlusion situations. Most existing methods directly blend the results of the two modalities or combine the visible and thermal features via a linear interpolation. However, such fusion strategies tend to extract coarser features corresponding to the positions of different modalities, which may lead to degraded detection performance. To mitigate this, this paper proposes a novel and adaptive cross-modality fusion framework, named Hierarchical Attentive Fusion Network (HAFNet), which fully exploits the multispectral attention knowledge to inspire pedestrian detection in the decision-making process. Concretely, we introduce a Hierarchical Content-dependent Attentive Fusion (HCAF) module to extract top-level features as a guide to pixel-wise blending features of two modalities to enhance the quality of the feature representation and a plug-in multi-modality feature alignment (MFA) block to fine-tune the feature alignment of two modalities. Experiments on the challenging KAIST and CVC-14 datasets demonstrate the superior performance of our method with satisfactory speed.<\/jats:p>","DOI":"10.3390\/rs15082041","type":"journal-article","created":{"date-parts":[[2023,4,13]],"date-time":"2023-04-13T01:35:00Z","timestamp":1681349700000},"page":"2041","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2552-0902","authenticated-orcid":false,"given":"Peiran","family":"Peng","sequence":"first","affiliation":[{"name":"The School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Tingfa","family":"Xu","sequence":"additional","affiliation":[{"name":"The School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China"},{"name":"Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education of China, Beijing 100081, China"},{"name":"Chongqing Innovation Center, Beijing Institute of Technology, Chongqing 401135, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6734-5247","authenticated-orcid":false,"given":"Bo","family":"Huang","sequence":"additional","affiliation":[{"name":"College of Optoelectronic Engineering, Chongqing University, Chongqing 400044, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6936-9485","authenticated-orcid":false,"given":"Jianan","family":"Li","sequence":"additional","affiliation":[{"name":"Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education of China, Beijing 100081, China"},{"name":"Chongqing Innovation Center, Beijing Institute of Technology, Chongqing 401135, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kuras, A., Brell, M., Liland, K.H., and Burud, I. (2023). Multitemporal Feature-Level Fusion on Hyperspectral and LiDAR Data in the Urban Environment. Remote Sens., 15.","DOI":"10.3390\/rs15030632"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"You, Y., Cao, J., and Zhou, W. (2020). A survey of change detection methods based on remote sensing images for multi-source and multi-objective scenarios. Remote Sens., 12.","DOI":"10.3390\/rs12152460"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wu, B., Iandola, F., Jin, P.H., and Keutzer, K. (2017, January 21\u201326). Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.60"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"24041","DOI":"10.1007\/s11042-018-5728-8","article-title":"Pedestrian tracking in surveillance video based on modified CNN","volume":"77","author":"Luo","year":"2018","journal-title":"Multimed. Tools Appl."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1109\/TITS.2016.2567418","article-title":"A unified framework for concurrent pedestrian and cyclist detection","volume":"18","author":"Li","year":"2016","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.patcog.2018.08.005","article-title":"Illumination-aware faster R-CNN for robust multispectral pedestrian detection","volume":"85","author":"Li","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, H., Fromont, E., Lefevre, S., and Avignon, B. (2020, January 25\u201328). Multispectral fusion for object detection with cyclic fuse-and-refine blocks. Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/ICIP40778.2020.9191080"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.inffus.2018.09.015","article-title":"Cross-modality interactive attention network for multispectral pedestrian detection","volume":"50","author":"Zhang","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1510","DOI":"10.1109\/TCSVT.2021.3076466","article-title":"Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection","volume":"32","author":"Kim","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.1109\/TPAMI.2014.2300479","article-title":"Fast feature pyramids for object detection","volume":"36","author":"Appel","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhou, K., Chen, L., and Cao, X. (2020, January 23\u201328). Improving multispectral pedestrian detection by addressing modality imbalance problems. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58523-5_46"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Liu, J., Zhang, S., Wang, S., and Metaxas, D.N. (2016). Multispectral deep neural networks for pedestrian detection. arXiv.","DOI":"10.5244\/C.30.73"},{"key":"ref_13","unstructured":"Qingyun, F., Dapeng, H., and Zhaokui, W. (2021). Cross-modality fusion transformer for multispectral object detection. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hwang, S., Park, J., Kim, N., Choi, Y., and So Kweon, I. (2015, January 7\u201312). Multispectral pedestrian detection: Benchmark dataset and baseline. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298706"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Gonz\u00e1lez, A., Fang, Z., Socarras, Y., Serrat, J., V\u00e1zquez, D., Xu, J., and L\u00f3pez, A.M. (2016). Pedestrian detection at day\/night time with visible and FIR cameras: A comparison. Sensors, 16.","DOI":"10.3390\/s16060820"},{"key":"ref_16","unstructured":"Wagner, J., Fischer, V., Herman, M., and Behnke, S. (2016, January 27\u201329). Multispectral pedestrian detection using deep fusion convolutional neural networks. Proceedings of the ESANN, Bruges, Belgium."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., and Teutsch, M. (2017, January 21\u201326). Fully convolutional region proposal networks for multispectral person detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.36"},{"key":"ref_18","unstructured":"Li, C., Song, D., Tong, R., and Tang, M. (2018). Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.inffus.2018.11.017","article-title":"Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection","volume":"50","author":"Guan","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhang, L., Liu, Z., Zhu, X., Song, Z., Yang, X., Lei, Z., and Qiao, H. (2021). Weakly aligned feature fusion for multimodal object detection. arXiv.","DOI":"10.1109\/TNNLS.2021.3105143"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"7846","DOI":"10.1109\/LRA.2021.3099870","article-title":"MLPD: Multi-label pedestrian detector in multispectral domain","volume":"6","author":"Kim","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201322). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_25","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1882","DOI":"10.1109\/LSP.2016.2618776","article-title":"Image fusion with convolutional sparse representation","volume":"23","author":"Liu","year":"2016","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_27","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tan, S., and Tong, Y. (2020, January 23\u201328). Semantic flow for fast and accurate scene parsing. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_45"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_30","unstructured":"Glorot, X., and Bengio, Y. (2010, January 13\u201315). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy."},{"key":"ref_31","unstructured":"Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., and Liu, Z. (November, January 27). Weakly aligned cross-modal learning for multispectral pedestrian detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Venice, Italy."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yang, X., Qiang, Y., Zhu, H., Wang, C., and Yang, M. (2021). BAANet: Learning bi-directional adaptive attention gates for multispectral pedestrian detection. arXiv.","DOI":"10.1109\/ICRA46639.2022.9811999"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, Q., Chi, Y., Shen, T., Song, J., Zhang, Z., and Zhu, Y. (2022). Improving RGB-infrared object detection by reducing cross-modality redundancy. Remote Sens., 14.","DOI":"10.3390\/rs14092020"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.patcog.2018.03.007","article-title":"Unified multi-spectral pedestrian detection based on probabilistic fusion networks","volume":"80","author":"Park","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_35","unstructured":"Choi, H., Kim, S., Park, K., and Sohn, K. (2016, January 4\u20138). Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/8\/2041\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:14:50Z","timestamp":1760123690000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/8\/2041"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,12]]},"references-count":35,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["rs15082041"],"URL":"https:\/\/doi.org\/10.3390\/rs15082041","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,12]]}}}