{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T22:11:21Z","timestamp":1773699081577,"version":"3.50.1"},"reference-count":53,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2021,4,26]],"date-time":"2021-04-26T00:00:00Z","timestamp":1619395200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["Grant Nos. 51775082, 61976039"],"award-info":[{"award-number":["Grant Nos. 51775082, 61976039"]}]},{"name":"China Fundamental Research Funds for the Central Universities","award":["Grant Nos. DUT19LAB36, DUT20GJ207"],"award-info":[{"award-number":["Grant Nos. DUT19LAB36, DUT20GJ207"]}]},{"name":"Science and Technology Innovation Fund of Dalian","award":["2018J12GX061"],"award-info":[{"award-number":["2018J12GX061"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.<\/jats:p>","DOI":"10.3390\/s21093031","type":"journal-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T06:19:11Z","timestamp":1619504351000},"page":"3031","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":63,"title":["Small Object Detection in Traffic Scenes Based on Attention Feature Fusion"],"prefix":"10.3390","volume":"21","author":[{"given":"Jing","family":"Lian","sequence":"first","affiliation":[{"name":"Faculty of Vehicle Engineering and Mechanics, School of Automotive Engineering, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuhang","family":"Yin","sequence":"additional","affiliation":[{"name":"Faculty of Vehicle Engineering and Mechanics, School of Automotive Engineering, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2667-8800","authenticated-orcid":false,"given":"Linhui","family":"Li","sequence":"additional","affiliation":[{"name":"Faculty of Vehicle Engineering and Mechanics, School of Automotive Engineering, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhenghao","family":"Wang","sequence":"additional","affiliation":[{"name":"Faculty of Vehicle Engineering and Mechanics, School of Automotive Engineering, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yafu","family":"Zhou","sequence":"additional","affiliation":[{"name":"Faculty of Vehicle Engineering and Mechanics, School of Automotive Engineering, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_3","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016). SSD: Single shot multibox detector. Computer Vision-ECCV 2016, Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_5","unstructured":"Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., and Sun, J. (2017). Light-Head R-CNN: In defense of two-stage object detector. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Qin, Z., Li, Z., Zhang, Z., Bao, Y., Yu, G., Peng, Y., and Sun, J. (2019). ThunderNet: Towards real-time generic object detection. arXiv.","DOI":"10.1109\/ICCV.2019.00682"},{"key":"ref_7","unstructured":"Jocher, G., Stoken, A., Borovec, J., Changyu, L., Hogan, A., Diaconu, L., Ingham, F., Poznanski, J., Fang, J., and Yu, L. (2020, November 16). YOLOv5. Available online: http:\/\/doi.org\/10.5281\/zenodo.4154370."},{"key":"ref_8","unstructured":"Fu, C., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C. (2017). DSSD: Deconvolutional single shot detector. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Singh, B., and Davis, L.S. (2018, January 18\u201323). An Analysis of Scale Invariance in Object Detection\u2014SNIP. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00377"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, Z., Gao, G., Sun, L., and Fang, Z. (2020). HRDNet: High-resolution detection network for small Objects. arXiv.","DOI":"10.1109\/ICME51207.2021.9428241"},{"key":"ref_11","unstructured":"Li, H., Xiong, P., An, J., and Wang, L. (2018). Pyramid attention network for semantic segmentation. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, W., Zhao, S., Shen, J., Hoi, S.C.H., and Borji, A. (2019, January 15\u201320). Salient Object Detection with Pyramid Attention and Salient Edges. Proceedings of the 2019 IEEE\/CVF Conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00154"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. Computer Vision-ECCV 2014, Springer.","DOI":"10.1007\/978-3-319-10578-9_23"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lin, T., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhu, C., He, Y., and Savvides, M. (2019, January 15\u201320). Feature selective anchor-free module for single-shot object detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00093"},{"key":"ref_20","unstructured":"Zhou, X., Wang, D., and Krhenb\u00fchl, P. (2019). Objects as Points. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 13\u201319). EfficientDet: Scalable and Efficient Object Detection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fan, D., Wang, W., Cheng, M., and Shen, J. (2019, January 15\u201320). Shifting more attention to video salient object detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00875"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J., and Kweon, I.S. (2018). CBAM: Convolutional Block Attention Module. Computer Vision-ECCV 2018, Springer.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, W., Hu, X., and Yang, J. (2019, January 15\u201320). Selective Kernel Networks. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00060"},{"key":"ref_27","first-page":"421","article-title":"Concurrent Spatial and Channel \u2018Squeeze & Excitation\u2019 in Fully Convolutional Networks","volume":"11070","author":"Roy","year":"2018","journal-title":"Med Image Comput. Comput. Assist. Interv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J., Lin, S., Wei, F., and Hu, H. (2019, January 27\u201328). GCNet: Non-local networks meet squeeze-excitation networks and beyond. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wang, X., Wei, Y., Huang, L., Shi, H., Liu, W., and Huang, T.S. (November, January 27). CCNet: Criss-cross attention for semantic segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00069"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_32","first-page":"234","article-title":"U-Net: Convolutional Networks for Biomedical Image Segmentation","volume":"9351","author":"Ronneberger","year":"2015","journal-title":"Med Image Comput. Comput. Assist. Interv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cai, Z., Fan, Q., Feris, R.S., and Vasconcelos, N. (2016). A Unified multi-scale deep convolutional neural network for fast object detection. Computer Vision-ECCV 2016, Springer.","DOI":"10.1007\/978-3-319-46493-0_22"},{"key":"ref_34","unstructured":"Li, Z., and Zhou, F. (2018). FSSD: Feature Fusion Single Shot Multibox Detector. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"4775","DOI":"10.1109\/TGRS.2017.2700322","article-title":"Deep Feature Fusion for VHR Remote Sensing Scene Classification","volume":"55","author":"Chaib","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_36","unstructured":"Lim, J., and Astrid, M. (2019). Small object detection using context and attention. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., and Lin, D. (2019, January 15\u201320). Libra R-CNN: Towards balanced learning for object detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00091"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Lin, T., and Le, Q.V. (2019, January 15\u201320). NAS-FPN: Learning scalable feature pyramid architecture for object detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00720"},{"key":"ref_39","unstructured":"Liu, S., Huang, D., and Wang, Y. (2019). Learning spatial fusion for single-shot object detection. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"194457","DOI":"10.1109\/ACCESS.2020.3031005","article-title":"A combined object detection method with application to pedestrian detection","volume":"8","author":"Gao","year":"2020","journal-title":"IEEE Access"},{"key":"ref_41","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_42","unstructured":"Liu, W., Rabinovich, A., and Berg, A.C. (2015). ParseNet: Looking wider to see better. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., and Barnard, K. (2021, January 5\u20139). Attentional Feature Fusion. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV).","DOI":"10.1109\/WACV48630.2021.00360"},{"key":"ref_44","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning (ICML)."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., and Vasudevan, V. (November, January 27). Searching for MobileNetV3. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00140"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., and Yeh, I.H. (2020, January 14\u201319). CSPNet: A new backbone that can enhance learning capability of CNN. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. (2020, January 13\u201319). BDD100K: A diverse driving dataset for heterogeneous multitask learning. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00271"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-IoU Loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_50","unstructured":"Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv."},{"key":"ref_51","unstructured":"Goyal, P., Doll\u00e1r, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. (2017). Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv."},{"key":"ref_52","unstructured":"Loshchilov, I., and Hutter, F. (2016). SGDR: Stochastic gradient descent with warm restarts. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft COCO: Common objects in context. Computer Vision-ECCV 2014, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/9\/3031\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:52:51Z","timestamp":1760161971000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/9\/3031"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,26]]},"references-count":53,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["s21093031"],"URL":"https:\/\/doi.org\/10.3390\/s21093031","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,26]]}}}