{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T10:27:56Z","timestamp":1781519276060,"version":"3.54.1"},"reference-count":28,"publisher":"MDPI AG","issue":"23","license":[{"start":{"date-parts":[[2021,11,28]],"date-time":"2021-11-28T00:00:00Z","timestamp":1638057600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Zhejiang Provincial Natural Science Foundation of China","award":["No. LZ20F010002"],"award-info":[{"award-number":["No. LZ20F010002"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No. 61703131"],"award-info":[{"award-number":["No. 61703131"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Key Research and Development Program of Zhejiang Province","award":["No. 2021C03029"],"award-info":[{"award-number":["No. 2021C03029"]}]},{"name":"National\u2002Defense\u2002Basic\u2002Scientific\u2002Research\u2002program\u2002of\u2002China","award":["No. JCKY2018415C004"],"award-info":[{"award-number":["No. JCKY2018415C004"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Attention mechanisms have demonstrated great potential in improving the performance of deep convolutional neural networks (CNNs). However, many existing methods dedicate to developing channel or spatial attention modules for CNNs with lots of parameters, and complex attention modules inevitably affect the performance of CNNs. During our experiments of embedding Convolutional Block Attention Module (CBAM) in light-weight model YOLOv5s, CBAM does influence the speed and increase model complexity while reduce the average precision, but Squeeze-and-Excitation (SE) has a positive impact in the model as part of CBAM. To replace the spatial attention module in CBAM and offer a suitable scheme of channel and spatial attention modules, this paper proposes one Spatio-temporal Sharpening Attention Mechanism (SSAM), which sequentially infers intermediate maps along channel attention module and Sharpening Spatial Attention (SSA) module. By introducing sharpening filter in spatial attention module, we propose SSA module with low complexity. We try to find a scheme to combine our SSA module with SE module or Efficient Channel Attention (ECA) module and show best improvement in models such as YOLOv5s and YOLOv3-tiny. Therefore, we perform various replacement experiments and offer one best scheme that is to embed channel attention modules in backbone and neck of the model and integrate SSAM into YOLO head. We verify the positive effect of our SSAM on two general object detection datasets VOC2012 and MS COCO2017. One for obtaining a suitable scheme and the other for proving the versatility of our method in complex scenes. Experimental results on the two datasets show obvious promotion in terms of average precision and detection performance, which demonstrates the usefulness of our SSAM in light-weight YOLO models. Furthermore, visualization results also show the advantage of enhancing positioning ability with our SSAM.<\/jats:p>","DOI":"10.3390\/s21237949","type":"journal-article","created":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T01:45:02Z","timestamp":1638323102000},"page":"7949","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["One Spatio-Temporal Sharpening Attention Mechanism for Light-Weight YOLO Models Based on Sharpening Spatial Attention"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1739-9201","authenticated-orcid":false,"given":"Mengfan","family":"Xue","sequence":"first","affiliation":[{"name":"School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Minghao","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China"},{"name":"HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dongliang","family":"Peng","sequence":"additional","affiliation":[{"name":"School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yunfei","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Huajie","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,28]]},"reference":[{"key":"ref_1","first-page":"12","article-title":"Survey of object detection algorithms for deep convolutional neural networks","volume":"56","author":"Huang","year":"2020","journal-title":"Comput. Eng. Appl."},{"key":"ref_2","first-page":"44","article-title":"Review on Single-Stage Object Detection Algorithm Based on Deep Learning","volume":"27","author":"Liu","year":"2020","journal-title":"Aero Weapon."},{"key":"ref_3","first-page":"56","article-title":"A survey of target detection based on deep learning","volume":"27","author":"Lu","year":"2020","journal-title":"Electron. Opt. Control."},{"key":"ref_4","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Duan, K.W., Bai, S., Xie, L.X., Qi, H., Huang, Q., and Tian, Q. (2019, January 27\u201328). CenterNet: Keypoint triplets for object detection. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017). Yolo9000: Better, faster, stronger. arXiv.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single shot multibox detector. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K.M., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_10","unstructured":"Tian, Z., Shen, C.H., Chen, H., and He, T. (November, January 27). FCOS: Fully convolutional one-stage object detection. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","unstructured":"Dai, J.F., Li, Y., He, K.M., and Sun, J. (2016, January 4\u20139). R-FCN: Object detection via region-based fully convolutional networks. Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, Q.L., Wu, B.G., Zhu, P.F., Li, P., Zuo, W., and Hu, Q. (2020, January 14\u201319). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Woo, S.H., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_17","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv."},{"key":"ref_18","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H.F., Shi, J., and Jia, J. (2018, January 18\u201322). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.-Y., Hsieh, J.-W., and Yeh, I.-H. (2020, January 14\u201319). CSPNet: A New Backbone that can Enhance Learning Capability of CNN. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_22","unstructured":"Wang, Q.L., Gao, Z.L., Xie, J.T., Zuo, W.M., and Li, P.H. (2018, January 3\u20138). Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. Proceedings of the 32nd Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_23","unstructured":"Hu, J., Shen, L., Albanie, S., Sun, G., and Vedaldi, A. (2018, January 3\u20138). Gather-excite: Exploiting feature context in convolutional neural networks. Proceedings of the 32nd Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J.R., Lin, S., Wei, F., and Hu, H. (2019, January 27\u201328). GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond. Proceedings of the IEEE International Conference on Computer Vision Workshops, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_25","unstructured":"Ding, X.H., Guo, Y.C., Ding, G.G., and Han, J. (November, January 29). ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H.J., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Xie, S.N., Girshick, R., Dollar, P., Tu, Z.W., and He, K.M. (2017, January 21\u201326). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/23\/7949\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:37:04Z","timestamp":1760168224000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/23\/7949"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,28]]},"references-count":28,"journal-issue":{"issue":"23","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["s21237949"],"URL":"https:\/\/doi.org\/10.3390\/s21237949","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,28]]}}}