{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,24]],"date-time":"2026-06-24T06:31:39Z","timestamp":1782282699643,"version":"3.54.5"},"reference-count":48,"publisher":"MDPI AG","issue":"20","license":[{"start":{"date-parts":[[2023,10,17]],"date-time":"2023-10-17T00:00:00Z","timestamp":1697500800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2020YFB1713300"],"award-info":[{"award-number":["2020YFB1713300"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2021YFB2601000"],"award-info":[{"award-number":["2021YFB2601000"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2023-JC-QN-0664"],"award-info":[{"award-number":["2023-JC-QN-0664"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2023JBGS-13"],"award-info":[{"award-number":["2023JBGS-13"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Natural Science Foundation of Shaanxi Province","award":["2020YFB1713300"],"award-info":[{"award-number":["2020YFB1713300"]}]},{"name":"Natural Science Foundation of Shaanxi Province","award":["2021YFB2601000"],"award-info":[{"award-number":["2021YFB2601000"]}]},{"name":"Natural Science Foundation of Shaanxi Province","award":["2023-JC-QN-0664"],"award-info":[{"award-number":["2023-JC-QN-0664"]}]},{"name":"Natural Science Foundation of Shaanxi Province","award":["2023JBGS-13"],"award-info":[{"award-number":["2023JBGS-13"]}]},{"name":"Key Research and Development Program of Shaanxi Province","award":["2020YFB1713300"],"award-info":[{"award-number":["2020YFB1713300"]}]},{"name":"Key Research and Development Program of Shaanxi Province","award":["2021YFB2601000"],"award-info":[{"award-number":["2021YFB2601000"]}]},{"name":"Key Research and Development Program of Shaanxi Province","award":["2023-JC-QN-0664"],"award-info":[{"award-number":["2023-JC-QN-0664"]}]},{"name":"Key Research and Development Program of Shaanxi Province","award":["2023JBGS-13"],"award-info":[{"award-number":["2023JBGS-13"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection is one of the fundamental tasks in computer vision, holding immense significance in the realm of intelligent mobile scenes. This paper proposes a hybrid cross-feature interaction (HCFI) attention module for object detection in intelligent mobile scenes. Firstly, the paper introduces multiple kernel (MK) spatial pyramid pooling (SPP) based on SPP and improves the channel attention using its structure. This results in a hybrid cross-channel interaction (HCCI) attention module with better cross-channel interaction performance. Additionally, we bolster spatial attention by incorporating dilated convolutions, leading to the creation of the cross-spatial interaction (CSI) attention module with superior cross-spatial interaction performance. By seamlessly combining the above two modules, we achieve an improved HCFI attention module without resorting to computationally expensive operations. Through a series of experiments involving various detectors and datasets, our proposed method consistently demonstrates superior performance. This results in a performance improvement of 1.53% for YOLOX on COCO and a performance boost of 2.05% for YOLOv5 on BDD100K. Furthermore, we propose a solution that combines HCCI and HCFI to address the challenge of extremely small output feature layers in detectors, such as SSD. The experimental results indicate that the proposed method significantly improves the attention capability of object detection in intelligent mobile scenes.<\/jats:p>","DOI":"10.3390\/rs15204991","type":"journal-article","created":{"date-parts":[[2023,10,17]],"date-time":"2023-10-17T08:10:19Z","timestamp":1697530219000},"page":"4991","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Hybrid Cross-Feature Interaction Attention Module for Object Detection in Intelligent Mobile Scenes"],"prefix":"10.3390","volume":"15","author":[{"given":"Di","family":"Tian","sequence":"first","affiliation":[{"name":"Mechanical Engineering College, Xi\u2019an Shiyou University, Xi\u2019an 710065, China"},{"name":"School of Automobile, Chang\u2019an University, Xi\u2019an 710064, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yi","family":"Han","sequence":"additional","affiliation":[{"name":"School of Automobile, Chang\u2019an University, Xi\u2019an 710064, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3643-4276","authenticated-orcid":false,"given":"Yongtao","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Automobile, Chang\u2019an University, Xi\u2019an 710064, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiabo","family":"Li","sequence":"additional","affiliation":[{"name":"Mechanical Engineering College, Xi\u2019an Shiyou University, Xi\u2019an 710065, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ping","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Automobile, Chang\u2019an University, Xi\u2019an 710064, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ming","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,10,17]]},"reference":[{"key":"ref_1","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/JPROC.2023.3238524","article-title":"Object Detection in 20 Years: A Survey","volume":"111","author":"Zou","year":"2019","journal-title":"Proc. IEEE"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/s11263-019-01247-4","article-title":"Deep Learning for Generic Object Detection: A Survey","volume":"128","author":"Liu","year":"2018","journal-title":"Int. J. Comput. Vis."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3212","DOI":"10.1109\/TNNLS.2018.2876865","article-title":"Object Detection with Deep Learning: A Review","volume":"30","author":"Zhao","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"103910","DOI":"10.1016\/j.imavis.2020.103910","article-title":"Recent advances in small object detection based on deep learning: A review","volume":"97","author":"Tong","year":"2020","journal-title":"Image Vis. Comput."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"15898","DOI":"10.1109\/TITS.2022.3146271","article-title":"ID-YOLO: Real-Time Salient Object Detection Based on the Driver\u2019s Fixation Region","volume":"23","author":"Qin","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"5410049","DOI":"10.1155\/2021\/5410049","article-title":"A Review of Intelligent Driving Pedestrian Detection Based on Deep Learning","volume":"2021","author":"Tian","year":"2021","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"25345","DOI":"10.1109\/TITS.2022.3158253","article-title":"Edge YOLO: Real-Time Intelligent Object Detection System Based on Edge-Cloud Cooperation in Autonomous Vehicles","volume":"23","author":"Liang","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, X., Ban, Y., Guo, H., and Hong, L. (2019\u20132, January 28). Deep Learning Model for Target Detection in Remote Sensing Images Fusing Multilevel Features. Proceedings of the IGARSS 2019\u20142019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8898759"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Han, X., Zhong, Y., and Zhang, L. (2017). An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery. Remote Sens., 9.","DOI":"10.3390\/rs9070666"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Guo, W., Yang, W., Zhang, H., and Hua, G. (2018). Geospatial Object Detection in High Resolution Satellite Images Based on Multi-Scale Convolutional Neural Network. Remote Sens., 10.","DOI":"10.3390\/rs10010131"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"638182","DOI":"10.3389\/fonc.2021.638182","article-title":"Artificial Convolutional Neural Network in Object Detection and Semantic Segmentation for Medical Imaging Analysis","volume":"11","author":"Yang","year":"2021","journal-title":"Front. Oncol."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Rezaei, M., Yang, H., and Meinel, C. (2018, January 8\u201313). Instance Tumor Segmentation using Multitask Convolutional Neural Network. Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil.","DOI":"10.1109\/IJCNN.2018.8489105"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1097\/BRS.0000000000003749","article-title":"Automated Detection of Spinal Schwannomas Utilizing Deep Learning Based on Object Detection from MRI","volume":"46","author":"Ito","year":"2020","journal-title":"Spine"},{"key":"ref_17","unstructured":"Sande, K.E., Uijlings, J.R., Gevers, T., and Smeulders, A. (2011, January 6\u201313). Segmentation as selective search for object recognition. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.sigpro.2014.08.004","article-title":"Flexible sliding windows with adaptive pixel strides","volume":"110","author":"Jiang","year":"2015","journal-title":"Signal Process."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s41095-021-0229-5","article-title":"PCT: Point cloud transformer","volume":"7","author":"Guo","year":"2020","journal-title":"Comput. Vis. Media"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Tay, F., Feng, J., and Yan, S. (2021, January 10\u201317). Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00060"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wu, H., Xiao, B., Codella, N.C., Liu, M., Dai, X., Yuan, L., and Zhang, L. (2021, January 10\u201317). CvT: Introducing Convolutions to Vision Transformers. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P.F., Li, P., Zuo, W., and Hu, Q. (2020, January 13\u201319). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","article-title":"Squeeze-and-Excitation Networks","volume":"42","author":"Hu","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","unstructured":"Woo, S., Park, J., Lee, J., and Koeon, I. (2018). European Conference on Computer Vision, Springer."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","unstructured":"Viola, P.A., and Jones, M.J. (2001, January 8\u201314). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Patle, A., and Chouhan, D.S. (2013, January 23\u201325). SVM kernel functions for classification. Proceedings of the 2013 International Conference on Advances in Technology and Engineering (ICATE), Mumbai, India.","DOI":"10.1109\/ICAdTE.2013.6524743"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Girshick, R.B., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Girshick, R.B. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., and Berg, A. (2016). European Conference on Computer Vision, Springer."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S.K., Girshick, R.B., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_34","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_35","unstructured":"Bochkovskiy, A., Wang, C., and Liao, H.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_36","unstructured":"Jocher, G. (2023, June 05). YOLOv5. Available online: https:\/\/github.com\/ultralytics\/yolov5."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"642","DOI":"10.1007\/s11263-019-01204-1","article-title":"CornerNet: Detecting Objects as Paired Keypoints","volume":"128","author":"Law","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_38","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). YOLOX: Exceeding YOLO Series in 2021. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1159","DOI":"10.1177\/0278364917726587","article-title":"Survey of recent advances in 3D visual attention for robotics","volume":"36","author":"Potapova","year":"2017","journal-title":"Int. J. Robot. Res."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1007\/s11263-017-1042-6","article-title":"Attentive Systems: A Survey","volume":"126","author":"Nguyen","year":"2018","journal-title":"Int. J. Comput. Vis."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"5411","DOI":"10.1007\/s00500-020-05539-7","article-title":"Cross-modality Co-attention Networks for Visual Question Answering","volume":"25","author":"Han","year":"2021","journal-title":"Soft Comput."},{"key":"ref_42","unstructured":"Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K. (2015, January 7\u201312). Spatial Transformer Networks. Proceedings of the 29th Annual Conference on Neural Information Processing Systems, Montreal, QC Canada."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R.B., Gupta, A.K., and He, K. (2018, January 18\u201323). Non-local Neural Networks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Gao, Z., Xie, J., Wang, Q., and Li, P. (2019, January 15\u201320). Global Second-Order Pooling Convolutional Networks. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00314"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual Attention Network for Scene Segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_47","unstructured":"Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). European Conference on Computer Vision, Springer."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. (2020, January 13\u201319). BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00271"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/20\/4991\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:08:15Z","timestamp":1760130495000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/20\/4991"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,17]]},"references-count":48,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["rs15204991"],"URL":"https:\/\/doi.org\/10.3390\/rs15204991","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,17]]}}}