{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T21:24:57Z","timestamp":1775683497593,"version":"3.50.1"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,2,15]],"date-time":"2024-02-15T00:00:00Z","timestamp":1707955200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,15]],"date-time":"2024-02-15T00:00:00Z","timestamp":1707955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U1804147"],"award-info":[{"award-number":["U1804147"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Innovative Scientists and Technicians Team of Henan Provincial High Education","award":["20IRTSTHN019"],"award-info":[{"award-number":["20IRTSTHN019"]}]},{"name":"Science and Technology Project of Henan Province","award":["212102210508"],"award-info":[{"award-number":["212102210508"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To solve the problem of insufficient feature fusion between the deep and shallow feature layers of the original YOLOX algorithm, which resulting in a loss of object semantic information, this paper proposes a YOLOX object detection algorithm based on attention and bidirectional cross-scale path aggregation. First, an efficient channel attention module is embedded in the YOLOX backbone network to reinforce the key features in the object region by distinguishing between the importance of the different channels in the feature layer, thus enhancing the detection accuracy of the network. Second, a bidirectional cross-scale path aggregation network is designed to change the information fusion circulation path while increasing the cross-scale connections. Weighted feature fusion is used to learn the importance of the different path input features for differentiated fusion, thereby improving the feature information fusion capability between the deep and shallow layers. Finally, the SIOU loss function is introduced to improve the detection performance of the network. The experimental results show that on the PASCAL VOC2007 and MS COCO2017 datasets, the algorithm in this paper improves mAP by 2.32% and 1.53% compared with the original YOLOX algorithm, and has comprehensive performance advantages compared with other algorithms. The mAP reaches 99.44% on the self-built iron ore metal foreign matter dataset, with a recognition speed of 56.90 frames\/s.<\/jats:p>","DOI":"10.1007\/s11063-024-11536-w","type":"journal-article","created":{"date-parts":[[2024,2,15]],"date-time":"2024-02-15T07:02:39Z","timestamp":1707980559000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A YOLOX Object Detection Algorithm Based on Bidirectional Cross-scale Path Aggregation"],"prefix":"10.1007","volume":"56","author":[{"given":"Qunpo","family":"Liu","sequence":"first","affiliation":[]},{"given":"Jingwen","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yi","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Xuhui","family":"Bu","sequence":"additional","affiliation":[]},{"given":"Naohiko","family":"Hanajima","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,15]]},"reference":[{"key":"11536_CR1","unstructured":"Zhang H (2020) Research on tunnel microseismic signal processing and intelligent rock burst early warning based on deep learning. Dissertation, Chengdu University of Technology"},{"key":"11536_CR2","unstructured":"Sun X L (2022) Research on generative target tracking method under deep learning framework. Dissertation, University of Chinese Academy of Sciences (Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences)"},{"key":"11536_CR3","doi-asserted-by":"crossref","unstructured":"Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580\u2013587","DOI":"10.1109\/CVPR.2014.81"},{"key":"11536_CR4","doi-asserted-by":"crossref","unstructured":"Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440\u20131448","DOI":"10.1109\/ICCV.2015.169"},{"key":"11536_CR5","first-page":"589","volume":"28","author":"S Ren","year":"2015","unstructured":"Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:589\u2013598","journal-title":"Adv Neural Inf Process Syst"},{"key":"11536_CR6","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Computer vision-ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Proceedings, Part I 14. Springer, pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"11536_CR7","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"11536_CR8","doi-asserted-by":"crossref","unstructured":"Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263\u20137271","DOI":"10.1109\/CVPR.2017.690"},{"key":"11536_CR9","unstructured":"Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767"},{"key":"11536_CR10","unstructured":"Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934"},{"key":"11536_CR11","unstructured":"Ge Z, Liu S, Wang F et al (2021) YOLOX: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430"},{"issue":"2","key":"11536_CR12","doi-asserted-by":"publisher","first-page":"428","DOI":"10.1049\/ipr2.12643","volume":"17","author":"Q Liu","year":"2023","unstructured":"Liu Q, Wang M, Wang H et al (2023) MPGI-terminal defect detection based on M-FRCNN. IET Image Process 17(2):428\u2013438","journal-title":"IET Image Process"},{"key":"11536_CR13","doi-asserted-by":"crossref","unstructured":"Liu Q, Bi J, Zhang J et al (2022) B-FPN SSD: an SSD algorithm based on a bidirectional feature fusion pyramid. Vis Comput 1\u201313","DOI":"10.1007\/s00371-022-02727-4"},{"key":"11536_CR14","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1016\/j.compag.2019.01.012","volume":"157","author":"Y Tian","year":"2019","unstructured":"Tian Y, Yang G, Wang Z et al (2019) Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput Electron Agric 157:417\u2013426","journal-title":"Comput Electron Agric"},{"issue":"11","key":"11536_CR15","first-page":"2707","volume":"36","author":"CY Liu","year":"2021","unstructured":"Liu CY, Wang Q, Bi XJ (2021) Multi-target small-scale vehicle target detection method. Control Decis Mak 36(11):2707\u20132712","journal-title":"Control Decis Mak"},{"key":"11536_CR16","first-page":"1","volume":"70","author":"Y Cai","year":"2021","unstructured":"Cai Y, Luan T, Gao H et al (2021) YOLOv4-5D: an effective and efficient object detector for autonomous driving. IEEE Trans Instrum Meas 70:1\u201313","journal-title":"IEEE Trans Instrum Meas"},{"issue":"11","key":"11536_CR17","first-page":"2156","volume":"56","author":"F Li","year":"2022","unstructured":"Li F, Hu K, Zhang Daniel, Wang WS, Jiang H (2022) Multi-dimensional detection of longitudinal tear of conveyor belt based on mixed domain attention YOLOv4. J Zhejiang Univ (Eng Sci) 56(11):2156\u20132167","journal-title":"J Zhejiang Univ (Eng Sci)"},{"key":"11536_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.compag.2022.107345","volume":"202","author":"J Li","year":"2022","unstructured":"Li J, Qiao Y, Liu S et al (2022) An improved YOLOv5-based vegetable disease detection method. Comput Electron Agric 202:107345","journal-title":"Comput Electron Agric"},{"issue":"11","key":"11536_CR19","first-page":"4147","volume":"47","author":"S Hao","year":"2022","unstructured":"Hao S, Zhang X, Ma X, Sun SY, Wen H, Wang JL (2022) Foreign body detection of coal mine conveyor belt based on CBAM-YOLOv5. J China Coal Soc 47(11):4147\u20134156","journal-title":"J China Coal Soc"},{"issue":"9","key":"11536_CR20","doi-asserted-by":"publisher","first-page":"3059","DOI":"10.1007\/s00371-022-02561-8","volume":"38","author":"C Xia","year":"2022","unstructured":"Xia C, Sun Y, Gao X et al (2022) DMINet: dense multi-scale inference network for salient object detection. Vis Comput 38(9):3059\u20133072","journal-title":"Vis Comput"},{"key":"11536_CR21","first-page":"1","volume":"2022","author":"P Wang","year":"2022","unstructured":"Wang P, Wang M, He D (2022) Multi-scale feature pyramid and multi-branch neural network for person re-identification. Vis Comput 2022:1\u201313","journal-title":"Vis Comput"},{"key":"11536_CR22","doi-asserted-by":"crossref","unstructured":"Tian Z, Shen C, Chen H et al (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 9627\u20139636","DOI":"10.1109\/ICCV.2019.00972"},{"key":"11536_CR23","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700\u20134708","DOI":"10.1109\/CVPR.2017.243"},{"key":"11536_CR24","unstructured":"Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516"},{"key":"11536_CR25","doi-asserted-by":"crossref","unstructured":"Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 7464\u20137475","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"11536_CR26","doi-asserted-by":"crossref","unstructured":"Wang Q, Wu B, Zhu P et al (2020) ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 11534\u201311542","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"11536_CR27","unstructured":"Gevorgyan Z (2022) SIoU loss: more powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740"},{"key":"11536_CR28","doi-asserted-by":"crossref","unstructured":"Rezatofighi H, Tsoi N, Gwak JY et al (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 658\u2013666","DOI":"10.1109\/CVPR.2019.00075"},{"key":"11536_CR29","doi-asserted-by":"crossref","unstructured":"Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759\u20138768","DOI":"10.1109\/CVPR.2018.00913"},{"key":"11536_CR30","doi-asserted-by":"crossref","unstructured":"Jiang B, Luo R, Mao J et al (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 784\u2013799","DOI":"10.1007\/978-3-030-01264-9_48"},{"key":"11536_CR31","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"11536_CR32","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee JY et al (2018) CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"11536_CR33","doi-asserted-by":"crossref","unstructured":"Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13713\u201313722","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"11536_CR34","doi-asserted-by":"crossref","unstructured":"Selvaraju RR, Cogswell M, Das A et al (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618\u2013626","DOI":"10.1109\/ICCV.2017.74"},{"key":"11536_CR35","doi-asserted-by":"crossref","unstructured":"Yu J, Jiang Y, Wang Z et al (2016) UnitBox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on Multimedia, pp 516\u2013520","DOI":"10.1145\/2964284.2967274"},{"key":"11536_CR36","doi-asserted-by":"crossref","unstructured":"Zheng Z, Wang P, Liu W et al (2020) Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, vol 34(07), pp 12993\u201313000","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"11536_CR37","doi-asserted-by":"crossref","unstructured":"Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213\u2013229","DOI":"10.1007\/978-3-030-58452-8_13"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11536-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11536-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11536-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,29]],"date-time":"2024-02-29T20:16:30Z","timestamp":1709237790000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11536-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,15]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,2]]}},"alternative-id":["11536"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11536-w","relation":{},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,15]]},"assertion":[{"value":"8 January 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 February 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"The article does not involve research of humans and\/or animals.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics Approval"}},{"value":"Informed consent was obtained from all individual participants included in the study.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Participate"}},{"value":"All authors approved the final manuscript and submission to this journal.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for Publication"}}],"article-number":"35"}}