{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T16:08:46Z","timestamp":1777910926310,"version":"3.51.4"},"reference-count":38,"publisher":"SAGE Publications","issue":"8","license":[{"start":{"date-parts":[[2024,4,22]],"date-time":"2024-04-22T00:00:00Z","timestamp":1713744000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"Intelligent Logistics Interdisciplinary Team Project of BUPT"},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFC3302200"],"award-info":[{"award-number":["2022YFC3302200"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Transactions of the Institute of Measurement and Control"],"published-print":{"date-parts":[[2025,5]]},"abstract":"<jats:p>In the domain of autonomous driving, object detection presents several complex challenges, particularly concerning the accurate identification of small and salient objects. This paper introduces DL-YOLOX (Dilated Enhancement YOLOX), which flexibly uses dilated convolution to enhance features to achieve the purpose of improving small objects and silent objects. As we all know, a large receptive field covers a larger area and has greater contextual information, which is more advantageous for detecting large targets. A small receptive field helps capture local details and has better detection capabilities for detecting small targets. To bolster the representation of objects across various scales, we propose the integration of Dilated Adaptive Feature Fusion (DAFF) which has the ability to adaptively fuse features with different receptive fields. This innovative fusion mechanism allows for a more comprehensive understanding of objects, enabling improved detection accuracy even for objects of varying sizes. In addition, we tackle the issue of small object loss during feature propagation by introducing Stack Dilated Module (SDM), a powerful module that mitigates this phenomenon and contributes to better detection performance. Moreover, we endeavor to enhance small object detection further by replacing the conventional Intersection over Union (IoU) metric with Normalized Gaussian Wasserstein Distance (NWD), a novel distance metric that proves to be more effective in accurately gauging small object detection, thus elevating the precision of our algorithm. To thoroughly evaluate the robustness and generalization capabilities of our proposed method, we conduct extensive experiments on two benchmark datasets, namely MS COCO 2017 and BDD100K. The results from our evaluation not only affirm the significant improvements achieved in multi-scale object detection but also highlight the real-time capability of our approach. The impressive performance across these datasets demonstrates the promising potential of DL-YOLOX in revolutionizing object detection techniques in the context of autonomous driving.<\/jats:p>","DOI":"10.1177\/01423312241239020","type":"journal-article","created":{"date-parts":[[2024,4,22]],"date-time":"2024-04-22T05:57:41Z","timestamp":1713765461000},"page":"1556-1569","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["DL-YOLOX: Real-time object detection via adjustable dilated enhancement for autonomous driving scene"],"prefix":"10.1177","volume":"47","author":[{"given":"Qing","family":"Song","sequence":"first","affiliation":[{"name":"Beijing University of Posts and Telecommunications, China"}]},{"given":"Boyuan","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, China"}]},{"given":"Yuandong","family":"Ma","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, China"}]},{"given":"Mengjie","family":"Hu","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2834-9461","authenticated-orcid":false,"given":"Chun","family":"Liu","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, China"}]}],"member":"179","published-online":{"date-parts":[[2024,4,22]]},"reference":[{"key":"e_1_3_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00644"},{"key":"e_1_3_2_3_1","first-page":"213","article-title":"End-to-end object detection with transformers","volume":"2020","author":"Carion N","year":"2020","unstructured":"Carion N, Massa F, Synnaeve G, et al. (2020) End-to-end object detection with transformers. Computer Vision\u2014ECCV 2020: 213\u2013229.","journal-title":"Computer Vision\u2014ECCV"},{"key":"e_1_3_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2699184"},{"key":"e_1_3_2_5_1","unstructured":"Chen LC Papandreou G Schroff F et al. (2017) Rethinking atrous convolution for semantic image segmentation. Available at: https:\/\/arxiv.org\/abs\/1706.05587"},{"key":"e_1_3_2_6_1","unstructured":"Dosovitskiy A Beyer L Kolesnikov A et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale. Available at: https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00667"},{"key":"e_1_3_2_8_1","unstructured":"Ge Z Liu S Wang F et al. (2021) YOLOX: Exceeding YOLO series in 2021. Available at: https:\/\/arxiv.org\/abs\/2107.08430"},{"key":"e_1_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00162"},{"key":"e_1_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_13_1","unstructured":"Kalchbrenner N Oord A Simonyan K et al. (2017) Video pixel networks. Available at: https:\/\/arxiv.org\/abs\/1610.00527"},{"key":"e_1_3_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01204-1"},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00060"},{"key":"e_1_3_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2858826"},{"key":"e_1_3_2_18_1","doi-asserted-by":"crossref","unstructured":"Lin TY Maire M Belongie S et al. (2014) Microsoft COCO: Common objects in context. Computer Vision\u2014ECCV 20148693: 740\u2013755. Available at: https:\/\/doi.org\/10.1007\/978-3-319-10602-1_48","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00913"},{"key":"e_1_3_2_20_1","doi-asserted-by":"crossref","unstructured":"Liu W Anguelov D Erhan D et al. (2016) SSD: Single shot multibox detector. Computer Vision\u2014ECCV 20169905: 21\u201337. Available at: https:\/\/doi.org\/10.1007\/978-3-319-46448-0_2","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_3_2_23_1","unstructured":"Oord A Dieleman S Zen H et al. (2016) WaveNet: A generative model for raw audio. Available at: https:\/\/arxiv.org\/abs\/1609.03499"},{"key":"e_1_3_2_24_1","unstructured":"Purkait P Zhao C Zach C (2017) SPP-Net: Deep absolute pose regression with synthetic views. Available at: https:\/\/arxiv.org\/abs\/1712.03452"},{"key":"e_1_3_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_2_26_1","first-page":"1440","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren S","year":"2015","unstructured":"Ren S, He K, Girshick R, et al. (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28: 1440\u20131448.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_28_1","first-page":"6105","article-title":"EfficientNet: Rethinking model scaling for convolutional neural networks","volume":"97","author":"Tan M","year":"2019","unstructured":"Tan M, Le Q (2019) EfficientNet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning 97: 6105\u20136114.","journal-title":"International Conference on Machine Learning"},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"e_1_3_2_30_1","unstructured":"Wang J Xu C Yang W et al. (2021) A normalized Gaussian Wasserstein Distance for tiny object detection. Available at: https:\/\/arxiv.org\/abs\/2110.13389"},{"key":"e_1_3_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00163"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00388"},{"key":"e_1_3_2_33_1","unstructured":"Yao Z Ai J Li B et al. (2021) Efficient DETR: Improving end-to-end object detector with dense prior. Available at: https:\/\/arxiv.org\/abs\/2104.01318"},{"key":"e_1_3_2_34_1","unstructured":"Yu F Koltun V (2016) Multi-scale context aggregation by dilated convolutions. Available at: https:\/\/arxiv.org\/abs\/1511.07122"},{"key":"e_1_3_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00271"},{"key":"e_1_3_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.75"},{"key":"e_1_3_2_37_1","doi-asserted-by":"crossref","unstructured":"Zheng D Dong W Hu H et al. (2023) Less is more: Focus attention for efficient DETR. In: Proceedings of the IEEE\/CVF international conference on computer vision pp. 6674\u20136683. Available at: https:\/\/openaccess.thecvf.com\/content\/ICCV2023\/papers\/Zheng_Less_is_More_Focus_Attention_for_Efficient_DETR_ICCV_2023_paper.pdf","DOI":"10.1109\/ICCV51070.2023.00614"},{"key":"e_1_3_2_38_1","unstructured":"Zhu X Su W Lu L et al. (2021) Deformable detr: Deformable transformers for end-to-end object detection. Available at: https:\/\/arxiv.org\/abs\/2010.04159"},{"key":"e_1_3_2_39_1","doi-asserted-by":"publisher","DOI":"10.1155\/2021\/7386280"}],"container-title":["Transactions of the Institute of Measurement and Control"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01423312241239020","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01423312241239020","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01423312241239020","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T15:10:14Z","timestamp":1777648214000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01423312241239020"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,22]]},"references-count":38,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,5]]}},"alternative-id":["10.1177\/01423312241239020"],"URL":"https:\/\/doi.org\/10.1177\/01423312241239020","relation":{},"ISSN":["0142-3312","1477-0369"],"issn-type":[{"value":"0142-3312","type":"print"},{"value":"1477-0369","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,22]]}}}