{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T12:09:19Z","timestamp":1775218159656,"version":"3.50.1"},"reference-count":47,"publisher":"Institution of Engineering and Technology (IET)","issue":"1","license":[{"start":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T00:00:00Z","timestamp":1757376000000},"content-version":"vor","delay-in-days":251,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Image Processing"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:title>ABSTRACT<\/jats:title>\n                  <jats:p>Advanced driver assistance systems (ADAS) mainly consist of three components: environmental perception, decision planning, and motion control. As a fundamental component of the ADAS environmental perception system, 3D object detection enables vehicles to avoid obstacles and ensure driving safety only through accurate and real\u2010time prediction and localization of three\u2010dimensional targets such as vehicles and pedestrians in road scenes. Therefore, to improve both the real\u2010time performance and accuracy of 3D object detection, we propose a lightweight depth prediction\u2010based 3D object detection model with multi\u2010scale fusion\u2014SPWS\u2010Transformer. First, to enhance the model's accuracy, we propose a feature extraction network incorporating multi\u2010scale feature fusion and depth prediction. By designing a multi\u2010scale feature fusion module, we effectively combine multi\u2010scale semantic and fine\u2010grained information from feature maps of different scales to enhance the network's feature extraction capability. To capture spatial information from the feature maps, we apply convolution, group normalization, and nonlinear activation operations on the fused feature maps to generate depth feature maps. Both the fused feature maps and depth feature maps serve as inputs for subsequent network stages. To further improve accuracy, we leverage the long\u2010range modelling advantages of Transformers by designing a feature enhancement encoder to strengthen the representation capability of depth feature maps. We incorporate a dilated encoder to perform positional encoding on depth feature maps and utilize multi\u2010head self\u2010attention mechanisms to capture contextual relationships within the input scene, thereby enhancing the detection capability of the 3D object detection network. Then, to improve real\u2010time performance, we design a decoder structure with scale\u2010aware attention. By predefining masks of different scales, we adaptively learn a scale\u2010aware filter using depth and visual features to enhance object queries. Finally, on the KITTI dataset, the improved algorithm achieves an AP of 24.66% for the car category, with more significant improvements in detection accuracy under the \u2018hard\u2019 difficulty level. The model achieves an inference time of 24 ms.<\/jats:p>","DOI":"10.1049\/ipr2.70204","type":"journal-article","created":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T07:12:49Z","timestamp":1757401969000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["SPWS\u2010Transformer: A Study of 3D Target Detection Method Based on Lightweight Depth Prediction With Multi\u2010Scale Fusion"],"prefix":"10.1049","volume":"19","author":[{"given":"Chang'an","family":"Zhang","sequence":"first","affiliation":[{"name":"Gansu Provincial Map Institute  Lanzhou China"},{"name":"College of Geomatics Xi'an University of Science and Technology  Xi'an China"}]},{"given":"Yian","family":"Wang","sequence":"additional","affiliation":[{"name":"East China Normal University Software Engineering Institute  Shanghai China"}]},{"given":"Ke","family":"Xu","sequence":"additional","affiliation":[{"name":"Faculty of Engineering University of New South Wales  Sydney Australia"}]},{"given":"ChunHong","family":"Yuan","sequence":"additional","affiliation":[{"name":"Department of Pre\u2010Engineering Kazan (Volga region) Federal University  Kazan Russia"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9142-697X","authenticated-orcid":false,"given":"Fusen","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Systems and Computing University of New South Wales  Canberra Australia"}]}],"member":"265","published-online":{"date-parts":[[2025,9,9]]},"reference":[{"key":"e_1_2_10_2_1","first-page":"1711","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Shi W.","year":"2020"},{"key":"e_1_2_10_3_1","first-page":"1907","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Cao X. Z.","year":"2017"},{"key":"e_1_2_10_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2018.8594049"},{"key":"e_1_2_10_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00798"},{"key":"e_1_2_10_6_1","first-page":"146","volume-title":"Proceedings of the Conference on Robot Learning","author":"Yang B.","year":"2018"},{"key":"e_1_2_10_7_1","first-page":"923","volume-title":"Proceedings of the Conference on Robot Learning","author":"Zhou Y.","year":"2020"},{"key":"e_1_2_10_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989161"},{"key":"e_1_2_10_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8205955"},{"key":"e_1_2_10_10_1","first-page":"4490","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Zhou Y.","year":"2018"},{"key":"e_1_2_10_11_1","unstructured":"B.GrahamandL.Van der Maaten \u201cSubmanifold sparse convolutional networks \u201d arxiv preprint arxiv:1706.01307 (2017)."},{"key":"e_1_2_10_12_1","doi-asserted-by":"publisher","DOI":"10.3390\/s18103337"},{"key":"e_1_2_10_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01298"},{"key":"e_1_2_10_14_1","first-page":"652","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi C. R.","year":"2017"},{"key":"e_1_2_10_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"e_1_2_10_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00102"},{"key":"e_1_2_10_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.597"},{"key":"e_1_2_10_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00086"},{"key":"e_1_2_10_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00285"},{"key":"e_1_2_10_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01255"},{"key":"e_1_2_10_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2706685"},{"key":"e_1_2_10_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_2_10_23_1","first-page":"7644","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li P.","year":"2019"},{"key":"e_1_2_10_24_1","first-page":"8445","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang Y.","year":"2019"},{"key":"e_1_2_10_25_1","first-page":"8383","volume-title":"2020 IEEE International Conference on Robotics and Automation","author":"Pon A. D.","year":"2020"},{"key":"e_1_2_10_26_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Peng W.","year":"2020"},{"key":"e_1_2_10_27_1","unstructured":"Y.You Y.Wang W. L.Chao et\u00a0al. \u201cPseudo\u2010lidar++: Accurate Depth for 3D Object Detection in Autonomous Driving \u201d arXiv preprint arXiv: 2019:06290\u201306310."},{"key":"e_1_2_10_28_1","first-page":"2069","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Manhardt F.","year":"2019"},{"key":"e_1_2_10_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00249"},{"key":"e_1_2_10_30_1","first-page":"2480","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Ren H.","year":"2019"},{"key":"e_1_2_10_31_1","first-page":"11826","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Facil J. M.","year":"2019"},{"key":"e_1_2_10_32_1","first-page":"424","volume-title":"Advances in Neural Information Processing Systems","author":"Chen X.","year":"2015"},{"key":"e_1_2_10_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.198"},{"key":"e_1_2_10_34_1","first-page":"1546","volume-title":"Proceedings of the IEEE 19th International Conference on Intelligent Transportation Systems","author":"Braun M.","year":"2016"},{"key":"e_1_2_10_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_2_10_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00506"},{"key":"e_1_2_10_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00667"},{"key":"e_1_2_10_38_1","first-page":"9287","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Brazil G.","year":"2019"},{"key":"e_1_2_10_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01211"},{"key":"e_1_2_10_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58592-1_9"},{"key":"e_1_2_10_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58580-8_38"},{"key":"e_1_2_10_42_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018851"},{"key":"e_1_2_10_43_1","first-page":"1000","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops","author":"Ding M.","year":"2020"},{"key":"e_1_2_10_44_1","first-page":"3225","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Simonelli A.","year":"2021"},{"key":"e_1_2_10_45_1","doi-asserted-by":"crossref","unstructured":"X.WengandK.Kitani \u201cMonocular 3D Object Detection With Pseudo\u2010Lidar Point Cloud \u201d inProceedings of the IEEE\/CVF International Conference on Computer Vision Workshops(2019) 1540\u20131610.","DOI":"10.1109\/ICCVW.2019.00114"},{"key":"e_1_2_10_46_1","first-page":"6851","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Ma X.","year":"2019"},{"key":"e_1_2_10_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00330"},{"key":"e_1_2_10_48_1","unstructured":"Y.Zhang X.Ma S.Yi et al. \u201cLearning Geometry\u2010guided Depth via Projective Modeling for Monocular 3D Object Detection \u201darxiv preprint arxiv:2107.13931 2021."}],"container-title":["IET Image Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/ipr2.70204","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/full-xml\/10.1049\/ipr2.70204","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/ipr2.70204","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T11:34:59Z","timestamp":1775216099000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/ipr2.70204"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1]]},"references-count":47,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10.1049\/ipr2.70204"],"URL":"https:\/\/doi.org\/10.1049\/ipr2.70204","archive":["Portico"],"relation":{},"ISSN":["1751-9659","1751-9667"],"issn-type":[{"value":"1751-9659","type":"print"},{"value":"1751-9667","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1]]},"assertion":[{"value":"2025-02-21","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-31","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e70204"}}