{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T18:08:37Z","timestamp":1775066917199,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T00:00:00Z","timestamp":1746662400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T00:00:00Z","timestamp":1746662400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Vision-based environmental perception has demonstrated significant promise for autonomous driving applications. However, the traditional unidirectional feature flow in many perception networks often leads to inadequate information propagation, which hinders the system\u2019s ability to comprehensively perceive complex driving environments. Issues such as similar objects, illumination variations, and scale differences aggravate this limitation, introducing noise and reducing the reliability of the perception system. To address these challenges, we propose a novel Attention-Aware Upsampling-Downsampling Network (AUDNet). AUDNet utilizes a bidirectional feature fusion structure, incorporating a multi-scale attention upsampling module (MAU) to enhance the fine details in high-level features by guiding the selection of feature information. Additionally, the multi-scale attention downsampling module (MAD) is designed to reinforce the semantic understanding of low-level features by emphasizing relevant spatial dfigureetails. Extensive experiments on a large-scale, real-world driving dataset demonstrate the superior performance of AUDNet, particularly in multi-task environment perception in complex and dynamic driving scenarios.<\/jats:p>","DOI":"10.1007\/s40747-025-01870-4","type":"journal-article","created":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T07:10:36Z","timestamp":1746688236000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Attention-aware upsampling-downsampling network for autonomous vehicle vision-based multitask perception"],"prefix":"10.1007","volume":"11","author":[{"given":"Chongjun","family":"Liu","sequence":"first","affiliation":[]},{"given":"Haobo","family":"Zuo","sequence":"additional","affiliation":[]},{"given":"Jianjun","family":"Yao","sequence":"additional","affiliation":[]},{"given":"Yuchen","family":"Li","sequence":"additional","affiliation":[]},{"given":"Frank","family":"Jiang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,8]]},"reference":[{"key":"1870_CR1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1604.07316","author":"M Bojarski","year":"2016","unstructured":"Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J, Zhang X, Zhao J, Zieba K (2016) End to end learning for self-driving cars. arXiv. https:\/\/doi.org\/10.48550\/arXiv.1604.07316","journal-title":"arXiv"},{"key":"1870_CR2","doi-asserted-by":"crossref","unstructured":"Yogamani S et al. (2019) Woodscape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: Proc. IEEE Int. Conf. Comput. Vis., 2019, pp 9308\u20139318","DOI":"10.1109\/ICCV.2019.00940"},{"key":"1870_CR3","doi-asserted-by":"crossref","unstructured":"Ishihara K, Kanervisto A, Miura J, Ishihara VH (2021) Multi-task learning with attention for end-to-end autonomous driving. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 2902\u20132911","DOI":"10.1109\/CVPRW53098.2021.00325"},{"key":"1870_CR4","doi-asserted-by":"crossref","unstructured":"Girshick R (2015) Fast R-CNN. In: Proc IEEE Int Conf Comput Vis. pp 1440\u20131448","DOI":"10.1109\/ICCV.2015.169"},{"key":"1870_CR5","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2017","unstructured":"Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137\u20131149","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1870_CR6","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2004.10934","author":"A Bochkovskiy","year":"2020","unstructured":"Bochkovskiy A, Wang C, Liao HM (2020) Yolov4: optimal speed and accuracy of object detection. ArXiv. https:\/\/doi.org\/10.48550\/arXiv.2004.10934","journal-title":"ArXiv"},{"key":"1870_CR7","doi-asserted-by":"crossref","unstructured":"Lin T, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis & Machine Intelligence. pp 2999\u20133007","DOI":"10.1109\/ICCV.2017.324"},{"key":"1870_CR8","doi-asserted-by":"crossref","unstructured":"Liu W et al. (2016) SSD: single shot multibox detector. In: Proc Eur Conf Comput Vis. pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"1870_CR9","doi-asserted-by":"crossref","unstructured":"Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proc Eur Conf Comput Vis. pp 801\u2013818","DOI":"10.1007\/978-3-030-01234-2_49"},{"issue":"1","key":"1870_CR10","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1109\/TITS.2017.2750080","volume":"19","author":"E Romera","year":"2017","unstructured":"Romera E, \u00c1lvarez JM, Bergasa LM, Arroyo R (2017) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263\u2013272","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"1870_CR11","doi-asserted-by":"crossref","unstructured":"Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proc Eur Conf Comput Vis. pp 325\u2013341","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"1870_CR12","doi-asserted-by":"crossref","unstructured":"Teichmann M, Weber M, Zoellner M, Cipolla R, Urtasun R (2018) Multinet: real-time joint semantic reasoning for autonomous driving. In: Proc IEEE Intelligent Vehicles Symp. pp 1013\u20131020","DOI":"10.1109\/IVS.2018.8500504"},{"issue":"11","key":"1870_CR13","doi-asserted-by":"publisher","first-page":"4670","DOI":"10.1109\/TITS.2019.2943777","volume":"21","author":"Y Qian","year":"2019","unstructured":"Qian Y, Dolan JM, Yang M (2019) DLT-Net: joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans Intell Transport Syst 21(11):4670\u20134679","journal-title":"IEEE Trans Intell Transport Syst"},{"issue":"12","key":"1870_CR14","doi-asserted-by":"publisher","first-page":"5323","DOI":"10.1109\/TNNLS.2021.3056383","volume":"32","author":"X Chang","year":"2021","unstructured":"Chang X, Pan H, Sun W, Gao H (2021) Yoltrack: multitask learning based real-time multiobject tracking and segmentation for autonomous vehicles. IEEE Trans Neural Netw Learn Syst 32(12):5323\u20135333","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"1870_CR15","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1007\/s11633-022-1339-y","volume":"19","author":"D Wu","year":"2022","unstructured":"Wu D, Liao MW, Zhang WT, Wang XG, Bai X, Cheng WQ, Liu WY (2022) Yolop: you only look once for panoptic driving perception. Mach Intell Res 19:550","journal-title":"Mach Intell Res"},{"key":"1870_CR16","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1409.1556","author":"K Simonyan","year":"2014","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv. https:\/\/doi.org\/10.48550\/arXiv.1409.1556","journal-title":"arXiv"},{"key":"1870_CR17","doi-asserted-by":"crossref","unstructured":"Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proc Eur Conf Comput Vis. pp 116\u2013131","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"1870_CR18","doi-asserted-by":"crossref","unstructured":"Lin TY, Doll\u00e1r P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 2117\u20132125","DOI":"10.1109\/CVPR.2017.106"},{"key":"1870_CR19","doi-asserted-by":"crossref","unstructured":"Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 12595\u201312604","DOI":"10.1109\/CVPR42600.2020.01261"},{"key":"1870_CR20","doi-asserted-by":"crossref","unstructured":"Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 8759\u20138768","DOI":"10.1109\/CVPR.2018.00913"},{"key":"1870_CR21","doi-asserted-by":"crossref","unstructured":"Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 7036\u20137045","DOI":"10.1109\/CVPR.2019.00720"},{"key":"1870_CR22","doi-asserted-by":"crossref","unstructured":"Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 10781\u201310790","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"1870_CR23","doi-asserted-by":"crossref","unstructured":"Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V, Darrell T (2020) Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 2636\u20132645","DOI":"10.1109\/CVPR42600.2020.00271"},{"key":"1870_CR24","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1807.01726","author":"Z Wang","year":"2018","unstructured":"Wang Z, Ren W, Qiu Q (2018) Lanenet: real-time lane detection networks for autonomous driving. arXiv. https:\/\/doi.org\/10.48550\/arXiv.1807.01726","journal-title":"arXiv"},{"key":"1870_CR25","doi-asserted-by":"crossref","unstructured":"Parashar A et al. (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: Proc 44th Annu Int Symp Comput Archit. pp 27\u201340","DOI":"10.1145\/3079856.3080254"},{"key":"1870_CR26","doi-asserted-by":"crossref","unstructured":"Hou Y, Ma Z, Liu C, Loy CC (2019) Learning lightweight lane detection cnns by self attention distillation. In: Proc IEEE Int Conf Comput Vis. pp 1013\u20131021","DOI":"10.1109\/ICCV.2019.00110"},{"key":"1870_CR27","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 3431\u20133440","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"1870_CR28","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 2881\u20132890","DOI":"10.1109\/CVPR.2017.660"},{"key":"1870_CR29","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1911.09516","author":"S Liu","year":"2019","unstructured":"Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv. https:\/\/doi.org\/10.48550\/arXiv.1911.09516","journal-title":"arXiv"},{"key":"1870_CR30","doi-asserted-by":"crossref","unstructured":"Wang C, Bochkovskiy A, Liao HM (2021) Scaled-yolov4: scaling cross stage partial network. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 13029\u201313038","DOI":"10.1109\/CVPR46437.2021.01283"},{"issue":"9","key":"1870_CR31","doi-asserted-by":"publisher","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","volume":"37","author":"X Zhang","year":"2015","unstructured":"Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904\u20131916","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"5","key":"1870_CR32","first-page":"21","volume":"5","author":"AK Aggarwal","year":"2015","unstructured":"Aggarwal AK (2015) On the use of artificial intelligence techniques in transportation systems. Int J Soft Comput Eng 5(5):21\u201324","journal-title":"Int J Soft Comput Eng"},{"issue":"10","key":"1870_CR33","first-page":"8210","volume":"4","author":"AK Aggarwal","year":"2015","unstructured":"Aggarwal AK (2015) Image based methods for navigation of intelligent vehicles. Int J Adv Res Elect Electron Instrum Eng 4(10):8210\u20138215","journal-title":"Int J Adv Res Elect Electron Instrum Eng"},{"issue":"3","key":"1870_CR34","first-page":"210","volume":"6","author":"SC Sreeharsha","year":"2024","unstructured":"Sreeharsha SC, Nikhitha KS (2024) VocalVision: smart wheelchair maintenance with pressure sensors and machine learning. J Electr Eng Autom 6(3):210\u2013221","journal-title":"J Electr Eng Autom"},{"issue":"4","key":"1870_CR35","first-page":"379","volume":"6","author":"S Srivastava","year":"2024","unstructured":"Srivastava S, Matura R, Sharma S et al (2024) Deep learning for CAD prediction: X-ray angiography insights. J Artif Intell 6(4):379\u2013392","journal-title":"J Artif Intell"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-01870-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-025-01870-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-01870-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T11:23:29Z","timestamp":1747481009000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-025-01870-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,8]]},"references-count":35,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["1870"],"URL":"https:\/\/doi.org\/10.1007\/s40747-025-01870-4","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,8]]},"assertion":[{"value":"18 September 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 March 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare they have no financial interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"The work described has not been published before; that it is not under consideration for publication anywhere else; that its publication has been approved by all co-authors.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}],"article-number":"279"}}