{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,10]],"date-time":"2026-07-10T02:33:06Z","timestamp":1783650786387,"version":"3.55.0"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T00:00:00Z","timestamp":1722816000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T00:00:00Z","timestamp":1722816000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"The National Key R&D Program of China","award":["2022YFB3603703"],"award-info":[{"award-number":["2022YFB3603703"]}]},{"name":"The Qinchuangyuan High-level Talent Project of Shaanxi","award":["No. QCYRCXM-2022-219"],"award-info":[{"award-number":["No. QCYRCXM-2022-219"]}]},{"name":"The Fundamental Research Funds for the Central Universities, Northwestern Polytechnical University","award":["No.D5000210825"],"award-info":[{"award-number":["No.D5000210825"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Semantic segmentation of urban street scenes has attracted much attention in the field of autonomous driving, which not only helps vehicles perceive the environment in real time, but also significantly improves the decision-making ability of autonomous driving systems. However, most of the current methods based on Convolutional Neural Network (CNN) mainly use coding the input image to a low resolution and then try to recover the high resolution, which leads to problems such as loss of spatial information, accumulation of errors, and difficulty in dealing with large-scale changes. To address these problems, in this paper, we propose a new semantic segmentation network (HRDLNet) for urban street scene images with high-resolution representation, which improves the accuracy of segmentation by always maintaining a high-resolution representation of the image. Specifically, we propose a feature extraction module (FHR) with high-resolution representation, which efficiently handles multi-scale targets and high-resolution image information by efficiently fusing high-resolution information and multi-scale features. Secondly, we design a multi-scale feature extraction enhancement (MFE) module, which significantly expands the sensory field of the network, thus enhancing the ability to capture correlations between image details and global contextual information. In addition, we introduce a dual-attention mechanism module (CSD), which dynamically adjusts the network to more accurately capture subtle features and rich semantic information in images. We trained and evaluated HRDLNet on the Cityscapes Dataset and the PASCAL VOC 2012 Augmented Dataset, and verified the model\u2019s excellent performance in the field of urban streetscape image segmentation. The unique advantages of our proposed HRDLNet in the field of semantic segmentation of urban streetscapes are also verified by comparing it with the state-of-the-art methods.<\/jats:p>","DOI":"10.1007\/s40747-024-01582-1","type":"journal-article","created":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T12:03:52Z","timestamp":1722859432000},"page":"7825-7844","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["HRDLNet: a semantic segmentation network with high resolution representation for urban street view images"],"prefix":"10.1007","volume":"10","author":[{"given":"Wenyi","family":"Chen","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zongcheng","family":"Miao","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yang","family":"Qu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guokai","family":"Shi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,8,5]]},"reference":[{"issue":"10","key":"1582_CR1","doi-asserted-by":"publisher","first-page":"2425","DOI":"10.1007\/s11263-022-01657-x","volume":"130","author":"\u00c9 Zablocki","year":"2022","unstructured":"Zablocki \u00c9, Ben-Younes H, P\u00e9rez P et al (2022) Explain ability of deep vision-based autonomous driving systems: review and challenges. Int J Comput Vision 130(10):2425\u20132452","journal-title":"Int J Comput Vision"},{"issue":"1","key":"1582_CR2","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1111\/cgf.13803","volume":"39","author":"Q Chao","year":"2020","unstructured":"Chao Q, Bi H, Li W et al (2020) A survey on visual traffic simulation: models, evaluations, and applications in autonomous driving. Comput Graphics Forum 39(1):287\u2013308","journal-title":"Comput Graphics Forum"},{"key":"1582_CR3","doi-asserted-by":"crossref","unstructured":"Set\u00e4l\u00e4 OE, Prest MJ, Stefanov KD et al (2023) CMOS Image Sensor for Broad Spectral Range with >\u200990% Quantum Efficiency. Small 2304001","DOI":"10.1002\/smll.202304001"},{"issue":"19","key":"1582_CR4","doi-asserted-by":"publisher","first-page":"16535","DOI":"10.1364\/OE.17.016535","volume":"17","author":"DA Roberts","year":"2009","unstructured":"Roberts DA, Kundtz N, Smith DR (2009) Optical lens compression via transformation optics. Opt Express 17(19):16535\u201316542","journal-title":"Opt Express"},{"key":"1582_CR5","doi-asserted-by":"crossref","unstructured":"Huang L, Barth M (2009) Tightly-coupled LIDAR and computer vision integration for vehicle detection. IEEE intelligent vehicles symposium 604\u2013609","DOI":"10.1109\/IVS.2009.5164346"},{"key":"1582_CR6","unstructured":"Garcia-Garcia A, Orts-Escolano S, Oprea S et al (2017) A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv 1704.06857"},{"issue":"7","key":"1582_CR7","first-page":"3523","volume":"44","author":"S Minaee","year":"2021","unstructured":"Minaee S, Boykov Y, Porikli F et al (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell 44(7):3523\u20133542","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1582_CR8","first-page":"405","volume":"2018","author":"H Zhao","year":"2018","unstructured":"Zhao H, Qi X, Shen X et al (2018) Icnet for real-time semantic segmentation on high-resolution images. Proc Eur Conf Comput Vis (ECCV) 2018:405\u2013420","journal-title":"Proc Eur Conf Comput Vis (ECCV)"},{"key":"1582_CR9","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2015: 18th International Conference, Munich, Germany, October 5\u20139, 2015, Proceedings, Part III 18. Springer International Publishing 2015: 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"1582_CR10","doi-asserted-by":"crossref","unstructured":"Xiao Z, Xing H, Zhao B et al (2023) Deep contrastive representation learning with self-distillation. IEEE Transactions on Emerging Topics in Computational Intelligence","DOI":"10.1109\/TETCI.2023.3304948"},{"key":"1582_CR11","first-page":"1","volume":"71","author":"H Xing","year":"2022","unstructured":"Xing H, Xiao Z, Qu R et al (2022) An efficient federated distillation learning system for multitask time series classification. IEEE Trans Instrum Meas 71:1\u201312","journal-title":"IEEE Trans Instrum Meas"},{"issue":"1","key":"1582_CR12","doi-asserted-by":"publisher","first-page":"7600","DOI":"10.1038\/s41598-023-34379-2","volume":"13","author":"X Wang","year":"2023","unstructured":"Wang X, Hu Z, Shi S et al (2023) A deep learning method for optimizing semantic segmentation accuracy of remote sensing images based on improved UNet. Sci Rep 13(1):7600","journal-title":"Sci Rep"},{"key":"1582_CR13","doi-asserted-by":"crossref","unstructured":"Tian Z, He T, Shen C et al (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 3126\u20133135","DOI":"10.1109\/CVPR.2019.00324"},{"key":"1582_CR14","doi-asserted-by":"publisher","first-page":"106682","DOI":"10.1016\/j.asoc.2020.106682","volume":"96","author":"Q Zhou","year":"2020","unstructured":"Zhou Q, Wang Y, Fan Y et al (2020) AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl Soft Comput 96:106682","journal-title":"Appl Soft Comput"},{"issue":"12","key":"1582_CR15","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2017","unstructured":"Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481\u20132495","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1582_CR16","doi-asserted-by":"crossref","unstructured":"Chen L-C, Zhu Y, Papandreou G et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the 2018 European Conference on Computer Vision. Cham: Springer 833\u2013851","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"1582_CR17","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee JY et al (2018) CBAM: convolutional block attention module. Proceedings of the 2018 European Conference on Computer Vision. Cham: Springer 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"1582_CR18","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L (2018) Sun G. Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"1582_CR19","doi-asserted-by":"crossref","unstructured":"Cordts M, Omran M, Ramos S et al (2016) The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE conference on computer vision and pattern recognition 3213\u20133223","DOI":"10.1109\/CVPR.2016.350"},{"key":"1582_CR20","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","volume":"111","author":"M Everingham","year":"2015","unstructured":"Everingham M, Eslami SMA, Van Gool L et al (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vision 111:98\u2013136","journal-title":"Int J Comput Vision"},{"key":"1582_CR21","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE 3431\u20133440","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"1582_CR22","doi-asserted-by":"crossref","unstructured":"Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition 3146\u20133154","DOI":"10.1109\/CVPR.2019.00326"},{"issue":"6","key":"1582_CR23","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","volume":"60","author":"A Krizhevsky","year":"2017","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84\u201390","journal-title":"Commun ACM"},{"key":"1582_CR24","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE 1\u20139","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"1582_CR25","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition 4700\u20134708","DOI":"10.1109\/CVPR.2017.243"},{"key":"1582_CR26","doi-asserted-by":"crossref","unstructured":"Zhang H, Wu C, Zhang Z et al (2022) Resnest: Split-attention networks. Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition 2736\u20132746","DOI":"10.1109\/CVPRW56347.2022.00309"},{"key":"1582_CR27","doi-asserted-by":"crossref","unstructured":"Li Z, Pan H, Zhu Y et al (2020) PGD-UNet: A position-guided deformable network for simultaneous segmentation of organs and tumors. 2020 International Joint Conference on Neural Networks (IJCNN). IEEE 1\u20138","DOI":"10.1109\/IJCNN48605.2020.9206944"},{"issue":"7","key":"1582_CR28","doi-asserted-by":"publisher","first-page":"6169","DOI":"10.1109\/TGRS.2020.3026051","volume":"59","author":"Q Zhu","year":"2020","unstructured":"Zhu Q, Liao C, Hu H et al (2020) MAP-Net: multiple attending path neural network for building footprint extraction from remote sensed imagery. IEEE Trans Geosci Remote Sens 59(7):6169\u20136181","journal-title":"IEEE Trans Geosci Remote Sens"},{"issue":"12","key":"1582_CR29","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2015","unstructured":"Badrinarayanan V, Kendall A, Cipolla R (2015) SegNet: a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. IEEE Trans Pattern Anal Mach Intell 39(12):2481\u20132495","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1582_CR30","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. Proceedings of the 2015 International Conference on Medical Image Computing and Computer Assisted Intervention. Cham: Springer 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"1582_CR31","doi-asserted-by":"crossref","unstructured":"Lin G, Milan A, Shen C RefineNet: multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and, Recognition P et al (2017) Washington, DC: IEEE Computer Society 5168\u20135177","DOI":"10.1109\/CVPR.2017.549"},{"issue":"18","key":"1582_CR32","doi-asserted-by":"publisher","first-page":"3617","DOI":"10.3390\/rs13183617","volume":"13","author":"X Yao","year":"2021","unstructured":"Yao X, Guo Q, Li A (2021) Light-weight cloud detection network for optical remote sensing images with attention-based deeplabv3\u2009+\u2009architecture. Remote Sens 13(18):3617","journal-title":"Remote Sens"},{"key":"1582_CR33","unstructured":"Nie Z, Xu J, Zhang S (2020) Analysis on DeepLabV3\u2009+\u2009performance for automatic steel defects detection. arXiv preprint arXiv 2004.04822"},{"key":"1582_CR34","doi-asserted-by":"publisher","first-page":"121060","DOI":"10.1109\/ACCESS.2021.3107353","volume":"9","author":"S Das","year":"2021","unstructured":"Das S, Fime AA, Siddique N et al (2021) Estimation of road boundary for intelligent vehicles based on deeplabv3\u2009+\u2009architecture. IEEE Access 9:121060\u2013121075","journal-title":"IEEE Access"},{"key":"1582_CR35","doi-asserted-by":"publisher","first-page":"107622","DOI":"10.1016\/j.patcog.2020.107622","volume":"110","author":"X Lian","year":"2021","unstructured":"Lian X, Pang Y, Han J et al (2021) Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recogn 110:107622","journal-title":"Pattern Recogn"},{"key":"1582_CR36","doi-asserted-by":"publisher","first-page":"119364","DOI":"10.1016\/j.ins.2023.119364","volume":"646","author":"X Sun","year":"2023","unstructured":"Sun X, Zhang Y, Chen C et al (2023) High-order paired-ASPP for deep semantic segmentation networks. Inf Sci 646:119364","journal-title":"Inf Sci"},{"key":"1582_CR37","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1016\/j.neucom.2022.06.103","volume":"504","author":"Z Li","year":"2022","unstructured":"Li Z, Jiang J, Chen X et al (2022) Superdense-scale network for semantic segmentation. Neurocomputing 504:30\u201341","journal-title":"Neurocomputing"},{"key":"1582_CR38","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. Proceedings of the IEEE conference on computer vision and pattern recognition 2881\u20132890","DOI":"10.1109\/CVPR.2017.660"},{"key":"1582_CR39","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition 3431\u20133440","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"1582_CR40","unstructured":"Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv Preprint arXiv 1706.05587."},{"key":"1582_CR41","doi-asserted-by":"crossref","unstructured":"Chen LC, Zhu Y, Papandreou G et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European conference on computer vision (ECCV) 801\u2013818","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"1582_CR42","doi-asserted-by":"crossref","unstructured":"Daliparthi VSSA (2022) The Ikshana Hypothesis of Human Scene Understanding. Proceedings of the Satellite Workshops of ICVGIP 2021. Singapore: Springer Nature Singapore 161\u2013181","DOI":"10.1007\/978-981-19-4136-8_12"},{"key":"1582_CR43","doi-asserted-by":"crossref","unstructured":"Wang Y, Qi L, Chen YC et al (2021) Image synthesis via semantic composition. Proceedings of the IEEE\/CVF International Conference on Computer Vision 13749\u201313758","DOI":"10.1109\/ICCV48922.2021.01349"},{"issue":"9","key":"1582_CR44","first-page":"4852","volume":"44","author":"Z Tan","year":"2021","unstructured":"Tan Z, Chen D, Chu Q et al (2021) Efficient semantic image synthesis via class-adaptive normalization. IEEE Trans Pattern Anal Mach Intell 44(9):4852\u20134866","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1582_CR45","doi-asserted-by":"crossref","unstructured":"Tan Z, Chai M, Chen D et al (2021) Diverse semantic image synthesis via probability distribution modeling. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 7962\u20137971","DOI":"10.1109\/CVPR46437.2021.00787"},{"key":"1582_CR46","unstructured":"Sushko V, Sch\u00f6nfeld E, Zhang D et al (2020) You only need adversarial supervision for semantic image synthesis. arXiv preprint arXiv 2012.04781."},{"key":"1582_CR47","doi-asserted-by":"crossref","unstructured":"Zbinden L, Doorenbos L, Pissas T et al (2023) Stochastic segmentation with conditional categorical diffusion models. Proceedings of the IEEE\/CVF International Conference on Computer Vision 1119\u20131129","DOI":"10.1109\/ICCV51070.2023.00109"},{"key":"1582_CR48","doi-asserted-by":"crossref","unstructured":"Xu J, Xiong Z, Bhattacharyya SP (2023) PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 19529\u201319539","DOI":"10.1109\/CVPR52729.2023.01871"},{"issue":"12","key":"1582_CR49","doi-asserted-by":"publisher","first-page":"25259","DOI":"10.1109\/TITS.2022.3194213","volume":"23","author":"Q Zhou","year":"2022","unstructured":"Zhou Q, Qiang Y, Mo Y et al (2022) Banet: Boundary-assistant encoder-decoder network for semantic segmentation. IEEE Trans Intell Transp Syst 23(12):25259\u201325270","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"1582_CR50","doi-asserted-by":"crossref","unstructured":"Mohammadzadeh A, Zhang C, Alattas KA et al (2023) Fourier-based type-2 fuzzy neural network: simple and effective for high dimensional problems. Neurocomputing 126316","DOI":"10.1016\/j.neucom.2023.126316"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01582-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01582-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01582-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T22:12:57Z","timestamp":1729116777000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01582-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,5]]},"references-count":50,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["1582"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01582-1","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,5]]},"assertion":[{"value":"3 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 August 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}