{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T11:10:45Z","timestamp":1762254645710,"version":"3.37.3"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2023,10,6]],"date-time":"2023-10-06T00:00:00Z","timestamp":1696550400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,6]],"date-time":"2023-10-06T00:00:00Z","timestamp":1696550400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"The National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61602157"],"award-info":[{"award-number":["61602157"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100018979","name":"Henan Office of Philosophy and Social Science","doi-asserted-by":"publisher","award":["202102210167"],"award-info":[{"award-number":["202102210167"]}],"id":[{"id":"10.13039\/100018979","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Recently, owing to the requirements of inference speed, most real-time semantic segmentation networks often have shallow network depth, which limits the receptive field size of the model, leading to the limited acquisition of semantic information and resulting in intraclass inconsistency and ultimately a decrease in segmentation accuracy. Additionally, the shallow network depth also restricts the feature extraction capability of the network, reducing its robustness and ability to adapt to complex scenes. To address these issues, a bilateral network with a rich semantic extractor (RSE) for real-time semantic segmentation (BRSeNet) is presented to perform real-time semantic segmentation. First, to solve the problem of insufficient semantic feature information extraction, an RSE is proposed, which includes a multiscale global semantic extraction module (MGSEM) and a semantic fusion module (SFM). The MGSEM can extract rich global semantics and expand the effective receptive field. Simultaneously, the SFM efficiently integrates multiscale local semantics with multiscale global semantics, resulting in more comprehensive semantic information for the network. Finally, based on the characteristics of detail and semantic branches, a bilateral reconstruction aggregation module is designed to reconstruct the contextual information of detail features, model the interdependencies on semantic feature channels, and enhance feature representation. Comprehensive experiments on the challenging Cityscapes and ADE20K datasets are conducted. The experimental results show that the proposed BRSeNet achieves mean intersection over union of 74.9% and 35.7% at inference speeds of 74 and 65 frames per second, respectively, and ensures a favorable balance between segmentation accuracy and inference speed.<\/jats:p>","DOI":"10.1007\/s40747-023-01242-w","type":"journal-article","created":{"date-parts":[[2023,10,6]],"date-time":"2023-10-06T07:01:59Z","timestamp":1696575719000},"page":"1899-1916","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Bilateral network with rich semantic extractor for real-time semantic segmentation"],"prefix":"10.1007","volume":"10","author":[{"given":"Shan","family":"Zhao","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1640-6043","authenticated-orcid":false,"given":"Xuan","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Kaiwen","family":"Tian","sequence":"additional","affiliation":[]},{"given":"Yang","family":"Yuan","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,10,6]]},"reference":[{"key":"1242_CR1","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431\u20133440","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"1242_CR2","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556"},{"key":"1242_CR3","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881\u20132890","DOI":"10.1109\/CVPR.2017.660"},{"key":"1242_CR4","doi-asserted-by":"crossref","unstructured":"Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925\u20131934","DOI":"10.1109\/CVPR.2017.549"},{"issue":"4","key":"1242_CR5","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","volume":"40","author":"L-C Chen","year":"2017","unstructured":"Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834\u2013848","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1242_CR6","doi-asserted-by":"crossref","unstructured":"Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251\u20131258","DOI":"10.1109\/CVPR.2017.195"},{"key":"1242_CR7","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"1242_CR8","doi-asserted-by":"crossref","unstructured":"Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510\u20134520","DOI":"10.1109\/CVPR.2018.00474"},{"key":"1242_CR9","doi-asserted-by":"crossref","unstructured":"Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848\u20136856","DOI":"10.1109\/CVPR.2018.00716"},{"key":"1242_CR10","doi-asserted-by":"crossref","unstructured":"Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405\u2013420","DOI":"10.1007\/978-3-030-01219-9_25"},{"key":"1242_CR11","doi-asserted-by":"crossref","unstructured":"Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325\u2013341","DOI":"10.1007\/978-3-030-01261-8_20"},{"issue":"11","key":"1242_CR12","doi-asserted-by":"publisher","first-page":"3051","DOI":"10.1007\/s11263-021-01515-2","volume":"129","author":"C Yu","year":"2021","unstructured":"Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2021) Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int J Comput Vis 129(11):3051\u20133068","journal-title":"Int J Comput Vis"},{"key":"1242_CR13","unstructured":"Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587"},{"key":"1242_CR14","doi-asserted-by":"crossref","unstructured":"Chen L-C, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640\u20133649","DOI":"10.1109\/CVPR.2016.396"},{"key":"1242_CR15","doi-asserted-by":"crossref","unstructured":"Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder\u2013decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801\u2013818","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"1242_CR16","doi-asserted-by":"crossref","unstructured":"Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7151\u20137160","DOI":"10.1109\/CVPR.2018.00747"},{"key":"1242_CR17","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"issue":"12","key":"1242_CR18","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2017","unstructured":"Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481\u20132495","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1242_CR19","unstructured":"Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062"},{"key":"1242_CR20","unstructured":"Tao A, Sapra K, Catanzaro B (2020) Hierarchical multi-scale attention for semantic segmentation. arXiv:2005.10821"},{"key":"1242_CR21","unstructured":"Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. Adv Neural Inf Process Syst 28:1135\u20131143"},{"key":"1242_CR22","unstructured":"Chen X, Wang Y, Zhang Y, Du P, Xu C, Xu C (2020) Multi-task pruning for semantic segmentation networks. arXiv:2007.08386"},{"key":"1242_CR23","unstructured":"Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861"},{"key":"1242_CR24","doi-asserted-by":"crossref","unstructured":"Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, etal (2019) Searching for mobilenetv3. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 1314\u20131324","DOI":"10.1109\/ICCV.2019.00140"},{"key":"1242_CR25","unstructured":"Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105\u20136114"},{"key":"1242_CR26","unstructured":"Zhang Y, Yao T, Qiu Z, Mei T (2022) Lightweight and progressively-scalable networks for semantic segmentation. arXiv:2207.13600"},{"key":"1242_CR27","doi-asserted-by":"crossref","unstructured":"Fan M, Lai S, Huang J, Wei X, Chai Z, Luo J, Wei X (2021) Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 9716\u20139725","DOI":"10.1109\/CVPR46437.2021.00959"},{"key":"1242_CR28","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929"},{"key":"1242_CR29","doi-asserted-by":"crossref","unstructured":"Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 568\u2013578","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"1242_CR30","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 10012\u201310022","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"1242_CR31","first-page":"9355","volume":"34","author":"X Chu","year":"2021","unstructured":"Chu X, Tian Z, Wang Y, Zhang B, Ren H, Wei X, Xia H, Shen C (2021) Twins: Revisiting the design of spatial attention in vision transformers. Adv Neural Inf Process Syst 34:9355\u20139366","journal-title":"Adv Neural Inf Process Syst"},{"key":"1242_CR32","doi-asserted-by":"crossref","unstructured":"Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: introducing convolutions to vision transformers. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 22\u201331","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"1242_CR33","first-page":"12077","volume":"34","author":"E Xie","year":"2021","unstructured":"Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077\u201312090","journal-title":"Adv Neural Inf Process Syst"},{"key":"1242_CR34","doi-asserted-by":"crossref","unstructured":"Graham B, El-Nouby A, Touvron H, Stock P, Joulin A, J\u00e9gou H, Douze M (2021) Levit: a vision transformer in convnet\u2019s clothing for faster inference. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 12259\u201312269","DOI":"10.1109\/ICCV48922.2021.01204"},{"key":"1242_CR35","doi-asserted-by":"crossref","unstructured":"Zhang W, Huang Z, Luo G, Chen T, Wang X, Liu W, Yu G, Shen C (2022) Topformer: token pyramid transformer for mobile semantic segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 12083\u201312093","DOI":"10.1109\/CVPR52688.2022.01177"},{"key":"1242_CR36","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"1242_CR37","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:6000\u20136010"},{"key":"1242_CR38","doi-asserted-by":"crossref","unstructured":"Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 3146\u20133154","DOI":"10.1109\/CVPR.2019.00326"},{"key":"1242_CR39","doi-asserted-by":"crossref","unstructured":"Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213\u20133223","DOI":"10.1109\/CVPR.2016.350"},{"key":"1242_CR40","doi-asserted-by":"crossref","unstructured":"Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 633\u2013641","DOI":"10.1109\/CVPR.2017.544"},{"key":"1242_CR41","doi-asserted-by":"crossref","unstructured":"Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT\u20192010. Springer, pp 177\u2013186","DOI":"10.1007\/978-3-7908-2604-3_16"},{"key":"1242_CR42","doi-asserted-by":"crossref","unstructured":"Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761\u2013769","DOI":"10.1109\/CVPR.2016.89"},{"key":"1242_CR43","doi-asserted-by":"crossref","unstructured":"Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7151\u20137160","DOI":"10.1109\/CVPR.2018.00747"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01242-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01242-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01242-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,30]],"date-time":"2024-03-30T15:20:55Z","timestamp":1711812055000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01242-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,6]]},"references-count":43,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["1242"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01242-w","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2023,10,6]]},"assertion":[{"value":"19 April 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"All authors agreed to participate in this paper.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}}]}}