{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,4,5]],"date-time":"2024-04-05T13:33:36Z","timestamp":1712324016805},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T00:00:00Z","timestamp":1683849600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T00:00:00Z","timestamp":1683849600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>It has been proved that the two-branch network architecture for real-time semantic segmentation is effectiveness. However, existing methods still can not obtain sufficient context information and sufficient detailed information, which limits the improvement of the accuracy of existed two-branch methods. In this paper, we proposed a real-time high-precision semantic segmentation network based on a novel multi-resolution feature fusion module, an auxiliary feature extracting module, an upsampling module and multi-ASPP(atrous spatial pyramid pooling) module. We designed a feature fusion module, which is integrated with sufficient features of different resolutions to help the network get both sufficient semantic information and sufficient detailed information. We also studied the effect of the side-branch architecture on the network, and made new discoveries that the role of the side-branch is more than regularization, it may either slow down the convergence or accelerate the convergence by influencing the gradient of different layers of the network, which is dependent on the parameters of the network and the input data. Based on the new discoveries about the side-branch architecture, we used a side-branch auxiliary feature extraction layer in the network to improve the performance of the network. We also designed an upsampling module, which can get better detailed information than the original upsampling module. In addition, we also re-considered the locations and number of atrous spatial pyramid pooling (ASPP) modules, and modified the network architecture according to the experimental results to further improve the performance of the network. We proposed a network based on the above study. We named this network Deep Multiple-resolution Bilateral Network for Real-time, referred to as DMBR-Net. The network proposed in the paper achieved 81.3% mIoU(Mean Intersection over Union) at 110FPS on the Cityscapes validation dataset, 80.7% mIoU at 104FPS on the CamVid test dataset, 32.2% mIoU at 78FPS on the COCO-Stuff test dataset.<\/jats:p>","DOI":"10.1007\/s40747-023-01046-y","type":"journal-article","created":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T09:02:26Z","timestamp":1683882146000},"page":"6427-6436","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["DMBR-Net: deep multiple-resolution bilateral network for real-time and accurate semantic segmentation"],"prefix":"10.1007","volume":"9","author":[{"given":"Pengfei","family":"Meng","sequence":"first","affiliation":[]},{"given":"Shuangcheng","family":"Jia","sequence":"additional","affiliation":[]},{"given":"Qian","family":"Li","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,12]]},"reference":[{"key":"1046_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2017","unstructured":"Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder\u2013decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39:1","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"2","key":"1046_CR2","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1016\/j.patrec.2008.04.005","volume":"30","author":"GJ Brostow","year":"2009","unstructured":"Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88\u201397","journal-title":"Pattern Recogn Lett"},{"key":"1046_CR3","unstructured":"Caesar H, Uijlings JRR, Ferrari V (2016) Coco-stuff: Thing and stuff classes in context. CoRR, arXiv:1612.03716"},{"key":"1046_CR4","first-page":"357","volume":"4","author":"LC Chen","year":"2014","unstructured":"Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. Comput Sci 4:357\u2013361","journal-title":"Comput Sci"},{"issue":"4","key":"1046_CR5","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","volume":"40","author":"LC Chen","year":"2018","unstructured":"Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834\u2013848","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1046_CR6","unstructured":"Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation"},{"key":"1046_CR7","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_49","volume-title":"Encoder-decoder with atrous separable convolution for semantic image segmentation","author":"LC Chen","year":"2018","unstructured":"Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. Springer, Cham"},{"key":"1046_CR8","doi-asserted-by":"crossref","unstructured":"Cordts M, Omran M, Ramos S, Rehfeld T, Schiele B (2016)The cityscapes dataset for semantic urban scene understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","DOI":"10.1109\/CVPR.2016.350"},{"key":"1046_CR9","doi-asserted-by":"crossref","unstructured":"Fan M, Lai S, Huang J, Wei X, Chai Z, Luo J, Wei X (2021) Rethinking bisenet for real-time semantic segmentation. CoRR, arXiv:2104.13188","DOI":"10.1109\/CVPR46437.2021.00959"},{"key":"1046_CR10","doi-asserted-by":"crossref","unstructured":"Fu J, Liu J, Tian H, Fang Z, Lu H (2018) Dual attention network for scene segmentation","DOI":"10.1109\/CVPR.2019.00326"},{"key":"1046_CR11","doi-asserted-by":"crossref","unstructured":"Gamal M, Siam M, Abdel-Razek M (2018) Shuffleseg: real-time semantic segmentation network","DOI":"10.1109\/ICIP.2018.8451495"},{"key":"1046_CR12","unstructured":"Hong Y, Pan H, Sun W (2021) Senior Member, IEEE, and Yisong Jia. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes"},{"key":"1046_CR13","doi-asserted-by":"crossref","unstructured":"Hu P, Heilbron FC, Wang O, Lin ZL, Sclaroff S, Perazzi F (2020) Temporally distributed networks for fast video semantic segmentation. CoRR, arXiv:2004.01800","DOI":"10.1109\/CVPR42600.2020.00884"},{"key":"1046_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TPAMI.2020.2977911","volume":"99","author":"Z Huang","year":"2020","unstructured":"Huang Z, Wang X, Wei Y, Huang L, Huang TS (2020) Ccnet: Criss-cross attention for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 99:1","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1046_CR15","doi-asserted-by":"crossref","unstructured":"Kumaar S, Lyu Y, Nex F, Michael YY (2020) Efficient context aggregation network for low-latency semantic segmentation, Cabinet","DOI":"10.1109\/ICRA48506.2021.9560977"},{"key":"1046_CR16","doi-asserted-by":"crossref","unstructured":"Li H, Xiong P, Fan H, Sun J (2020) Dfanet: Deep feature aggregation for real-time semantic segmentation. In 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","DOI":"10.1109\/CVPR.2019.00975"},{"key":"1046_CR17","doi-asserted-by":"crossref","unstructured":"Li X, Zhou Y, Pan Z, Feng J (2019) Partial order pruning: for best speed\/accuracy trade-off in neural architecture search. IEEE","DOI":"10.1109\/CVPR.2019.00936"},{"key":"1046_CR18","doi-asserted-by":"crossref","unstructured":"Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","DOI":"10.1109\/CVPR.2017.549"},{"key":"1046_CR19","doi-asserted-by":"crossref","unstructured":"Lin P, Sun P, Cheng G, Xie S, Shi J (2020) Graph-guided architecture search for real-time semantic segmentation. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","DOI":"10.1109\/CVPR42600.2020.00426"},{"issue":"4","key":"1046_CR20","first-page":"640","volume":"39","author":"J Long","year":"2015","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640\u2013651","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1046_CR21","doi-asserted-by":"crossref","unstructured":"Nirkin Y, Wolf L, Hassner T (2021) Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation. In Computer Vision and Pattern Recognition","DOI":"10.1109\/CVPR46437.2021.00405"},{"key":"1046_CR22","doi-asserted-by":"crossref","unstructured":"Orsic M, Kreso I, Bevandic P, Segvic S (2019) In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. CoRR, arXiv:1903.08469","DOI":"10.1109\/CVPR.2019.01289"},{"key":"1046_CR23","unstructured":"Paszke A, Chaurasia Ak, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. CoRR, arXiv:1606.02147"},{"key":"1046_CR24","unstructured":"Peng J, Liu Y, Tang S, Hao Y, Chu L, Chen G, Wu Z, Chen Z, Yu Z, Du Y (2022) Pp-liteseg: A superior real-time semantic segmentation model"},{"key":"1046_CR25","unstructured":"Poudel RPK, Bonde U, Liwicki S, Zach C (2018) Contextnet: Exploring context and detail for semantic segmentation in real-time. CoRR, arXiv:1805.04554"},{"key":"1046_CR26","unstructured":"Poudel RPK, Liwicki S, Cipolla R (2019) Fast-scnn: Fast semantic segmentation network. CoRR, arXiv:1902.04502"},{"key":"1046_CR27","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Springer International Publishing, Berlin","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"1046_CR28","doi-asserted-by":"crossref","unstructured":"Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen LC (2018) Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR, arXiv:1801.04381","DOI":"10.1109\/CVPR.2018.00474"},{"key":"1046_CR29","unstructured":"Shi W, Caballero J, Theis L, Huszar F, Wang Z (2016) Is the deconvolution layer the same as a convolutional layer?"},{"key":"1046_CR30","unstructured":"Si H, Zhang Z, Lv F, Yu G, Lu F (2019) Real-time semantic segmentation via multiply spatial fusion network"},{"key":"1046_CR31","doi-asserted-by":"crossref","unstructured":"Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818\u20132826","DOI":"10.1109\/CVPR.2016.308"},{"key":"1046_CR32","unstructured":"Tao A, Sapra K, Catanzaro B (2020) Hierarchical multi-scale attention for semantic segmentation. CoRR, arXiv:2005.10821"},{"key":"1046_CR33","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. CoRR, arXiv:1706.03762"},{"key":"1046_CR34","unstructured":"Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2019) Deep high-resolution representation learning for visual recognition. CoRR, arXiv:1908.07919"},{"key":"1046_CR35","doi-asserted-by":"crossref","unstructured":"Wang X, Girshick RB, Gupta A, He K (2017) Non-local neural networks. CoRR, arXiv:1711.07971","DOI":"10.1109\/CVPR.2018.00813"},{"key":"1046_CR36","unstructured":"Xu K, Guan K, Peng J, Luo Y, Wang S (2019) Deepmask: an algorithm for cloud and cloud shadow detection in optical satellite remote sensing images using deep residual network"},{"key":"1046_CR37","doi-asserted-by":"crossref","unstructured":"Yu C, Gao C, Wang J, Yu G, Shen C, Sang N(2020) Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation","DOI":"10.1007\/s11263-021-01515-2"},{"key":"1046_CR38","doi-asserted-by":"crossref","unstructured":"Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. European Conference on Computer Vision,","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"1046_CR39","doi-asserted-by":"crossref","unstructured":"Yuan Y, Chen X, Wang J (2020) Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation. In European Conference on Computer Vision","DOI":"10.1007\/978-3-030-58539-6_11"},{"key":"1046_CR40","doi-asserted-by":"crossref","unstructured":"Zhang X, Zhou X, Lin M, Jian S (2017) An extremely efficient convolutional neural network for mobile devices, Shufflenet","DOI":"10.1109\/CVPR.2018.00716"},{"key":"1046_CR41","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. In IEEE Computer Society,","DOI":"10.1109\/CVPR.2017.660"},{"key":"1046_CR42","doi-asserted-by":"crossref","unstructured":"Zhao H, Qi X, Shen X, Shi J, Jia J (2017) Icnet for real-time semantic segmentation on high-resolution images. CoRR, arXiv:1704.08545","DOI":"10.1007\/978-3-030-01219-9_25"},{"key":"1046_CR43","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. CoRR, arXiv:1612.01105","DOI":"10.1109\/CVPR.2017.660"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01046-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01046-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01046-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T19:17:05Z","timestamp":1698434225000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01046-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,12]]},"references-count":43,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["1046"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01046-y","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,12]]},"assertion":[{"value":"15 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declared that they have no conflicts of interest to this work.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}