{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T17:57:28Z","timestamp":1769277448142,"version":"3.49.0"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T00:00:00Z","timestamp":1681948800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T00:00:00Z","timestamp":1681948800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61472220"],"award-info":[{"award-number":["61472220"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61572286"],"award-info":[{"award-number":["61572286"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The ability to capture pixels' long-distance interdependence is beneficial to semantic segmentation. In addition, semantic segmentation requires the effective use of pixel-to-pixel similarity in the channel direction to enhance pixel regions. Asymmetric Non-local Neural Networks (ANNet) combine multi-scale spatial pyramidal pooling modules and Non-local blocks to reduce model parameters without sacrificing performance. However, ANNet does not consider pixel similarity in the channel direction in the feature map, so its segmentation effect is not ideal. This article proposes a Mutually Reinforcing Non-local Neural Networks (MRNNet) to improve ANNet. MRNNet consists specifically of the channel enhancement regions module (CERM), and the position-enhanced pixels module (PEPM). In contrast to Asymmetric Fusion Non-local Block (AFNB) in ANNet, CERM does not combine the feature maps of the high and low stages, but rather utilizes the auxiliary loss function of ANNet. Calculating the similarity between feature maps in channel direction improves the category representation of feature maps in the channel aspect and reduces matrix multiplication computation. PEPM enhances pixels in the spatial direction of the feature map by calculating the similarity between pixels in the spatial direction of the feature map. Experiments reveal that our segmentation accuracy for cityscapes test data reaches 81.9%. Compared to ANNet, the model's parameters are reduced by 11.35\u00a0(M). Given ten different pictures with a size of 2048\u2009\u00d7\u20091024, the average reasoning time of MRNNet is 0.103(s) faster than that of the ANNet model.<\/jats:p>","DOI":"10.1007\/s40747-023-01056-w","type":"journal-article","created":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T08:22:31Z","timestamp":1681978951000},"page":"6037-6049","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Mutually reinforcing non-local neural networks for semantic segmentation"],"prefix":"10.1007","volume":"9","author":[{"given":"Tianping","family":"Li","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanjun","family":"Wei","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhaotong","family":"Cui","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guanxing","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Meng","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hua","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,4,20]]},"reference":[{"key":"1056_CR1","doi-asserted-by":"publisher","unstructured":"Zhou B, Zhao H, Puig X, et al (2017) Scene parsing through ADE20K dataset. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, HI, pp 5122\u20135130.https:\/\/doi.org\/10.1109\/CVPR.2017.544","DOI":"10.1109\/CVPR.2017.544"},{"issue":"4","key":"1056_CR2","doi-asserted-by":"publisher","first-page":"648","DOI":"10.1109\/TSMC.2016.2623683","volume":"47","author":"Y Li","year":"2016","unstructured":"Li Y, Guo Y, Kao Y, He R (2016) Image piece learning for weakly supervised semantic segmentation. IEEE Trans Systems Man Cybern Syst 47(4):648\u2013659. https:\/\/doi.org\/10.1109\/TSMC.2016.2623683","journal-title":"IEEE Trans Systems Man Cybern Syst"},{"issue":"12","key":"1056_CR3","doi-asserted-by":"publisher","first-page":"25489","DOI":"10.1109\/TITS.2021.3098355","volume":"23","author":"G Gao","year":"2021","unstructured":"Gao G, Xu G, Yu Y et al (2021) MSCFNet: a lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Trans Intell Transp Syst 23(12):25489\u201325499. https:\/\/doi.org\/10.1109\/TITS.2021.3098355","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"1056_CR4","doi-asserted-by":"publisher","unstructured":"Teichmann M, Weber M, Zollner M, et al (2018) MultiNet: real-time joint semantic reasoning for autonomous driving. In: 2018 IEEE intelligent vehicles symposium (IV). IEEE, Changshu, June 2018, pp 1013\u20131020. https:\/\/doi.org\/10.1109\/IVS.2018.8500504","DOI":"10.1109\/IVS.2018.8500504"},{"key":"1056_CR5","doi-asserted-by":"publisher","unstructured":"Siam M, Elkerdawy S, Jagersand M, Yogamani S (2017) Deep semantic segmentation for automated driving: taxonomy, roadmap and challenges. In: 2017 IEEE 20th international conference on intelligent transportation systems (ITSC). IEEE, Yokohama, pp 1\u20138. https:\/\/doi.org\/10.1109\/ITSC.2017.8317714","DOI":"10.1109\/ITSC.2017.8317714"},{"key":"1056_CR6","doi-asserted-by":"publisher","first-page":"1430","DOI":"10.1109\/JPROC.2003.817125","volume":"91","author":"M Hardens","year":"2003","unstructured":"Hardens M, Szekely G (2003) Enhancing human-computer interaction in medical segmentation. Proc IEEE 91:1430\u20131442. https:\/\/doi.org\/10.1109\/JPROC.2003.817125","journal-title":"Proc IEEE"},{"key":"1056_CR7","unstructured":"Alhaija H A, Mustikovela S K, Mescheder L, et al (2017) Augmented reality meets deep learning for car instance segmentation in urban scenes. In: British Machine Vision Conference, vol 1, p 2"},{"key":"1056_CR8","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-030-00889-5_1","volume-title":"Deep learning in medical image analysis and multimodal learning for clinical decision support","author":"Z Zhou","year":"2018","unstructured":"Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In: Stoyanov D, Taylor Z, Carneiro G et al (eds) Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer International Publishing, Cham, pp 3\u201311"},{"key":"1056_CR9","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y Lecun","year":"1998","unstructured":"Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278\u20132324. https:\/\/doi.org\/10.1109\/5.726791","journal-title":"Proc IEEE"},{"key":"1056_CR10","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp 1\u20139","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"1056_CR11","doi-asserted-by":"publisher","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv preprint arXiv:1409.1556. https:\/\/doi.org\/10.48550\/ARXIV.1409.1556","DOI":"10.48550\/ARXIV.1409.1556"},{"key":"1056_CR12","doi-asserted-by":"publisher","first-page":"112045","DOI":"10.1016\/j.rse.2020.112045","volume":"250","author":"Y Li","year":"2020","unstructured":"Li Y, Chen W, Zhang Y et al (2020) Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning. Remote Sens Environ 250:112045. https:\/\/doi.org\/10.1016\/j.rse.2020.112045","journal-title":"Remote Sens Environ"},{"key":"1056_CR13","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1016\/j.isprsjprs.2019.10.001","volume":"158","author":"C Tao","year":"2019","unstructured":"Tao C, Qi J, Li Y et al (2019) Spatial information inference net: Road extraction using road-specific contextual information. ISPRS J Photogramm Remote Sens 158:155\u2013166. https:\/\/doi.org\/10.1016\/j.isprsjprs.2019.10.001","journal-title":"ISPRS J Photogramm Remote Sens"},{"issue":"4","key":"1056_CR14","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","volume":"39","author":"J Long","year":"2015","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640\u2013651. https:\/\/doi.org\/10.1109\/TPAMI.2016.2572683","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1056_CR15","doi-asserted-by":"publisher","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770\u2013778. https:\/\/doi.org\/10.1109\/CVPR.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"1056_CR16","doi-asserted-by":"crossref","unstructured":"Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision\u2014ECCV 2020. Springer International Publishing, Cham, pp 173\u2013190","DOI":"10.1007\/978-3-030-58539-6_11"},{"key":"1056_CR17","doi-asserted-by":"publisher","unstructured":"Chen L-C, Zhu Y, Papandreou G, et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on computer vision (ECCV), pp 801\u2013818. https:\/\/doi.org\/10.48550\/ARXIV.1802.02611","DOI":"10.48550\/ARXIV.1802.02611"},{"key":"1056_CR18","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","volume":"40","author":"L-C Chen","year":"2018","unstructured":"Chen L-C, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40:834\u2013848. https:\/\/doi.org\/10.1109\/TPAMI.2017.2699184","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1056_CR19","unstructured":"Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. In: arXiv preprint arXiv:1706.05587"},{"key":"1056_CR20","unstructured":"Badrinarayanan V, Handa A, Cipolla R (2015) SegNet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. In: arXiv preprint arXiv:1505.07293"},{"key":"1056_CR21","doi-asserted-by":"publisher","unstructured":"Zhao H, Shi J, Qi X, et al (2017) Pyramid scene parsing network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, pp 6230\u20136239. https:\/\/doi.org\/10.1109\/CVPR.2017.660","DOI":"10.1109\/CVPR.2017.660"},{"key":"1056_CR22","doi-asserted-by":"publisher","unstructured":"Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp 7794\u20137803. https:\/\/doi.org\/10.1109\/CVPR.2018.00813","DOI":"10.1109\/CVPR.2018.00813"},{"key":"1056_CR23","doi-asserted-by":"publisher","unstructured":"Zhu Z, Xu M, Bai S et al (2019) Asymmetric non-local neural networks for semantic segmentation. In: 2019 IEEE\/CVF international conference on computer vision (ICCV), Seoul, Korea (South), pp 593\u2013602. https:\/\/doi.org\/10.1109\/ICCV.2019.00068","DOI":"10.1109\/ICCV.2019.00068"},{"key":"1056_CR24","doi-asserted-by":"publisher","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","volume":"37","author":"K He","year":"2015","unstructured":"He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904\u20131916. https:\/\/doi.org\/10.1109\/TPAMI.2015.2389824","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1056_CR25","doi-asserted-by":"publisher","unstructured":"Lazebnik S, Schmid C, Ponce J (2006) Beyond Bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on computer vision and pattern recognition (CVPR\u201906), IEEE, New York, NY, USA, pp 2169\u20132178. https:\/\/doi.org\/10.1109\/CVPR.2006.68","DOI":"10.1109\/CVPR.2006.68"},{"key":"1056_CR26","doi-asserted-by":"publisher","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, pp 770\u2013778. https:\/\/doi.org\/10.1109\/CVPR.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"1056_CR27","doi-asserted-by":"publisher","unstructured":"Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: 2019 IEEE\/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA, pp 3146\u20133154. https:\/\/doi.org\/10.1109\/CVPR.2019.00326","DOI":"10.1109\/CVPR.2019.00326"},{"key":"1056_CR28","doi-asserted-by":"publisher","unstructured":"Yu C, Wang J, Peng C et al (2018) Learning a discriminative feature network for semantic segmentation. In: 2018 IEEE\/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA, pp 1857\u20131866. https:\/\/doi.org\/10.1109\/CVPR.2018.00199","DOI":"10.1109\/CVPR.2018.00199"},{"key":"1056_CR29","doi-asserted-by":"publisher","unstructured":"Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on computer vision (ECCV), pp 3\u201319. https:\/\/doi.org\/10.48550\/arXiv.1807.06521","DOI":"10.48550\/arXiv.1807.06521"},{"key":"1056_CR30","doi-asserted-by":"publisher","unstructured":"Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: 2021 IEEE\/CVF conference on computer vision and pattern recognition, Nashville, TN, USA, pp 13713\u201313722. https:\/\/doi.org\/10.1109\/CVPR46437.2021.01350","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"1056_CR31","doi-asserted-by":"publisher","unstructured":"Mottaghi R, Chen X, Liu X et al (2014) The role of context for object detection and semantic segmentation in the wild. In: 2014 IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, pp 891\u2013898. https:\/\/doi.org\/10.1109\/CVPR.2014.119","DOI":"10.1109\/CVPR.2014.119"},{"key":"1056_CR32","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/j.neucom.2019.07.078","volume":"365","author":"B Zhao","year":"2019","unstructured":"Zhao B, Zhang X, Li Z, Hu X (2019) A multi-scale strategy for deep semantic segmentation with convolutional neural networks. Neurocomputing 365:273\u2013284. https:\/\/doi.org\/10.1016\/j.neucom.2019.07.078","journal-title":"Neurocomputing"},{"key":"1056_CR33","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical image computing and computer-assisted intervention\u2014MICCAI 2015. Springer International Publishing, Cham, pp 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"1056_CR34","doi-asserted-by":"publisher","unstructured":"Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on computer vision (ICCV), Santiago, Chile, pp 1520\u20131528. https:\/\/doi.org\/10.1109\/ICCV.2015.178","DOI":"10.1109\/ICCV.2015.178"},{"key":"1056_CR35","doi-asserted-by":"crossref","unstructured":"Juraska J, Walker M (2021) Attention is indeed all you need: semantically attention-guided decoding for data-to-text NLG. In: arXiv preprint arXiv:2109.07043","DOI":"10.18653\/v1\/2021.inlg-1.45"},{"key":"1056_CR36","doi-asserted-by":"publisher","unstructured":"Chaurasia A, Culurciello E (2017) LinkNet: exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE visual communications and image processing (VCIP), St. Petersburg, FL, USA, pp 1\u20134. https:\/\/doi.org\/10.1109\/VCIP.2017.8305148","DOI":"10.1109\/VCIP.2017.8305148"},{"key":"1056_CR37","doi-asserted-by":"publisher","first-page":"3349","DOI":"10.1109\/TPAMI.2020.2983686","volume":"43","author":"J Wang","year":"2021","unstructured":"Wang J, Sun K, Cheng T et al (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43:3349\u20133364. https:\/\/doi.org\/10.1109\/TPAMI.2020.2983686","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1056_CR38","doi-asserted-by":"publisher","unstructured":"Zheng S, Lu J, Zhao H et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: 2021 IEEE\/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, pp 6881\u20136890. https:\/\/doi.org\/10.1109\/CVPR46437.2021.00681","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"1056_CR39","doi-asserted-by":"publisher","unstructured":"Xie E, Wang W, Yu Z et al (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077\u201312090. https:\/\/doi.org\/10.48550\/arXiv.2105.15203","DOI":"10.48550\/arXiv.2105.15203"},{"key":"1056_CR40","doi-asserted-by":"publisher","unstructured":"YUAN Y, Fu R, Huang L et al (2021) HRFormer: high-resolution vision transformer for dense predict. Adv Neural Inf Process Syst 34:7281\u20137293. https:\/\/doi.org\/10.48550\/arXiv.2110.09408","DOI":"10.48550\/arXiv.2110.09408"},{"key":"1056_CR41","doi-asserted-by":"publisher","unstructured":"Zhang H, Wu C, Zhang Z et al (2022) ResNeSt: split-attention networks. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2736\u20132746. https:\/\/doi.org\/10.48550\/arXiv.2004.08955","DOI":"10.48550\/arXiv.2004.08955"},{"key":"1056_CR42","doi-asserted-by":"publisher","unstructured":"Zhen M, Wang J, Zhou L et al (2020) Joint semantic segmentation and boundary detection using iterative pyramid contexts. In: Proceedings of the IEEE\/CVF Conference on computer vision and pattern recognition, pp 13666\u201313675. https:\/\/doi.org\/10.48550\/arXiv.2004.07684","DOI":"10.48550\/arXiv.2004.07684"},{"key":"1056_CR43","doi-asserted-by":"crossref","unstructured":"Li X, Li X, Zhang L, et al (2020) Improving semantic segmentation via decoupled body and edge supervision. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision\u2014ECCV 2020. Springer International Publishing, Cham, pp 435\u2013452","DOI":"10.1007\/978-3-030-58520-4_26"},{"key":"1056_CR44","doi-asserted-by":"crossref","unstructured":"Yuan Y, Xie J, Chen X, Wang J (2020) SegFix: model-agnostic boundary refinement for segmentation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision\u2014ECCV 2020. Springer International Publishing, Cham, pp 489\u2013506","DOI":"10.1007\/978-3-030-58610-2_29"},{"key":"1056_CR45","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3211006","author":"M-H Guo","year":"2022","unstructured":"Guo M-H, Liu Z-N, Mu T-J, Hu S-M (2022) Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell. https:\/\/doi.org\/10.1109\/TPAMI.2022.3211006","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1056_CR46","doi-asserted-by":"publisher","unstructured":"Chen C-F (Richard), Fan Q, Panda R (2021) CrossViT: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE\/CVF International conference on computer vision, pp 357\u2013366. https:\/\/doi.org\/10.48550\/arXiv.2103.14899","DOI":"10.48550\/arXiv.2103.14899"},{"key":"1056_CR47","doi-asserted-by":"publisher","unstructured":"Liu Z, Lin Y, Cao Y, et al (2021) Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 10012\u201310022. https:\/\/doi.org\/10.48550\/arXiv.2103.14030","DOI":"10.48550\/arXiv.2103.14030"},{"key":"1056_CR48","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1007\/s41095-021-0229-5","volume":"7","author":"M-H Guo","year":"2021","unstructured":"Guo M-H, Cai J-X, Liu Z-N et al (2021) PCT: Point cloud transformer. Comp Vis Media 7:187\u2013199. https:\/\/doi.org\/10.1007\/s41095-021-0229-5","journal-title":"Comp Vis Media"},{"key":"1056_CR49","doi-asserted-by":"publisher","unstructured":"Chen L, Zhang H, Xiao J et al (2017) SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, pp 6298\u20136306. https:\/\/doi.org\/10.1109\/CVPR.2017.667","DOI":"10.1109\/CVPR.2017.667"},{"key":"1056_CR50","doi-asserted-by":"publisher","unstructured":"Wang Q, Wu B, Zhu P, et al (2020) Supplementary Material for \u201cECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks\u201d. In: 2020 IEEE\/CVF Conference on computer vision and pattern recognition (CVPR), Seattle, WA, USA, pp 11531\u201311539. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01155","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"1056_CR51","doi-asserted-by":"crossref","unstructured":"Huang Z, Shi X, Zhang C, et al (2022) FlowFormer: a transformer architecture for optical flow. In: arXiv preprint arXiv:2203.16194","DOI":"10.1007\/978-3-031-19790-1_40"},{"key":"1056_CR52","unstructured":"Yuan Y, Huang L, Guo J, et al (2018) OCNet: object context network for scene parsing. In: arXiv preprint arXiv:1809.00916."},{"key":"1056_CR53","doi-asserted-by":"publisher","unstructured":"Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: International conference on machine learning, PMLR, pp 7354\u20137363. https:\/\/doi.org\/10.48550\/arXiv.1805.08318","DOI":"10.48550\/arXiv.1805.08318"},{"issue":"2","key":"1056_CR54","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham M, Van Gool L, Williams C et al (2010) The PASCAL visual object classes challenge 2012 (VOC2012) development kit. Int J Comput Vision 88(2):303\u2013338","journal-title":"Int J Comput Vision"},{"key":"1056_CR55","doi-asserted-by":"publisher","unstructured":"Cordts M, Omran M, Ramos S et al (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, pp 3213\u20133223. https:\/\/doi.org\/10.1109\/CVPR.2016.350","DOI":"10.1109\/CVPR.2016.350"},{"issue":"3","key":"1056_CR56","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1007\/s11263-018-1140-0","volume":"127","author":"B Zhou","year":"2019","unstructured":"Zhou B, Zhao H, Puig X et al (2019) Semantic understanding of scenes through the ADE20K dataset. Int J Comput Vision 127(3):302\u2013321. https:\/\/doi.org\/10.1007\/s11263-018-1140-0","journal-title":"Int J Comput Vision"},{"issue":"6","key":"1056_CR57","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","volume":"60","author":"A Krizhevsky","year":"2017","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84\u201390. https:\/\/doi.org\/10.1145\/3065386","journal-title":"Commun ACM"},{"key":"1056_CR58","doi-asserted-by":"publisher","unstructured":"Deng J, Dong W, Socher R et al (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, Miami, FL, USA, pp 248\u2013255. https:\/\/doi.org\/10.1109\/CVPR.2009.5206848","DOI":"10.1109\/CVPR.2009.5206848"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01056-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01056-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01056-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T02:11:18Z","timestamp":1702260678000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01056-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,20]]},"references-count":58,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["1056"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01056-w","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,20]]},"assertion":[{"value":"25 October 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflicts of interest in the publication of this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}