{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T08:11:45Z","timestamp":1776413505202,"version":"3.51.2"},"reference-count":77,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,3,4]],"date-time":"2024-03-04T00:00:00Z","timestamp":1709510400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,4]],"date-time":"2024-03-04T00:00:00Z","timestamp":1709510400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Semantic Segmentation has been widely used in a variety of clinical images, which greatly assists medical diagnosis and other work. To address the challenge of reduced semantic inference accuracy caused by feature weakening, a pioneering network called FTUNet (Feature-enhanced Transformer UNet) was introduced, leveraging the classical Encoder-Decoder architecture. Firstly, a dual-branch Encoder is proposed based on the U-shaped structure. In addition to employing convolution for feature extraction, a Layer Transformer structure (LTrans) is established to capture long-range dependencies and global context information. Then, an Inception structural module focusing on local features is proposed at the Bottleneck, which adopts the dilated convolution to amplify the receptive field to achieve deeper semantic mining based on the comprehensive information brought by the dual Encoder. Finally, in order to amplify feature differences, a lightweight attention mechanism of feature polarization is proposed at Skip Connection, which can strengthen or suppress feature channels by reallocating weights. The experiment is conducted on 3 different medical datasets. A comprehensive and detailed comparison was conducted with 6 non-U-shaped models, 5 U-shaped models, and 3 Transformer models in 8 categories of indicators. Meanwhile, 9 kinds of layer-by-layer ablation and 4 kinds of other embedding attempts are implemented to demonstrate the optimal structure of the current FTUNet.<\/jats:p>","DOI":"10.1007\/s11063-024-11533-z","type":"journal-article","created":{"date-parts":[[2024,3,4]],"date-time":"2024-03-04T06:02:20Z","timestamp":1709532140000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer"],"prefix":"10.1007","volume":"56","author":[{"given":"Yuefei","family":"Wang","sequence":"first","affiliation":[]},{"given":"Xi","family":"Yu","sequence":"additional","affiliation":[]},{"given":"Yixi","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Shijie","family":"Zeng","sequence":"additional","affiliation":[]},{"given":"Yuquan","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Ronghui","family":"Feng","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,3,4]]},"reference":[{"key":"11533_CR1","doi-asserted-by":"crossref","unstructured":"Voulodimos A, Doulamis N, Doulamis A, et al (2018) Deep learning for computer vision: a brief review. Comput Intell Neurosci 1\u201313","DOI":"10.1155\/2018\/7068349"},{"key":"11533_CR2","doi-asserted-by":"crossref","unstructured":"Garcia-Garcia A, Orts-Escolano S, Oprea S, et al (2017) A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857","DOI":"10.1016\/j.asoc.2018.05.018"},{"key":"11533_CR3","doi-asserted-by":"publisher","first-page":"626","DOI":"10.1016\/j.neucom.2022.01.005","volume":"493","author":"Y Mo","year":"2022","unstructured":"Mo Y, Wu Y, Yang X et al (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626\u2013646","journal-title":"Neurocomputing"},{"key":"11533_CR4","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1016\/j.neucom.2019.11.118","volume":"406","author":"S Hao","year":"2020","unstructured":"Hao S, Zhou Y, Guo Y (2020) A brief survey on semantic segmentation with deep learning. Neurocomputing 406:302\u2013321","journal-title":"Neurocomputing"},{"issue":"5786","key":"11533_CR5","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1126\/science.1127647","volume":"313","author":"GE Hinton","year":"2006","unstructured":"Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504\u2013507","journal-title":"Science"},{"key":"11533_CR6","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1016\/j.media.2017.07.005","volume":"42","author":"G Litjens","year":"2017","unstructured":"Litjens G, Kooi T, Bejnordi BE et al (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60\u201388","journal-title":"Med Image Anal"},{"issue":"5","key":"11533_CR7","doi-asserted-by":"publisher","first-page":"1257","DOI":"10.1007\/s00521-017-3158-6","volume":"29","author":"F Jiang","year":"2018","unstructured":"Jiang F, Grigorev A, Rho S et al (2018) Medical image semantic segmentation based on deep learning. Neural Comput Appl 29(5):1257\u20131265","journal-title":"Neural Comput Appl"},{"issue":"1","key":"11533_CR8","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1007\/s10462-020-09854-1","volume":"54","author":"S Asgari Taghanaki","year":"2021","unstructured":"Asgari Taghanaki S, Abhishek K, Cohen JP et al (2021) Deep semantic segmentation of natural and medical images: a review. Artif Intell Rev 54(1):137\u2013178","journal-title":"Artif Intell Rev"},{"key":"11533_CR9","doi-asserted-by":"crossref","unstructured":"Shamshad F, Khan S, Zamir SW, et al (2022) Transformers in medical imaging: a survey. arXiv preprint arXiv:2201.09873","DOI":"10.1016\/j.media.2023.102802"},{"key":"11533_CR10","volume-title":"Computer and robot vision","author":"RM Haralick","year":"1992","unstructured":"Haralick RM, Shapiro LG (1992) Computer and robot vision. Addison-wesley, Reading"},{"issue":"6","key":"11533_CR11","doi-asserted-by":"publisher","first-page":"e314","DOI":"10.1016\/S2589-7500(20)30085-6","volume":"2","author":"M Monteiro","year":"2020","unstructured":"Monteiro M, Newcombe VFJ, Mathieu F et al (2020) Multiclass semantic segmentation and quantification of traumatic brain injury lesions on head CT using deep learning: an algorithm development and multicentre validation study. Lancet Digital Health 2(6):e314\u2013e322","journal-title":"Lancet Digital Health"},{"issue":"5","key":"11533_CR12","doi-asserted-by":"publisher","first-page":"2019","DOI":"10.1109\/TIP.2014.2311377","volume":"23","author":"J Yu","year":"2014","unstructured":"Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019\u20132032","journal-title":"IEEE Trans Image Process"},{"key":"11533_CR13","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1016\/j.cmpb.2019.07.005","volume":"178","author":"P Tang","year":"2019","unstructured":"Tang P, Liang Q, Yan X et al (2019) Efficient skin lesion segmentation using separable-Unet with stochastic weight averaging. Comput Methods Programs Biomed 178:289\u2013301","journal-title":"Comput Methods Programs Biomed"},{"key":"11533_CR14","doi-asserted-by":"publisher","first-page":"103738","DOI":"10.1016\/j.compbiomed.2020.103738","volume":"120","author":"MK Hasan","year":"2020","unstructured":"Hasan MK, Dahal L, Samarakoon PN et al (2020) DSNet: automatic dermoscopic skin lesion segmentation. Comput Biol Med 120:103738","journal-title":"Comput Biol Med"},{"key":"11533_CR15","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1016\/j.neucom.2022.05.023","volume":"500","author":"Z Huang","year":"2022","unstructured":"Huang Z, Miao J, Song H et al (2022) A novel tongue segmentation method based on improved U-Net. Neurocomputing 500:73\u201389","journal-title":"Neurocomputing"},{"key":"11533_CR16","doi-asserted-by":"crossref","unstructured":"Kaganami H G, Beiji Z (2009) Region-based segmentation versus edge detection. In: 2009 fifth international conference on intelligent information hiding and multimedia signal processing. IEEE, pp 1217\u20131221","DOI":"10.1109\/IIH-MSP.2009.13"},{"issue":"6","key":"11533_CR17","doi-asserted-by":"publisher","first-page":"4259","DOI":"10.1007\/s10462-019-09792-7","volume":"53","author":"M Zhang","year":"2020","unstructured":"Zhang M, Zhou Y, Zhao J et al (2020) A survey of semi-and weakly supervised semantic segmentation of images. Artif Intell Rev 53(6):4259\u20134288","journal-title":"Artif Intell Rev"},{"issue":"5","key":"11533_CR18","doi-asserted-by":"publisher","first-page":"3117","DOI":"10.1002\/int.22814","volume":"37","author":"J Zhang","year":"2022","unstructured":"Zhang J, Yang J, Yu J et al (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. Int J Intell Syst 37(5):3117\u20133141","journal-title":"Int J Intell Syst"},{"key":"11533_CR19","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE Press, NJ, pp 3431\u20133440","DOI":"10.1109\/CVPR.2015.7298965"},{"issue":"12","key":"11533_CR20","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2017","unstructured":"Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans on Pattern Anal Mach Intell 39(12):2481\u20132495","journal-title":"IEEE Trans on Pattern Anal Mach Intell"},{"key":"11533_CR21","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, Cham, pp 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"issue":"2","key":"11533_CR22","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1109\/TPAMI.2019.2932058","volume":"44","author":"J Yu","year":"2019","unstructured":"Yu J, Tan M, Zhang H et al (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563\u2013578","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11533_CR23","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1109\/TPAMI.2022.3152247","volume":"45","author":"K Han","year":"2022","unstructured":"Han K, Wang Y, Chen H et al (2022) A survey on vision transformer. IEEE Tran Pattern Anal Mach Intell 45:87\u2013110","journal-title":"IEEE Tran Pattern Anal Mach Intell"},{"key":"11533_CR24","unstructured":"Zhou D, Kang B, Jin X, et al (2021) Deepvit: towards deeper vision transformer. arXiv preprint arXiv:2103.11886"},{"key":"11533_CR25","unstructured":"Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. Adv Neural Inf Process Syst 30"},{"key":"11533_CR26","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929"},{"key":"11533_CR27","doi-asserted-by":"crossref","unstructured":"Carion N, Massa F, Synnaeve G, et al (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, Cham, pp 213\u2013229","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"11533_CR28","doi-asserted-by":"crossref","unstructured":"Zhou L, Zhou Y, Corso JJ, et al (2018) End-to-end dense video captioning with masked transformer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8739\u20138748","DOI":"10.1109\/CVPR.2018.00911"},{"key":"11533_CR29","doi-asserted-by":"publisher","first-page":"102327","DOI":"10.1016\/j.media.2021.102327","volume":"76","author":"H Wu","year":"2022","unstructured":"Wu H, Chen S, Chen G et al (2022) FAT-Net: feature adaptive transformers for automated skin lesion segmentation. Med Image Anal 76:102327","journal-title":"Med Image Anal"},{"key":"11533_CR30","unstructured":"Touvron H, Cord M, Douze M, et al (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR, pp 10347\u201310357"},{"key":"11533_CR31","unstructured":"Cao H, Wang Y, Chen J, et al (2021) Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537"},{"key":"11533_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2352\/J.ImagingSci.Technol.2020.64.2.020508","volume":"64","author":"G Du","year":"2020","unstructured":"Du G, Cao X, Liang J et al (2020) Medical image segmentation based on u-net: a review. J Imaging Sci Technol 64:1\u201312","journal-title":"J Imaging Sci Technol"},{"key":"11533_CR33","doi-asserted-by":"publisher","first-page":"107952","DOI":"10.1016\/j.patcog.2021.107952","volume":"116","author":"J Zhang","year":"2021","unstructured":"Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recogn 116:107952","journal-title":"Pattern Recogn"},{"key":"11533_CR34","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X, et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881\u20132890","DOI":"10.1109\/CVPR.2017.660"},{"key":"11533_CR35","unstructured":"Chen LC, Papandreou G, Kokkinos I, et al (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062"},{"issue":"4","key":"11533_CR36","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","volume":"40","author":"LC Chen","year":"2017","unstructured":"Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834\u2013848","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11533_CR37","unstructured":"Chen LC, Papandreou G, Schroff F, et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587"},{"key":"11533_CR38","doi-asserted-by":"crossref","unstructured":"Chen L C, Zhu Y, Papandreou G, et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801\u2013818","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"11533_CR39","doi-asserted-by":"crossref","unstructured":"Azad R, Asadi-Aghbolaghi M, Fathy M, et al (2020) Attention deeplabv3+: multi-level context attention mechanism for skin lesion segmentation. In: European conference on computer vision, Springer, Cham, pp 251\u2013266","DOI":"10.1007\/978-3-030-66415-2_16"},{"key":"11533_CR40","doi-asserted-by":"crossref","unstructured":"Lin G, Milan A, Shen C, et al (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925\u20131934","DOI":"10.1109\/CVPR.2017.549"},{"key":"11533_CR41","unstructured":"Xia X, Kulis B (2017) W-net: a deep model for fully unsupervised image segmentation. arXiv preprint arXiv:1711.08506"},{"key":"11533_CR42","doi-asserted-by":"crossref","unstructured":"Qi K, Yang H, Li C, et al (2019) X-net: brain stroke lesion segmentation based on depthwise separable convolution and long-range dependencies. In: International conference on medical image computing and computer-assisted intervention, Springer, Cham, pp 247\u2013255","DOI":"10.1007\/978-3-030-32248-9_28"},{"issue":"10","key":"11533_CR43","doi-asserted-by":"publisher","first-page":"2281","DOI":"10.1109\/TMI.2019.2903562","volume":"38","author":"Z Gu","year":"2019","unstructured":"Gu Z, Cheng J, Fu H et al (2019) Ce-net: context encoder network for 2d medical image segmentation. IEEE Trans Med Imaging 38(10):2281\u20132292","journal-title":"IEEE Trans Med Imaging"},{"key":"11533_CR44","doi-asserted-by":"publisher","first-page":"104038","DOI":"10.1016\/j.bspc.2022.104038","volume":"79","author":"H Song","year":"2023","unstructured":"Song H, Wang Y, Zeng S et al (2023) OAU-net: outlined attention U-net for biomedical image segmentation. Biomed Signal Process Control 79:104038","journal-title":"Biomed Signal Process Control"},{"key":"11533_CR45","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1016\/j.patrec.2021.01.036","volume":"145","author":"K Trebing","year":"2021","unstructured":"Trebing K, Sta\u01f9czyk T, Mehrkanoon S (2021) SmaAt-UNet: precipitation nowcasting using a small attention-UNet architecture. Pattern Recogn Lett 145:178\u2013186","journal-title":"Pattern Recogn Lett"},{"key":"11533_CR46","doi-asserted-by":"crossref","unstructured":"Lou A, Guan S, Loew M (2021) DC-UNet: rethinking the U-Net architecture with dual channel efficient CNN for medical image segmentation. In: Medical imaging 2021: image processing. SPIE, vol 11596, pp 758\u2013768","DOI":"10.1117\/12.2582338"},{"key":"11533_CR47","doi-asserted-by":"crossref","unstructured":"Huang L, Tan J, Liu J, et al (2020) Hand-transformer: non-autoregressive structured modeling for 3d hand pose estimation. In: European conference on computer vision, Springer, Cham, pp 17\u201333","DOI":"10.1007\/978-3-030-58595-2_2"},{"key":"11533_CR48","doi-asserted-by":"crossref","unstructured":"Huang L, Tan J, Meng J, et al (2020) Hot-net: non-autoregressive transformer for 3d hand-object pose estimation. In: Proceedings of the 28th ACM international conference on multimedia, pp 3136\u20133145","DOI":"10.1145\/3394171.3413775"},{"key":"11533_CR49","doi-asserted-by":"crossref","unstructured":"Lin K, Wang L, Liu Z (2021) End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1954\u20131963","DOI":"10.1109\/CVPR46437.2021.00199"},{"key":"11533_CR50","doi-asserted-by":"crossref","unstructured":"Dai Z, Cai B, Lin Y, et al (2021) Up-detr: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1601\u20131610","DOI":"10.1109\/CVPR46437.2021.00165"},{"key":"11533_CR51","unstructured":"Zhu X, Su W, Lu L, et al (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159"},{"key":"11533_CR52","unstructured":"Radford A, Kim JW, Hallacy C, et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp 8748\u20138763"},{"key":"11533_CR53","unstructured":"Devlin J, Chang MW, Lee K, et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805"},{"key":"11533_CR54","doi-asserted-by":"crossref","unstructured":"He K, Chen X, Xie S, et al (2022) Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 16000\u201316009","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"11533_CR55","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 10012\u201310022","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"11533_CR56","doi-asserted-by":"publisher","first-page":"847","DOI":"10.1109\/JSTARS.2020.2971763","volume":"13","author":"Z Li","year":"2020","unstructured":"Li Z, Chen G, Zhang T (2020) A CNN-transformer hybrid approach for crop classification using multitemporal multisensor images. IEEE J Selected Topics Appl Earth Obs Remote Sens 13:847\u2013858","journal-title":"IEEE J Selected Topics Appl Earth Obs Remote Sens"},{"issue":"4","key":"11533_CR57","doi-asserted-by":"publisher","first-page":"984","DOI":"10.3390\/rs14040984","volume":"14","author":"Q Li","year":"2022","unstructured":"Li Q, Chen Y, Zeng Y (2022) Transformer with transfer CNN for remote-sensing-image object detection. Remote Sens 14(4):984","journal-title":"Remote Sens"},{"key":"11533_CR58","unstructured":"Liu Y, Sun G, Qiu Y, et al (2021) Transformer in convolutional neural networks. arXiv preprint arXiv:2106.03180"},{"key":"11533_CR59","doi-asserted-by":"crossref","unstructured":"Azad R, Heidari M, Shariatnia M, et al (2022) TransDeepLab: convolution-free transformer-based DeepLab v3+ for medical image segmentation. arXiv preprint arXiv:2208.00713","DOI":"10.1007\/978-3-031-16919-9_9"},{"key":"11533_CR60","doi-asserted-by":"crossref","unstructured":"Kim D, Xie J, Wang H, et al (2022) TubeFormer-DeepLab: video mask transformer. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13914\u201313924","DOI":"10.1109\/CVPR52688.2022.01354"},{"key":"11533_CR61","doi-asserted-by":"crossref","unstructured":"Sanderson E, Matuszewski BJ (2022) FCN-transformer feature fusion for polyp segmentation. In: Annual conference on medical image understanding and analysis, Springer, Cham, pp 892\u2013907","DOI":"10.1007\/978-3-031-12053-4_65"},{"key":"11533_CR62","doi-asserted-by":"publisher","first-page":"102357","DOI":"10.1016\/j.media.2022.102357","volume":"77","author":"X He","year":"2022","unstructured":"He X, Tan EL, Bi H et al (2022) Fully transformer network for skin lesion analysis. Med Image Anal 77:102357","journal-title":"Med Image Anal"},{"key":"11533_CR63","doi-asserted-by":"crossref","unstructured":"Xie Y, Zhang J, Shen C, et al (2021) Cotr: efficiently bridging cnn and transformer for 3d medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, Cham, pp 171\u2013180","DOI":"10.1007\/978-3-030-87199-4_16"},{"key":"11533_CR64","doi-asserted-by":"crossref","unstructured":"Wang H, Zhu Y, Adam H, et al (2021) Max-deeplab: end-to-end panoptic segmentation with mask transformers. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5463\u20135474","DOI":"10.1109\/CVPR46437.2021.00542"},{"key":"11533_CR65","doi-asserted-by":"crossref","unstructured":"Yu Q, Wang H, Kim D, et al (2022) CMT-DeepLab: clustering mask transformers for panoptic segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2560\u20132570","DOI":"10.1109\/CVPR52688.2022.00259"},{"key":"11533_CR66","doi-asserted-by":"crossref","unstructured":"Hatamizadeh A, Tang Y, Nath V, et al (2022) Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE\/CVF winter conference on applications of computer vision, pp 574\u2013584","DOI":"10.1109\/WACV51458.2022.00181"},{"key":"11533_CR67","doi-asserted-by":"crossref","unstructured":"Fan CM, Liu TJ, Liu KH (2022) SUNet: swin transformer unet for image denoising. arXiv preprint arXiv:2202.14009","DOI":"10.1109\/ISCAS48785.2022.9937486"},{"key":"11533_CR68","doi-asserted-by":"crossref","unstructured":"Wang H, Xie S, Lin L, et al (2022) Mixed transformer u-net for medical image segmentation. In: ICASSP 2022\u20132022 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2390\u20132394","DOI":"10.1109\/ICASSP43922.2022.9746172"},{"key":"11533_CR69","doi-asserted-by":"crossref","unstructured":"Gao Y, Zhou M, Metaxas DN (2021) UTNet: a hybrid transformer architecture for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, Cham, pp 61\u201371","DOI":"10.1007\/978-3-030-87199-4_6"},{"key":"11533_CR70","doi-asserted-by":"crossref","unstructured":"Valanarasu JMJ, Oza P, Hacihaliloglu I, et al (2021) Medical transformer: gated axial-attention for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, Cham, pp 36\u201346","DOI":"10.1007\/978-3-030-87193-2_4"},{"key":"11533_CR71","unstructured":"Chen J, Lu Y, Yu Q, et al (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306"},{"key":"11533_CR72","doi-asserted-by":"crossref","unstructured":"Xu G, Wu X, Zhang X, et al (2021) Levit-unet: make faster encoders with transformer for medical image segmentation. arXiv preprint arXiv:2107.08623","DOI":"10.2139\/ssrn.4116174"},{"key":"11533_CR73","doi-asserted-by":"crossref","unstructured":"Petit O, Thome N, Rambour C, et al (2021) U-net transformer: self and cross attention for medical image segmentation. In: International workshop on machine learning in medical imaging, Springer, Cham, pp 267\u2013276","DOI":"10.1007\/978-3-030-87589-3_28"},{"key":"11533_CR74","doi-asserted-by":"publisher","first-page":"107914","DOI":"10.1016\/j.cmpb.2023.107914","volume":"243","author":"Y Wang","year":"2023","unstructured":"Wang Y, Yu X, Yang Y et al (2023) A multi-branched semantic segmentation network based on twisted information sharing pattern for medical images. Comput Methods Programs Biomed 243:107914","journal-title":"Comput Methods Programs Biomed"},{"key":"11533_CR75","doi-asserted-by":"publisher","first-page":"103856","DOI":"10.1016\/j.jvcir.2023.103856","volume":"95","author":"Y Wang","year":"2023","unstructured":"Wang Y, Yu X, Guo X et al (2023) A dual-decoding branch U-shaped semantic segmentation network combining transformer attention with decoder: DBUNet. J Visual Commun Image Represent 95:103856","journal-title":"J Visual Commun Image Represent"},{"issue":"9","key":"11533_CR76","doi-asserted-by":"publisher","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","volume":"37","author":"K He","year":"2015","unstructured":"He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904\u20131916","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11533_CR77","doi-asserted-by":"crossref","unstructured":"Lee HJ, Kim HE, Nam H (2019) Srm: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 1854\u20131862","DOI":"10.1109\/ICCV.2019.00194"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11533-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11533-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11533-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T20:30:42Z","timestamp":1715891442000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11533-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,4]]},"references-count":77,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["11533"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11533-z","relation":{},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,4]]},"assertion":[{"value":"8 January 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 March 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"No potential conflict of interest was reported by the authors.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"83"}}