{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T06:30:07Z","timestamp":1770273007087,"version":"3.49.0"},"reference-count":55,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T00:00:00Z","timestamp":1759449600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009377","name":"Education Department of Hunan Province","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100009377","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Comput. Sci."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Vision Transformers (ViTs) show promise for image recognition but struggle with medical image segmentation due to a lack of inductive biases for local structures and an inability to adapt to diverse modalities like CT, endoscopy, and dermatology. Effectively combining multi-scale features from CNNs and ViTs remains a critical, unsolved challenge.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>We propose a Pyramid Feature Fusion Network (PFF-Net) that integrates hierarchical features from pre-trained CNN and Transformer backbones. Its dual-branch architecture includes: (1) a region-aware branch for global-to-local contextual understanding via pyramid fusion, and (2) a boundary-aware branch that employs orthogonal Sobel operators and low-level features to generate precise, semantic boundaries. These boundary predictions are iteratively fed back to enhance the region branch, creating a mutually reinforcing loop between segmenting anatomical regions and delineating their boundaries.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>PFF-Net achieved state-of-the-art performance across three clinical segmentation tasks. On polyp segmentation, PFF-Net attained a Dice score of 91.87%, surpassing the TransUNet baseline (86.96%) by 5.6% and reducing the HD95 metric from 22.25 to 11.68 (a 47.5% reduction). For spleen CT segmentation, it reached a Dice score of 95.33%, outperforming ESFPNet-S (94.92%) by 4.3% while reducing the HD95 from 6.99 to 3.35 (a 52.1% reduction). In skin lesion segmentation, our model achieved a Dice score of 90.29%, which represents a 7.3% improvement over the ESFPNet-S baseline (89.64%).<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>The results validate the effectiveness of our pyramid fusion strategy and dual-branch design in bridging the domain gap between natural and medical images. The framework demonstrates strong generalization on small-scale datasets, proving its robustness and potential for accurate segmentation across highly heterogeneous medical imaging modalities.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fcomp.2025.1677905","type":"journal-article","created":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T05:29:55Z","timestamp":1759469395000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Enhancing medical image segmentation via complementary CNN-transformer fusion and boundary perception"],"prefix":"10.3389","volume":"7","author":[{"given":"Xiaowei","family":"Liu","sequence":"first","affiliation":[]},{"given":"Juanxiu","family":"Tian","sequence":"additional","affiliation":[]},{"given":"Shangrong","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Wei","family":"Shen","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,10,3]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.10847","article-title":"Enhancing medical image segmentation with transception: A multi-scale feature fusion approach","author":"Azad","year":"2023","journal-title":"arXiv preprint arXiv:2301.10847"},{"key":"B2","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1016\/j.compmedimag.2015.02.007","article-title":"Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians","volume":"43","author":"Bernal","year":"2015","journal-title":"Comput. Med. Imaging Graph"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2105.05537","article-title":"Swin-unet: Unet-like pure transformer for medical image segmentation","author":"Cao","year":"2021","journal-title":"arXiv preprint arXiv:2105.05537"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1246803","DOI":"10.1117\/12.2647897","article-title":"\u201cEsfpnet: efficient deep learning architecture for real-time lesion segmentation in autofluorescence bronchoscopic video,\u201d","author":"Chang","year":"2023"},{"key":"B5","doi-asserted-by":"publisher","first-page":"103280","DOI":"10.1016\/j.media.2024.103280","article-title":"Transunet: rethinking the u-net architecture design for medical image segmentation through the lens of transformers","volume":"97","author":"Chen","year":"2024","journal-title":"Med. Image Anal"},{"key":"B6","doi-asserted-by":"publisher","first-page":"175027","DOI":"10.1088\/1361-6560\/acede8","article-title":"Cotrfuse: a novel framework by fusing cnn and transformer for medical image segmentation","volume":"68","author":"Chen","year":"2023","journal-title":"Phys. Med. Biol"},{"key":"B7","first-page":"11963","article-title":"\u201cScaling up your kernels to 31 \u00d7 31: Revisiting large kernel design in CNNS,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ding","year":"2022"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2108.06932","article-title":"Polyp-pvt: Polyp segmentation with pyramid vision transformers","author":"Dong","year":"2021","journal-title":"arXiv preprint arXiv:2108.06932"},{"key":"B9","first-page":"3146","article-title":"\u201cDual attention network for scene segmentation,\u201d","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Fu","year":"2019"},{"key":"B10","doi-asserted-by":"publisher","first-page":"112841","DOI":"10.1016\/j.asoc.2025.112841","article-title":"Collaborative transformer u-shaped network for medical image segmentation","volume":"173","author":"Gao","year":"2025","journal-title":"Appl. Soft Comput"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2203.00131","article-title":"A multi-scale transformer for medical image segmentation: architectures, model efficiency, and benchmarks","author":"Gao","year":"2022","journal-title":"CoRR,abs"},{"key":"B12","first-page":"574","article-title":"\u201cUnetr: Transformers for 3d medical image segmentation,\u201d","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Hatamizadeh","year":"2022"},{"key":"B13","first-page":"770","article-title":"\u201cDeep residual learning for image recognition,\u201d","author":"He","year":"2016"},{"key":"B14","first-page":"6202","article-title":"\u201cHiformer: Hierarchical multi-scale representations using transformers for medical image segmentation,\u201d","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Heidari","year":"2023"},{"key":"B15","doi-asserted-by":"publisher","first-page":"1484","DOI":"10.1109\/TMI.2022.3230943","article-title":"Missformer: an effective transformer for 2D medical image segmentation","volume":"42","author":"Huang","year":"2023","journal-title":"IEEE Trans. Med. Imaging"},{"key":"B16","doi-asserted-by":"publisher","first-page":"451","DOI":"10.1007\/978-3-030-37734-2_37","article-title":"\u201cKVASIR-SEG: a segmented polyp dataset,\u201d","author":"Jha","year":"2020"},{"key":"B17","doi-asserted-by":"publisher","first-page":"358","DOI":"10.1109\/4.996","article-title":"Design of an image edge detection filter using the sobel operator","volume":"23","author":"Kanopoulos","year":"1988","journal-title":"IEEE J. Solid-State Circuits"},{"key":"B18","doi-asserted-by":"publisher","first-page":"110281","DOI":"10.1016\/j.mri.2024.110281","article-title":"A lightweight adaptive spatial channel attention efficient net B3 based generative adversarial network approach for mr image reconstruction from under sampled data","volume":"117","author":"Kumar","year":"2025","journal-title":"Magn. Reson. Imaging"},{"key":"B19","doi-asserted-by":"publisher","first-page":"110099","DOI":"10.1016\/j.compeleceng.2025.110099","article-title":"Advancements in medical image segmentation: a review of transformer models","volume":"123","author":"Kumar","year":"2025","journal-title":"Comput. Electr. Eng"},{"key":"B20","doi-asserted-by":"publisher","first-page":"103370","DOI":"10.1016\/j.media.2024.103370","article-title":"Medlsam: localize and segment anything model for 3D CT images","volume":"99","author":"Lei","year":"2025","journal-title":"Med. Image Anal"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TIM.2022.3178991","article-title":"Ds-transunet: dual swin transformer u-net for medical image segmentation","volume":"71","author":"Lin","year":"2022","journal-title":"IEEE Trans. Instrum. Meas"},{"key":"B22","doi-asserted-by":"publisher","first-page":"105331","DOI":"10.1016\/j.bspc.2023.105331","article-title":"Hybrid cnn-transformer model for medical image segmentation with pyramid convolution and multi-layer perceptron","volume":"86","author":"Liu","year":"2023","journal-title":"Biomed. Signal Process. Control"},{"key":"B23","doi-asserted-by":"publisher","first-page":"103165","DOI":"10.1016\/j.bspc.2021.103165","article-title":"Region-to-boundary deep learning model with multi-scale feature fusion for medical image segmentation","volume":"71","author":"Liu","year":"2022","journal-title":"Biomed. Signal Process. Control"},{"key":"B24","first-page":"10012","article-title":"\u201cSwin transformer: hierarchical vision transformer using shifted windows,\u201d","volume-title":"IEEE\/CVF International Conference on Computer Vision","author":"Liu","year":"2021"},{"key":"B25","first-page":"128","article-title":"\u201cOverlock: an overview-first-look-closely-next convnet with context-mixing dynamic kernels,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Lou","year":"2025"},{"key":"B26","doi-asserted-by":"publisher","first-page":"e210315","DOI":"10.1148\/ryai.210315","article-title":"Radimagenet: An open radiologic deep learning research dataset for effective transfer learning","volume":"4","author":"Mei","year":"2022","journal-title":"Radiol. Artif. Intell"},{"key":"B27","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1804.03999","article-title":"Attention u-net: Learning where to look for the pancreas","author":"Oktay","year":"2018","journal-title":"arXiv preprint arXiv:1804.03999"},{"key":"B28","doi-asserted-by":"publisher","first-page":"616","DOI":"10.1093\/jcde\/qwac018","article-title":"Swine-net: hybrid deep learning approach to novel polyp segmentation using convolutional neural network and swin transformer","volume":"9","author":"Park","year":"2022","journal-title":"J. Comput. Des. Eng"},{"key":"B29","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1007\/s10462-025-11173-2","article-title":"Polyp segmentation in medical imaging: challenges, approaches and future directions","volume":"58","author":"Qayoom","year":"2025","journal-title":"Artif. Intell. Rev"},{"key":"B30","doi-asserted-by":"publisher","first-page":"6222","DOI":"10.1109\/WACV56688.2023.00616","article-title":"\u201cMedical image segmentation via cascaded attention decoding,\u201d","author":"Rahman","year":"2023"},{"key":"B31","doi-asserted-by":"publisher","first-page":"1607","DOI":"10.1007\/s12530-024-09581-w","article-title":"Self-supervised learning for medical image analysis: a comprehensive review","volume":"15","author":"Rani","year":"2024","journal-title":"Evol. Syst"},{"key":"B32","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1007\/978-3-319-24574-4_28","article-title":"\u201cU-net: convolutional networks for biomedical image segmentation,\u201d","author":"Ronneberger","year":"2015"},{"key":"B33","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1007\/s42979-025-03929-y","article-title":"Transformer-based innovations in medical image segmentation: a mini review","volume":"6","author":"Shah","year":"2025","journal-title":"SN Comput. Sci"},{"key":"B34","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1007\/s11548-013-0926-3","article-title":"Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer","volume":"9","author":"Silva","year":"2014","journal-title":"Int. J. Comput. Assist. Radiol. Surg"},{"key":"B35","doi-asserted-by":"publisher","first-page":"7262","DOI":"10.1109\/ICCV48922.2021.00717","article-title":"\u201cSegmenter: transformer for semantic segmentation,\u201d","author":"Strudel","year":"2021"},{"key":"B36","doi-asserted-by":"publisher","first-page":"630","DOI":"10.1109\/TMI.2015.2487997","article-title":"Automated polyp detection in colonoscopy videos using shape and context information","volume":"35","author":"Tajbakhsh","year":"2015","journal-title":"IEEE Trans. Med. Imaging"},{"key":"B37","doi-asserted-by":"publisher","first-page":"6000","DOI":"10.5555\/3295222.3295349","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B38","doi-asserted-by":"publisher","first-page":"4037190","DOI":"10.1155\/2017\/4037190","article-title":"A benchmark for endoluminal scene segmentation of colonoscopy images","volume":"2017","author":"V\u00e1zquez","year":"2017","journal-title":"J. Healthc. Eng"},{"key":"B39","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1007\/978-3-030-87193-2_11","article-title":"\u201cTransbts: multimodal brain tumor segmentation using transformer,\u201d","author":"Wang","year":""},{"key":"B40","doi-asserted-by":"publisher","first-page":"568","DOI":"10.1109\/ICCV48922.2021.00061","article-title":"\u201cPyramid vision transformer: A versatile backbone for dense prediction without convolutions,\u201d","author":"Wang","year":""},{"key":"B41","doi-asserted-by":"publisher","first-page":"128740","DOI":"10.1016\/j.neucom.2024.128740","article-title":"A comprehensive review of deep learning for medical image segmentation","volume":"613","author":"Xia","year":"2025","journal-title":"Neurocomputing"},{"key":"B42","doi-asserted-by":"publisher","first-page":"12077","DOI":"10.5555\/3540261.3541185","article-title":"Segformer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B43","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/978-3-030-87199-4_16","article-title":"\u201cCOTR: Efficiently bridging cnn and transformer for 3d medical image segmentation,\u201d","author":"Xie","year":""},{"key":"B44","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2025.3587430","article-title":"Visual foundation models boost cross-modal unsupervised domain adaptation for 3D semantic segmentation","author":"Xu","year":"2025","journal-title":"IEEE Trans. Intell. Transp. Syst"},{"key":"B45","doi-asserted-by":"publisher","first-page":"103019","DOI":"10.1016\/j.inffus.2025.103019","article-title":"Multi-scale convolutional attention frequency-enhanced transformer network for medical image segmentation","volume":"119","author":"Yan","year":"2025","journal-title":"Inf. Fusion"},{"key":"B46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10278-024-00981-7","article-title":"From CNN to transformer: a review of medical image segmentation models","volume":"37","author":"Yao","year":"2024","journal-title":"J. Imaging Inform. Med"},{"key":"B47","doi-asserted-by":"publisher","first-page":"109228","DOI":"10.1016\/j.patcog.2022.109228","article-title":"An effective CNN and transformer complementary network for medical image segmentation","volume":"136","author":"Yuan","year":"2023","journal-title":"Pattern Recognit"},{"key":"B48","doi-asserted-by":"publisher","first-page":"100721","DOI":"10.1016\/j.cosrev.2024.100721","article-title":"Advances in attention mechanisms for medical image segmentation","volume":"56","author":"Zhang","year":"2025","journal-title":"Comput. Sci. Rev"},{"key":"B49","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s11263-022-01739-w","article-title":"Vitae V2: Vision transformer advanced by exploring inductive bias for image recognition and beyond","volume":"131","author":"Zhang","year":"2023","journal-title":"Int. J. Comput. Vis"},{"key":"B50","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1007\/978-3-030-87193-2_2","article-title":"\u201cTransfuse: fusing transformers and CNNS for medical image segmentation,\u201d","author":"Zhang","year":"2021"},{"key":"B51","doi-asserted-by":"publisher","first-page":"6881","DOI":"10.1109\/CVPR46437.2021.00681","article-title":"\u201cRethinking semantic segmentation from a sequence-to-sequence perspective with transformers,\u201d","author":"Zheng","year":"2021"},{"key":"B52","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2025.3557259","article-title":"Reinforcement learning-based edge server placement in the intelligent internet of vehicles environment","author":"Zhou","year":"2025","journal-title":"IEEE Trans. Intell. Transp. Syst"},{"key":"B53","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-030-00889-5_1","article-title":"\u201cU-net++: a nested u-net architecture for medical image segmentation,\u201d","author":"Zhou","year":"2018"},{"key":"B54","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1109\/TGCN.2021.3121961","article-title":"ECMS: an edge intelligent energy efficient model in mobile edge computing","volume":"6","author":"Zhou","year":"","journal-title":"IEEE Trans. Green Commun. Netw"},{"key":"B55","doi-asserted-by":"publisher","first-page":"8967","DOI":"10.1109\/TII.2022.3165085","article-title":"IECL: an intelligent energy consumption model for cloud manufacturing","volume":"18","author":"Zhou","year":"","journal-title":"IEEE Trans. Ind. Informat"}],"container-title":["Frontiers in Computer Science"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2025.1677905\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T05:29:58Z","timestamp":1759469398000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2025.1677905\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,3]]},"references-count":55,"alternative-id":["10.3389\/fcomp.2025.1677905"],"URL":"https:\/\/doi.org\/10.3389\/fcomp.2025.1677905","relation":{},"ISSN":["2624-9898"],"issn-type":[{"value":"2624-9898","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,3]]},"article-number":"1677905"}}