{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T13:56:57Z","timestamp":1772632617496,"version":"3.50.1"},"reference-count":41,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T00:00:00Z","timestamp":1772582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:sec>\n                    <jats:title>Introduction<\/jats:title>\n                    <jats:p>Medical image segmentation is fundamental to quantitative disease analysis and therapeutic decision-making. However, constrained by limited computational resources, existing deep learning methods often struggle to simultaneously model long-range dependencies and preserve boundary precision, particularly when delineating structures with complex morphology or blurred edges.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Method<\/jats:title>\n                    <jats:p>\n                      To overcome these challenges, we propose\n                      <jats:inline-formula>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"m1\">\n                          <mml:mrow>\n                            <mml:msup>\n                              <mml:mrow>\n                                <mml:mtext>ZR<\/mml:mtext>\n                              <\/mml:mrow>\n                              <mml:mrow>\n                                <mml:mn>2<\/mml:mn>\n                              <\/mml:mrow>\n                            <\/mml:msup>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:inline-formula>\n                      ViM, a recursion-enhanced visual state space model designed for medical image segmentation.\n                      <jats:inline-formula>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"m2\">\n                          <mml:mrow>\n                            <mml:msup>\n                              <mml:mrow>\n                                <mml:mtext>ZR<\/mml:mtext>\n                              <\/mml:mrow>\n                              <mml:mrow>\n                                <mml:mn>2<\/mml:mn>\n                              <\/mml:mrow>\n                            <\/mml:msup>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:inline-formula>\n                      ViM augments the Vision Mamba framework with a Zigzag Recursive Reinforced (\n                      <jats:inline-formula>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"m3\">\n                          <mml:mrow>\n                            <mml:msup>\n                              <mml:mrow>\n                                <mml:mtext>ZR<\/mml:mtext>\n                              <\/mml:mrow>\n                              <mml:mrow>\n                                <mml:mn>2<\/mml:mn>\n                              <\/mml:mrow>\n                            <\/mml:msup>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:inline-formula>\n                      ) Block that incorporates Stacked State Redistribution (SSR) and a Nested Recursive Connection (NRC). The NRC employs dual inner and outer pathways to iteratively fuse local details with global context while preserving 2D spatial adjacency. Furthermore, a Cross-directional Zigzag WKV (CZ-WKV) module executes multi-step recursive updates along multiple zigzag trajectories, injecting spatial directional information via Quad-Directional Token Shift (Q-Shift) directional priors. Collectively, these mechanisms mitigate serialization-induced banding artifacts and enhance the representation of fine, elongated, and low-contrast structures, all while maintaining near-linear computational complexity.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>\n                      Comprehensive evaluations across four medical imaging domains\u2014spanning dermatoscopic images, breast ultrasound, colorectal polyps, and abdominal multi-organ CT\u2014on five public datasets demonstrate that\n                      <jats:inline-formula>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"m4\">\n                          <mml:mrow>\n                            <mml:msup>\n                              <mml:mrow>\n                                <mml:mtext>ZR<\/mml:mtext>\n                              <\/mml:mrow>\n                              <mml:mrow>\n                                <mml:mn>2<\/mml:mn>\n                              <\/mml:mrow>\n                            <\/mml:msup>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:inline-formula>\n                      ViM consistently outperforms representative convolutional, attention-based, and visual state space architectures in region consistency and boundary localization. Notably,\n                      <jats:inline-formula>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"m5\">\n                          <mml:mrow>\n                            <mml:msup>\n                              <mml:mrow>\n                                <mml:mtext>ZR<\/mml:mtext>\n                              <\/mml:mrow>\n                              <mml:mrow>\n                                <mml:mn>2<\/mml:mn>\n                              <\/mml:mrow>\n                            <\/mml:msup>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:inline-formula>\n                      ViM achieves a 2.15 mm reduction in the HD95 on the Synapse multi-organ CT dataset relative to the CC-ViM baseline, substantiating its superior capability for precise, clinically relevant boundary delineation.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>\n                      The\n                      <jats:inline-formula>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"m6\">\n                          <mml:mrow>\n                            <mml:msup>\n                              <mml:mrow>\n                                <mml:mtext>ZR<\/mml:mtext>\n                              <\/mml:mrow>\n                              <mml:mrow>\n                                <mml:mn>2<\/mml:mn>\n                              <\/mml:mrow>\n                            <\/mml:msup>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:inline-formula>\n                      ViM framework delivers accurate, boundary-preserving segmentation across diverse imaging modalities and anatomically complex structures, achieving these gains with near-linear computational complexity. These findings demonstrate that\n                      <jats:inline-formula>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"m7\">\n                          <mml:mrow>\n                            <mml:msup>\n                              <mml:mrow>\n                                <mml:mtext>ZR<\/mml:mtext>\n                              <\/mml:mrow>\n                              <mml:mrow>\n                                <mml:mn>2<\/mml:mn>\n                              <\/mml:mrow>\n                            <\/mml:msup>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:inline-formula>\n                      ViM offers a robust and efficient solution for medical image analysis, establishing a promising foundation for advanced clinical and research applications.\n                    <\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/fbinf.2026.1768786","type":"journal-article","created":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T07:02:38Z","timestamp":1772607758000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["ZR2ViM: a recursive vision Mamba model for boundary-preserving medical image segmentation"],"prefix":"10.3389","volume":"6","author":[{"given":"Caijian","family":"Hua","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Sichuan University of Science and Engineering","place":["Yibin, China"]}]},{"given":"Caorong","family":"Xiang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Sichuan University of Science and Engineering","place":["Yibin, China"]}]},{"given":"Liuying","family":"Li","sequence":"additional","affiliation":[{"name":"Traditional Chinese Medicine Department, Zigong First People\u2019s Hospital","place":["Zigong, China"]}]},{"given":"Xia","family":"Zhou","sequence":"additional","affiliation":[{"name":"Traditional Chinese Medicine Department, Zigong First People\u2019s Hospital","place":["Zigong, China"]}]}],"member":"1965","published-online":{"date-parts":[[2026,3,4]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"104863","DOI":"10.1016\/j.dib.2019.104863","article-title":"Dataset of breast ultrasound images","volume":"28","author":"Al-Dhabyani","year":"2020","journal-title":"Data Brief"},{"key":"B2","doi-asserted-by":"publisher","first-page":"1580502","DOI":"10.3389\/fbioe.2025.1580502","article-title":"Deep ensemble learning-driven fully automated multi-structure segmentation for precision craniomaxillofacial surgery","volume":"13","author":"Bao","year":"2025","journal-title":"Front. Bioeng. Biotechnol."},{"key":"B3","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1016\/j.compmedimag.2015.02.007","article-title":"Wm-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians","volume":"43","author":"Bernal","year":"2015","journal-title":"Comput. Medical Imaging Graphics"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1703.00523","article-title":"Isic 2017-skin lesion analysis towards melanoma detection","author":"Berseth","year":"2017"},{"key":"B5","first-page":"205","article-title":"Swin-unet: unet-like pure transformer for medical image segmentation","author":"Cao","year":"2022"},{"key":"B6","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-030-33128-3_1","article-title":"Deep learning in medical image analysis","volume":"1213","author":"Chan","year":"2020","journal-title":"Adv. Exp. Med. Biol."},{"key":"B7","doi-asserted-by":"publisher","first-page":"103280","DOI":"10.1016\/j.media.2024.103280","article-title":"Transunet: rethinking the u-net architecture design for medical image segmentation through the lens of transformers","volume":"97","author":"Chen","year":"2024","journal-title":"Med. Image Anal."},{"key":"B8","doi-asserted-by":"publisher","first-page":"3245","DOI":"10.1109\/TMI.2025.3561797","article-title":"Zig-rir: Zigzag rwkv-in-rwkv for efficient medical image segmentation","volume":"44","author":"Chen","year":"2025","journal-title":"IEEE Trans. Med. Imaging"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1902.03368","article-title":"Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (isic)","author":"Codella","year":"2019"},{"key":"B10","doi-asserted-by":"publisher","first-page":"e0277578","DOI":"10.1371\/journal.pone.0277578","article-title":"Tc-net: dual coding network of transformer and cnn for skin lesion segmentation","volume":"17","author":"Dong","year":"2022","journal-title":"Plos One"},{"key":"B11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/JBHI.2025.3572088","article-title":"AEmmamba: an efficient medical segmentation model with edge enhancement","author":"Dong","year":"2025","journal-title":"IEEE J. Biomed. Health Inf."},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2403.02308","article-title":"Vision-rwkv: efficient and scalable visual perception with rwkv-like architectures","author":"Duan","year":"2024"},{"key":"B13","doi-asserted-by":"publisher","first-page":"7446","DOI":"10.1109\/JBHI.2025.3564381","article-title":"Slicemamba with neural architecture search for medical image segmentation","volume":"29","author":"Fan","year":"2025","journal-title":"IEEE J. Biomed. Health Inf."},{"key":"B14","article-title":"Mamba: linear-time sequence modeling with selective state spaces","volume-title":"First conference on language modeling","author":"Gu","year":"2024"},{"key":"B15","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1109\/TPAMI.2022.3152247","article-title":"A survey on vision transformer","volume":"45","author":"Han","year":"2023","journal-title":"IEEE Transactions Pattern Analysis Machine Intelligence"},{"key":"B16","first-page":"1748","article-title":"Unetr: transformers for 3d medical image segmentation","author":"Hatamizadeh","year":"2022"},{"key":"B17","first-page":"770","article-title":"Deep residual learning for image recognition","author":"He","year":"2016"},{"key":"B18","doi-asserted-by":"publisher","first-page":"1484","DOI":"10.1109\/TMI.2022.3230943","article-title":"Missformer: an effective transformer for 2d medical image segmentation","volume":"42","author":"Huang","year":"2022","journal-title":"IEEE Transactions Medical Imaging"},{"key":"B19","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1038\/s41592-020-01008-z","article-title":"nnu-net: a self-configuring method for deep learning-based biomedical image segmentation","volume":"18","author":"Isensee","year":"2021","journal-title":"Nat. Methods"},{"key":"B20","first-page":"451","article-title":"Kvasir-seg: a segmented polyp dataset","author":"Jha","year":"2019"},{"key":"B21","first-page":"12","article-title":"Miccai multi-atlas labeling beyond the cranial vault\u2013workshop and challenge","author":"Landman","year":"2015"},{"key":"B22","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1016\/j.media.2017.07.005","article-title":"A survey on deep learning in medical image analysis","volume":"42","author":"Litjens","year":"2017","journal-title":"Med. Image Analysis"},{"key":"B23","first-page":"10012","article-title":"Swin transformer: hierarchical vision transformer using shifted windows","author":"Liu","year":"2021"},{"key":"B24","first-page":"615","article-title":"Swin-umamba: Mamba-based unet with imagenet-based pretraining","author":"Liu","year":"2024"},{"key":"B25","doi-asserted-by":"publisher","first-page":"103031","DOI":"10.5555\/3737916.3741189","article-title":"Vmamba: visual state space model","volume":"37","author":"Liu","year":"2024","journal-title":"Adv. Neural Information Processing Systems"},{"key":"B26","doi-asserted-by":"publisher","first-page":"e0325899","DOI":"10.1371\/journal.pone.0325899","article-title":"Sa-umamba: spatial attention convolutional neural networks for medical image segmentation","volume":"20","author":"Liu","year":"2025","journal-title":"PLoS One"},{"key":"B27","unstructured":"Attention u-net: learning where to look for the pancreas\n          \n          \n            \n              Oktay\n              O.\n            \n            \n              Schlemper\n              J.\n            \n            \n              Folgoc\n              L. L.\n            \n            \n              Lee\n              M.\n            \n            \n              Heinrich\n              M.\n            \n            \n              Misawa\n              K.\n            \n          \n          \n          2018"},{"key":"B28","first-page":"724","article-title":"A benchmark dataset and evaluation methodology for video object segmentation","author":"Perazzi","year":"2016"},{"key":"B29","first-page":"234","article-title":"U-net: convolutional networks for biomedical image segmentation","volume-title":"Int. Conf. Med. Image Computing Computer-Assisted Intervention","author":"Ronneberger","year":"2015"},{"key":"B30","volume-title":"Vm-unet: vision mamba unet for medical image segmentation","author":"Ruan","year":"2024"},{"key":"B31","doi-asserted-by":"publisher","first-page":"102802","DOI":"10.1016\/j.media.2023.102802","article-title":"Transformers in medical imaging: a survey","volume":"88","author":"Shamshad","year":"2023","journal-title":"Med. Image Analysis"},{"key":"B32","doi-asserted-by":"publisher","first-page":"5915","DOI":"10.1038\/s41467-021-26216-9","article-title":"Annotation-efficient deep learning for automatic medical image segmentation","volume":"12","author":"Wang","year":"2021","journal-title":"Nat. Communications"},{"key":"B33","doi-asserted-by":"publisher","first-page":"101298","DOI":"10.1016\/j.patter.2025.101298","article-title":"Ultralight vm-unet: parallel vision mamba significantly reduces parameters for skin lesion segmentation","volume":"6","author":"Wu","year":"2024","journal-title":"Patterns"},{"key":"B34","first-page":"327","article-title":"Weighted res-unet for high-quality retina vessel segmentation","author":"Xiao","year":"2018"},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2504.06205","article-title":"Her-seg: holistically efficient segmentation for high-resolution medical images","author":"Xu","year":"2025"},{"key":"B36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/JBHI.2025.3588555","article-title":"Restore-rwkv: efficient and effective medical image restoration with rwkv","volume":"30","author":"Yang","year":"2025","journal-title":"IEEE J. Biomed. Health Inf."},{"key":"B37","first-page":"14","article-title":"Transfuse: fusing transformers and cnns for medical image segmentation","author":"Zhang","year":"2021"},{"key":"B38","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-3-030-00889-5_1","article-title":"Unet++: a nested u-net architecture for medical image segmentation","volume-title":"Deep Learn. Med. Image Anal. Multimodal Learn. Clin. Decis. Support","author":"Zhou","year":"2018"},{"key":"B39","first-page":"425","article-title":"Rwkv-based encoder-decoder model for code completion","author":"Zhou","year":"2023"},{"key":"B40","doi-asserted-by":"publisher","first-page":"6486","DOI":"10.1109\/TPAMI.2024.3382294","article-title":"Towards understanding convergence and generalization of adamw","volume":"46","author":"Zhou","year":"2024","journal-title":"IEEE Transactions Pattern Analysis Machine Intelligence"},{"key":"B41","doi-asserted-by":"publisher","first-page":"2131","DOI":"10.1109\/TMI.2025.3525673","article-title":"Merging context clustering with visual state space models for medical image segmentation","volume":"44","author":"Zhu","year":"2025","journal-title":"IEEE Trans. Med. Imaging"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2026.1768786\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T07:02:45Z","timestamp":1772607765000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2026.1768786\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,4]]},"references-count":41,"alternative-id":["10.3389\/fbinf.2026.1768786"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2026.1768786","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,4]]},"article-number":"1768786"}}