{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T14:27:23Z","timestamp":1766068043021,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2024,10,14]],"date-time":"2024-10-14T00:00:00Z","timestamp":1728864000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Bingtuan Science and Technology Program","award":["2022DB005"],"award-info":[{"award-number":["2022DB005"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Interactive segmentation methods utilize user-provided positive and negative clicks to guide the model in accurately segmenting target objects. Compared to fully automatic medical image segmentation, these methods can achieve higher segmentation accuracy with limited image data, demonstrating significant potential in clinical applications. Typically, for each new click provided by the user, conventional interactive segmentation methods reprocess the entire network by re-inputting the click into the segmentation model, which greatly increases the user\u2019s interaction burden and deviates from the intended goal of interactive segmentation tasks. To address this issue, we propose an efficient segmentation network, ESM-Net, for interactive medical image segmentation. It obtains high-quality segmentation masks based on the user\u2019s initial clicks, reducing the complexity of subsequent refinement steps. Recent studies have demonstrated the strong performance of the Mamba model in various vision tasks; however, its application in interactive segmentation remains unexplored. In our study, we incorporate the Mamba module into our framework for the first time and enhance its spatial representation capabilities by developing a Spatial Augmented Convolution (SAC) module. These components are combined as the fundamental building blocks of our network. Furthermore, we designed a novel and efficient segmentation head to fuse multi-scale features extracted from the encoder, optimizing the generation of the predicted segmentation masks. Through comprehensive experiments, our method achieved state-of-the-art performance on three medical image datasets. Specifically, we achieved 1.43 NoC@90 on the Kvasir-SEG dataset, 1.57 NoC@90 on the CVC-ClinicDB polyp segmentation dataset, and 1.03 NoC@90 on the ADAM retinal disk segmentation dataset. The assessments on these three medical image datasets highlight the effectiveness of our approach in interactive medical image segmentation.<\/jats:p>","DOI":"10.3390\/info15100633","type":"journal-article","created":{"date-parts":[[2024,10,21]],"date-time":"2024-10-21T07:28:35Z","timestamp":1729495715000},"page":"633","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Interactive Segmentation for Medical Images Using Spatial Modeling Mamba"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-3930-7279","authenticated-orcid":false,"given":"Yuxin","family":"Tang","sequence":"first","affiliation":[{"name":"School of Electronic and Electrical Engineering, Wuhan Textile University, Wuhan 430077, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Li","sequence":"additional","affiliation":[{"name":"School of Electronic and Electrical Engineering, Wuhan Textile University, Wuhan 430077, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3641-2686","authenticated-orcid":false,"given":"Hua","family":"Zou","sequence":"additional","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuedong","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Tarim University, Alaer 843300, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,10,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1243","DOI":"10.1049\/ipr2.12419","article-title":"Medical image segmentation using deep learning: A survey","volume":"16","author":"Wang","year":"2022","journal-title":"IET Image Process."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1146\/annurev-bioeng-071516-044442","article-title":"Deep Learning in Medical Image Analysis","volume":"19","author":"Shen","year":"2017","journal-title":"Annu. Rev. Biomed. Eng."},{"key":"ref_3","unstructured":"Qiu, P., Yang, J., Kumar, S., Ghosh, S.S., and Sotiras, A. (2024). AgileFormer: Spatially Agile Transformer UNet for Medical Image Segmentation. arXiv."},{"key":"ref_4","unstructured":"Wang, H., Cao, P., Wang, J., and Zaiane, O.R. (March, January 22). Uctransnet: Rethinking the skip connections in u-net from a channel-wise perspective with transformer. Proceedings of the AAAI Conference on Artificial Intelligence, online."},{"key":"ref_5","unstructured":"Fitzgerald, K., and Matuszewski, B. (2023). FCB-SwinV2 transformer for polyp segmentation. arXiv."},{"key":"ref_6","unstructured":"Jha, D., Tomar, N.K., Sharma, V., and Bagci, U. (2023, January 10\u201312). TransNetR: Transformer-based residual network for polyp segmentation with multi-center out-of-distribution testing. Proceedings of the Medical Imaging with Deep Learning, Nashville, TN, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Xu, N., Price, B., Cohen, S., Yang, J., and Huang, T. (2016, January 27\u201330). Deep Interactive Object Selection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.47"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wu, J., Zhao, Y., Zhu, J.-Y., Luo, S., and Tu, Z. (2014, January 23\u201328). MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.40"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lempitsky, V., Kohli, P., Rother, C., and Sharp, T. (October, January 27). Image segmentation with a bounding box prior. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (ICCV), Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459262"},{"key":"ref_10","first-page":"309","article-title":"\u201cGrabCut\u201d: Interactive foreground extraction using iterated graph cuts","volume":"23","author":"Rother","year":"2004","journal-title":"ACM J."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Bai, J., and Wu, X. (2014, January 23\u201328). Error-Tolerant Scribbles Based Interactive Image Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.57"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1768","DOI":"10.1109\/TPAMI.2006.233","article-title":"Random Walks for Image Segmentation","volume":"28","author":"Grady","year":"2006","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_13","first-page":"303","article-title":"Lazy snapping","volume":"23","author":"Li","year":"2004","journal-title":"ACM J."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Jang, W.-D., and Kim, C.-S. (2019, January 15\u201320). Interactive Image Segmentation via Backpropagating Refinement Scheme. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00544"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lin, Z., Zhang, Z., Chen, L.-Z., Cheng, M.-M., and Lu, S.-P. (2020, January 13\u201319). Interactive Image Segmentation With First Click Attention. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01335"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sofiiuk, K., Petrov, I., Barinova, O., and Konushin, A. (2020, January 13\u201319). f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00865"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sofiiuk, K., Petrov, I.A., and Konushin, A. (2022, January 16\u201319). Reviving Iterative Training with Mask Guidance for Interactive Segmentation. Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France.","DOI":"10.1109\/ICIP46576.2022.9897365"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lin, Z., Duan, Z.-P., Zhang, Z., Guo, C.-L., and Cheng, M.-M. (2022, January 18\u201324). Focuscut: Diving into a focus view in interactive segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00266"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, Q. (2021). iSegFormer: Interactive Segmentation via Transformers with Application to 3D Knee MR Images. arXiv.","DOI":"10.1007\/978-3-031-16443-9_45"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, Q., Xu, Z., Bertasius, G., and Niethammer, M. (2023, January 2\u20136). Simpleclick: Interactive image segmentation with simple vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.02037"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, X., Zhao, Z., Zhang, Y., Duan, M., Qi, D., and Zhao, H. (2022). FocalClick: Towards Practical Interactive Image Segmentation. arXiv.","DOI":"10.1109\/CVPR52688.2022.00136"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, Q., Zheng, M., Planche, B., Karanam, S., Chen, T., Niethammer, M., and Wu, Z. (2022). PseudoClick: Interactive Image Segmentation with Click Imitation. arXiv.","DOI":"10.1007\/978-3-031-20068-7_42"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Du, F., Yuan, J., Wang, Z., and Wang, F. (2023, January 17\u201324). Efficient mask correction for click-based interactive image segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.02181"},{"key":"ref_24","unstructured":"Gu, A., and Dao, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1115\/1.3662552","article-title":"A new approach to linear filtering and prediction problems","volume":"82","author":"Kalman","year":"1960","journal-title":"J. Basic Eng. March"},{"key":"ref_26","unstructured":"Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., and Wang, X. (2024). Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv."},{"key":"ref_27","unstructured":"Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., and Liu, Y.J.A. (2024). VMamba: Visual State Space Model. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/s41592-020-01008-z","article-title":"nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation","volume":"18","author":"Isensee","year":"2021","journal-title":"Nat. Methods"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., and Xu, D. (2021, January 27). Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. Proceedings of the International MICCAI Brainlesion Workshop, Online.","DOI":"10.1007\/978-3-031-08999-2_22"},{"key":"ref_30","unstructured":"Ma, J., Li, F., and Wang, B. (2024). U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Xing, Z., Ye, T., Yang, Y., Liu, G., and Zhu, L. (2024). Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. arXiv.","DOI":"10.1007\/978-3-031-72111-3_54"},{"key":"ref_32","unstructured":"Boykov, Y.Y., and Jolly, M.-P. (2001, January 7\u201314). Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Gulshan, V., Rother, C., Criminisi, A., Blake, A., and Zisserman, A. (2010, January 13\u201318). Geodesic star convexity for interactive image segmentation. Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540073"},{"key":"ref_34","unstructured":"Mahadevan, S., Voigtlaender, P., and Leibe, B. (2018). Iteratively Trained Interactive Segmentation. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Li, K., Vosselman, G., and Yang, M.Y. (2023, January 2\u20136). Interactive image segmentation with cross-modality vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCVW60793.2023.00084"},{"key":"ref_36","unstructured":"Zeng, H., Wang, W., Tao, X., Xiong, Z., Tai, Y.-W., and Pei, W. (November, January 29). Feature decoupling-recycling network for fast interactive segmentation. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada."},{"key":"ref_37","unstructured":"Xu, L., Li, S., Chen, Y., Chen, J., Huang, R., and Wu, F. (2024). ClickAttention: Click Region Similarity Guided Interactive Segmentation. arXiv."},{"key":"ref_38","unstructured":"Xu, L., Li, S., Chen, Y., and Luo, J. (2024). MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhou, M., Wang, H., Zhao, Q., Li, Y., Huang, Y., Meng, D., and Zheng, Y. (2023). Interactive Segmentation as Gaussian Process Classification. arXiv.","DOI":"10.1109\/CVPR52729.2023.01867"},{"key":"ref_40","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_41","unstructured":"Tan, M., and Le, Q. (2021). EfficientNetV2: Smaller Models and Faster Training. arXiv."},{"key":"ref_42","unstructured":"Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Solja\u010di\u0107, M., Hou, T.Y., and Tegmark, M. (2024). Kan: Kolmogorov-arnold networks. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Graves, A. (2012). Long short-term memory. Supervised Sequence Labelling with Recurrent Neural Networks, Springer.","DOI":"10.1007\/978-3-642-24797-2"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Sofiiuk, K., Barinova, O., and Konushin, A. (November, January 27). AdaptIS: Adaptive Instance Selection Network. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00745"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., De Lange, T., Johansen, D., and Johansen, H.D. (2020;, January 5\u20138). Kvasir-seg: A segmented polyp dataset. Proceedings of the MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, Republic of Korea.","DOI":"10.1007\/978-3-030-37734-2_37"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.compmedimag.2015.02.007","article-title":"WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians","volume":"43","author":"Bernal","year":"2015","journal-title":"Comput. Med. Imaging Graph."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"2828","DOI":"10.1109\/TMI.2022.3172773","article-title":"Adam challenge: Detecting age-related macular degeneration from fundus images","volume":"41","author":"Fang","year":"2022","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_48","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Chen, X., Zhao, Z., Yu, F., Zhang, Y., and Duan, M. (2021, January 11\u201317). Conditional diffusion for interactive segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00725"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Lin, J., Chen, J., Yang, K., Roitberg, A., Li, S., Li, Z., and Li, S. (2024). AdaptiveClick: Click-Aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation. arXiv.","DOI":"10.1109\/TNNLS.2024.3378295"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 15\u201320). Deep High-Resolution Representation Learning for Human Pose Estimation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_52","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/10\/633\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:12:46Z","timestamp":1760112766000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/10\/633"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,14]]},"references-count":52,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["info15100633"],"URL":"https:\/\/doi.org\/10.3390\/info15100633","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2024,10,14]]}}}