{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T18:57:16Z","timestamp":1774378636228,"version":"3.50.1"},"reference-count":50,"publisher":"IOP Publishing","issue":"4","license":[{"start":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T00:00:00Z","timestamp":1761696000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T00:00:00Z","timestamp":1761696000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"NIH","award":["R01 CA240808"],"award-info":[{"award-number":["R01 CA240808"]}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,12,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Recent advances in medical image segmentation have been driven by deep learning; however, most existing methods remain limited by modality-specific designs and exhibit poor adaptability to dynamic medical imaging scenarios. The segment anything model 2 (SAM2) and its related variants, which introduce a streaming memory mechanism for real-time video segmentation, present new opportunities for prompt-based, generalizable solutions. Nevertheless, adapting these models to medical video scenarios typically requires large-scale datasets for retraining or transfer learning, leading to high computational costs and the risk of catastrophic forgetting. To address these challenges, we propose DD-SAM2, an efficient adaptation framework for SAM2 that incorporates a depthwise-dilated adapter to enhance multi-scale feature extraction with minimal parameter overhead. This design enables effective fine-tuning of SAM2 on medical videos with limited training data. Unlike existing adapter-based methods focused solely on static images, DD-SAM2 fully exploits SAM2\u2019s streaming memory for medical video objects tracking and segmentation. Comprehensive evaluations on TrackRad2025 (tumor segmentation) and EchoNet-Dynamic (left ventricle tracking) datasets demonstrate superior performance, achieving Dice scores of 0.93 \u00b1 0.04 and 0.97 \u00b1 0.01, respectively. To the best of our knowledge, this work provides an initial attempt at systematically exploring adapter-based fine-tuning strategies for SAM2 applied medical video segmentation and tracking. Code, datasets, and models will be made publicly available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/apple1986\/DD-SAM2\">https:\/\/github.com\/apple1986\/DD-SAM2<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1088\/2632-2153\/ae13d1","type":"journal-article","created":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T22:49:46Z","timestamp":1760568586000},"page":"045026","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Depthwise-dilated convolutional adapters for medical object tracking and segmentation using the segment anything model 2"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1431-7191","authenticated-orcid":true,"given":"Guoping","family":"Xu","sequence":"first","affiliation":[]},{"given":"Christopher","family":"Kabat","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8033-2755","authenticated-orcid":true,"given":"You","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2025,10,29]]},"reference":[{"key":"mlstae13d1bib1","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","type":"journal-article","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"mlstae13d1bib2","first-page":"234","type":"conference-proceedings","article-title":"U-net: convolutional networks for biomedical image segmentation","author":"Ronneberger","year":"2015"},{"key":"mlstae13d1bib3","doi-asserted-by":"publisher","first-page":"1856","DOI":"10.1109\/TMI.2019.2959609","type":"journal-article","article-title":"UNet++: redesigning skip connections to exploit multiscale features in image segmentation","volume":"39","author":"Zhou","year":"2020","journal-title":"IEEE Trans. Med. Imaging"},{"key":"mlstae13d1bib4","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2024.103280","type":"journal-article","article-title":"TransUNet: rethinking the U-Net architecture design for medical image segmentation through the lens of transformers","volume":"97","author":"Chen","year":"2024","journal-title":"Med. Image Anal."},{"key":"mlstae13d1bib5","first-page":"565","type":"conference-proceedings","article-title":"V-net: fully convolutional neural networks for volumetric medical image segmentation","author":"Milletari","year":"2016"},{"key":"mlstae13d1bib6","first-page":"272","type":"conference-proceedings","article-title":"Swin unetr: swin transformers for semantic segmentation of brain tumors in mri images","author":"Hatamizadeh","year":"2021"},{"key":"mlstae13d1bib7","doi-asserted-by":"publisher","first-page":"1095","DOI":"10.1109\/TMI.2022.3224067","type":"journal-article","article-title":"Causality-inspired single-source domain generalization for medical image segmentation","volume":"42","author":"Ouyang","year":"2022","journal-title":"IEEE Trans. Med. Imaging"},{"key":"mlstae13d1bib8","doi-asserted-by":"publisher","DOI":"10.1016\/j.radonc.2023.109970","type":"journal-article","article-title":"Real-time motion management in MRI-guided radiotherapy: current status and AI-enabled prospects","volume":"190","author":"Lombardo","year":"2024","journal-title":"Radiother. Oncol."},{"key":"mlstae13d1bib9","first-page":"4015","type":"conference-proceedings","article-title":"Segment anything","author":"Kirillov","year":"2023"},{"key":"mlstae13d1bib10","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1038\/s41467-024-44824-z","type":"journal-article","article-title":"Segment anything in medical images","volume":"15","author":"Ma","year":"2024","journal-title":"Nat. Commun."},{"key":"mlstae13d1bib11","article-title":"Sam-med3d: towards general-purpose segmentation models for volumetric medical images","author":"Wang","year":"2023","type":"preprint"},{"key":"mlstae13d1bib12","article-title":"Sam 2: segment anything in images and videos","author":"Ravi","year":"2024","type":"preprint"},{"key":"mlstae13d1bib13","article-title":"Medical sam 2: segment medical images as video via segment anything model 2","author":"Zhu","year":"2024","type":"preprint"},{"key":"mlstae13d1bib14","article-title":"MedSAM2: segment anything in 3D Medical Images and Videos","author":"Ma","year":"2025","type":"preprint"},{"key":"mlstae13d1bib15","article-title":"Neural transfer learning for natural language processing","author":"Ruder","year":"2019","type":"other"},{"key":"mlstae13d1bib16","doi-asserted-by":"publisher","first-page":"2337","DOI":"10.1007\/s11263-022-01653-1","type":"journal-article","article-title":"Learning to prompt for vision-language models","volume":"130","author":"Zhou","year":"2022","journal-title":"Int. J. Comput. Vis."},{"key":"mlstae13d1bib17","article-title":"Towards general purpose medical ai: continual learning medical foundation model","author":"Yi","year":"2023","type":"preprint"},{"key":"mlstae13d1bib18","first-page":"2790","type":"conference-proceedings","article-title":"Parameter-efficient transfer learning for NLP","author":"Houlsby","year":"2019"},{"key":"mlstae13d1bib19","first-page":"3","type":"journal-article","article-title":"Lora: low-rank adaptation of large language models","volume":"1","author":"Hu","year":"2022","journal-title":"Iclr"},{"key":"mlstae13d1bib20","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2025.103547","type":"journal-article","article-title":"Medical SAM adapter: adapting segment anything model for medical image segmentation","volume":"102","author":"Wu","year":"2025","journal-title":"Med. Image Anal."},{"key":"mlstae13d1bib21","first-page":"3367","type":"conference-proceedings","article-title":"Sam-adapter: adapting segment anything in underperformed scenes","author":"Chen","year":"2023"},{"key":"mlstae13d1bib22","article-title":"Sam-med2d","author":"Cheng","year":"2023","type":"preprint"},{"key":"mlstae13d1bib23","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2024.103310","type":"journal-article","article-title":"MA-SAM: modality-agnostic SAM adaptation for 3D medical image segmentation","volume":"98","author":"Chen","year":"2024","journal-title":"Med. Image Anal."},{"key":"mlstae13d1bib24","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2024.103324","type":"journal-article","article-title":"3DSAM-adapter: holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation","volume":"98","author":"Gong","year":"2024","journal-title":"Med. Image Anal."},{"key":"mlstae13d1bib25","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2023.103061","type":"journal-article","article-title":"Segment anything model for medical images?","volume":"92","author":"Huang","year":"2024","journal-title":"Med. Image Anal."},{"key":"mlstae13d1bib26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3391743","type":"journal-article","article-title":"Video object segmentation and tracking: a survey","volume":"11","author":"Yao","year":"2020","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"mlstae13d1bib27","first-page":"640","type":"conference-proceedings","article-title":"Xmem: long-term video object segmentation with an atkinson-shiffrin memory model","author":"Cheng","year":"2022"},{"key":"mlstae13d1bib28","article-title":"Track anything: segment anything meets videos","author":"Yang","year":"2023","type":"preprint"},{"key":"mlstae13d1bib29","article-title":"Biomedical sam 2: segment anything in biomedical images and videos","author":"Yan","year":"2024","type":"preprint"},{"key":"mlstae13d1bib30","article-title":"Surgical sam 2: real-time segment anything in surgical video by efficient frame pruning","author":"Liu","year":"2024","type":"preprint"},{"key":"mlstae13d1bib31","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1038\/s42256-023-00626-4","type":"journal-article","article-title":"Parameter-efficient fine-tuning of large-scale pre-trained language models","volume":"5","author":"Ding","year":"2023","journal-title":"Nat. Mach. Intell."},{"key":"mlstae13d1bib32","first-page":"12799","type":"conference-proceedings","article-title":"On the effectiveness of parameter-efficient fine-tuning","volume":"vol 37","author":"Fu","year":"2023"},{"key":"mlstae13d1bib33","article-title":"Sam2-unet: segment anything 2 makes strong encoder for natural and medical image segmentation","author":"Xiong","year":"2024","type":"preprint"},{"key":"mlstae13d1bib34","article-title":"Customized segment anything model for medical image segmentation","author":"Zhang","year":"2023","type":"preprint"},{"key":"mlstae13d1bib35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TMI.2025.3532084","type":"journal-article","article-title":"Stitching, Fine-tuning, Re-training: a SAM-enabled framework for semi-supervised 3D medical image segmentation","author":"Li","year":"2025","journal-title":"IEEE Trans. Med. Imaging"},{"key":"mlstae13d1bib36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/JBHI.2025.3540306","type":"journal-article","article-title":"MediViSTA: medical video segmentation via temporal fusion SAM adaptation for echocardiography","author":"Kim","year":"2025","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"mlstae13d1bib37","first-page":"542","author":"Shen","year":"2024","type":"conference-proceedings"},{"key":"mlstae13d1bib38","article-title":"Sam2-adapter: evaluating & adapting segment anything 2 in downstream tasks: camouflage, shadow, medical image segmentation, and more","author":"Chen","year":"2024","type":"preprint"},{"key":"mlstae13d1bib39","first-page":"29441","type":"book","article-title":"Hiera: a hierarchical vision transformer without the bells-and-whistles","author":"Ryali","year":"2023"},{"key":"mlstae13d1bib40","first-page":"16000","type":"conference-proceedings","article-title":"Masked autoencoders are scalable vision learners","author":"He","year":"2022"},{"key":"mlstae13d1bib41","article-title":"Gaussian error linear units (gelus)","author":"Hendrycks","year":"2016","type":"preprint"},{"key":"mlstae13d1bib42","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1038\/s41586-020-2145-8","type":"journal-article","article-title":"Video-based AI for beat-to-beat assessment of cardiac function","volume":"580","author":"Ouyang","year":"2020","journal-title":"Nature"},{"key":"mlstae13d1bib43","doi-asserted-by":"publisher","first-page":"2198","DOI":"10.1109\/TMI.2019.2900516","type":"journal-article","article-title":"Deep learning for segmentation using an open large-scale dataset in 2D echocardiography","volume":"38","author":"Leclerc","year":"2019","journal-title":"IEEE Trans. Med. Imaging"},{"key":"mlstae13d1bib44","doi-asserted-by":"publisher","DOI":"10.1002\/mp.17964","type":"journal-article","article-title":"TrackRAD2025 challenge dataset: real-time tumor tracking for MRI-guided radiotherapy","volume":"52","author":"Wang","year":"2025","journal-title":"Med. Phys."},{"key":"mlstae13d1bib45","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1016\/j.media.2007.06.004","type":"journal-article","article-title":"Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain","volume":"12","author":"Avants","year":"2008","journal-title":"Med. Image Anal."},{"key":"mlstae13d1bib46","doi-asserted-by":"publisher","first-page":"1788","DOI":"10.1109\/TMI.2019.2897538","type":"journal-article","article-title":"VoxelMorph: a learning framework for deformable medical image registration","volume":"38","author":"Balakrishnan","year":"2019","journal-title":"IEEE Trans. Med. Imaging"},{"key":"mlstae13d1bib47","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2022.102615","type":"journal-article","article-title":"TransMorph: transformer for unsupervised medical image registration","volume":"82","author":"Chen","year":"2022","journal-title":"Med. Image Anal."},{"key":"mlstae13d1bib48","article-title":"Samurai: adapting segment anything model for zero-shot visual tracking with motion-aware memory","author":"Yang","year":"2024","type":"preprint"},{"key":"mlstae13d1bib49","first-page":"3395","type":"conference-proceedings","article-title":"SAMWISE: infusing wisdom in SAM2 for text-driven video segmentation","author":"Cuttano","year":"2025"},{"key":"mlstae13d1bib50","doi-asserted-by":"publisher","first-page":"4513","DOI":"10.1002\/mp.17785","type":"journal-article","article-title":"A segment anything model-guided and match-based semi-supervised segmentation framework for medical imaging","volume":"52","author":"Xu","year":"2025","journal-title":"Med. Phys."}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T08:48:09Z","timestamp":1761727689000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae13d1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,29]]},"references-count":50,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,10,29]]},"published-print":{"date-parts":[[2025,12,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ae13d1","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,29]]},"assertion":[{"value":"Depthwise-dilated convolutional adapters for medical object tracking and segmentation using the segment anything model 2","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2025-07-21","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-10-15","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-10-29","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}