{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T21:56:12Z","timestamp":1780610172426,"version":"3.54.1"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,4,2]],"date-time":"2025-04-02T00:00:00Z","timestamp":1743552000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,2]],"date-time":"2025-04-02T00:00:00Z","timestamp":1743552000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["Grant 62272363"],"award-info":[{"award-number":["Grant 62272363"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Joint Laboratory for Innovation in Satellite-Borne Computers and Electronics Technology Open Fund 2023","award":["Grant 2024KFKT001-1"],"award-info":[{"award-number":["Grant 2024KFKT001-1"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Vis. Intell."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Infrared small target detection (IRSTD) plays a crucial role in applications such as traffic monitoring systems and maritime rescue. However, existing IRSTD methods face challenges due to their reliance on a single type of data, making them susceptible to noise and deficient in contextual understanding. Additionally, small and limited datasets hinder model generalization and performance in complex scenarios. Previous methods are mostly based on U-Net architectures that are optimized for small-scale data and involve intricate design. These designs often perform well in specific scenarios, but they struggle to generalize effectively in real-world applications. Inspired by leading vision-language models, we propose an MIRSAM (Multimodal Vision-Language Segment Anything Model for Infrared Small Target Detection), the first framework to integrate text modality with image modality for IRSTD in this article. Given the differences in noise and structural information between infrared and natural images, we fine-tune segment anything model (SAM) by designing a contourlet denoising adapter module (CDAM). Integrated into SAM\u2019s image encoder, this module suppresses noise during feature extraction and encoding, enabling efficient adaptation to the infrared domain. To incorporate textual information, we utilize the text encoder of contrastive language-image pre-training (CLIP) to convert text into high-dimensional feature vectors, which then serve as prompts to extract relevant details from the features. In addition, we build the first multimodal IRSTD dataset, IR-TXPair, containing image-text pairs. Experiments on the newly constructed IR-TXPair dataset demonstrate that the proposed MIRSAM outperforms state-of-the-art methods.<\/jats:p>","DOI":"10.1007\/s44267-025-00075-0","type":"journal-article","created":{"date-parts":[[2025,4,4]],"date-time":"2025-04-04T08:35:56Z","timestamp":1743755756000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["MIRSAM: multimodal vision-language segment anything model for infrared small target detection"],"prefix":"10.1007","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1473-9784","authenticated-orcid":false,"given":"Mingjin","family":"Zhang","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qian","family":"Xu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuchun","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6025-377X","authenticated-orcid":false,"given":"Xi","family":"Li","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haojuan","family":"Yuan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,4,2]]},"reference":[{"issue":"7","key":"75_CR1","doi-asserted-by":"publisher","first-page":"4204","DOI":"10.1109\/TGRS.2016.2538295","volume":"54","author":"H. Deng","year":"2016","unstructured":"Deng, H., Sun, X., Liu, M., Ye, C., & Zhou, X. (2016). Small infrared target detection based on weighted local difference measure. IEEE Transactions on Geoscience and Remote Sensing, 54(7), 4204\u20134214.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"75_CR2","first-page":"1","volume-title":"Proceedings of the 2010 international waterside security conference","author":"M. Teutsch","year":"2010","unstructured":"Teutsch, M., & Kr\u00fcger, W. (2010). Classification of small boats in infrared images for maritime surveillance. In Proceedings of the 2010 international waterside security conference (pp.\u00a01\u20137). Red Hook: Curran Associates."},{"key":"75_CR3","first-page":"15528","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"X. Ying","year":"2023","unstructured":"Ying, X., Liu, L., Wang, Y., Li, R., Chen, N., Lin, Z., Sheng, W., & Zhou, S. (2023). Mapping degeneration meets label evolution: learning infrared small target detection with single point supervision. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp.\u00a015528\u201315538). Piscataway: IEEE."},{"key":"75_CR4","first-page":"877","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"M. Zhang","year":"2022","unstructured":"Zhang, M., Zhang, R., Yang, Y., Bai, H., Zhang, J., & Guo, J. (2022). ISNet: shape matters for infrared small target detection. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp.\u00a0877\u2013886). Piscataway: IEEE."},{"key":"75_CR5","doi-asserted-by":"publisher","first-page":"109788","DOI":"10.1016\/j.patcog.2023.109788","volume":"143","author":"R. Kou","year":"2023","unstructured":"Kou, R., Wang, C., Peng, Z., Zhao, Z., Chen, Y., Han, J., Huang, F., Yu, Y., & Fu, Q. (2023). Infrared small target segmentation networks: a survey. Pattern Recognition, 143, 109788\u2013109803.","journal-title":"Pattern Recognition"},{"key":"75_CR6","first-page":"2","volume-title":"Proceedings of the signal and data processing of small targets","author":"V.T. Tom","year":"1993","unstructured":"Tom, V.T., Peli, T., Leung, M., & Bondaryk, J. E. (1993). Morphology-based algorithm for point target detection in infrared backgrounds. In Proceedings of the signal and data processing of small targets (pp.\u00a02\u201311). Bellingham: SPIE."},{"key":"75_CR7","first-page":"74","volume-title":"Proceedings of the signal and data processing of small targets","author":"S. D. Deshpande","year":"1999","unstructured":"Deshpande, S. D., Er, M. H., Venkateswarlu, R., & Chan, P. (1999). Max-mean and max-median filters for detection of small targets. In Proceedings of the signal and data processing of small targets (pp.\u00a074\u201383). Bellingham: SPIE."},{"key":"75_CR8","doi-asserted-by":"publisher","first-page":"2145","DOI":"10.1016\/j.patcog.2009.12.023","volume":"43","author":"X. Bai","year":"2010","unstructured":"Bai, X., & Zhou, F. (2010). Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recognition, 43, 2145\u20132156.","journal-title":"Pattern Recognition"},{"issue":"4","key":"75_CR9","doi-asserted-by":"publisher","first-page":"612","DOI":"10.1109\/LGRS.2018.2790909","volume":"15","author":"J. Han","year":"2018","unstructured":"Han, J., Liang, K., Zhou, B., Zhu, X., Zhao, J., & Zhao, L. (2018). Infrared small target detection utilizing the multiscale relative local contrast measure. IEEE Geoscience and Remote Sensing Letters, 15(4), 612\u2013616.","journal-title":"IEEE Geoscience and Remote Sensing Letters"},{"issue":"11","key":"75_CR10","doi-asserted-by":"publisher","DOI":"10.3390\/rs10111821","volume":"10","author":"L. Zhang","year":"2018","unstructured":"Zhang, L., Peng, L., Zhang, T., Cao, S., & Peng, Z. (2018). Infrared small target detection via non-convex rank approximation minimization joint $l_{2, 1}$ norm. Remote Sensing, 10(11), 1821.","journal-title":"Remote Sensing"},{"issue":"8","key":"75_CR11","doi-asserted-by":"publisher","first-page":"3752","DOI":"10.1109\/JSTARS.2017.2700023","volume":"10","author":"Y. Dai","year":"2017","unstructured":"Dai, Y., & Wu, Y. (2017). Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(8), 3752\u20133767.","journal-title":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing"},{"issue":"4","key":"75_CR12","doi-asserted-by":"publisher","first-page":"382","DOI":"10.3390\/rs11040382","volume":"11","author":"L. Zhang","year":"2019","unstructured":"Zhang, L., & Peng, Z. (2019). Infrared small target detection based on partial sum of the tensor nuclear norm. Remote Sensing, 11(4), 382\u2013416.","journal-title":"Remote Sensing"},{"issue":"10","key":"75_CR13","doi-asserted-by":"publisher","first-page":"3109","DOI":"10.1109\/TNNLS.2018.2890017","volume":"30","author":"M. Zhang","year":"2019","unstructured":"Zhang, M., Wang, N., Li, Y., & Gao, X. (2019). Deep latent low-rank representation for face sketch synthesis. IEEE Transactions on Neural Networks and Learning Systems, 30(10), 3109\u20133123.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"75_CR14","first-page":"8509","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"H. Wang","year":"2019","unstructured":"Wang, H., Zhou, L., & Wang, L. (2019). Miss detection vs. false alarm: adversarial learning for small object segmentation in infrared images. In Proceedings of the IEEE\/CVF international conference on computer vision (pp.\u00a08509\u20138518). Piscataway: IEEE."},{"issue":"5","key":"75_CR15","doi-asserted-by":"publisher","first-page":"4481","DOI":"10.1109\/TGRS.2020.3012981","volume":"59","author":"B. Zhao","year":"2021","unstructured":"Zhao, B., Wang, C., Fu, Q., & Han, Z. (2021). A novel pattern for infrared small target detection with generative adversarial network. IEEE Transactions on Geoscience and Remote Sensing, 59(5), 4481\u20134492.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"issue":"7","key":"75_CR16","doi-asserted-by":"publisher","first-page":"2623","DOI":"10.1109\/TNNLS.2019.2933590","volume":"31","author":"M. Zhang","year":"2020","unstructured":"Zhang, M., Wang, N., Li, Y., & Gao, X. (2020). Neural probabilistic graphical model for face sketch synthesis. IEEE Transactions on Neural Networks and Learning Systems, 31(7), 2623\u20132637.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"75_CR17","first-page":"950","volume-title":"Proceedings of the IEEE\/CVF winter conference on applications of computer vision","author":"Y. Dai","year":"2021","unstructured":"Dai, Y., Wu, Y., Zhou, F., & Barnard, K. (2021). Asymmetric contextual modulation for infrared small target detection. In Proceedings of the IEEE\/CVF winter conference on applications of computer vision (pp.\u00a0950\u2013959). Piscataway: IEEE."},{"issue":"11","key":"75_CR18","doi-asserted-by":"publisher","first-page":"9813","DOI":"10.1109\/TGRS.2020.3044958","volume":"59","author":"Y. Dai","year":"2021","unstructured":"Dai, Y., Wu, Y., Zhou, F., & Barnard, K. (2021). Attentional local contrast networks for infrared small target detection. IEEE Transactions on Geoscience and Remote Sensing, 59(11), 9813\u20139824.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"75_CR19","doi-asserted-by":"publisher","first-page":"364","DOI":"10.1109\/TIP.2022.3228497","volume":"32","author":"X. Wu","year":"2022","unstructured":"Wu, X., Hong, D., & Chanussot, J. (2022). UIU-Net: U-Net in U-Net for infrared small object detection. IEEE Transactions on Image Processing, 32, 364\u2013376.","journal-title":"IEEE Transactions on Image Processing"},{"key":"75_CR20","doi-asserted-by":"publisher","first-page":"1730","DOI":"10.1145\/3503161.3547817","volume-title":"Proceedings of the 30th ACM international conference on multimedia","author":"M. Zhang","year":"2022","unstructured":"Zhang, M., Bai, H., Zhang, J., Zhang, R., Wang, C., Guo, J., & Gao, X. (2022). Rkformer: Runge-Kutta transformer with random-connection attention for infrared small target detection. In Proceedings of the 30th ACM international conference on multimedia (pp.\u00a01730\u20131738). New York: ACM."},{"key":"75_CR21","first-page":"2607","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"B. Cheng","year":"2022","unstructured":"Cheng, B., Parkhi, O., & Kirillov, A. (2022). Pointly-supervised instance segmentation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp.\u00a02607\u20132616). Piscataway: IEEE."},{"key":"75_CR22","first-page":"214","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"Y. Li","year":"2021","unstructured":"Li, Y., Zhao, H., Qi, X., Wang, L., Li, Z., Sun, J., & Jia, J. (2021). Fully convolutional networks for panoptic segmentation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp.\u00a0214\u2013223). Piscataway: IEEE."},{"key":"75_CR23","unstructured":"Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et\u00a0al. Segment anything. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 3992\u20134003). Piscataway: IEEE."},{"key":"75_CR24","volume-title":"Proceedings of the 38th international conference on machine learning","author":"A. Radford","year":"2021","unstructured":"Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th international conference on machine learning. Retrieved March 12, 2025, from http:\/\/proceedings.mlr.press\/v139\/radford21a.html."},{"key":"75_CR25","unstructured":"OpenAI (2022, November 30). Introducing ChatGPT. Retrieved March 12, 2025, from https:\/\/openai.com\/blog\/chatgpt\/."},{"issue":"9","key":"75_CR26","doi-asserted-by":"publisher","first-page":"1670","DOI":"10.1109\/LGRS.2020.3004978","volume":"18","author":"J. Han","year":"2020","unstructured":"Han, J., Moradi, S., Faramarzi, I., Zhang, H., Zhao, Q., Zhang, X., & Li, N. (2020). Infrared small target detection based on the weighted strengthened local contrast measure. IEEE Geoscience and Remote Sensing Letters, 18(9), 1670\u20131674.","journal-title":"IEEE Geoscience and Remote Sensing Letters"},{"issue":"1","key":"75_CR27","doi-asserted-by":"publisher","first-page":"574","DOI":"10.1109\/TGRS.2013.2242477","volume":"52","author":"L. P. Chen","year":"2014","unstructured":"Chen, L. P., Li, H., Wei, Y., Xia, T., & Tang, Y. Y. (2014). A local contrast method for small infrared target detection. IEEE Transactions on Geoscience and Remote Sensing, 52(1), 574\u2013581.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"issue":"12","key":"75_CR28","doi-asserted-by":"publisher","first-page":"4996","DOI":"10.1109\/TIP.2013.2281420","volume":"22","author":"C. Gao","year":"2013","unstructured":"Gao, C., Meng, D., Yang, Y., Wang, Y., Zhou, X., & Hauptmann, A. G. (2013). Infrared patch-image model for small target detection in a single image. IEEE Transactions on Image Processing, 22(12), 4996\u20135009.","journal-title":"IEEE Transactions on Image Processing"},{"issue":"5","key":"75_CR29","doi-asserted-by":"publisher","first-page":"3737","DOI":"10.1109\/TGRS.2020.3022069","volume":"59","author":"Y. Sun","year":"2020","unstructured":"Sun, Y., Yang, J., & An, W. (2020). Infrared dim and small target detection via multiple subspace learning and spatial-temporal patch-tensor model. IEEE Transactions on Geoscience and Remote Sensing, 59(5), 3737\u20133752.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"75_CR30","first-page":"211","volume":"1","author":"M. Liu","year":"2017","unstructured":"Liu, M., Du, H., Zhao, Y., Dong, L., Hui, M., & Wang, S. X. (2017). Image small target detection based on deep learning with SNR controlled sample generation. Current Trends in Computer Science and Mechanical Automation, 1, 211\u2013220.","journal-title":"Current Trends in Computer Science and Mechanical Automation"},{"issue":"1","key":"75_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1049\/ipr2.12001","volume":"15","author":"J. Du","year":"2021","unstructured":"Du, J., Lu, H., Hu, M., Zhang, L., & Shen, X. (2021). CNN-based infrared dim small target detection algorithm using target-oriented shallow-deep features and effective small anchor. IET Image Processing, 15(1), 1\u201315.","journal-title":"IET Image Processing"},{"key":"75_CR32","unstructured":"Zhang, T., Cao, S., Pu, T., & Peng, Z. (2021). AGPCNet: attention-guided pyramid context networks for infrared small target detection. arXiv preprint. arXiv:2111.03580."},{"issue":"14","key":"75_CR33","doi-asserted-by":"publisher","first-page":"3412","DOI":"10.3390\/rs14143412","volume":"14","author":"Z. Zuo","year":"2022","unstructured":"Zuo, Z., Tong, X., Wei, J., Su, S., Wu, P., Guo, R., & Sun, B. (2022). AFFPN: attention fusion feature pyramid network for small infrared target detection. Remote Sensing, 14(14), 3412\u20133434.","journal-title":"Remote Sensing"},{"key":"75_CR34","first-page":"1","volume":"19","author":"T. Ma","year":"2022","unstructured":"Ma, T., Yang, Z., Wang, J., Sun, S., Ren, X., & Usman, A. (2022). Infrared small target detection network with generate label and feature mapping. IEEE Geoscience and Remote Sensing Letters, 19, 1\u20135.","journal-title":"IEEE Geoscience and Remote Sensing Letters"},{"key":"75_CR35","first-page":"1","volume":"19","author":"Y. Bai","year":"2022","unstructured":"Bai, Y., Li, R., Gou, S., Zhang, C., Chen, Y., & Zheng, Z. (2022). Cross-connected bidirectional pyramid network for infrared small-dim target detection. IEEE Geoscience and Remote Sensing Letters, 19, 1\u20135.","journal-title":"IEEE Geoscience and Remote Sensing Letters"},{"key":"75_CR36","first-page":"1","volume":"71","author":"H. Fang","year":"2022","unstructured":"Fang, H., Ding, L., Wang, L., Chang, Y., Yan, L., & Han, J. (2022). Infrared small UAV target detection based on depthwise separable residual dense network and multiscale feature fusion. IEEE Transactions on Instrumentation and Measurement, 71, 1\u201320.","journal-title":"IEEE Transactions on Instrumentation and Measurement"},{"key":"75_CR37","doi-asserted-by":"publisher","first-page":"1745","DOI":"10.1109\/TIP.2022.3199107","volume":"32","author":"B. Li","year":"2022","unstructured":"Li, B., Xiao, C., Wang, L., Wang, Y., Lin, Z., Li, M., An, W., & Guo, Y. (2022). Dense nested attention network for infrared small target detection. IEEE Transactions on Image Processing, 32, 1745\u20131758.","journal-title":"IEEE Transactions on Image Processing"},{"key":"75_CR38","first-page":"7352","volume-title":"Proceedings of the IEEE international geoscience and remote sensing symposium","author":"G. Li","year":"2023","unstructured":"Li, G., Ye, Z., Jia, H., & Wang, H. (2023). Multiscale interactive attention network for infrared small target detection. In Proceedings of the IEEE international geoscience and remote sensing symposium (pp.\u00a07352\u20137355). Piscataway: IEEE."},{"issue":"4","key":"75_CR39","doi-asserted-by":"publisher","DOI":"10.3390\/rs16040643","volume":"16","author":"X. Wang","year":"2024","unstructured":"Wang, X., Han, C., Li, J., Nie, T., Li, M., Wang, X., & Huang, L. (2024). Multiscale feature extraction U-Net for infrared dim- and small-target detection. Remote Sensing, 16(4), 643.","journal-title":"Remote Sensing"},{"key":"75_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TGRS.2024.3510781","volume":"62","author":"M. Zhang","year":"2024","unstructured":"Zhang, M., Yue, K., Li, B., Guo, J., Li, Y., & Gao, X. (2024). Single-frame infrared small target detection via Gaussian curvature inspired network. IEEE Transactions on Geoscience and Remote Sensing, 62, 1\u201313.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"75_CR41","first-page":"1","volume":"62","author":"T. Chen","year":"2024","unstructured":"Chen, T., Ye, Z., Tan, Z., Gong, T., Wu, Y., Chu, Q., Liu, B., Yu, N., & Ye, J. (2024). MIM-ISTD: Mamba-in-Mamba for efficient infrared small-target detection. IEEE Transactions on Geoscience and Remote Sensing, 62, 1\u201313.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"75_CR42","first-page":"1","volume":"62","author":"Y. Li","year":"2024","unstructured":"Li, Y., Li, Z., Guo, Z., Siddique, A., Liu, Y., & Yu, K. (2024). Infrared small target detection based on adaptive region growing algorithm with iterative threshold analysis. IEEE Transactions on Geoscience and Remote Sensing, 62, 1\u201315.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"75_CR43","first-page":"1009","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"B. Li","year":"2023","unstructured":"Li, B., Wang, Y., Wang, L., Zhang, F., Liu, T., Lin, Z., An, W., & Guo, Y. (2023). Monte Carlo linear clustering with single-point supervision is enough for infrared small target detection. In Proceedings of the IEEE\/CVF international conference on computer vision (pp.\u00a01009\u20131019). Piscataway: IEEE."},{"key":"75_CR44","first-page":"6000","volume-title":"Proceedings of the 31st international conference on neural information processing systems","author":"A. Vaswani","year":"2017","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, \u0141., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st international conference on neural information processing systems (pp.\u00a06000\u20136010). Red Hook: Curran Associates."},{"key":"75_CR45","volume-title":"International conference on learning representations","author":"A. Dosovitskiy","year":"2021","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2021). An image is worth 16x16 words: transformers for image recognition at scale. In International conference on learning representations. Retrieved March 12, 2025, from https:\/\/openreview.net\/forum?id=YicbFdNTTy."},{"key":"75_CR46","first-page":"1","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"L. Ke","year":"2023","unstructured":"Ke, L., Ye, M., Danelljan, M., Liu, Y., Tai, Y.-W., Tang, C.-K., & Yu, F. (2023). Segment anything in high quality. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Proceedings of the 37th International Conference on Neural Information Processing Systems (pp.\u00a01\u201321). Red Hook: Curran Associates."},{"key":"75_CR47","first-page":"8355","volume-title":"Proceedings of the IEEE\/CVF winter conference on applications of computer vision","author":"S. Ren","year":"2024","unstructured":"Ren, S., Luzi, F., Lahrichi, S., Kassaw, K., Collins, L. M., Bradbury, K., & Malof, J.\u00a0M. (2024). Segment anything, from space? In Proceedings of the IEEE\/CVF winter conference on applications of computer vision (pp.\u00a08355\u20138365). Piscataway: IEEE."},{"issue":"1","key":"75_CR48","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1038\/s41467-024-44824-z","volume":"15","author":"J. Ma","year":"2024","unstructured":"Ma, J., He, Y., Li, F., Han, L., You, C., & Wang, B. (2024). Segment anything in medical images. Nature Communications, 15(1), 654\u2013674.","journal-title":"Nature Communications"},{"key":"75_CR49","first-page":"1","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"D. Wang","year":"2023","unstructured":"Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., Tao, D., & Zhang, L. (2023). SAMRS: Scaling-up remote sensing segmentation dataset with segment anything model. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Proceedings of the 37th International Conference on Neural Information Processing Systems (pp.\u00a01\u201313). Red Hook: Curran Associates."},{"key":"75_CR50","first-page":"368","volume-title":"Proceedings of the 26th international conference on medical image computing and computer-assisted intervention workshops","author":"G. Deng","year":"2023","unstructured":"Deng, G., Zou, K., Ren, K., Wang, M., Yuan, X., Ying, S., & Fu, H. (2023). SAM-U: multi-box prompts triggered uncertainty estimation for reliable SAM in medical image. In Proceedings of the 26th international conference on medical image computing and computer-assisted intervention workshops (pp.\u00a0368\u2013377). Cham: Springer."},{"key":"75_CR51","doi-asserted-by":"crossref","unstructured":"Zhang, K., & Liu, D. (2023). Customized segment anything model for medical image segmentation. arXiv preprint. arXiv:2304.13785.","DOI":"10.2139\/ssrn.4495221"},{"key":"75_CR52","unstructured":"Hu, X., Xu, X., & Shi, Y. (2023). How to efficiently adapt large segmentation model (SAM) to medical images. arXiv preprint. arXiv:2306.13731."},{"key":"75_CR53","unstructured":"Wu, J., Fu, R., Fang, H., Liu, Y., Wang, Z., Xu, Y., Jin, Y., & Arbel, T. (2023). Medical SAM adapter: adapting segment anything model for medical image segmentation. arXiv preprint. arXiv:2304.12620."},{"key":"75_CR54","first-page":"5184","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition workshops","author":"S. Aleem","year":"2024","unstructured":"Aleem, S., Wang, F., Maniparambil, M., Arazo, E., Dietlmeier, J., Curran, K., Connor, N. E. O., & Little, S. (2024). Test-time adaptation with SaLIP: a cascade of SAM and CLIP for zero-shot medical image segmentation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition workshops (pp.\u00a05184\u20135193). Piscataway: IEEE."},{"key":"75_CR55","first-page":"3635","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition workshops","author":"H. Wang","year":"2024","unstructured":"Wang, H., Vasu, P. K. A., Faghri, F., Vemulapalli, R., Farajtabar, M., Mehta, S., Rastegari, M., Tuzel, O., & Pouransari, H. (2024). SAM-CLIP: merging vision foundation models towards semantic and spatial understanding. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition workshops (pp.\u00a03635\u20133647). Piscataway: IEEE."},{"key":"75_CR56","unstructured":"Gundavarapu, S. K., Arora, A., & Agarwal, S. (2024). Zero shot context-based object segmentation using SLIP (SAM+CLIP). arXiv preprint. arXiv:2405.07284."},{"key":"75_CR57","unstructured":"Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved March 12, 2025, from. https:\/\/cdn.openai.com\/research-covers\/language-unsupervised\/language_understanding_paper.pdf."},{"key":"75_CR58","first-page":"1","volume":"61","author":"M. Zhang","year":"2023","unstructured":"Zhang, M., Zhang, R., Zhang, J., Guo, J., Li, Y., & Gao, X. (2023). Dim2Clear network for infrared small target detection. IEEE Transactions on Geoscience and Remote Sensing, 61, 1\u201314.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"}],"container-title":["Visual Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44267-025-00075-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44267-025-00075-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44267-025-00075-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,5]],"date-time":"2025-04-05T00:39:19Z","timestamp":1743813559000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44267-025-00075-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,2]]},"references-count":58,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["75"],"URL":"https:\/\/doi.org\/10.1007\/s44267-025-00075-0","relation":{},"ISSN":["2097-3330","2731-9008"],"issn-type":[{"value":"2097-3330","type":"print"},{"value":"2731-9008","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,2]]},"assertion":[{"value":"29 September 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 March 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 March 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 April 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"4"}}