{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,27]],"date-time":"2025-12-27T07:29:39Z","timestamp":1766820579763,"version":"build-2065373602"},"reference-count":45,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2025,2,24]],"date-time":"2025-02-24T00:00:00Z","timestamp":1740355200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>We introduce a weakly supervised segmentation approach that leverages class activation maps and the Segment Anything Model to generate high-quality masks using only classification data. A pre-trained classifier produces class activation maps that, once thresholded, yield bounding boxes encapsulating the regions of interest. These boxes prompt the SAM to generate detailed segmentation masks, which are then refined by selecting the best overlap with automatically generated masks from the foundational model using the intersection over union metric. In a polyp segmentation case study, our approach outperforms existing zero-shot and weakly supervised methods, achieving a mean intersection over union of 0.63. This method offers an efficient and general solution for image segmentation tasks where segmentation data are scarce.<\/jats:p>","DOI":"10.3390\/make7010022","type":"journal-article","created":{"date-parts":[[2025,2,24]],"date-time":"2025-02-24T06:47:17Z","timestamp":1740379637000},"page":"22","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Automatic Prompt Generation Using Class Activation Maps for Foundational Models: A Polyp Segmentation Case Study"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9925-6134","authenticated-orcid":false,"given":"Hanna","family":"Borgli","sequence":"first","affiliation":[{"name":"Department of High-Performance Computing, Simula Research Laboratory, 0164 Oslo, Norway"},{"name":"Department of Informatics, Faculty of Mathematics and Natural Sciences, University of Oslo, 0373 Oslo, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1085-8540","authenticated-orcid":false,"given":"H\u00e5kon Kvale","family":"Stensland","sequence":"additional","affiliation":[{"name":"Department of High-Performance Computing, Simula Research Laboratory, 0164 Oslo, Norway"},{"name":"Department of Informatics, Faculty of Mathematics and Natural Sciences, University of Oslo, 0373 Oslo, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2073-7029","authenticated-orcid":false,"given":"P\u00e5l","family":"Halvorsen","sequence":"additional","affiliation":[{"name":"Department of Holistic Systems, SimulaMet, 0170 Oslo, Norway"},{"name":"Department of Computer Science, Faculty of Technology, Art and Design, OsloMet, 0166 Oslo, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,24]]},"reference":[{"key":"ref_1","first-page":"3523","article-title":"Image Segmentation Using Deep Learning: A Survey","volume":"44","author":"Minaee","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1038\/s41467-024-44824-z","article-title":"Segment anything in medical images","volume":"15","author":"Ma","year":"2024","journal-title":"Nat. Commun."},{"key":"ref_3","unstructured":"Zhou, T., Zhang, F., Chang, B., Wang, W., Yuan, Y., Konukoglu, E., and Cremers, D. (2024). Image segmentation in foundation model era: A survey. arXiv."},{"key":"ref_4","unstructured":"Jha, D., Tomar, N.K., Sharma, V., and Bagci, U. (2024, January 3\u20135). TransNetR: Transformer-based residual network for polyp segmentation with multi-center out-of-distribution testing. Proceedings of the Medical Imaging with Deep Learning, PMLR, Paris, France."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., and Lo, W.Y. (2023, January 1\u20136). Segment anything. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Xiao, B., Wu, H., Xu, W., Dai, X., Hu, H., Lu, Y., Zeng, M., Liu, C., and Yuan, L. (2023). Florence-2: Advancing a unified representation for a variety of vision tasks. arXiv.","DOI":"10.1109\/CVPR52733.2024.00461"},{"key":"ref_7","unstructured":"Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., R\u00e4dle, R., Rolland, C., and Gustafson, L. (2024). SAM 2: Segment anything in images and videos. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"103061","DOI":"10.1016\/j.media.2023.103061","article-title":"Segment anything model for medical images?","volume":"92","author":"Huang","year":"2024","journal-title":"Med. Image Anal."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"102802","DOI":"10.1016\/j.media.2023.102802","article-title":"Transformers in medical imaging: A survey","volume":"88","author":"Shamshad","year":"2023","journal-title":"Med. Image Anal."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Fan, D.P., Ji, G.P., Zhou, T., Chen, G., Fu, H., Shen, J., and Shao, L. (2020, January 4\u20138). PraNet: Parallel reverse attention network for polyp segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2014MICCAI 2020, Lima, Peru.","DOI":"10.1007\/978-3-030-59725-2_26"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Jha, D., Smedsrud, P.H., Riegler, M.A., Johansen, D., de Lange, T., Halvorsen, P., and Johansen, H.D. (2019, January 9\u201311). ResUNet++: An advanced architecture for medical image segmentation. Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA.","DOI":"10.1109\/ISM46123.2019.00049"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2252","DOI":"10.1109\/JBHI.2021.3138024","article-title":"MSRF-Net: A multi-scale residual fusion network for biomedical image segmentation","volume":"26","author":"Srivastava","year":"2021","journal-title":"IEEE J. Biomed. Health Informat."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wu, J., and Xu, M. (2024, January 16\u201322). One-prompt to segment all medical images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01074"},{"key":"ref_14","unstructured":"Zhang, C., Puspitasari, F.D., Zheng, S., Li, C., Qiao, Y., Kang, T., Shan, X., Zhang, C., Qin, C., and Rameau, F. (2023). A survey on segment anything model (sam): Vision foundation model meets prompt engineering. arXiv."},{"key":"ref_15","unstructured":"Shaharabany, T., Dahan, A., Giryes, R., and Wolf, L. (2023). Autosam: Adapting sam to medical images by overloading the prompt encoder. arXiv."},{"key":"ref_16","unstructured":"Shen, Y., Wei, Z., Liu, C., Wei, S., Zhao, Q., Zeng, K., and Li, G. (2024). Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Borgli, H., Stensland, H.K., and Halvorsen, P. (2025, January 8\u201310). Better Image Segmentation with Classification: Guiding Zero-Shot Models Using Class Activation Maps. Proceedings of the International Conference on Multimedia Modeling (MMM), Nara, Japan.","DOI":"10.1007\/978-981-96-2074-6_10"},{"key":"ref_18","unstructured":"Zhang, C., Han, D., Qiao, Y., Kim, J.U., Bae, S.H., Lee, S., and Hong, C.S. (2023). Faster segment anything: Towards lightweight SAM for mobile applications. arXiv."},{"key":"ref_19","unstructured":"Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., and Wang, J. (2023). Fast Segment Anything. arXiv."},{"key":"ref_20","first-page":"29914","article-title":"Segment anything in high quality","volume":"36","author":"Ke","year":"2024","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_21","first-page":"02783649241281508","article-title":"Foundation models in robotics: Applications, challenges, and the future","volume":"2023","author":"Firoozi","year":"2023","journal-title":"Int. J. Robot. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"102918","DOI":"10.1016\/j.media.2023.102918","article-title":"Segment anything model for medical image analysis: An experimental study","volume":"89","author":"Mazurowski","year":"2023","journal-title":"Med. Image Anal."},{"key":"ref_23","first-page":"759","article-title":"Polyp-SAM: Transfer SAM for polyp segmentation","volume":"Volume 12927","author":"Li","year":"2024","journal-title":"Proceedings of the Medical Imaging 2024: Computer-Aided Diagnosis"},{"key":"ref_24","unstructured":"Huang, J., Jiang, K., Zhang, J., Qiu, H., Lu, L., Lu, S., and Xing, E. (2024). Learning to Prompt Segment Anything Models. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kweon, H., and Yoon, K.J. (2024, January 16\u201322). From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01844"},{"key":"ref_26","unstructured":"Biswas, R. (2023). Polyp-SAM++: Can a text-guided SAM perform better for polyp segmentation?. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, H., Vasu, P.K.A., Faghri, F., Vemulapalli, R., Farajtabar, M., Mehta, S., Rastegari, M., Tuzel, O., and Pouransari, H. (2024, January 16\u201322). SAM-CLIP: Merging vision foundation models towards semantic and spatial understanding. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00367"},{"key":"ref_28","unstructured":"Kellener, E., Nath, I., Ngo, A., Nguyen, T., Schuman, J., Adler, C., and Kartikeya, A. (2023). Utilizing segment anything model for assessing localization of GRAD-CAM in medical imaging. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Shen, Z., and Jiao, R. (2024). Segment anything model for medical image segmentation: Current applications and future directions. Comput. Biol. Med., 171.","DOI":"10.1016\/j.compbiomed.2024.108238"},{"key":"ref_30","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021). Learning transferable visual models from natural language supervision. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Li, C., Yang, J., Su, H., and Zhu, J. (2023). Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. arXiv.","DOI":"10.1007\/978-3-031-72970-6_3"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Jha, D., Sharma, V., Dasu, N., Tomar, N.K., Hicks, S., Bhuyan, M., Das, P.K., Riegler, M.A., Halvorsen, P., and de Lange, T. (2023, January 29). GastroVision: A multi-class endoscopy image dataset for computer-aided gastrointestinal disease detection. Proceedings of the ICML Workshop on Machine Learning for Multimodal Healthcare Data (ML4MHD 2023), Honolulu, HI, USA.","DOI":"10.1007\/978-3-031-47679-2_10"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., de Lange, T., Johansen, D., and Johansen, H.D. (2020, January 5\u20138). Kvasir-SEG: A segmented polyp dataset. Proceedings of the 26th International Conference on Multimedia Modeling (MMM), Daejeon, Republic of Korea.","DOI":"10.1007\/978-3-030-37734-2_37"},{"key":"ref_34","unstructured":"Gildenblat, J., and Contributors (2025, February 17). PyTorch Library for CAM Methods. Available online: https:\/\/github.com\/jacobgil\/pytorch-grad-cam."},{"key":"ref_35","first-page":"8355","article-title":"Learning debiased and disentangled representations for semantic segmentation","volume":"34","author":"Chu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1038\/s41597-020-00622-y","article-title":"HyperKvasir: A comprehensive multi-class image and video dataset for gastrointestinal endoscopy","volume":"7","author":"Borgli","year":"2020","journal-title":"Sci. Data"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"108743","DOI":"10.1016\/j.patcog.2022.108743","article-title":"Believe the HiPe: Hierarchical perturbation for fast, robust, and model-agnostic saliency mapping","volume":"129","author":"Cooper","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_39","unstructured":"Petsiuk, V. (2018). Rise: Randomized Input Sampling for Explanation of black-box models. arXiv."},{"key":"ref_40","unstructured":"Fong, R., Patrick, M., and Vedaldi, A. (November, January 27). Understanding deep networks via extremal perturbations and smooth masks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_41","unstructured":"Springenberg, J.T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1084","DOI":"10.1007\/s11263-017-1059-x","article-title":"Top-down neural attention by excitation backprop","volume":"126","author":"Zhang","year":"2018","journal-title":"Int. J. Comput. Vis."},{"key":"ref_43","unstructured":"Zeiler, M.D., and Fergus, R. (2014, January 6\u201312). Visualizing and understanding convolutional networks. Proceedings of the Computer Vision\u2014ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part I 13."},{"key":"ref_44","first-page":"2951","article-title":"Practical Bayesian optimization of machine learning algorithms","volume":"25","author":"Snoek","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., and Hu, X. (2020, January 14\u201319). Score-CAM: Score-weighted visual explanations for convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00020"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/1\/22\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:41:22Z","timestamp":1760028082000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/1\/22"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,24]]},"references-count":45,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["make7010022"],"URL":"https:\/\/doi.org\/10.3390\/make7010022","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2025,2,24]]}}}