{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T15:23:24Z","timestamp":1774538604118,"version":"3.50.1"},"reference-count":63,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T00:00:00Z","timestamp":1724976000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union","award":["101095359"],"award-info":[{"award-number":["101095359"]}]},{"name":"European Union","award":["10058099"],"award-info":[{"award-number":["10058099"]}]},{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","award":["101095359"],"award-info":[{"award-number":["101095359"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","award":["10058099"],"award-info":[{"award-number":["10058099"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Diagnostics"],"abstract":"<jats:p>The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, which is a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia, and ultimately cancer. Early detection through endoscopic regular surveillance is essential for better outcomes. Foundation models (FMs), which are machine or deep learning models trained on diverse data and applicable to broad use cases, offer a promising solution to enhance the accuracy of endoscopy and its subsequent pathology image analysis. This review explores the recent advancements, applications, and challenges associated with FMs in endoscopy and pathology imaging. We started by elucidating the core principles and architectures underlying these models, including their training methodologies and the pivotal role of large-scale data in developing their predictive capabilities. Moreover, this work discusses emerging trends and future research directions, emphasizing the integration of multimodal data, the development of more robust and equitable models, and the potential for real-time diagnostic support. This review aims to provide a roadmap for researchers and practitioners in navigating the complexities of incorporating FMs into clinical practice for the prevention\/management of GC cases, thereby improving patient outcomes.<\/jats:p>","DOI":"10.3390\/diagnostics14171912","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T06:46:29Z","timestamp":1725000389000},"page":"1912","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1478-6687","authenticated-orcid":false,"given":"Hamideh","family":"Kerdegari","sequence":"first","affiliation":[{"name":"Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0265-1395","authenticated-orcid":false,"given":"Kyle","family":"Higgins","sequence":"additional","affiliation":[{"name":"Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK"},{"name":"Department of Neurobiology, Boston Children\u2019s Hospital, Harvard Medical School, Boston, MA 02115, USA"}]},{"given":"Dennis","family":"Veselkov","sequence":"additional","affiliation":[{"name":"Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK"}]},{"given":"Ivan","family":"Laponogov","sequence":"additional","affiliation":[{"name":"Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9892-7765","authenticated-orcid":false,"given":"Inese","family":"Polaka","sequence":"additional","affiliation":[{"name":"Faculty of Medicine, Institute of Clinical and Preventive Medicine, University of Latvia, LV-1586 Riga, Latvia"}]},{"given":"Miguel","family":"Coimbra","sequence":"additional","affiliation":[{"name":"Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ci\u00eancia, 3200-465 Porto, Portugal"},{"name":"Faculdade de Ci\u00eancias, Universidade do Porto, 4169-007 Porto, Portugal"}]},{"given":"Junior Andrea","family":"Pescino","sequence":"additional","affiliation":[{"name":"StratejAI, Avenue Louise 209, 1050 Brussels, Belgium"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0319-8855","authenticated-orcid":false,"given":"M\u0101rcis","family":"Leja","sequence":"additional","affiliation":[{"name":"Faculty of Medicine, Institute of Clinical and Preventive Medicine, University of Latvia, LV-1586 Riga, Latvia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0121-6850","authenticated-orcid":false,"given":"M\u00e1rio","family":"Dinis-Ribeiro","sequence":"additional","affiliation":[{"name":"IRISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto), 4200-072 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2789-9082","authenticated-orcid":false,"given":"Tania","family":"Fleitas Kanonnikoff","sequence":"additional","affiliation":[{"name":"Instituto Investigaci\u00f3n Sanitaria INCLIVA, Medical Oncology Department, Hospital Cl\u00ednico Universitario de Valencia, 46010 Valencia, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1164-0359","authenticated-orcid":false,"given":"Kirill","family":"Veselkov","sequence":"additional","affiliation":[{"name":"Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK"},{"name":"Department of Environmental Health Sciences, Yale University, New Haven, CT 06520, USA"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"5","DOI":"10.5009\/gnl14118","article-title":"Diagnosis and management of high risk group for gastric cancer","volume":"9","author":"Yoon","year":"2015","journal-title":"Gut Liver"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1055\/a-0859-1883","article-title":"Management of epithelial precancerous conditions and lesions in the stomach (maps II): European Society of gastrointestinal endoscopy (ESGE), European Helicobacter and microbiota Study Group (EHMSG), European Society of pathology (ESP), and Sociedade Portuguesa de Endoscopia Digestiva (SPED) guideline update 2019","volume":"51","author":"Areia","year":"2019","journal-title":"Endoscopy"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1899","DOI":"10.1007\/s10620-020-06272-9","article-title":"Recent guidelines on the management of patients with gastric atrophy: Common points and controversies","volume":"65","author":"Camargo","year":"2020","journal-title":"Dig. Dis. Sci."},{"key":"ref_4","unstructured":"Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020, January 13\u201318). A simple framework for contrastive learning of visual representations. Proceedings of the 37th International Conference on Machine Learning, Virtual."},{"key":"ref_5","unstructured":"Radford, A., Kim, J., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learning, Virtual."},{"key":"ref_6","unstructured":"Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., and Duerig, T. (2021, January 18\u201324). Scaling up visual and vision-language representation learning with noisy text supervision. Proceedings of the 38th International Conference on Machine Learning, Virtual."},{"key":"ref_7","unstructured":"Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., and Hoi, S.C.H. (2021, January 18\u201324). Align before fuse: Vision and language representation learning with momentum distillation. Proceedings of the 38th International Conference on Machine Learning, Virtual."},{"key":"ref_8","unstructured":"Yao, L., Huang, R., Hou, L., Lu, G., Niu, M., Xu, H., Liang, X., Li, Z., Jiang, X., and Xu, C. (2021). Filip: Fine-grained interactive language-image pre-training. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Wang, Z., Lu, Y., Li, Q., Tao, X., Guo, Y., Gong, M., and Liu, T. (2022, January 18\u201324). Cris: Clip-driven referring image segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01139"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, L.H., Zhang, P., Zhang, H., Yang, J., Li, C., Zhong, Y., Wang, L., Yuan, L., Zhang, L., and Hwang, J.-N. (2022, January 18\u201324). Grounded language-image pre-training. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01069"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Xu, J., De Mello, S., Liu, S., Byeon, W., Breuel, T., Kautz, J., and Wang, X. (2022, January 18\u201324). Groupvit: Semantic segmentation emerges from text supervision. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01760"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yang, J., Li, C., Zhang, P., Xiao, B., Liu, C., Yuan, L., and Gao, J. (2022, January 18\u201324). Unified contrastive learning in image-text-label space. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01857"},{"key":"ref_13","first-page":"36067","article-title":"Glipv2: Unifying localization and vision-language understanding","volume":"35","author":"Zhang","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_14","unstructured":"Bao, H., Dong, L., and Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv."},{"key":"ref_15","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zeng, Y., Zhang, J., and Li, H. (2023). Toward building general foundation models for language, vision, and vision-language understanding tasks. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.40"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Singh, A., Hu, R., Goswami, V., Couairon, G., Galuba, W., Rohrbach, M., and Kiela, D. (2022, January 18\u201324). Flava: A foundational language and vision alignment model. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01519"},{"key":"ref_18","unstructured":"Hao, Y., Song, H., Dong, L., Huang, S., Chi, Z., Wang, W., Ma, S., and Wei, F. (2022). Language models are general-purpose interfaces. arXiv."},{"key":"ref_19","unstructured":"Li, J., Li, D., Savarese, S., and Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv."},{"key":"ref_20","unstructured":"Tschannen, M., Kumar, M., Steiner, A., Zhai, X., Houlsby, N., and Beyer, L. (2023). Image captioners are scalable vision learners too. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, Y.-C., Li, L., Yu, L., El Kholy, A., Ahmed, F., Gan, Z., Cheng, Y., and Liu, J. (2020). Uniter: Universal image-text representation learning. Computer Vision\u2014ECCV 2020: 16th European Conference, Part XXX, Springer.","DOI":"10.1007\/978-3-030-58577-8_7"},{"key":"ref_22","first-page":"200","article-title":"Multimodal few-shot learning with frozen language models","volume":"34","author":"Tsimpoukelli","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Xu, H., Zhang, J., Cai, J., Rezatofighi, H., Yu, F., Tao, D., and Geiger, A. (2022). Unifying flow, stereo and depth estimation. arXiv.","DOI":"10.1109\/TPAMI.2023.3298645"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., and Lo, W.-Y. (2023, January 2\u20136). Segment anything. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"ref_25","unstructured":"Deng, R., Cui, C., Liu, Q., Yao, T., Remedios, L.W., Bao, S., Landman, B.A., Wheless, L.E., Coburn, L.A., and Wilson, K.T. (2023). Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Cui, C., Deng, R., Liu, Q., Yao, T., Bao, S., Remedios, L.W., Tang, Y., and Huo, Y. (2023). All-in-sam: From weak annotation to pixel-wise nuclei segmentation with prompt-based finetuning. arXiv.","DOI":"10.1088\/1742-6596\/2722\/1\/012012"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, J., Ma, K., Kapse, S., Saltz, J., Vakalopoulou, M., Prasanna, P., and Samaras, D. (2023). Sam-path: A segment anything model for semantic segmentation in digital pathology. arXiv.","DOI":"10.1007\/978-3-031-47401-9_16"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Israel, U., Marks, M., Dilip, R., Li, Q., Yu, C., Laubscher, E., Li, S., Schwartz, M., Pradhan, E., and Ates, A. (2023). A foundation model for cell segmentation. bioRxiv.","DOI":"10.1101\/2023.11.17.567630"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Archit, A., Nair, S., Khalid, N., Hilt, P., Rajashekar, V., Freitag, M., Gupta, S., Dengel, A., Ahmed, S., and Pape, C. (2023). Segment anything for microscopy. bioRxiv.","DOI":"10.1101\/2023.08.21.554208"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, X., Deng, R., Tang, Y., Bao, S., and Yang, H. (2023). and Huo, Y. Leverage Weakly Annotation to Pixel-wise Annotation via Zero-shot Segment Anything Model for Molecular-empowered Learning. arXiv.","DOI":"10.1117\/12.3006577"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chen, R.J., Chen, C., Li, Y., Chen, T.Y., Trister, A.D., Krishnan, R.G., and Mahmood, F. (2022, January 18\u201324). Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01567"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"102559","DOI":"10.1016\/j.media.2022.102559","article-title":"Transformer-based unsupervised contrastive learning for histopathological image classification","volume":"81","author":"Wang","year":"2022","journal-title":"Med. Image Anal."},{"key":"ref_33","first-page":"100198","article-title":"Self supervised contrastive learning for digital histopathology","volume":"7","author":"Ciga","year":"2022","journal-title":"Mach. Learn. Appl."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"756","DOI":"10.1038\/s41551-023-01049-7","article-title":"Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging","volume":"7","author":"Azizi","year":"2023","journal-title":"Nat. Biomed. Eng."},{"key":"ref_35","unstructured":"Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., and El-Nouby, A. (2023). Dinov2: Learning robust visual features without supervision. arXiv."},{"key":"ref_36","unstructured":"Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Liu, S., Severson, K., Zimmermann, E., Hall, J., and Tenenholtz, N. (2023). Virchow: A million-slide digital pathology foundation model. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Roth, B., Koch, V., Wagner, S.J., Schnabel, J.A., Marr, C., and Peng, T. (2024). Low-resource finetuning of foundation models beats state-of-the-art in histopathology. arXiv.","DOI":"10.1109\/ISBI56570.2024.10635695"},{"key":"ref_38","unstructured":"Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., Jaume, G., Chen, B., Zhang, A., Shao, D., Song, A.H., and Shaban, M. (2023). A general-purpose self-supervised model for computational pathology. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Filiot, A., Ghermi, R., Olivier, A., Jacob, P., Fidon, L., Mac Kain, A., Saillard, C., and Schiratti, J.-B. (2023). Scaling self-supervised learning for histopathology with masked image modeling. medRxiv, 2023-07.","DOI":"10.1101\/2023.07.21.23292757"},{"key":"ref_40","unstructured":"Campanella, G., Kwan, R., Fluder, E., Zeng, J., Stock, A., Veremis, B., Polydorides, A.D., Hedvat, C., Schoenfeld, A., and Vanderbilt, C. (2023). Computational pathology at health system scale\u2013self-supervised foundation models from three billion images. arXiv."},{"key":"ref_41","unstructured":"Dippel, J., Feulner, B., Winterhoff, T., Schallenberg, S., Dernbach, G., Kunft, A., Tietz, S., Jurmeister, P., Horst, D., and Ruff, L. (2024). RudolfV: A Foundation Model by Pathologists for Pathologists. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1038\/s41586-024-07441-w","article-title":"A whole-slide foundation model for digital pathology from real-world data","volume":"630","author":"Xu","year":"2024","journal-title":"Nature"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1681","DOI":"10.1109\/JBHI.2022.3163751","article-title":"Vision-language transformer for interpretable pathology visual question answering","volume":"27","author":"Naseem","year":"2022","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"He, X., Zhang, Y., Mou, L., Xing, E., and Xie, P. (2020). Pathvqa: 30,000+ questions for medical visual question answering. arXiv.","DOI":"10.36227\/techrxiv.13127537"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T., and Zou, J. (2023). Leveraging medical twitter to build a visual\u2013language foundation model for pathology ai. bioRxiv, 2023-03.","DOI":"10.1101\/2023.03.29.534834"},{"key":"ref_46","unstructured":"Sun, Y., Zhu, C., Zheng, S., Zhang, K., Shui, Z., Yu, X., Zhao, Y., Li, H., Zhang, Y., and Zhao, R. (2023). Pathasst: Redefining pathology through generative foundation ai assistant for pathology. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Lu, M.Y., Chen, B., Zhang, A., Williamson, D.F., Chen, R.J., Ding, T., Le, L.P., Chuang, Y.S., and Mahmood, F. (2023, January 17\u201324). Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01893"},{"key":"ref_48","unstructured":"Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Liang, I., Ding, T., Jaume, G., Odintsov, I., Zhang, A., and Le, L.P. (2023). Towards a visual-language foundation model for computational pathology. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Gao, J., Zhou, M., Wang, X., Qiao, Y., Zhang, S., and Wang, D. (2023). Text-guided foundation model adaptation for pathological image classification. International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer.","DOI":"10.1007\/978-3-031-43904-9_27"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The cancer genome atlas pan-cancer analysis project","volume":"45","author":"Weinstein","year":"2013","journal-title":"Nat. Genet."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Caron, M., Touvron, H., Misra, I., J\u00e9gou, H., Mairal, J., Bojanowski, P., and Joulin, A. (2021, January 11\u201317). Emerging properties in self-supervised vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"101854","DOI":"10.1016\/j.media.2020.101854","article-title":"Paip 2019: Liver cancer segmentation challenge","volume":"67","author":"Kim","year":"2021","journal-title":"Med. Image Anal."},{"key":"ref_53","unstructured":"Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., and Kong, T. (2021). ibot: Image bert pre-training with online tokenizer. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"102645","DOI":"10.1016\/j.media.2022.102645","article-title":"Retccl: Clustering-guided contrastive learning for whole-slide image retrieval","volume":"83","author":"Wang","year":"2023","journal-title":"Med. Image Anal."},{"key":"ref_55","unstructured":"Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., and Wu, Y. (2022). Coca: Contrastive captioners are image-text foundation models. arXiv."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Wang, Z., Liu, C., Zhang, S., and Dou, Q. (2023). Foundation model for endoscopy video analysis via large-scale self-supervised pre-train. International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer.","DOI":"10.1007\/978-3-031-43996-4_10"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Cui, B., Mobarakol, I., Bai, L., and Ren, H. (2024). Surgical-DINO: Adapter Learning of Foundation Model for Depth Estimation in Endoscopic Surgery. arXiv.","DOI":"10.1007\/s11548-024-03083-5"},{"key":"ref_58","unstructured":"Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv."},{"key":"ref_59","unstructured":"Cheng, Y., Li, L., Xu, Y., Li, X., Yang, Z., Wang, W., and Yang, Y. (2023). Segment and track anything. arXiv."},{"key":"ref_60","unstructured":"Song, Y., Yang, M., Wu, W., He, D., Li, F., and Wang, J. (2022). It takes two: Masked appearance-motion modeling for self-supervised video transformer pre-training. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1145\/3571730","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Hoelscher-Obermaier, J., Persson, J., Kran, E., Konstas, I., and Barez, F. (2023). Detecting edit failures in large language models: An improved specificity benchmark. arXiv.","DOI":"10.18653\/v1\/2023.findings-acl.733"},{"key":"ref_63","unstructured":"Lekadir, K., Feragen, A., Fofanah, A.J., Frangi, A.F., Buyx, A., Emelie, A., Lara, A., Porras, A.R., Chan, A., and Navarro, A. (2023). FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. arXiv."}],"container-title":["Diagnostics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2075-4418\/14\/17\/1912\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:45:40Z","timestamp":1760111140000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2075-4418\/14\/17\/1912"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,30]]},"references-count":63,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["diagnostics14171912"],"URL":"https:\/\/doi.org\/10.3390\/diagnostics14171912","relation":{},"ISSN":["2075-4418"],"issn-type":[{"value":"2075-4418","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,30]]}}}