{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T16:27:49Z","timestamp":1779899269068,"version":"3.53.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T00:00:00Z","timestamp":1749600000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T00:00:00Z","timestamp":1749600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Ostbayerische Technische Hochschule Regensburg"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J CARS"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Purpose<\/jats:title>\n            <jats:p>Recognizing previously unseen classes with neural networks is a significant challenge due to their limited generalization capabilities.  This issue is particularly critical in safety-critical domains such as medical applications, where accurate classification is essential for reliability and patient safety. Zero-shot learning methods address this challenge by utilizing additional semantic data, with their performance relying heavily on the quality of the generated embeddings.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>This work investigates the use of full descriptive sentences, generated by a Sentence-BERT model, as class representations, compared to simpler category-based word embeddings derived from a BERT model. Additionally, the impact of z-score normalization as a post-processing step on these embeddings is explored. The proposed approach is evaluated on a multi-label generalized zero-shot learning task, focusing on the recognition of surgical instruments in endoscopic images from minimally invasive cholecystectomies.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>The results demonstrate that combining sentence embeddings and z-score normalization significantly improves model performance. For unseen classes, the AUROC improves from 43.9\u00a0% to 64.9\u00a0%, and the multi-label accuracy from 26.1\u00a0% to 79.5\u00a0%. Overall performance measured across both seen and unseen classes improves from 49.3\u00a0% to 64.9\u00a0% in AUROC and from 37.3\u00a0% to 65.1\u00a0% in multi-label accuracy, highlighting the effectiveness of our approach.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>These findings demonstrate that sentence embeddings and z-score normalization can substantially enhance the generalization performance of zero-shot learning models. However, as the study is based on a single dataset, future work should validate the method across diverse datasets and application domains to establish its robustness and broader applicability.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1007\/s11548-025-03439-5","type":"journal-article","created":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T01:13:57Z","timestamp":1749604437000},"page":"1577-1587","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Enhancing generalization in zero-shot multi-label endoscopic instrument classification"],"prefix":"10.1007","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-0870-6654","authenticated-orcid":false,"given":"Raphaela","family":"Maerkl","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0628-1979","authenticated-orcid":false,"given":"Tobias","family":"Rueckert","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-1782-5035","authenticated-orcid":false,"given":"David","family":"Rauber","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1451-8361","authenticated-orcid":false,"given":"Max","family":"Gutbrod","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0401-0842","authenticated-orcid":false,"given":"Danilo","family":"Weber Nunes","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9468-2871","authenticated-orcid":false,"given":"Christoph","family":"Palm","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,6,11]]},"reference":[{"issue":"4","key":"3439_CR1","doi-asserted-by":"publisher","first-page":"812","DOI":"10.1007\/s41315-024-00341-2","volume":"8","author":"G Dagnino","year":"2024","unstructured":"Dagnino G, Kundrat D (2024) Robot-assistive minimally invasive surgery: trends and future directions. Int J Intell Robot Appl 8(4):812\u2013826. https:\/\/doi.org\/10.1007\/s41315-024-00341-2","journal-title":"Int J Intell Robot Appl"},{"issue":"8","key":"3439_CR2","doi-asserted-by":"publisher","first-page":"E598","DOI":"10.1001\/amajethics.2023.598","volume":"25","author":"A Chuchulo","year":"2023","unstructured":"Chuchulo A, Ali A (2023) Is robotic-assisted surgery better? AMA J Ethics 25(8):E598-604. https:\/\/doi.org\/10.1001\/amajethics.2023.598","journal-title":"AMA J Ethics"},{"key":"3439_CR3","doi-asserted-by":"publisher","first-page":"107929","DOI":"10.1016\/J.COMPBIOMED.2024.107929","volume":"169","author":"T Rueckert","year":"2024","unstructured":"Rueckert T, Rueckert D, Palm C (2024) Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art. Comput Biol Med 169:107929. https:\/\/doi.org\/10.1016\/J.COMPBIOMED.2024.107929","journal-title":"Comput Biol Med"},{"issue":"1","key":"3439_CR4","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1007\/s13755-023-00238-7","volume":"11","author":"S Casas-Yrurzum","year":"2023","unstructured":"Casas-Yrurzum S, Gimeno J, Casanova-Salas P, Garc\u00eda-Pereira I, Garc\u00eda Del Olmo E, Salvador A, Guijarro R, Zaragoza C, Fern\u00e1ndez M (2023) A new mixed reality tool for training in minimally invasive robotic-assisted surgery. Health Inf Sci Syst 11(1):34. https:\/\/doi.org\/10.1007\/s13755-023-00238-7","journal-title":"Health Inf Sci Syst"},{"key":"3439_CR5","doi-asserted-by":"crossref","unstructured":"Bai L, Islam M, Ren H (2023) CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp 397\u2013407. https:\/\/doi.org\/10.48550\/arXiv.2307.05182","DOI":"10.1007\/978-3-031-43996-4_38"},{"key":"3439_CR6","doi-asserted-by":"publisher","first-page":"633","DOI":"10.1016\/J.MEDIA.2016.09.003","volume":"35","author":"D Bouget","year":"2017","unstructured":"Bouget D, Allan M, Stoyanov D, Jannin P (2017) Vision-based and marker-less surgical tool detection and tracking: a review of the literature. Medical Image Anal 35:633\u2013654. https:\/\/doi.org\/10.1016\/J.MEDIA.2016.09.003","journal-title":"Medical Image Anal"},{"issue":"4","key":"3439_CR7","doi-asserted-by":"publisher","first-page":"4051","DOI":"10.1109\/TPAMI.2022.3191696","volume":"45","author":"F Pourpanah","year":"2023","unstructured":"Pourpanah F, Abdar M, Luo Y, Zhou X, Wang R, Lim CP, Wang X, Wu QMJ (2023) A review of generalized zero-shot learning methods. IEEE Trans Pattern Anal Mach Intell 45(4):4051\u20134070. https:\/\/doi.org\/10.1109\/TPAMI.2022.3191696","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"3439_CR8","unstructured":"Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations in vector space. In: International Conference on Learning Representations"},{"key":"3439_CR9","doi-asserted-by":"crossref","unstructured":"Pennington J, Socher R, Manning C (2014) Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), p 1532\u20131543","DOI":"10.3115\/v1\/D14-1162"},{"key":"3439_CR10","doi-asserted-by":"crossref","unstructured":"Reimers N, Gurevych I (2019) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, p 3980\u20133990","DOI":"10.18653\/v1\/D19-1410"},{"key":"3439_CR11","doi-asserted-by":"publisher","first-page":"6943","DOI":"10.1109\/TIP.2021.3100552","volume":"30","author":"B Liu","year":"2021","unstructured":"Liu B, Hu L, Dong Q, Hu Z (2021) An iterative co-training transductive framework for zero shot learning. IEEE Trans Image Process 30:6943\u20136956","journal-title":"IEEE Trans Image Process"},{"key":"3439_CR12","doi-asserted-by":"crossref","unstructured":"Yin Y, Ma J, You Y, Hou R, Qin N, Huang D (2023) Generalized Zero-Shot Learning Fault Diagnosis for Bogies of High-speed Trains based on Improved Generative Adversarial Networks. In: 2023 CAA Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS) p 1\u20136","DOI":"10.1109\/SAFEPROCESS58597.2023.10295675"},{"key":"3439_CR13","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2024.3425437","author":"N Qin","year":"2024","unstructured":"Qin N, Yin Y, Huang D, You Y, Hou R (2024) Generalized zero-shot learning for fault diagnosis in high-speed train bogies based on enhanced diffusion generative models. IEEE Trans Reliability. https:\/\/doi.org\/10.1109\/TR.2024.3425437","journal-title":"IEEE Trans Reliability"},{"key":"3439_CR14","doi-asserted-by":"publisher","first-page":"107352","DOI":"10.1016\/j.asoc.2021.107352","volume":"107","author":"Y Luo","year":"2021","unstructured":"Luo Y, Wang X, Pourpanah F (2021) Dual VAEGAN: a generative model for generalized zero-shot learning. Appl Soft Comput 107:107352. https:\/\/doi.org\/10.1016\/j.asoc.2021.107352","journal-title":"Appl Soft Comput"},{"key":"3439_CR15","unstructured":"Hayat N, Lashen H, Shamout FE (2021) Multi-Label Generalized Zero Shot Learning for the Classification of Disease in Chest Radiographs. arXiv:abs\/2107.06563"},{"key":"3439_CR16","unstructured":"Xu W, Xian Y, Wang J, Schiele B, Akata Z (2020) Attribute Prototype Network for Zero-Shot Learning. Adv Neural Inform Process Syst (NeurIPS 2020) 33:21969\u201321980. https:\/\/doi.org\/10.48550\/arXiv.2008.08290"},{"key":"3439_CR17","doi-asserted-by":"crossref","unstructured":"Shubho FH, Chowdhury TF, Cheraghian A, Saberi M, Mohammed N, Rahman S (2023) ChatGPT-guided Semantics for Zero-shot Learning. In: 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA) pp 418\u2013425","DOI":"10.1109\/DICTA60407.2023.00064"},{"key":"3439_CR18","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1, pp 4171\u20134186"},{"key":"3439_CR19","unstructured":"Sajjad H, Alam F, Dalvi F, Durrani N (2021) Effect of Post-processing on Contextualized Word Representations. In: Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, p 3127\u20133142"},{"key":"3439_CR20","first-page":"2487","volume":"11","author":"M Radovanovic","year":"2010","unstructured":"Radovanovic M, Nanopoulos A, Ivanovic M (2010) Hubs in space: popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487\u20132531 (https:\/\/api.semanticscholar.org\/CorpusID:12182489)","journal-title":"J Mach Learn Res"},{"key":"3439_CR21","doi-asserted-by":"crossref","unstructured":"Fei N, Gao Y, Lu Z, Xiang T (2021) Z-Score Normalization, Hubness, and Few-Shot Learning. In: 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), p 142\u2013151","DOI":"10.1109\/ICCV48922.2021.00021"},{"key":"3439_CR22","unstructured":"MICCAI (2024) MICCAI 2024 - 27. International Conference On Medical Image Computing & Computer Assisted Intervention. https:\/\/conferences.miccai.org\/2024\/en\/default.asp"},{"key":"3439_CR23","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Van Der\u00a0Maaten L, Weinberger KQ (2017) Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, p 2261\u20132269","DOI":"10.1109\/CVPR.2017.243"},{"issue":"4","key":"3439_CR24","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234\u20131240. https:\/\/doi.org\/10.1093\/bioinformatics\/btz682","journal-title":"Bioinformatics"},{"key":"3439_CR25","doi-asserted-by":"crossref","unstructured":"Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, McDermott MBA (2019) Publicly Available Clinical BERT Embeddings","DOI":"10.18653\/v1\/W19-1909"},{"key":"3439_CR26","doi-asserted-by":"publisher","unstructured":"Johnson AEW, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG (2016) MIMIC-III, a freely accessible critical care database. Sci Data 3:160035. https:\/\/doi.org\/10.1038\/sdata.2016.35","DOI":"10.1038\/sdata.2016.35"},{"key":"3439_CR27","unstructured":"Kingma DP, Ba J (2014) Adam: A Method for Stochastic Optimization. CoRR abs\/1412.6980. https:\/\/api.semanticscholar.org\/CorpusID:6628106"},{"key":"3439_CR28","doi-asserted-by":"crossref","unstructured":"Smith LN, Topin N (2019) Super-convergence: very fast training of neural networks using large learning rates. In: Proc. SPIE 11006, artificial intelligence and machine learning for multi-domain operations applications, vol 1100612","DOI":"10.1117\/12.2520589"},{"key":"3439_CR29","unstructured":"Wu XZ, Zhou ZH (2017) A Unified View of Multi-Label Performance Measures. In: Proceedings of the 34th International Conference on Machine Learning. PMLR, p 3780\u20133788"}],"container-title":["International Journal of Computer Assisted Radiology and Surgery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-025-03439-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11548-025-03439-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-025-03439-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T19:06:00Z","timestamp":1757185560000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11548-025-03439-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,11]]},"references-count":29,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["3439"],"URL":"https:\/\/doi.org\/10.1007\/s11548-025-03439-5","relation":{},"ISSN":["1861-6429"],"issn-type":[{"value":"1861-6429","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,11]]},"assertion":[{"value":"10 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 June 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"In our work, we use the already published PhaKIR dataset, which provides exclusively anonymized data collected in an ethically approved research project approved by the local ethics committee of the Technical University of Munich (approval code 337\/21\u00a0S-EB). For our work, no new patient information was collected.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}}]}}