{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T17:35:51Z","timestamp":1781717751048,"version":"3.54.5"},"reference-count":98,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T00:00:00Z","timestamp":1747440000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T00:00:00Z","timestamp":1747440000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"ScaDS.AI Leipzig"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Zero-shot recognition is centered around learning representations to transfer knowledge from seen to unseen classes. Where foundational approaches perform the transfer with semantic embedding spaces, <jats:italic>e.g.,<\/jats:italic> from attributes or word vectors, the current state-of-the-art relies on prompting pre-trained vision-language models to obtain class embeddings. Whether zero-shot learning is performed with attributes, CLIP, or something else, current approaches <jats:italic>de facto<\/jats:italic> assume that there is a pre-defined embedding space in which seen and unseen classes can be positioned. Our work is concerned with real-world zero-shot settings where a pre-defined embedding space can no longer be assumed. This is natural in domains such as biology and medicine, where class names are not common English words, rendering vision-language models useless; or neuroscience, where class relations are only given with non-semantic human comparison scores. We find that there is one data structure enabling zero-shot learning in both standard and non-standard settings: a similarity matrix spanning the seen and unseen classes. We introduce four <jats:italic>similarity-based zero-shot learning<\/jats:italic> challenges, tackling open-ended scenarios such as learning with uncommon class names, learning from multiple partial sources, and learning with missing knowledge. As the first step for zero-shot learning beyond a pre-defined semantic embedding space, we propose <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\kappa $$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>\u03ba<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>-MDS, a general approach that obtains a prototype for each class on any manifold from similarities alone, even when part of the similarities are missing. Our approach can be plugged into any standard, hyperspherical, or hyperbolic zero-shot learner. Experiments on existing datasets and the new benchmarks show the promise and challenges of similarity-based zero-shot learning.<\/jats:p>","DOI":"10.1007\/s11263-025-02422-6","type":"journal-article","created":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T12:27:08Z","timestamp":1747484828000},"page":"5161-5177","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["SimZSL: Zero-Shot Learning Beyond a Pre-defined Semantic Embedding Space"],"prefix":"10.1007","volume":"133","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-8377-270X","authenticated-orcid":false,"given":"Mina Ghadimi","family":"Atigh","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stephanie","family":"Nargang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Martin","family":"Keller-Ressel","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pascal","family":"Mettes","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,5,17]]},"reference":[{"key":"2422_CR1","doi-asserted-by":"crossref","unstructured":"Agarwal, A., Phillips, J.M., & Venkatasubramanian, S. (2010). Universal multi-dimensional scaling. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining pp. 1149\u2013 1158.","DOI":"10.1145\/1835804.1835948"},{"issue":"7","key":"2422_CR2","doi-asserted-by":"publisher","first-page":"1425","DOI":"10.1109\/TPAMI.2015.2487986","volume":"38","author":"Z Akata","year":"2015","unstructured":"Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2015). Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(7), 1425\u20131438.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2422_CR3","first-page":"23716","volume":"35","author":"J-B Alayrac","year":"2022","unstructured":"Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., et al. (2022). Flamingo: A visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35, 23716\u201323736.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2422_CR4","doi-asserted-by":"crossref","unstructured":"Al-Halah, Z., & Stiefelhagen, R. (2015). How to transfer? zero-shot object recognition via hierarchical transfer of semantic attributes. In 2015 IEEE Winter Conference on Applications of Computer Vision pp. 837\u2013 843. IEEE.","DOI":"10.1109\/WACV.2015.116"},{"key":"2422_CR5","doi-asserted-by":"crossref","unstructured":"Ali, M., & Khan, S. (2023). Clip-decoder: Zeroshot multilabel classification using multimodal clip aligned representations. In Proceedings of the IEEE\/CVF international conference on computer vision pp. 4675\u2013 4679.","DOI":"10.1109\/ICCVW60793.2023.00505"},{"key":"2422_CR6","doi-asserted-by":"crossref","unstructured":"Atigh, M.G., Schoep, J., Acar, E., Van\u00a0Noord, N., & Mettes, P. (2022). Hyperbolic image segmentation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 4453\u2013 4462.","DOI":"10.1109\/CVPR52688.2022.00441"},{"key":"2422_CR7","doi-asserted-by":"crossref","unstructured":"Beery, S., Liu, Y., Morris, D., Piavis, J., Kapoor, A., Joshi, N., Meister, M., & Perona, P. ( 2020) Synthetic examples improve generalization for rare classes. In Proceedings of the Ieee\/cvf winter conference on applications of computer vision pp. 863\u2013 873.","DOI":"10.1109\/WACV45572.2020.9093570"},{"key":"2422_CR8","doi-asserted-by":"crossref","unstructured":"Beery, S., Wu, G., Edwards, T., Pavetic, F., Majewski, B., Mukherjee, S., Chan, S., Morgan, J., Rathod, V., & Huang, J. ( 2022). The auto arborist dataset: a large-scale benchmark for multiview urban forest monitoring under domain shift. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 21294\u2013 21307","DOI":"10.1109\/CVPR52688.2022.02061"},{"key":"2422_CR9","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2017","unstructured":"Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135\u2013146.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2422_CR10","volume-title":"Modern multidimensional scaling: Theory and applications","author":"I Borg","year":"2005","unstructured":"Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer."},{"key":"2422_CR11","doi-asserted-by":"crossref","unstructured":"Braytee, A., Naji, M., Anaissi, A., Chaturvedi, K., & Prasad, M. ( 2021). Zero-shot learning with missing attributes using semantic correlations. In 2021 international joint conference on neural networks (IJCNN) pp. 1\u2013 7. IEEE.","DOI":"10.1109\/IJCNN52387.2021.9533591"},{"key":"2422_CR12","doi-asserted-by":"crossref","unstructured":"Bretti, C., & Mettes, P. ( 2021). Zero-shot action recognition from diverse object-scene compositions.","DOI":"10.5244\/C.35.213"},{"issue":"6","key":"2422_CR13","doi-asserted-by":"publisher","first-page":"925","DOI":"10.1109\/JPROC.2009.2035722","volume":"98","author":"EJ Candes","year":"2010","unstructured":"Candes, E. J., & Plan, Y. (2010). Matrix completion with noise. Proceedings of the IEEE, 98(6), 925\u2013936.","journal-title":"Proceedings of the IEEE"},{"key":"2422_CR14","doi-asserted-by":"crossref","unstructured":"Carroll, J.D., Arabie, P. (1998) Multidimensional scaling. Measurement, judgment and decision making, 179\u2013250","DOI":"10.1016\/B978-012099975-0.50005-1"},{"key":"2422_CR15","doi-asserted-by":"crossref","unstructured":"Changpinyo, S., Chao, W.-L., & Sha, F. ( 2017). Predicting visual exemplars of unseen classes for zero-shot learning. In Proceedings of the IEEE international conference on computer vision pp. 3476\u2013 3485.","DOI":"10.1109\/ICCV.2017.376"},{"key":"2422_CR16","doi-asserted-by":"crossref","unstructured":"Chen, S., Hong, Z., Liu, Y., Xie, G.-S., Sun, B., Li, H., Peng, Q., Lu, K., & You, X. (2022a). Transzero: Attribute-guided transformer for zero-shot learning. In Proceedings of the AAAI conference on artificial intelligence vol. 36, pp. 330\u2013 338.","DOI":"10.1609\/aaai.v36i1.19909"},{"key":"2422_CR17","doi-asserted-by":"crossref","unstructured":"Chen, S., Hong, Z., Xie, G.-S., Yang, W., Peng, Q., Wang, K., Zhao, J., & You, X. (2022b). Msdn: Mutually semantic distillation network for zero-shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 7612\u2013 7621.","DOI":"10.1109\/CVPR52688.2022.00746"},{"key":"2422_CR18","doi-asserted-by":"crossref","unstructured":"Chen, Z., Luo, Y., Qiu, R., Wang, S., Huang, Z., Li, J., & Zhang, Z. (2021a). Semantics disentangling for generalized zero-shot learning. In Proceedings of the IEEE\/CVF international conference on computer vision pp. 8712\u2013 8720.","DOI":"10.1109\/ICCV48922.2021.00859"},{"key":"2422_CR19","doi-asserted-by":"crossref","unstructured":"Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., & Shao, L. (2021b). Free: Feature refinement for generalized zero-shot learning. In Proceedings of the IEEE\/CVF international conference on computer vision pp. 122\u2013 131.","DOI":"10.1109\/ICCV48922.2021.00019"},{"key":"2422_CR20","doi-asserted-by":"crossref","unstructured":"Chen, T., Wu, W., Gao, Y., Dong, L., Luo, X., Lin, L. (2018a) Fine-grained representation learning and recognition by exploiting hierarchical semantic embedding. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 2023\u2013 2031","DOI":"10.1145\/3240508.3240523"},{"key":"2422_CR21","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhang, H., Xiao, J., Liu, W., & Chang, S.-F. (2018b). Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In Proceedings of the IEEE conference on computer vision and pattern recognition pp. 1043\u2013 1052.","DOI":"10.1109\/CVPR.2018.00115"},{"issue":"4","key":"2422_CR22","doi-asserted-by":"publisher","first-page":"2026","DOI":"10.1109\/JBHI.2023.3240136","volume":"27","author":"K Chen","year":"2023","unstructured":"Chen, K., Lei, W., Zhao, S., Zheng, W.-S., & Wang, R. (2023). Pcct: Progressive class-center triplet loss for imbalanced medical image classification. IEEE Journal of Biomedical and Health Informatics, 27(4), 2026\u20132036.","journal-title":"IEEE Journal of Biomedical and Health Informatics"},{"key":"2422_CR23","first-page":"16622","volume":"34","author":"S Chen","year":"2021","unstructured":"Chen, S., Xie, G., Liu, Y., Peng, Q., Sun, B., Li, H., You, X., & Shao, L. (2021). Hsva: Hierarchical semantic-visual adaptation for zero-shot learning. Advances in Neural Information Processing Systems, 34, 16622\u201316634.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2422_CR24","doi-asserted-by":"crossref","unstructured":"Christensen, A., Mancini, M., Koepke, A., Winther, O., & Akata, Z. (2023). Image-free classifier injection for zero-shot classification. In Proceedings of the IEEE\/CVF international conference on computer vision pp. 19072\u2013 19081.","DOI":"10.1109\/ICCV51070.2023.01748"},{"key":"2422_CR25","unstructured":"De\u00a0Silva, V., & Tenenbaum, J.B.(2004) Sparse multidimensional scaling using landmark points. Technical report, technical report, Stanford University."},{"key":"2422_CR26","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L. ( 2009) Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition pp. 248\u2013 255 . Ieee.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"2422_CR27","unstructured":"Desai, K., Nickel, M., Rajpurohit, T., Johnson, J., & Vedantam, S.R. ( 2023). Hyperbolic image-text representations. In International conference on machine learning pp. 7694\u2013 7731. PMLR."},{"key":"2422_CR28","doi-asserted-by":"publisher","first-page":"45","DOI":"10.3389\/fncom.2012.00045","volume":"6","author":"S Edelman","year":"2012","unstructured":"Edelman, S., & Shahbazi, R. (2012). Renewing the respect for similarity. Frontiers in Computational Neuroscience, 6, 45.","journal-title":"Frontiers in Computational Neuroscience"},{"key":"2422_CR29","unstructured":"Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T.(2013) Devise: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems Vol. 26."},{"key":"2422_CR30","first-page":"103","volume":"34","author":"M Ghadimi Atigh","year":"2021","unstructured":"Ghadimi Atigh, M., Keller-Ressel, M., & Mettes, P. (2021). Hyperbolic busemann learning with ideal prototypes. Advances in Neural Information Processing Systems, 34, 103\u2013115.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2422_CR31","unstructured":"Gui, Z., Sun, S., Li, R., & Yuan, J., An, Z., Roth, K., Prabhu, A., Torr, P. (2024). knn-clip: Retrieval enables training-free segmentation on continually expanding large vocabularies. arXiv preprint arXiv:2404.09447"},{"key":"2422_CR32","doi-asserted-by":"crossref","unstructured":"Han, Z., Fu, Z., Chen, S., & Yang, J.( 2021). Contrastive embedding for generalized zero-shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 2371\u2013 2381.","DOI":"10.1109\/CVPR46437.2021.00240"},{"key":"2422_CR33","doi-asserted-by":"crossref","unstructured":"Han, H., Miao, K., Zheng, Q., Luo, & M.( 2023). Noisy correspondence learning with meta similarity correction. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 7517\u2013 7526.","DOI":"10.1109\/CVPR52729.2023.00726"},{"issue":"1","key":"2422_CR34","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1002\/wrna.1143","volume":"4","author":"MC Hout","year":"2013","unstructured":"Hout, M. C., Papesh, M. H., & Goldinger, S. D. (2013). Multidimensional scaling. Wiley Interdisciplinary Reviews: Cognitive Science, 4(1), 93\u2013103.","journal-title":"Wiley Interdisciplinary Reviews: Cognitive Science"},{"key":"2422_CR35","doi-asserted-by":"crossref","unstructured":"Hsia, H.-A., Lin, C.-H., Kung, B.-H., Chen, J.-T., Tan, D.S., Chen, J.-C., & Hua, K.-L.( 2022). Clipcam: A simple baseline for zero-shot text-guided object and action localization. In ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) pp. 4453\u2013 4457. IEEE.","DOI":"10.1109\/ICASSP43922.2022.9747841"},{"key":"2422_CR36","doi-asserted-by":"crossref","unstructured":"Huynh, D., & Elhamifar, E. ( 2020). Fine-grained generalized zero-shot learning via dense attribute-based attention. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 4483\u2013 4493.","DOI":"10.1109\/CVPR42600.2020.00454"},{"key":"2422_CR37","doi-asserted-by":"crossref","unstructured":"Jain, P., Netrapalli, P., & Sanghavi, S. (2013). Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on theory of computing pp. 665\u2013 674.","DOI":"10.1145\/2488608.2488693"},{"issue":"1","key":"2422_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.20982\/tqmp.05.1.p001","volume":"5","author":"N Jaworska","year":"2009","unstructured":"Jaworska, N., & Chupetlovska-Anastasova, A. (2009). A review of multidimensional scaling (mds) and its utility in various psychological domains. Tutorials in Quantitative Methods for Psychology, 5(1), 1\u201310.","journal-title":"Tutorials in Quantitative Methods for Psychology"},{"key":"2422_CR39","unstructured":"Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., & Duerig, T. (2021). Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning pp. 4904\u2013 4916. PMLR."},{"key":"2422_CR40","doi-asserted-by":"crossref","unstructured":"Jiang, H., Wang, R., Shan, S., Chen, X.( 2019) Transferable contrastive network for generalized zero-shot learning. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 9765\u2013 9774","DOI":"10.1109\/ICCV.2019.00986"},{"issue":"5","key":"2422_CR41","doi-asserted-by":"publisher","first-page":"579","DOI":"10.1109\/LSP.2017.2685518","volume":"24","author":"X Jiang","year":"2017","unstructured":"Jiang, X., Zhong, Z., Liu, X., & So, H. C. (2017). Robust matrix completion via alternating projection. IEEE Signal Processing Letters, 24(5), 579\u2013583.","journal-title":"IEEE Signal Processing Letters"},{"key":"2422_CR42","first-page":"35631","volume":"36","author":"S Jiao","year":"2023","unstructured":"Jiao, S., Wei, Y., Wang, Y., Zhao, Y., & Shi, H. (2023). Learning mask-aware clip representations for zero-shot segmentation. Advances in Neural Information Processing Systems, 36, 35631\u201335653.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2422_CR43","unstructured":"Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., & Natsev, P., et al. (2017) The kinetics human action video dataset. arXiv preprint arXiv:1705.06950"},{"key":"2422_CR44","doi-asserted-by":"crossref","unstructured":"Keller-Ressel, M., Nargang, S.(2022) Strain-minimizing hyperbolic network embeddings with landmarks. arXiv preprint arXiv:2207.06775","DOI":"10.1093\/comnet\/cnad002"},{"issue":"1","key":"2422_CR45","doi-asserted-by":"publisher","first-page":"002","DOI":"10.1093\/comnet\/cnaa002","volume":"8","author":"M Keller-Ressel","year":"2020","unstructured":"Keller-Ressel, M., & Nargang, S. (2020). Hydra: a method for strain-minimizing hyperbolic embedding of network-and distance-based data. Journal of Complex Networks, 8(1), 002.","journal-title":"Journal of Complex Networks"},{"key":"2422_CR46","doi-asserted-by":"crossref","unstructured":"Khan, F.F., Li, X., Temple, A.J., & Elhoseiny, M. (2023). Fishnet: A large-scale dataset and benchmark for fish recognition, detection, and functional trait prediction. In Proceedings of the IEEE\/CVF international conference on computer vision pp. 20496\u2013 20506.","DOI":"10.1109\/ICCV51070.2023.01874"},{"key":"2422_CR47","unstructured":"Krizhevsky, A., & Hinton, G., et al. (2009). Learning multiple layers of features from tiny images."},{"key":"2422_CR48","doi-asserted-by":"crossref","unstructured":"Lampert, C.H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE conference on computer vision and pattern recognition pp. 951\u2013 958. IEEE.","DOI":"10.1109\/CVPR.2009.5206594"},{"issue":"3","key":"2422_CR49","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1109\/TPAMI.2013.140","volume":"36","author":"CH Lampert","year":"2013","unstructured":"Lampert, C. H., Nickisch, H., & Harmeling, S. (2013). Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 453\u2013465.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2422_CR50","unstructured":"Larochelle, H., Erhan, D., Bengio, Y.( 2008) Zero-data learning of new tasks. In: AAAI, vol. 1, p. 3"},{"key":"2422_CR51","unstructured":"Li, Y., Li, Z., Zeng, Q., Hou, Q., Cheng, M.-M.(2024) Cascade-clip: Cascaded vision-language embeddings alignment for zero-shot semantic segmentation. arXiv preprint arXiv:2406.00670"},{"key":"2422_CR52","doi-asserted-by":"crossref","unstructured":"Li, A., Luo, T., Lu, Z., Xiang, T., & Wang, L. (2019). Large-scale few-shot learning: Knowledge transfer with class hierarchy. In Proceedings of the Ieee\/cvf conference on computer vision and pattern recognition pp. 7212\u2013 7220.","DOI":"10.1109\/CVPR.2019.00738"},{"key":"2422_CR53","doi-asserted-by":"publisher","first-page":"2810","DOI":"10.1007\/s11263-020-01342-x","volume":"128","author":"A Li","year":"2020","unstructured":"Li, A., Lu, Z., Guan, J., Xiang, T., Wang, L., & Wen, J.-R. (2020). Transferrable feature and projection learning with class hierarchy for zero-shot learning. International Journal of Computer Vision, 128, 2810\u20132827.","journal-title":"International Journal of Computer Vision"},{"key":"2422_CR54","doi-asserted-by":"crossref","unstructured":"Liu, S., Chen, J., Pan, L., Ngo, C.-W., Chua, T.-S., & Jiang, Y.-G. (2020). Hyperbolic visual embedding learning for zero-shot recognition. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 9273\u2013 9281.","DOI":"10.1109\/CVPR42600.2020.00929"},{"key":"2422_CR55","unstructured":"Liu, S., Long, M., Wang, J., & Jordan, M.I. (2018). Generalized zero-shot learning with deep calibration network. Advances in Neural Information Processing Systems Vol. 31"},{"key":"2422_CR56","doi-asserted-by":"crossref","unstructured":"Long, T., Mettes, P., Shen, H.T., Snoek, C.G.( 2020) Searching for actions on the hyperbole. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 1141\u2013 1150","DOI":"10.1109\/CVPR42600.2020.00122"},{"key":"2422_CR57","doi-asserted-by":"crossref","unstructured":"Ma, Y., Xu, G., Sun, X., Yan, M., Zhang, J., & Ji, R (2022). X-clip: End-to-end multi-grained contrastive learning for video-text retrieval. In Proceedings of the 30th ACM international conference on multimedia pp. 638\u2013 647.","DOI":"10.1145\/3503161.3547910"},{"key":"2422_CR58","doi-asserted-by":"crossref","unstructured":"Mensink, T., Gavves, E., & Snoek, C.G. (2014). Costa: Co-occurrence statistics for zero-shot classification. In Proceedings of the IEEE conference on computer vision and pattern recognition pp. 2441\u2013 2448.","DOI":"10.1109\/CVPR.2014.313"},{"key":"2422_CR59","doi-asserted-by":"crossref","unstructured":"Moreira, G., Marques, M., Costeira, J.P., & Hauptmann, A. (2024) Hyperbolic vs euclidean embeddings in few-shot learning: Two sides of the same coin. In Proceedings of the IEEE\/CVF Winter conference on applications of computer vision pp. 2082\u2013 2090.","DOI":"10.1109\/WACV57701.2024.00208"},{"key":"2422_CR60","doi-asserted-by":"crossref","unstructured":"Narayan, S., Gupta, A., Khan, F.S., Snoek, C.G., & Shao, L. (2020). Latent embedding feedback and discriminative features for zero-shot classification. In: Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXII 16, pp. 479\u2013 495 . Springer","DOI":"10.1007\/978-3-030-58542-6_29"},{"key":"2422_CR61","doi-asserted-by":"publisher","first-page":"94215","DOI":"10.1109\/ACCESS.2019.2928130","volume":"7","author":"LT Nguyen","year":"2019","unstructured":"Nguyen, L. T., Kim, J., & Shim, B. (2019). Low-rank matrix completion: a contemporary survey. IEEE Access, 7, 94215\u201394237.","journal-title":"IEEE Access"},{"key":"2422_CR62","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1007\/s11263-013-0695-z","volume":"108","author":"G Patterson","year":"2014","unstructured":"Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108, 59\u201381.","journal-title":"International Journal of Computer Vision"},{"key":"2422_CR63","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J., et al. (2021).Learning transferable visual models from natural language supervision. In International conference on machine learning pp. 8748\u2013 8763. PmLR."},{"key":"2422_CR64","doi-asserted-by":"crossref","unstructured":"Ratcliffe, J.G., Axler, S., & Ribet, K.A. (1994). Foundations of Hyperbolic Manifolds vol. 149. Springer.","DOI":"10.1007\/978-1-4757-4013-4"},{"key":"2422_CR65","doi-asserted-by":"crossref","unstructured":"Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition pp. 49\u2013 58.","DOI":"10.1109\/CVPR.2016.13"},{"key":"2422_CR66","doi-asserted-by":"crossref","unstructured":"Rohrbach, M., Stark, M., Schiele, B.( 2011) Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR 2011, pp. 1641\u2013 1648 . IEEE","DOI":"10.1109\/CVPR.2011.5995627"},{"key":"2422_CR67","unstructured":"Romera-Paredes, B., & Torr, P. (2015) An embarrassingly simple approach to zero-shot learning. In International conference on machine learning pp. 2152\u2013 2161. PMLR."},{"key":"2422_CR68","doi-asserted-by":"crossref","unstructured":"Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., & Akata, Z. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 8247\u2013 8255.","DOI":"10.1109\/CVPR.2019.00844"},{"key":"2422_CR69","doi-asserted-by":"crossref","unstructured":"Sharma, P., Ding, N., Goodman, S., & Soricut, R. (2018). Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 2556\u2013 2565.","DOI":"10.18653\/v1\/P18-1238"},{"key":"2422_CR70","doi-asserted-by":"crossref","unstructured":"Shen, Y., Qin, J., Huang, L., Liu, L., Zhu, F., & Shao, L. (2020). Invertible zero-shot recognition flows. In: Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XVI 16, pp. 614\u2013 631. Springer","DOI":"10.1007\/978-3-030-58517-4_36"},{"issue":"2","key":"2422_CR71","doi-asserted-by":"publisher","first-page":"634","DOI":"10.1109\/TCSVT.2021.3067067","volume":"32","author":"J Shen","year":"2021","unstructured":"Shen, J., Xiao, Z., Zhen, X., & Zhang, L. (2021). Spherical zero-shot learning. IEEE Transactions on Circuits and Systems for Video Technology, 32(2), 634\u2013645.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"2422_CR72","unstructured":"Socher, R., Ganjoo, M., Manning, C.D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. Advances in Neural Information Processing Systems Vol. 26."},{"key":"2422_CR73","doi-asserted-by":"crossref","unstructured":"Subramanian, S., Merrill, W., Darrell, T., Gardner, M., Singh, S., & Rohrbach, A. (2022) Reclip: A strong zero-shot baseline for referring expression comprehension. arXiv preprint arXiv:2204.05991","DOI":"10.18653\/v1\/2022.acl-long.357"},{"key":"2422_CR74","doi-asserted-by":"crossref","unstructured":"Tabaghi, P., Dokmani\u0107, I. (2020). Hyperbolic distance matrices. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining pp. 1728\u2013 1738.","DOI":"10.1145\/3394486.3403224"},{"key":"2422_CR75","doi-asserted-by":"publisher","first-page":"5108","DOI":"10.1609\/aaai.v38i6.28316","volume":"38","author":"B Tang","year":"2024","unstructured":"Tang, B., Zhang, J., Yan, L., Yu, Q., Sheng, L., & Xu, D. (2024). Data-free generalized zero-shot learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 5108\u20135117.","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"issue":"1","key":"2422_CR76","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2018.161","volume":"5","author":"P Tschandl","year":"2018","unstructured":"Tschandl, P., Rosendahl, C., & Kittler, H. (2018). The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data, 5(1), 1\u20139.","journal-title":"Scientific Data"},{"key":"2422_CR77","doi-asserted-by":"crossref","unstructured":"Veiga, R.J., & Rodrigues, J.M. (2024). Fine-grained fish classification from small to large datasets with vision transformers. IEEE Access.","DOI":"10.1109\/ACCESS.2024.3443654"},{"key":"2422_CR78","doi-asserted-by":"crossref","unstructured":"Verma, V.K., Arora, G., Mishra, A., & Rai, P. (2018). Generalized zero-shot learning via synthesized examples. In Proceedings of the IEEE conference on computer vision and pattern recognition pp. 4281\u2013 4289.","DOI":"10.1109\/CVPR.2018.00450"},{"key":"2422_CR79","doi-asserted-by":"crossref","unstructured":"Vyas, M.R., Venkateswara, H., & Panchanathan, S. (2020). Leveraging seen and unseen semantic relationships for generative zero-shot learning. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXX 16, pp. 70\u2013 86 . Springer.","DOI":"10.1007\/978-3-030-58577-8_5"},{"key":"2422_CR80","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset."},{"key":"2422_CR81","doi-asserted-by":"crossref","unstructured":"Walker, J.L., & Orenstein, E.C. (2021). Improving rare-class recognition of marine plankton with hard negative mining. In Proceedings of the IEEE\/CVF international conference on computer vision pp. 3672\u2013 3682.","DOI":"10.1109\/ICCVW54120.2021.00410"},{"key":"2422_CR82","doi-asserted-by":"crossref","unstructured":"Wang, Y., Kwok, J.T., Yao, Q., & Ni, L.M. (2017). Zero-shot learning with a partial set of observed attributes. In 2017 international joint conference on neural networks (IJCNN), pp. 3777\u2013 3784. IEEE.","DOI":"10.1109\/IJCNN.2017.7966332"},{"key":"2422_CR83","doi-asserted-by":"crossref","unstructured":"Wang, H., Li, Y., Yao, H., Li, & X. (2023b). Clipn for zero-shot ood detection: Teaching clip to say no. In Proceedings of the IEEE\/CVF international conference on computer vision pp. 1802\u2013 1812.","DOI":"10.1109\/ICCV51070.2023.00173"},{"key":"2422_CR84","unstructured":"Wang, Z., Liang, J., He, R., Xu, N., Wang, Z., & Tan, T. (2023a). Improving zero-shot generalization for clip with synthesized prompts. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) pp. 3032\u2013 3042."},{"key":"2422_CR85","unstructured":"Wang, M., Xing, J., & Liu, Y. (2021). Actionclip: A new paradigm for video action recognition. arXiv preprint arXiv:2109.08472"},{"key":"2422_CR86","doi-asserted-by":"publisher","first-page":"356","DOI":"10.1007\/s11263-017-1027-5","volume":"124","author":"Q Wang","year":"2017","unstructured":"Wang, Q., & Chen, K. (2017). Zero-shot visual recognition via bidirectional latent embedding. International Journal of Computer Vision, 124, 356\u2013383.","journal-title":"International Journal of Computer Vision"},{"key":"2422_CR87","doi-asserted-by":"crossref","unstructured":"Wu, T.-Y., Morgado, P., Wang, P., Ho, C.-H., & Vasconcelos, N. (2020). Solving long-tailed recognition with deep realistic taxonomic classifier. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part VIII 16, pp. 171\u2013 189. Springer.","DOI":"10.1007\/978-3-030-58598-3_11"},{"key":"2422_CR88","doi-asserted-by":"crossref","unstructured":"Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018b). Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition pp. 5542\u2013 5551.","DOI":"10.1109\/CVPR.2018.00581"},{"issue":"9","key":"2422_CR89","doi-asserted-by":"publisher","first-page":"2251","DOI":"10.1109\/TPAMI.2018.2857768","volume":"41","author":"Y Xian","year":"2018","unstructured":"Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251\u20132265.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2422_CR90","doi-asserted-by":"crossref","unstructured":"Xie, G.-S., Liu, L., Jin, X., Zhu, F., Zhang, Z., Qin, J., Yao, Y., & Shao, L. (2019). Attentive region embedding network for zero-shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 9384\u2013 9393.","DOI":"10.1109\/CVPR.2019.00961"},{"key":"2422_CR91","doi-asserted-by":"crossref","unstructured":"Xie, G.-S., Liu, L., Zhu, F., Zhao, F., Zhang, Z., Yao, Y., Qin, J., & Shao, L. (2020). Region graph embedding network for zero-shot learning. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part IV 16, pp. 562\u2013 580 . Springer.","DOI":"10.1007\/978-3-030-58548-8_33"},{"key":"2422_CR92","doi-asserted-by":"crossref","unstructured":"Xu, W., Xian, Y., Wang, J., Schiele, B., & Akata, Z. (2022). Vgse: Visually-grounded semantic embeddings for zero-shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 9316\u2013 9325.","DOI":"10.1109\/CVPR52688.2022.00910"},{"key":"2422_CR93","unstructured":"Xu, W., Xian, Y., Wang, J., Schiele, B., & Akata, Z. (2020). Attribute prototype network for zero-shot learning. Advances in Neural Information Processing Systems, 33, 21969\u201321980."},{"key":"2422_CR94","doi-asserted-by":"publisher","first-page":"107852","DOI":"10.1016\/j.compag.2023.107852","volume":"209","author":"T Yang","year":"2023","unstructured":"Yang, T., Zhou, S., Huang, Z., Xu, A., Ye, J., & Yin, J. (2023). Urban street tree dataset for image classification and instance segmentation. Computers and Electronics in Agriculture, 209, 107852.","journal-title":"Computers and Electronics in Agriculture"},{"key":"2422_CR95","doi-asserted-by":"crossref","unstructured":"Yu, Y., Ji, Z., Han, J., & Zhang, Z.(2020). Episode-based prototype generating network for zero-shot learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 14035\u2013 14044.","DOI":"10.1109\/CVPR42600.2020.01405"},{"key":"2422_CR96","doi-asserted-by":"crossref","unstructured":"Zhou, Z., Lei, Y., Zhang, B., Liu, L., & Liu, Y. (2023). Zegclip: Towards adapting clip for zero-shot semantic segmentation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp. 11175\u2013 11185.","DOI":"10.1109\/CVPR52729.2023.01075"},{"key":"2422_CR97","doi-asserted-by":"crossref","unstructured":"Zhou, C., Loy, C.C., & Dai, B. (2022). Extract free dense labels from clip. In European conference on computer vision, pp. 696\u2013 712. Springer.","DOI":"10.1007\/978-3-031-19815-1_40"},{"key":"2422_CR98","doi-asserted-by":"crossref","unstructured":"Zhu, P., Wang, H., & Saligrama, V. (2019). Generalized zero-shot recognition based on visually semantic embedding. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 2995\u2013 3003.","DOI":"10.1109\/CVPR.2019.00311"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02422-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-025-02422-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02422-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T14:46:25Z","timestamp":1757169985000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-025-02422-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,17]]},"references-count":98,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["2422"],"URL":"https:\/\/doi.org\/10.1007\/s11263-025-02422-6","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,17]]},"assertion":[{"value":"22 July 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 March 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}