{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T05:20:14Z","timestamp":1763011214217,"version":"3.45.0"},"reference-count":81,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T00:00:00Z","timestamp":1762819200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004329","name":"The Slovenian Research and Innovation Agency","doi-asserted-by":"publisher","award":["GC-0002","P2-0103","PR-12394","L2\u201150070"],"award-info":[{"award-number":["GC-0002","P2-0103","PR-12394","L2\u201150070"]}],"id":[{"id":"10.13039\/501100004329","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Amid ongoing efforts to develop extremely large, multimodal models, there is increasing interest in efficient Small Language Models (SLMs) that can operate without reliance on large data-centre infrastructure. However, recent SLMs (e.g., LLaMA or Phi) with up to three billion parameters are predominantly trained in high-resource languages, such as English, which limits their applicability to industries that require robust NLP solutions for less-represented languages and low-resource settings, particularly those requiring low latency and adaptability to evolving label spaces. This paper examines a retrieval-based approach to multi-label text classification (MLC) for a media monitoring dataset, with a particular focus on less-represented languages, such as Slovene. This dataset presents an extreme MLC challenge, with instances labelled using up to twelve thousand categories. The proposed method, which combines retrieval with computationally efficient prediction, effectively addresses challenges related to multilinguality, resource constraints, and frequent label changes. We adopt a model-agnostic approach that does not rely on a specific model architecture or language selection. Our results demonstrate that techniques from the extreme multi-label text classification (XMC) domain outperform traditional Transformer-based encoder models, particularly in handling dynamic label spaces without requiring continuous fine-tuning. Additionally, we highlight the effectiveness of this approach in scenarios involving rare labels, where baseline models struggle with generalisation.<\/jats:p>","DOI":"10.3390\/make7040142","type":"journal-article","created":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T12:44:55Z","timestamp":1762865095000},"page":"142","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Extreme Multi-Label Text Classification for Less-Represented Languages and Low-Resource Environments: Advances and Lessons Learned"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-8016-9530","authenticated-orcid":false,"given":"Nikola","family":"Iva\u010di\u010d","sequence":"first","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia"},{"name":"Jo\u017eef Stefan International Postgraduate School, Jamova Cesta 39, 1000 Ljubljana, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9916-8756","authenticated-orcid":false,"given":"Bla\u017e","family":"\u0160krlj","sequence":"additional","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7330-0579","authenticated-orcid":false,"given":"Boshko","family":"Koloski","sequence":"additional","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia"},{"name":"Jo\u017eef Stefan International Postgraduate School, Jamova Cesta 39, 1000 Ljubljana, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4380-0863","authenticated-orcid":false,"given":"Senja","family":"Pollak","sequence":"additional","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia"},{"name":"Jo\u017eef Stefan International Postgraduate School, Jamova Cesta 39, 1000 Ljubljana, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9995-7093","authenticated-orcid":false,"given":"Nada","family":"Lavra\u010d","sequence":"additional","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia"},{"name":"Faculty of Computer and Information Science, University of Ljubljana, Ve\u010dna pot 113, 1000 Ljubljana, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2297-1273","authenticated-orcid":false,"given":"Matthew","family":"Purver","sequence":"additional","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia"},{"name":"School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1613\/jair.4780","article-title":"News across languages-cross-lingual document similarity and event tracking","volume":"55","author":"Rupnik","year":"2016","journal-title":"J. Artif. Intell. Res."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Guo, F., Shen, J., and Han, J. (2022, January 14\u201318). Unsupervised Key Event Detection from Massive Text Corpora. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD \u201922, Washington, DC, USA.","DOI":"10.1145\/3534678.3539395"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Yoon, S., Lee, D., Zhang, Y., and Han, J. (2023, January 23\u201327). Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR \u201923, Taipei, Taiwan.","DOI":"10.1145\/3539618.3591782"},{"key":"ref_4","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_5","unstructured":"Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P.J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv."},{"key":"ref_6","unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020, January 6\u201312). Language Models are Few-Shot Learners. Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual."},{"key":"ref_7","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., and Bhosale, S. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv."},{"key":"ref_8","unstructured":"Tarekegn, A.N., Ullah, M., and Cheikh, F.A. (2024). Deep Learning for Multi-Label Learning: A Comprehensive Survey. arXiv."},{"key":"ref_9","first-page":"1","article-title":"A Survey on Text Classification: From Traditional to Deep Learning","volume":"13","author":"Li","year":"2022","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_10","unstructured":"Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., and Fan, A. (2024). The Llama 3 Herd of Models. arXiv."},{"key":"ref_11","unstructured":"OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., and Altman, S. (2023). GPT-4 Technical Report. arXiv."},{"key":"ref_12","unstructured":"Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., and Millican, K. (2023). Gemini: A Family of Highly Capable Multimodal Models. arXiv."},{"key":"ref_13","unstructured":"Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., Harrison, M., Hewett, R.J., Javaheripi, M., and Kauffmann, P. (2024). Phi-4 Technical Report. arXiv."},{"key":"ref_14","unstructured":"Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ram\u00e9, A., and Rivi\u00e8re, M. (2025). Gemma 3 technical report. arXiv."},{"key":"ref_15","unstructured":"Subramanian, S., Elango, V., and Gungor, M. (2025). Small Language Models (SLMs) Can Still Pack a Punch: A survey. arXiv."},{"key":"ref_16","unstructured":"Vajjala, S., and Shimangaud, S. (2025). Text Classification in the LLM Era\u2014Where do we stand?. arXiv."},{"key":"ref_17","unstructured":"Muralidharan, S., Sreenivas, S.T., Joshi, R., Chochowski, M., Patwary, M., Shoeybi, M., Catanzaro, B., Kautz, J., and Molchanov, P. (2024). Compact Language Models via Pruning and Knowledge Distillation. arXiv."},{"key":"ref_18","unstructured":"Gu, Y., Dong, L., Wei, F., and Huang, M. (2023). MiniLLM: Knowledge Distillation of Large Language Models. arXiv."},{"key":"ref_19","unstructured":"Malinovskii, V., Mazur, D., Ilin, I., Kuznedelev, D., Burlachenko, K., Yi, K., Alistarh, D., and Richtarik, P. (2024). PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"35621","DOI":"10.1109\/ACCESS.2025.3544814","article-title":"LLM Teacher-Student Framework for Text Classification with No Manually Annotated Data: A Case Study in IPTC News Topic Classification","volume":"13","author":"Kuzman","year":"2025","journal-title":"IEEE Access"},{"key":"ref_21","unstructured":"Dasgupta, A., Lamba, P., Kushwaha, A., Ravish, K., Katyan, S., Das, S., and Kumar, P. (2023). Review of Extreme Multilabel Classification. arXiv."},{"key":"ref_22","unstructured":"Wang, Y.S., Chang, W.C., Jiang, J.Y., Zhang, J., Yu, H.F., and Vishwanathan, S.V.N. (2025). Retrieval-augmented Encoders for Extreme Multi-label Text Classification. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Dai, X., Chalkidis, I., Darkner, S., and Elliott, D. (2022, January 7\u201311). Revisiting Transformer-based Models for Long Document Classification. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.findings-emnlp.534"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Duan, L., You, Q., Wu, X., and Sun, J. (2022). Multilabel Text Classification Algorithm Based on Fusion of Two-Stream Transformer. Electronics, 11.","DOI":"10.3390\/electronics11142138"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.neucom.2021.10.099","article-title":"Co-attention network with label embedding for text classification","volume":"471","author":"Liu","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"van der Aalst, W.M.P., Batagelj, V., Ignatov, D.I., Khachay, M., Koltsova, O., Kutuzov, A., Kuznetsov, S.O., Lomazova, I.A., Loukachevitch, N., and Napoli, A. (2021). BERT for Sequence-to-Sequence Multi-label Text Classification. Analysis of Images, Social Networks and Texts, Springer.","DOI":"10.1007\/978-3-030-72610-2"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Fallah, H., Bruno, E., Bellot, P., and Murisasco, E. (2023, January 22\u201325). Exploiting Label Dependencies for Multi-Label Document Classification Using Transformers. Proceedings of the ACM Symposium on Document Engineering 2023, DocEng \u201923, Limerick, Ireland.","DOI":"10.1145\/3573128.3609356"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, B., Chen, Y., and Zeng, L. (2024, January 14\u201319). Kenet:Knowledge-Enhanced DOC-Label Attention Network for Multi-Label Text Classification. Proceedings of the ICASSP 2024\u20142024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea.","DOI":"10.1109\/ICASSP48485.2024.10447643"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., and Birch, A. (2016, January 7\u201312). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany.","DOI":"10.18653\/v1\/P16-1162"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kudo, T., and Richardson, J. (November, January 31). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-2012"},{"key":"ref_31","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Park, H., Vyas, Y., and Shah, K. (2022, January 22\u201327). Efficient Classification of Long Documents Using Transformers. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Dublin, Ireland.","DOI":"10.18653\/v1\/2022.acl-short.79"},{"key":"ref_33","unstructured":"Beltagy, I., Peters, M.E., and Cohan, A. (2020). Longformer: The Long-Document Transformer. arXiv."},{"key":"ref_34","unstructured":"Zaheer, M., Guruganesh, G., Dubey, A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., and Yang, L. (2020). Big Bird: Transformers for Longer Sequences. arXiv."},{"key":"ref_35","unstructured":"Ding, M., Zhou, C., Yang, H., and Tang, J. (2020, January 6\u201312). CogLTX: Applying BERT to Long Texts. Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., and Dehak, N. (2019, January 14\u201318). Hierarchical Transformers for Long Document Classification. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore.","DOI":"10.1109\/ASRU46091.2019.9003958"},{"key":"ref_37","unstructured":"Jaiswal, A., and Milios, E. (2023). Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT. arXiv."},{"key":"ref_38","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R.B., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Cui, Y., Jia, M., Lin, T., Song, Y., and Belongie, S.J. (2019, January 16\u201320). Class-Balanced Loss Based on Effective Number of Samples. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00949"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.M. (2020, January 23\u201328). Distribution-Balanced Loss for Multi-label Classification in Long-Tailed Datasets. Proceedings of the Computer Vision\u2014ECCV 2020, Glasgow, UK.","DOI":"10.1007\/978-3-030-58542-6"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Huang, Y., Giledereli, B., K\u00f6ksal, A., \u00d6zg\u00fcr, A., and Ozkirimli, E. (2021, January 7\u201311). Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.643"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Piskorski, J., Stefanovitch, N., Da San Martino, G., and Nakov, P. (2023, January 13\u201314). SemEval-2023 task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup. Proceedings of the 17th International Workshop on Semantic Evaluation, SemEval\u201923, Toronto, Canada.","DOI":"10.18653\/v1\/2023.semeval-1.317"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Liao, Q., Lai, M., and Nakov, P. (2023). MarsEclipse at SemEval-2023 Task 3: Multi-Lingual and Multi-Label Framing Detection with Contrastive Learning. arXiv.","DOI":"10.18653\/v1\/2023.semeval-1.10"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Reiter-Haas, M., Ertl, A., Innerhofer, K., and Lex, E. (2023). mCPT at SemEval-2023 Task 3: Multilingual Label-Aware Contrastive Pre-Training of Transformers for Few- and Zero-shot Framing Detection. arXiv.","DOI":"10.18653\/v1\/2023.semeval-1.130"},{"key":"ref_46","unstructured":"Tunstall, L., Reimers, N., Jo, U.E.S., Bates, L., Korat, D., Wasserblat, M., and Pereg, O. (2022). Efficient Few-Shot Learning Without Prompts. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Zheng, L., Xiong, J., Zhu, Y., and He, J. (2022, January 14\u201318). Contrastive Learning with Complex Heterogeneity. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD \u201922, Washington, DC, USA.","DOI":"10.1145\/3534678.3539311"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019, January 3\u20137). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhang, Y., Long, D., Xie, W., Dai, Z., Tang, J., Lin, H., Yang, B., Xie, P., and Huang, F. (2024, January 12\u201316). mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, Miami, FlL, USA.","DOI":"10.18653\/v1\/2024.emnlp-industry.103"},{"key":"ref_50","unstructured":"Sturua, S., Mohr, I., Akram, M.K., G\u00fcnther, M., Wang, B., Krimmel, M., Wang, F., Mastrapas, G., Koukounas, A., and Koukounas, A. (2024). jina-embeddings-v3: Multilingual Embeddings with Task LoRA. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2024, January 14\u201316). M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.findings-acl.137"},{"key":"ref_52","unstructured":"Dahiya, K., Gupta, N., Saini, D., Soni, A., Wang, Y., Dave, K., Jiao, J., K, G., Dey, P., and Singh, A. (March, January 27). NGAME: Negative Mining-aware Mini-batching for Extreme Classification. Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM \u201923, Singapore."},{"key":"ref_53","unstructured":"You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., and Zhu, S. (2019, January 8\u201314). AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Chang, W., Yu, H., Zhong, K., Yang, Y., and Dhillon, I.S. (2020, January 23\u201327). Taming Pretrained Transformers for Extreme Multi-label Text Classification. Proceedings of the KDD \u201920: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual.","DOI":"10.1145\/3394486.3403368"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Jiang, T., Wang, D., Sun, L., Yang, H., Zhao, Z., and Zhuang, F. (2021). LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification. arXiv.","DOI":"10.1609\/aaai.v35i9.16974"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Zhang, R., Wang, Y.S., Yang, Y., Yu, D., Vu, T., and Lei, L. (2023, January 2\u20136). Long-tailed Extreme Multi-label Text Classification by the Retrieval of Generated Pseudo Label Descriptions. Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, Dubrovnik, Croatia.","DOI":"10.18653\/v1\/2023.findings-eacl.81"},{"key":"ref_57","unstructured":"van den Oord, A., Li, Y., and Vinyals, O. (2018). Representation Learning with Contrastive Predictive Coding. arXiv."},{"key":"ref_58","unstructured":"Gupta, N., Khatri, D., Rawat, A.S., Bhojanapalli, S., Jain, P., and Dhillon, I. (2024, January 7\u201311). Dual-encoders for Extreme Multi-label Classification. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_59","unstructured":"Magueresse, A., Carles, V., and Heetderks, E. (2020). Low-resource Languages: A Review of Past Work and Future Challenges. arXiv."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1017\/nlp.2024.33","article-title":"Natural language processing applications for low-resource languages","volume":"31","author":"Pakray","year":"2025","journal-title":"Nat. Lang. Process."},{"key":"ref_61","unstructured":"Chalkidis, I., Fergadiotis, E., Malakasiotis, P., and Androutsopoulos, I. (August, January 28). Large-Scale Multi-Label Text Classification on EU Legislation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_62","unstructured":"Xiong, L., Xiong, C., Li, Y., Tang, K.F., Liu, J., Bennett, P.N., Ahmed, J., and Overwijk, A. (2021, January 3\u20137). Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. Proceedings of the International Conference on Learning Representations, Virtual."},{"key":"ref_63","unstructured":"Teredesai, A., Kumar, V., Li, Y., Rosales, R., Terzi, E., and Karypis, G. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, 4\u20138 August 2019, ACM."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","article-title":"The Regression Analysis of Binary Sequences (with Discussion)","volume":"20","author":"Cox","year":"1958","journal-title":"J. R. Stat. Soc. B"},{"key":"ref_66","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020, January 5\u201310). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"2038","DOI":"10.1016\/j.patcog.2006.12.019","article-title":"ML-KNN: A lazy learning approach to multi-label learning","volume":"40","author":"Zhang","year":"2007","journal-title":"Pattern Recognit."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1109\/TBDATA.2019.2921572","article-title":"Billion-scale similarity search with GPUs","volume":"7","author":"Johnson","year":"2019","journal-title":"IEEE Trans. Big Data"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazar\u00e9, P.E., Lomeli, M., Hosseini, L., and J\u00e9gou, H. (2024). The Faiss library. arXiv.","DOI":"10.1109\/TBDATA.2025.3618474"},{"key":"ref_71","unstructured":"Lu, Z., Li, X., Cai, D., Yi, R., Liu, F., Zhang, X., Lane, N.D., and Xu, M. (2024). Small Language Models: Survey, Measurements, and Insights. arXiv."},{"key":"ref_72","unstructured":"Enevoldsen, K., Chung, I., Kerboua, I., Kardos, M., Mathur, A., Stap, D., Gala, J., Siblini, W., Krzemi\u0144ski, D., and Winata, G.I. (2025). MMTEB: Massive Multilingual Text Embedding Benchmark. arXiv."},{"key":"ref_73","unstructured":"Jang, S., and Morabito, R. (2025). Edge-First Language Model Inference: Models, Metrics, and Tradeoffs. arXiv."},{"key":"ref_74","unstructured":"Bucher, M.J.J., and Martini, M. (2024). Fine-Tuned \u2019Small\u2019 LLMs (Still) Significantly Outperform Zero-Shot Generative AI Models in Text Classification. arXiv."},{"key":"ref_75","unstructured":"Galke, L., Scherp, A., Diera, A., Karl, F., Lin, B.X., Khera, B., Meuser, T., and Singhal, T. (2022). Are We Really Making Much Progress in Text Classification? A Comparative Review. arXiv."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Chang, T.A., Arnett, C., Tu, Z., and Bergen, B.K. (2023). When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages. arXiv.","DOI":"10.18653\/v1\/2024.emnlp-main.236"},{"key":"ref_77","unstructured":"Gupta, V., Chowdhury, S.P., Zouhar, V., Rooein, D., and Sachan, M. (2025). Multilingual Performance Biases of Large Language Models in Education. arXiv."},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"107965","DOI":"10.1016\/j.patcog.2021.107965","article-title":"A review of methods for imbalanced multi-label classification","volume":"118","author":"Tarekegn","year":"2021","journal-title":"Pattern Recognit."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"53","DOI":"10.21528\/LNLM-vol12-no1-art4","article-title":"Cardinality and Density Measures and Their Influence to Multi-Label Learning Methods","volume":"12","author":"Bernardini","year":"2014","journal-title":"Learn. Nonlinear Model."},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Raschka, S., Patterson, J., and Nolet, C. (2020). Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. arXiv.","DOI":"10.3390\/info11040193"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020, January 16\u201320). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/142\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T05:17:01Z","timestamp":1763011021000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/142"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,11]]},"references-count":81,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["make7040142"],"URL":"https:\/\/doi.org\/10.3390\/make7040142","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2025,11,11]]}}}