{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:53:13Z","timestamp":1778082793942,"version":"3.51.4"},"reference-count":53,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T00:00:00Z","timestamp":1745366400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T00:00:00Z","timestamp":1745366400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Data Sci Anal"],"published-print":{"date-parts":[[2025,11]]},"DOI":"10.1007\/s41060-025-00774-3","type":"journal-article","created":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T16:11:38Z","timestamp":1745424698000},"page":"5377-5398","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs"],"prefix":"10.1007","volume":"20","author":[{"given":"Shan","family":"Zhong","sequence":"first","affiliation":[]},{"given":"Jiahao","family":"Zeng","sequence":"additional","affiliation":[]},{"given":"Yongxin","family":"Yu","sequence":"additional","affiliation":[]},{"given":"Bohong","family":"Lin","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,23]]},"reference":[{"key":"774_CR1","doi-asserted-by":"crossref","unstructured":"Yang, Weiyi, Zhang, Richong, Chen, Junfan, Wang, Lihong, Kim, Jaein: Prototype-guided pseudo labeling for semi-supervised text classification. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16369\u201316382, Toronto, Canada, (July 2023) Association for Computational Linguistics","DOI":"10.18653\/v1\/2023.acl-long.904"},{"issue":"9","key":"774_CR2","doi-asserted-by":"publisher","first-page":"9401","DOI":"10.1007\/s10462-023-10393-8","volume":"56","author":"JM Duarte","year":"2023","unstructured":"Duarte, J.M., Berton, L.: A review of semi-supervised learning for text classification. Artif. Intell. Rev. 56(9), 9401\u20139469 (2023)","journal-title":"Artif. Intell. Rev."},{"key":"774_CR3","unstructured":"Brown, Tom, Mann, Benjamin, Ryder, Nick, Subbiah, Melanie, Kaplan, Jared\u00a0D., Dhariwal, Prafulla, Neelakantan, Arvind, Shyam, Pranav, Sastry, Girish, Askell, Amanda, Agarwal, Sandhini, Herbert-Voss, Ariel, Krueger, Gretchen, Henighan, Tom, Child, Rewon, Ramesh, Aditya, Ziegler, Daniel, Wu, Jeffrey, Winter, Clemens, Hesse, Chris, Chen, Mark, Sigler, Eric, Litwin, Mateusz, Gray, Scott, Chess, Benjamin, Clark, Jack, Berner, Christopher, McCandlish, Sam, Radford, Alec, Sutskever, Ilya, Amodei, Dario: Language models are few-shot learners. In H.\u00a0Larochelle, M.\u00a0Ranzato, R.\u00a0Hadsell, M.F. Balcan, and H.\u00a0Lin, editors, Advances in Neural Information Processing Systems, volume\u00a033, pages 1877\u20131901. Curran Associates, Inc., (2020)"},{"key":"774_CR4","unstructured":"Wei, Jason, Wang, Xuezhi, Schuurmans, Dale, Bosma, Maarten, ichter, brian, Xia, Fei, Chi, Ed, Le, Quoc\u00a0V, Zhou, Denny: Chain-of-thought prompting elicits reasoning in large language models. In S.\u00a0Koyejo, S.\u00a0Mohamed, A.\u00a0Agarwal, D.\u00a0Belgrave, K.\u00a0Cho, and A.\u00a0Oh, editors, Advances in Neural Information Processing Systems, volume\u00a035, pages 24824\u201324837. Curran Associates, Inc., (2022)"},{"key":"774_CR5","unstructured":"Yao, Shunyu, Yu, Dian, Zhao, Jeffrey, Shafran, Izhak, Griffiths, Tom, Cao, Yuan, Narasimhan, Karthik: Tree of thoughts: Deliberate problem solving with large language models. In A.\u00a0Oh, T.\u00a0Naumann, A.\u00a0Globerson, K.\u00a0Saenko, M.\u00a0Hardt, and S.\u00a0Levine, editors, Advances in Neural Information Processing Systems, volume\u00a036, pages 11809\u201311822. Curran Associates, Inc., (2023)"},{"key":"774_CR6","unstructured":"Ahn, Michael, Brohan, Anthony, Brown, Noah, Chebotar, Yevgen, Cortes, Omar, David, Byron, Finn, Chelsea, Fu, Chuyuan, Gopalakrishnan, Keerthana, Hausman, Karol, Herzog, Alex, Ho, Daniel, Hsu, Jasmine, Ibarz, Julian, Ichter, Brian, Irpan, Alex, Jang, Eric, Ruano, Rosario\u00a0Jauregui, Jeffrey, Kyle, Jesmonth, Sally, Joshi, Nikhil\u00a0J, Julian, Ryan, Kalashnikov, Dmitry, Kuang, Yuheng, Lee, Kuang-Huei, Levine, Sergey, Lu, Yao, Luu, Linda, Parada, Carolina, Pastor, Peter, Quiambao, Jornell, Rao, Kanishka, Rettinghouse, Jarek, Reyes, Diego, Sermanet, Pierre, Sievers, Nicolas, Tan, Clayton, Toshev, Alexander, Vanhoucke, Vincent, Xia, Fei, Xiao, Ted, Xu, Peng, Xu, Sichun, Yan, Mengyuan, Zeng, Andy: Do as i can, not as i say: Grounding language in robotic affordances, (2022)"},{"key":"774_CR7","unstructured":"Talmor, Alon, Yoran, Ori, Bras, Ronan Le, Bhagavatula, Chandra, Goldberg, Yoav, Choi, Yejin, Berant, Jonathan: CommonsenseQA 2.0: Exposing the Limits of AI through Gamification. arXiv e-prints, page arXiv:2201.05320, (January 2022)"},{"key":"774_CR8","unstructured":"Wang, Xuezhi, Wei, Jason, Schuurmans, Dale, Le, Quoc, Chi, Ed, Narang, Sharan, Chowdhery, Aakanksha, Zhou, Denny: Self-consistency improves chain of thought reasoning in language models, (2023)"},{"key":"774_CR9","unstructured":"Meng, Yu, Huang, Jiaxin, Zhang, Yu, Han, Jiawei: Generating training data with language models: Towards zero-shot language understanding. In S.\u00a0Koyejo, S.\u00a0Mohamed, A.\u00a0Agarwal, D.\u00a0Belgrave, K.\u00a0Cho, and A.\u00a0Oh, editors, Advances in Neural Information Processing Systems, volume\u00a035, pages 462\u2013477. Curran Associates, Inc., (2022)"},{"key":"774_CR10","doi-asserted-by":"crossref","unstructured":"Dai, Haixing, Liu, Zhengliang, Liao, Wenxiong, Huang, Xiaoke, Cao, Yihan, Wu, Zihao, Zhao, Lin, Xu, Shaochen, Zeng, Fang, Liu, Wei, Liu, Ninghao, Li, Sheng, Zhu, Dajiang, Cai, Hongmin, Sun, Lichao, Li, Quanzheng, Shen, Dinggang, Liu, Tianming, Li, Xiang: Auggpt: Leveraging chatgpt for text data augmentation. IEEE Transactions on Big Data, pages 1\u201312, (2025)","DOI":"10.1109\/TBDATA.2025.3536934"},{"key":"774_CR11","unstructured":"Lewis, Patrick, Perez, Ethan, Piktus, Aleksandra, Petroni, Fabio, Karpukhin, Vladimir, Goyal, Naman, K\u00fcttler, Heinrich, Lewis, Mike, Yih, Wen-tau, Rockt\u00e4schel, Tim, Riedel, Sebastian, Kiela, Douwe: Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS \u201920, Red Hook, NY, USA, (2020) Curran Associates Inc"},{"key":"774_CR12","doi-asserted-by":"crossref","unstructured":"Fan, Wenqi, Ding, Yujuan, Ning, Liangbo, Wang, Shijie, Li, Hengyun, Yin, Dawei, Chua, Tat-Seng, Li, Qing: A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD \u201924, page 6491-6501, New York, NY, USA, (2024) Association for Computing Machinery","DOI":"10.1145\/3637528.3671470"},{"key":"774_CR13","unstructured":"Balaguer, Angels, Benara, Vinamra, de\u00a0Freitas\u00a0Cunha, Renato\u00a0Luiz, Filho, Roberto de\u00a0M.\u00a0Estev\u00e3o, Hendry, Todd, Holstein, Daniel, Marsman, Jennifer, Mecklenburg, Nick, Malvar, Sara, Nunes, Leonardo\u00a0O., Padilha, Rafael, Sharp, Morris, Silva, Bruno, Sharma, Swati, Aski, Vijay, Chandra, Ranveer: Rag vs fine-tuning: Pipelines, tradeoffs, and a case study on agriculture, (2024)"},{"key":"774_CR14","volume-title":"and Haofen Wang","author":"Y Gao","year":"2024","unstructured":"Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M.: and Haofen Wang. A survey, Retrieval-augmented generation for large language models (2024)"},{"key":"774_CR15","doi-asserted-by":"crossref","unstructured":"Huang, Jiaxin, Gu, Shixiang, Hou, Le, Wu, Yuexin, Wang, Xuezhi, Yu, Hongkun, Han, Jiawei: Large language models can self-improve. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1051\u20131068, Singapore, (December 2023) Association for Computational Linguistics","DOI":"10.18653\/v1\/2023.emnlp-main.67"},{"key":"774_CR16","doi-asserted-by":"crossref","unstructured":"Magister, Lucie\u00a0Charlotte, Mallinson, Jonathan, Adamek, Jakub, Malmi, Eric, Severyn, Aliaksei: Teaching small language models to reason. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1773\u20131781, Toronto, Canada, (July 2023). Association for Computational Linguistics","DOI":"10.18653\/v1\/2023.acl-short.151"},{"issue":"4","key":"774_CR17","doi-asserted-by":"publisher","first-page":"2925","DOI":"10.1109\/TPWRS.2019.2892619","volume":"34","author":"M Sun","year":"2019","unstructured":"Sun, M., Teng, F., Zhang, X., Strbac, G., Pudjianto, D.: Data-driven representative day selection for investment decisions: A cost-oriented approach. IEEE Trans. Power Syst. 34(4), 2925\u20132936 (2019)","journal-title":"IEEE Trans. Power Syst."},{"key":"774_CR18","doi-asserted-by":"crossref","unstructured":"Wang, Yanhao, Fabbri, Francesco, Mathioudakis, Michael: Fair and representative subset selection from data streams. In Proceedings of the Web Conference 2021, WWW \u201921, page 1340-1350, New York, NY, USA, (2021) Association for Computing Machinery","DOI":"10.1145\/3442381.3449799"},{"key":"774_CR19","unstructured":"Coleman, C., Yeh, C., Mussmann, S., Mirzasoleiman, B., Bailis, P., Liang, P., Leskovec, J., Zaharia, M.: Selection via proxy: Efficient data selection for deep learning. International Conference on Learning Representations (ICLR)"},{"key":"774_CR20","doi-asserted-by":"crossref","unstructured":"Prabhu, Ameya, Dognin, Charles, Singh, Maneesh: Sampling bias in deep active classification: An empirical study. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4058\u20134068, Hong Kong, China, (November 2019) Association for Computational Linguistics","DOI":"10.18653\/v1\/D19-1417"},{"key":"774_CR21","doi-asserted-by":"crossref","unstructured":"Kobayashi, Sosuke: Contextual augmentation: Data augmentation by words with paradigmatic relations, (2018)","DOI":"10.18653\/v1\/N18-2072"},{"issue":"1","key":"774_CR22","doi-asserted-by":"publisher","first-page":"15","DOI":"10.14513\/actatechjaur.00628","volume":"15","author":"G Cs\u00e1nyi","year":"2021","unstructured":"Cs\u00e1nyi, G., Orosz, T.: Comparison of data augmentation methods for legal document classification. Acta Technica Jaurinensis 15(1), 15\u201321 (2021)","journal-title":"Acta Technica Jaurinensis"},{"key":"774_CR23","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.105918","volume":"197","author":"S Liu","year":"2020","unstructured":"Liu, S., Lee, K., Lee, I.: Document-level multi-topic sentiment classification of email data with bilstm and data augmentation. Knowl.-Based Syst. 197, 105918 (2020)","journal-title":"Knowl.-Based Syst."},{"issue":"1","key":"774_CR24","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1186\/s40537-021-00492-0","volume":"8","author":"C Shorten","year":"2021","unstructured":"Shorten, C., Khoshgoftaar, T.M., Furht, B.: Text data augmentation for deep learning. Journal of Big Data 8(1), 101 (2021)","journal-title":"Journal of Big Data"},{"key":"774_CR25","unstructured":"Nvidia, Bo\u00a0Adler, Agarwal, Niket, Aithal, Ashwath, Anh, Dong\u00a0H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya\u00a0Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de\u00a0Melo, Maer\u00a0Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Rao\u00a0Naik, Vasanth, Sabavat, Satheesh, Sanjeev, Scowcroft, Jane\u00a0Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh\u00a0Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen: Nemotron-4 340b technical report, (2024)"},{"key":"774_CR26","volume-title":"Robert Osazuwa Ness, and Jonathan Larson","author":"D Edge","year":"2025","unstructured":"Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D.: Robert Osazuwa Ness, and Jonathan Larson. A graph rag approach to query-focused summarization, From local to global (2025)"},{"key":"774_CR27","doi-asserted-by":"crossref","unstructured":"Yu, Yue, Zhuang, Yuchen, Zhang, Rongzhi, Meng, Yu, Shen, Jiaming, Zhang, Chao: ReGen: Zero-shot text classification via training data generation with progressive dense retrieval. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 11782\u201311805, Toronto, Canada, (July 2023). Association for Computational Linguistics","DOI":"10.18653\/v1\/2023.findings-acl.748"},{"key":"774_CR28","doi-asserted-by":"crossref","unstructured":"Li, Rongsheng, Li, Yangning, Li, Yinghui, Luoyiching, Chaiyut, Zhou, Nannan, Su, Hanjing, Zheng, Hai-Tao: Retrieval-augmented meta learning for low-resource text classification. In 2024 International Joint Conference on Neural Networks (IJCNN), pages 1\u20138, (2024)","DOI":"10.1109\/IJCNN60899.2024.10651119"},{"key":"774_CR29","doi-asserted-by":"crossref","unstructured":"Min, Sewon, Lyu, Xinxi, Holtzman, Ari, Artetxe, Mikel, Lewis, Mike, Hajishirzi, Hannaneh, Zettlemoyer, Luke: Rethinking the role of demonstrations: What makes in-context learning work? In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048\u201311064, Abu Dhabi, United Arab Emirates, (December 2022). Association for Computational Linguistics","DOI":"10.18653\/v1\/2022.emnlp-main.759"},{"key":"774_CR30","unstructured":"Yu, Qingchen, Zheng, Zifan, Song, Shichao, Li, Zhiyu, Xiong, Feiyu, Tang, Bo, Chen, Ding: xfinder: Large language models as automated evaluators for reliable evaluation, (2025)"},{"key":"774_CR31","doi-asserted-by":"crossref","unstructured":"Suzgun, Mirac, Scales, Nathan, Sch\u00e4rli, Nathanael, Gehrmann, Sebastian, Tay, Yi, Chung, Hyung\u00a0Won, Chowdhery, Aakanksha, Le, Quoc, Chi, Ed, Zhou, Denny, Wei, Jason: Challenging BIG-bench tasks and whether chain-of-thought can solve them. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 13003\u201313051, Toronto, Canada, (July 2023) Association for Computational Linguistics","DOI":"10.18653\/v1\/2023.findings-acl.824"},{"key":"774_CR32","doi-asserted-by":"crossref","unstructured":"Min, Bonan, Ross, Hayley, Sulem, Elior, Veyseh, Amir Pouran\u00a0Ben, Nguyen, Thien\u00a0Huu, Sainz, Oscar, Agirre, Eneko, Heintz, Ilana, Roth, Dan: Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput. Surv., 56(2), (September 2023)","DOI":"10.1145\/3605943"},{"key":"774_CR33","unstructured":"Rafailov, Rafael, Sharma, Archit, Mitchell, Eric, Manning, Christopher\u00a0D, Ermon, Stefano, Finn, Chelsea: Direct preference optimization: Your language model is secretly a reward model. In A.\u00a0Oh, T.\u00a0Naumann, A.\u00a0Globerson, K.\u00a0Saenko, M.\u00a0Hardt, and S.\u00a0Levine, editors, Advances in Neural Information Processing Systems, volume\u00a036, pages 53728\u201353741. Curran Associates, Inc., (2023)"},{"issue":"70","key":"774_CR34","first-page":"1","volume":"25","author":"HW Chung","year":"2024","unstructured":"Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S.S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Castro-Ros, A., Pellat, M., Robinson, K., Valter, D., Narang, S., Mishra, G., Yu, A., Zhao, V., Huang, Y., Dai, A., Yu, H., Petrov, S., Chi, E.H., Dean, J., Devlin, J., Roberts, A., Zhou, D., Le, Q.V., Wei, J.: Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25(70), 1\u201353 (2024)","journal-title":"J. Mach. Learn. Res."},{"key":"774_CR35","volume-title":"and Xiangang Li","author":"Y Ji","year":"2023","unstructured":"Ji, Y., Deng, Y., Gong, Y., Peng, Y., Niu, Q., Zhang, L., Ma, B.: and Xiangang Li. An empirical study on real-world use cases, Exploring the impact of instruction data scaling on large language models (2023)"},{"key":"774_CR36","unstructured":"Zhou, Chunting, Liu, Pengfei, Xu, Puxin, Iyer, Srinivasan, Sun, Jiao, Mao, Yuning, Ma, Xuezhe, Efrat, Avia, Yu, Ping, YU, LILI, Zhang, Susan, Ghosh, Gargi, Lewis, Mike, Zettlemoyer, Luke, Levy, Omer: Lima: Less is more for alignment. In A.\u00a0Oh, T.\u00a0Naumann, A.\u00a0Globerson, K.\u00a0Saenko, M.\u00a0Hardt, and S.\u00a0Levine, editors, Advances in Neural Information Processing Systems, volume\u00a036, pages 55006\u201355021. Curran Associates, Inc., (2023)"},{"issue":"3","key":"774_CR37","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1038\/s42256-023-00626-4","volume":"5","author":"N Ding","year":"2023","unstructured":"Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Yusheng, S., Shengding, H., Chen, Y., Chan, C.-M., Chen, W., Yi, J., Zhao, W., Wang, X., Liu, Z., Zheng, H.-T., Chen, J., Liu, Y., Tang, J., Li, J., Sun, M.: Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence 5(3), 220\u2013235 (2023)","journal-title":"Nature Machine Intelligence"},{"key":"774_CR38","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1016\/j.eswa.2017.03.020","volume":"80","author":"M Pavlinek","year":"2017","unstructured":"Pavlinek, M., Podgorelec, V.: Text classification method based on self-training and lda topic models. Expert Syst. Appl. 80, 83\u201393 (2017)","journal-title":"Expert Syst. Appl."},{"issue":"2","key":"774_CR39","doi-asserted-by":"publisher","first-page":"462","DOI":"10.1109\/TCYB.2015.2403573","volume":"46","author":"C-L Liu","year":"2016","unstructured":"Liu, C.-L., Hsaio, W.-H., Lee, C.-H., Chang, T.-H., Kuo, T.-H.: Semi-supervised text classification with universum learning. IEEE Transactions on Cybernetics 46(2), 462\u2013473 (2016)","journal-title":"IEEE Transactions on Cybernetics"},{"key":"774_CR40","doi-asserted-by":"crossref","unstructured":"Karisani, Payam, Karisani, Negin: Semi-supervised text classification via self-pretraining. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, WSDM \u201921, page 40-48, New York, NY, USA, (2021). Association for Computing Machinery","DOI":"10.1145\/3437963.3441814"},{"key":"774_CR41","unstructured":"Wang, Jiahao, Zhang, Bolin, Du, Qianlong, Zhang, Jiajun, Chu, Dianhui: A survey on data selection for llm instruction tuning, (2024)"},{"key":"774_CR42","doi-asserted-by":"crossref","unstructured":"Pease, Adam, Fellbaum, Christiane, Huang, Chu-ren, Calzolari, Nicoletta, Gangemi, Aldo, Lenci, Alessandro, Oltramari, Alessandro, Prevot, Laurent: Formal ontology as interlingua: the SUMO and WordNet linking project and global WordNet, page 25-35. Studies in Natural Language Processing. Cambridge University Press, (2010)","DOI":"10.1017\/CBO9780511676536.003"},{"key":"774_CR43","doi-asserted-by":"crossref","unstructured":"Zheng, Yaowei, Zhang, Richong, Zhang, Junhao, Ye, Yanhan, Luo, Zheyan, Feng, Zhangchi, Ma, Yongqiang: Llamafactory: Unified efficient fine-tuning of 100+ language models, (2024)","DOI":"10.18653\/v1\/2024.acl-demos.38"},{"key":"774_CR44","doi-asserted-by":"crossref","unstructured":"Kowsari, Kamran, Brown, Donald\u00a0E., Heidarysafa, Mojtaba, Jafari\u00a0Meimandi, Kiana, Gerber, Matthew\u00a0S., Barnes, Laura\u00a0E.: Hdltex: Hierarchical deep learning for text classification. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 364\u2013371, (2017)","DOI":"10.1109\/ICMLA.2017.0-134"},{"key":"774_CR45","unstructured":"Lewis, David\u00a0D.: Reuters-21578 text categorization test collection, distribution 1.0. (1997)"},{"key":"774_CR46","unstructured":"Rosenberg, Andrew, Hirschberg, Julia: V-measure: A conditional entropy-based external cluster evaluation measure. In Jason Eisner, editor, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 410\u2013420, Prague, Czech Republic, (June 2007). Association for Computational Linguistics"},{"key":"774_CR47","doi-asserted-by":"crossref","unstructured":"Chen, Jianlv, Xiao, Shitao, Zhang, Peitian, Luo, Kun, Lian, Defu, Liu, Zheng: Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, (2024)","DOI":"10.18653\/v1\/2024.findings-acl.137"},{"key":"774_CR48","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/j.ins.2018.10.006","volume":"477","author":"D Kim","year":"2019","unstructured":"Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf. Sci. 477, 15\u201329 (2019)","journal-title":"Inf. Sci."},{"key":"774_CR49","unstructured":"Mikolov, Tomas, Chen, Kai, Corrado, Gregory\u00a0S., Dean, Jeffrey: Efficient estimation of word representations in vector space. In International Conference on Learning Representations, (2013)"},{"issue":"1","key":"774_CR50","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1109\/TPAMI.2018.2877660","volume":"42","author":"C Zhang","year":"2020","unstructured":"Zhang, C., Huazhu, F., Qinghua, H., Cao, X., Xie, Y., Tao, D., Dong, X.: Generalized latent multi-view subspace clustering. IEEE Trans. Pattern Anal. Mach. Intell. 42(1), 86\u201399 (2020)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"774_CR51","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1016\/j.future.2017.12.005","volume":"82","author":"N Kushwaha","year":"2018","unstructured":"Kushwaha, N., Pant, M.: Link based bpso for feature selection in big data text clustering. Futur. Gener. Comput. Syst. 82, 190\u2013199 (2018)","journal-title":"Futur. Gener. Comput. Syst."},{"key":"774_CR52","doi-asserted-by":"publisher","first-page":"404","DOI":"10.1016\/j.ins.2020.08.052","volume":"547","author":"S Laohakiat","year":"2021","unstructured":"Laohakiat, S., Sa-ing, V.: An incremental density-based clustering framework using fuzzy local clustering. Inf. Sci. 547, 404\u2013426 (2021)","journal-title":"Inf. Sci."},{"key":"774_CR53","doi-asserted-by":"crossref","unstructured":"Zhang, Tian, Ramakrishnan, Raghu, Livny, Miron: Birch: an efficient data clustering method for very large databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD \u201996, page 103-114, New York, NY, USA, (1996). Association for Computing Machinery","DOI":"10.1145\/233269.233324"}],"container-title":["International Journal of Data Science and Analytics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-025-00774-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41060-025-00774-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-025-00774-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T12:18:37Z","timestamp":1758975517000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41060-025-00774-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,23]]},"references-count":53,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["774"],"URL":"https:\/\/doi.org\/10.1007\/s41060-025-00774-3","relation":{},"ISSN":["2364-415X","2364-4168"],"issn-type":[{"value":"2364-415X","type":"print"},{"value":"2364-4168","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,23]]},"assertion":[{"value":"24 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 March 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}