{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,19]],"date-time":"2025-10-19T00:23:38Z","timestamp":1760833418246,"version":"build-2065373602"},"reference-count":84,"publisher":"Association for Computing Machinery (ACM)","issue":"9","funder":[{"name":"National Key Research and Development Program of China","award":["2022YFB3102904"],"award-info":[{"award-number":["2022YFB3102904"]}]},{"name":"Open Fund of the Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>Website owner identification aims to recognize the organization or individual who owns a given website that is served on the web. It is a crucial step for cyberspace surveying and mapping, playing a significant role in cyberspace administration and governance. Existing widely employed solutions for website owner identification mainly fall into two paradigms: (1) querying the public information databases such as WHOIS, which store the Internet resource\u2019s registered users or assignees; and (2) directly extracting the organization or individual name of the website owner from the webpage using the technique of named entity recognition. However, the former is less reliable due to the incomplete, encrypted, and outdated records in the public information databases. Meanwhile, the latter requires that the webpages explicitly and precisely present their owner names without ambiguity, which is often hard to guarantee in practice.<\/jats:p>\n          <jats:p>\n            To address these limitations, we propose to formulate website owner identification as a problem of webpage representation learning, thereby introducing a novel representation learning framework empowered by large language model-based text Rewriting and Multi-level contrastive learning, named ReMon. First, we devise a prompt to rewrite the webpages using large language models, which effectively filters out noise from the original webpages. Second, we model website\u2013website, website\u2013owner, and owner\u2013owner interactions through multi-level contrastive learning, fully utilizing the self-supervision signals on long-tail items to learn the multi-level constraints. Third, we design a retrieval-based prediction framework and a clustering-based framework to apply websites\u2019 and owners\u2019 representations for different scenarios of the website owner identification task. To evaluate ReMon under our formulation, we construct two datasets based on real-world data. Compared to existing approaches, our ReMon can address the challenging scenarios when valid information cannot be found in public information databases and the owner\u2019s name does not appear on the webpage. Meanwhile, the experimental results show that ReMon outperforms all representation learning-based baselines and significantly enhances training efficiency. The code is available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/tuchen9\/ReMon\">https:\/\/github.com\/tuchen9\/ReMon<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3767155","type":"journal-article","created":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T22:22:12Z","timestamp":1757629332000},"page":"1-39","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Website Owner Identification through Multi-level Contrastive Representation Learning"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4223-7298","authenticated-orcid":false,"given":"Cheng","family":"Tu","sequence":"first","affiliation":[{"name":"College of Electronic Engineering, National University of Defense Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3038-5389","authenticated-orcid":false,"given":"Yunshan","family":"Ma","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9585-3472","authenticated-orcid":false,"given":"Yang","family":"Li","sequence":"additional","affiliation":[{"name":"College of Electronic Engineering, National University of Defense Technology, Hefei, China and Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6654-7610","authenticated-orcid":false,"given":"Min","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Electronic Engineering, National University of Defense Technology, Hefei, China and Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9691-4011","authenticated-orcid":false,"given":"Miao","family":"Hu","sequence":"additional","affiliation":[{"name":"College of Electronic Engineering, National University of Defense Technology, Hefei, China and Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4533-2706","authenticated-orcid":false,"given":"Fan","family":"Shi","sequence":"additional","affiliation":[{"name":"College of Electronic Engineering, National University of Defense Technology, Hefei, China and Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6148-6329","authenticated-orcid":false,"given":"Xiang","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,18]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et al. 2023. Gpt-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_2_3_2","unstructured":"Jinze Bai Shuai Bai Yunfei Chu Zeyu Cui Kai Dang Xiaodong Deng Yang Fan Wenbin Ge Yu Han Fei Huang et al. 2023. Qwen technical report. arXiv:2309.16609. Retrieved from https:\/\/arxiv.org\/abs\/2309.16609"},{"key":"e_1_3_2_4_2","first-page":"32897","article-title":"Vlmo: Unified vision-language pre-training with mixture-of-modality-experts","volume":"35","author":"Bao Hangbo","year":"2022","unstructured":"Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Songhao Piao, and Furu Wei. 2022. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. In Advances in Neural Information Processing Systems, Vol. 35, 32897\u201332912.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.121629"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2024.112739"},{"key":"e_1_3_2_7_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33, 1877\u20131901.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_8_2","first-page":"9912","article-title":"Unsupervised learning of visual features by contrasting cluster assignments","volume":"33","author":"Caron Mathilde","year":"2020","unstructured":"Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. In Advances in Neural Information Processing Systems, Vol. 33, 9912\u20139924.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Hou Pong Chan Qi Zeng and Heng Ji. 2023. Interpretable automatic fine-grained inconsistency detection in text summarization. arXiv:2305.14548. Retrieved from https:\/\/arxiv.org\/abs\/2305.14548","DOI":"10.18653\/v1\/2023.findings-acl.402"},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"Wen Yu Chang and Yun-Nung Chen. 2024. Injecting salesperson\u2019s dialogue strategies in large language models with chain-of-thought reasoning. arXiv:2404.18564. Retrieved from https:\/\/arxiv.org\/abs\/2404.18564","DOI":"10.18653\/v1\/2024.findings-acl.228"},{"key":"e_1_3_2_11_2","first-page":"1597","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning. PMLR, 1597\u20131607."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01549"},{"key":"e_1_3_2_13_2","unstructured":"Xiao Chen Sihang Zhou Ke Liang and Xinwang Liu. 2024. Distilling reasoning ability from large language models with adaptive thinking. arXiv:2404.09170. Retrieved from https:\/\/arxiv.org\/abs\/2404.09170"},{"key":"e_1_3_2_14_2","unstructured":"Wei Lin Chiang Zhuohan Li Zi Lin Ying Sheng Zhanghao Wu Hao Zhang Lianmin Zheng Siyuan Zhuang Yonghao Zhuang Joseph E. Gonzalez et al. 2023. Vicuna: An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality. Retrieved from https:\/\/lmsys.org\/blog\/2023-03-30-vicuna\/"},{"issue":"240","key":"e_1_3_2_15_2","first-page":"1","article-title":"Palm: Scaling language modeling with pathways","volume":"24","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24, 240 (2023), 1\u2013113.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.65"},{"key":"e_1_3_2_17_2","unstructured":"Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay William Fedus Eric Li Xuezhi Wang Mostafa Dehghani Siddhartha Brahma et al. 2022. Scaling instruction-finetuned language models. arXiv:2210.11416. Retrieved from https:\/\/arxiv.org\/abs\/2210.11416"},{"key":"e_1_3_2_18_2","first-page":"23182","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"Cohn Clayton","year":"2024","unstructured":"Clayton Cohn, Nicole Hutchins, Tuan Le, and Gautam Biswas. 2024. A chain-of-thought prompting approach with LLMs for evaluating students\u2019 formative assessment responses in science. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 23182\u201323190."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.58"},{"key":"e_1_3_2_20_2","unstructured":"Yiming Cui Wanxiang Che Shijin Wang and Ting Liu. 2022. LERT: A linguistically-motivated pre-trained language model. arXiv:2211.05344. Retrieved from https:\/\/arxiv.org\/abs\/2211.05344"},{"key":"e_1_3_2_21_2","unstructured":"Yiming Cui Ziqing Yang and Ting Liu. 2022. PERT: Pre-training BERT with permuted language model. arXiv:2203.06906. Retrieved from https:\/\/arxiv.org\/abs\/2203.06906"},{"key":"e_1_3_2_22_2","unstructured":"Debarati Das David Ma and Dongyeop Kang. 2023. Balancing effect of training dataset distribution of multiple styles for multi-style text transfer. arXiv:2305.15582. Retrieved from https:\/\/arxiv.org\/abs\/2305.15582"},{"key":"e_1_3_2_23_2","unstructured":"Qingxiu Dong Lei Li Damai Dai Ce Zheng Zhiyong Wu Baobao Chang Xu Sun Jingjing Xu and Zhifang Sui. 2022. A survey on in-context learning. arXiv:2301.00234. Retrieved from https:\/\/arxiv.org\/abs\/2301.00234"},{"key":"e_1_3_2_24_2","unstructured":"Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Amy Yang Angela Fan et al. 2024. The llama 3 herd of models. arXiv:2407.21783. Retrieved from https:\/\/arxiv.org\/abs\/2407.21783"},{"key":"e_1_3_2_25_2","unstructured":"Chris Dyer. 2014. Notes on noise contrastive estimation and negative sampling. arXiv:1410.8251. Retrieved from https:\/\/arxiv.org\/abs\/1410.8251"},{"key":"e_1_3_2_26_2","volume-title":"Proceedings of the","author":"Fachkha Claude","year":"2017","unstructured":"Claude Fachkha, Elias Bou-Harb, Anastasis Keliris, Nasir D. Memon, and Mustaque Ahamad. 2017. Internet-scale probing of CPS: Inference, characterization and orchestration analysis. In Proceedings of the Network and Distributed System Security."},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"Hongchao Fang Sicheng Wang Meng Zhou Jiayuan Ding and Pengtao Xie. 2020. CERT: Contrastive self-supervised learning for language understanding. arXiv:2005.12766. Retrieved from https:\/\/arxiv.org\/abs\/2005.12766","DOI":"10.36227\/techrxiv.12308378.v1"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1983.10478008"},{"key":"e_1_3_2_29_2","unstructured":"Tianyu Gao Xingcheng Yao and Danqi Chen. 2021. Simcse: Simple contrastive learning of sentence embeddings. arXiv:2104.08821. Retrieved from https:\/\/arxiv.org\/abs\/2104.08821"},{"key":"e_1_3_2_30_2","unstructured":"Team GLM Aohan Zeng Bin Xu Bowen Wang Chenhui Zhang Da Yin Dan Zhang Diego Rojas Guanyu Feng Hanlin Zhao et al. 2024. Chatglm: A family of large language models from glm-130b to glm-4 all tools. arXiv:2406.12793. Retrieved from https:\/\/arxiv.org\/abs\/2406.12793"},{"key":"e_1_3_2_31_2","first-page":"21271","article-title":"Bootstrap your own latent-a new approach to self-supervised learning","volume":"33","author":"Grill Jean-Bastien","year":"2020","unstructured":"Jean-Bastien Grill, Florian Strub, Florent Altch\u00e9, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. 2020. Bootstrap your own latent-a new approach to self-supervised learning. In Advances in Neural Information Processing Systems, Vol. 33, 21271\u201321284.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_32_2","unstructured":"Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang Xiao Bi et al. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv:2501.12948. Retrieved from https:\/\/arxiv.org\/abs\/2501.12948"},{"key":"e_1_3_2_33_2","article-title":"Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects","author":"Hadi Muhammad Usman","year":"2023","unstructured":"Muhammad Usman Hadi, Rizwan Qureshi, Abbas Shah, Muhammad Irfan, Anas Zafar, Muhammad Bilal Shaikh, Naveed Akhtar, Jia Wu, Seyedali Mirjalili, Mubarak Shah, et al. 2023. Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints. Retrieved from https:\/\/www.techrxiv.org\/doi\/full\/10.36227\/techrxiv.23589741.v3","journal-title":"Authorea Preprints"},{"key":"e_1_3_2_34_2","first-page":"4116","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Hassani Kaveh","year":"2020","unstructured":"Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive multi-view representation learning on graphs. In Proceedings of the International Conference on Machine Learning. PMLR, 4116\u20134126."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578338.3593542"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01908075"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i16.29794"},{"key":"e_1_3_2_39_2","first-page":"4904","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Jia Chao","year":"2021","unstructured":"Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 4904\u20134916."},{"key":"e_1_3_2_40_2","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom B. Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. arXiv:2001.08361. Retrieved from https:\/\/arxiv.org\/abs\/2001.08361"},{"key":"e_1_3_2_41_2","first-page":"2","volume-title":"Proceedings of the NAACL-HLT","volume":"1","author":"Kenton Jacob Devlin Ming-Wei Chang","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT, Vol. 1, 2."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICAI55435.2022.9773725"},{"key":"e_1_3_2_43_2","volume-title":"Learning Scrapy","author":"Kouzis-Loukas Dimitrios","year":"2016","unstructured":"Dimitrios Kouzis-Loukas. 2016. Learning Scrapy. Packt Publishing Livery Place."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-emnlp.38"},{"key":"e_1_3_2_45_2","doi-asserted-by":"crossref","unstructured":"Mike Lewis Yinhan Liu Naman Goyal Marjan Ghazvininejad Abdelrahman Mohamed Omer Levy Ves Stoyanov and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation translation and comprehension. arXiv:1910.13461. Retrieved from https:\/\/arxiv.org\/abs\/1910.13461","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_2_46_2","first-page":"12888","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning. PMLR, 12888\u201312900."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2024.3393356"},{"key":"e_1_3_2_48_2","first-page":"9694","article-title":"Align before fuse: Vision and language representation learning with momentum distillation","volume":"34","author":"Li Junnan","year":"2021","unstructured":"Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. 2021. Align before fuse: Vision and language representation learning with momentum distillation. In Advances in Neural Information Processing Systems, Vol. 34, 9694\u20139705.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3560815"},{"key":"e_1_3_2_50_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3640810"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681349"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-5010"},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","unstructured":"Zhen Qin Rolf Jagerman Kai Hui Honglei Zhuang Junru Wu Jiaming Shen Tianqi Liu Jialu Liu Donald Metzler Xuanhui Wang et al. 2023. Large language models are effective text rankers with pairwise ranking prompting. arXiv:2306.17563. Retrieved from https:\/\/arxiv.org\/abs\/2306.17563","DOI":"10.18653\/v1\/2024.findings-naacl.97"},{"key":"e_1_3_2_56_2","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 8748\u20138763."},{"issue":"8","key":"e_1_3_2_57_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.120014"},{"key":"e_1_3_2_60_2","unstructured":"Leonard Richardson. 2025. Beautiful soup documentation. Retrieved from https:\/\/www.crummy.com\/software\/BeautifulSoup\/"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3637528.3671772"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3627106.3627190"},{"key":"e_1_3_2_63_2","unstructured":"Mahsa Shamsabadi Jennifer D\u2019Souza and S\u00f6ren Auer. 2024. Large language models for scientific information extraction: An empirical study for virology. arXiv:2401.10040. Retrieved from https:\/\/arxiv.org\/abs\/2401.10040"},{"key":"e_1_3_2_64_2","unstructured":"Dinghan Shen Mingzhi Zheng Yelong Shen Yanru Qu and Weizhu Chen. 2020. A simple but tough-to-beat data augmentation approach for natural language understanding and generation. arXiv:2009.13818. Retrieved from https:\/\/arxiv.org\/abs\/2009.13818"},{"key":"e_1_3_2_65_2","unstructured":"Lei Shu Liangchen Luo Jayakumar Hoskere Yun Zhu Canoee Liu Simon Tong Jindong Chen and Lei Meng. 2023. RewriteLM: An instruction-tuned large language model for text rewriting. arXiv:2305.15685. Retrieved from https:\/\/arxiv.org\/abs\/2305.15685"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.65"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.121236"},{"key":"e_1_3_2_68_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_2_69_2","volume-title":"International Conference on Learning Representations","author":"Victor Sanh","year":"2022","unstructured":"Sanh Victor, Webson Albert, Raffel Colin, Bach Stephen, Sutawika Lintang, Alyafeai Zaid, Chaffin Antoine, Stiegler Arnaud, Raja Arun, Dey Manan, et al. 2022. Multitask prompted training enables zero-shot task generalization. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2023.111026"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-23597-0_28"},{"key":"e_1_3_2_72_2","unstructured":"Jason Wei Yi Tay Rishi Bommasani Colin Raffel Barret Zoph Sebastian Borgeaud Dani Yogatama Maarten Bosma Denny Zhou Donald Metzler et al. 2022. Emergent abilities of large language models. arXiv:2206.07682. Retrieved from https:\/\/arxiv.org\/abs\/2206.07682"},{"key":"e_1_3_2_73_2","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, Vol. 35, 24824\u201324837.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_74_2","unstructured":"Xing Wu Chaochen Gao Liangjun Zang Jizhong Han Zhongyuan Wang and Songlin Hu. 2021. Esimcse: Enhanced sample building method for contrastive learning of unsupervised sentence embedding. arXiv:2109.04380. Retrieved from https:\/\/arxiv.org\/abs\/2109.04380"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00393"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-024-40555-y"},{"key":"e_1_3_2_77_2","unstructured":"Ran Xu Hejie Cui Yue Yu Xuan Kan Wenqi Shi Yuchen Zhuang Wei Jin Joyce Ho and Carl Yang. 2023. Knowledge-infused prompting: Assessing and advancing clinical text data generation with large language models. arXiv:2311.00287. Retrieved from https:\/\/arxiv.org\/abs\/2311.00287"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOMW.2019.8845226"},{"key":"e_1_3_2_79_2","unstructured":"Jiahui Yu Zirui Wang Vijay Vasudevan Legg Yeung Mojtaba Seyedhosseini and Yonghui Wu. 2022. Coca: Contrastive captioners are image-text foundation models. arXiv:2205.01917. Retrieved from https:\/\/arxiv.org\/abs\/2205.01917"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i17.29920"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/3617680"},{"key":"e_1_3_2_82_2","doi-asserted-by":"crossref","unstructured":"Haopeng Zhang Philip S. Yu and Jiawei Zhang. 2024. A systematic survey of text summarization: From statistical methods to large language models. arXiv:2406.11289. Retrieved from https:\/\/arxiv.org\/abs\/2406.11289","DOI":"10.1145\/3731445"},{"key":"e_1_3_2_83_2","unstructured":"Jiaxing Zhang Ruyi Gan Junjie Wang Yuxiang Zhang Lin Zhang Ping Yang Xinyu Gao Ziwei Wu Xiaoqun Dong Junqing He et al. 2022. Fengshenbang 1.0: Being the foundation of chinese cognitive intelligence. arXiv:2209.02970. Retrieved from https:\/\/arxiv.org\/abs\/2209.02970"},{"key":"e_1_3_2_84_2","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong et al. 2023. A survey of large language models. arXiv:2303.18223. Retrieved from https:\/\/arxiv.org\/abs\/2303.18223"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00530-023-01170-2"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3767155","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T09:09:23Z","timestamp":1760778563000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3767155"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,18]]},"references-count":84,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3767155"],"URL":"https:\/\/doi.org\/10.1145\/3767155","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2025,10,18]]},"assertion":[{"value":"2024-10-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}