{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,22]],"date-time":"2026-03-22T00:21:42Z","timestamp":1774138902761,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":25,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,10,25]],"date-time":"2024-10-25T00:00:00Z","timestamp":1729814400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,25]]},"DOI":"10.1145\/3704323.3704357","type":"proceedings-article","created":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T08:25:22Z","timestamp":1736238322000},"page":"295-302","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Qwen-IG: A Qwen-based Instruction Generation Model for LLM Fine-tuning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-7846-1018","authenticated-orcid":false,"given":"Lu","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5097-0527","authenticated-orcid":false,"given":"Yu","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-0336-389X","authenticated-orcid":false,"given":"Yitian","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2396-1360","authenticated-orcid":false,"given":"Feng","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8823-8480","authenticated-orcid":false,"given":"Jinguang","family":"Gu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China"}]}],"member":"320","published-online":{"date-parts":[[2025,1,7]]},"reference":[{"key":"e_1_3_3_1_2_2","unstructured":"Yue Wang Xinrui Wang Juntao Li Jinxiong Chang Qishen Zhang Zhongyi Liu Guannan Zhang and Min Zhang. Harnessing the power of david against goliath: Exploring instruction data generation without using closed-source models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.12711 2023."},{"key":"e_1_3_3_1_3_2","unstructured":"Chunting Zhou Pengfei Liu Puxin Xu Srinivasan Iyer Jiao Sun Yuning Mao Xuezhe Ma Avia Efrat Ping Yu Lili Yu et\u00a0al. Lima: Less is more for alignment. Advances in Neural Information Processing Systems 36 2024."},{"key":"e_1_3_3_1_4_2","unstructured":"Hao Chen Yiming Zhang Qi\u00a0Zhang Hantao Yang Xiaomeng Hu Xuetao Ma Yifan Yanggong and Junbo Zhao. Maybe only 0.5% data is needed: A preliminary exploration of low training data instruction tuning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.09246 2023."},{"key":"e_1_3_3_1_5_2","unstructured":"Beiming Liu Kunhao Huang Lihua Jiao Yuchen He Ruiqin Zhang Yuan Liang and Yingshan Wang. chat-dataset-baseline. https:\/\/github.com\/hikariming\/alpaca_chinese_dataset 2023."},{"key":"e_1_3_3_1_6_2","unstructured":"Baolin Peng Chunyuan Li Pengcheng He Michel Galley and Jianfeng Gao. Instruction tuning with gpt-4. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2304.03277 2023."},{"key":"e_1_3_3_1_7_2","unstructured":"Ye\u00a0Chen Wei Cai Liangmin Wu Xiaowei Li Zhanxuan Xin and Cong Fu. Tigerbot: An open multilingual multitask llm. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.08688 2023."},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Raul Puri Ryan Spring Mostofa Patwary Mohammad Shoeybi and Bryan Catanzaro. Training question answering models from synthetic data. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2002.09599 2020.","DOI":"10.18653\/v1\/2020.emnlp-main.468"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"crossref","unstructured":"Siamak Shakeri Cicero dos Santos Henghui Zhu Patrick Ng Feng Nan Zhiguo Wang Ramesh Nallapati and Bing Xiang. End-to-end synthetic data generation for domain adaptation of question answering systems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) pages 5445\u20135460 2020.","DOI":"10.18653\/v1\/2020.emnlp-main.439"},{"key":"e_1_3_3_1_10_2","unstructured":"Dong\u00a0Bok Lee Seanie Lee Woo\u00a0Tae Jeong Donghwan Kim and Sung\u00a0Ju Hwang. Generating diverse and consistent qa pairs from contexts with information-maximizing hierarchical conditional vaes. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2005.13837 2020."},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"crossref","unstructured":"Patrick Lewis Yuxiang Wu Linqing Liu Pasquale Minervini Heinrich K\u00fcttler Aleksandra Piktus Pontus Stenetorp and Sebastian Riedel. Paq: 65 million probably-asked questions and what you can do with them. Transactions of the Association for Computational Linguistics 9:1098\u20131115 2021.","DOI":"10.1162\/tacl_a_00415"},{"key":"e_1_3_3_1_12_2","doi-asserted-by":"crossref","unstructured":"Asahi Ushio Fernando Alva-Manchego and Jose Camacho-Collados. A practical toolkit for multilingual question and answer generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.17416 2023.","DOI":"10.18653\/v1\/2023.acl-demo.8"},{"key":"e_1_3_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Yizhong Wang Yeganeh Kordi Swaroop Mishra Alisa Liu Noah\u00a0A Smith Daniel Khashabi and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2212.10560 2022.","DOI":"10.18653\/v1\/2023.acl-long.754"},{"key":"e_1_3_3_1_14_2","unstructured":"Xuanyu Zhang and Qing Yang. Self-qa: Unsupervised knowledge guided language model alignment. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.11952 2023."},{"key":"e_1_3_3_1_15_2","unstructured":"Xian Li Ping Yu Chunting Zhou Timo Schick Luke Zettlemoyer Omer Levy Jason Weston and Mike Lewis. Self-alignment with instruction backtranslation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.06259 2023."},{"key":"e_1_3_3_1_16_2","unstructured":"Zhijie Bao Wei Chen Shengze Xiao Kuang Ren Jiaao Wu Cheng Zhong Jiajie Peng Xuanjing Huang and Zhongyu Wei. Disc-medllm: Bridging general large language models and real-world medical consultation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.14346 2023."},{"key":"e_1_3_3_1_17_2","unstructured":"Ningyu Zhang Jintian Zhang Xiaohan Wang Honghao Gui Kangwei Liu Yinuo Jiang Xiang Chen Shengyu Mao Shuofei Qiao Yuqi Zhu Zhen Bi Jing Chen Xiaozhuan Liang Yixin Ou Runnan Fang Zekun Xi Xin Xu Lei Li Peng Wang Mengru Wang Yunzhi Yao Bozhong Tian Yin Fang Guozhou Zheng and Huajun Chen. Knowlm technical report 2023."},{"key":"e_1_3_3_1_18_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori\u00a0B Hashimoto. Stanford alpaca: an instruction-following llama model (2023). URL https:\/\/github. com\/tatsu-lab\/stanford_alpaca 1(9) 2023."},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"crossref","unstructured":"Daoyuan Chen Yilun Huang Zhijian Ma Hesen Chen Xuchen Pan Ce\u00a0Ge Dawei Gao Yuexiang Xie Zhaoyang Liu Jinyang Gao et\u00a0al. Data-juicer: A one-stop data processing system for large language models. In Companion of the 2024 International Conference on Management of Data pages 120\u2013134 2024.","DOI":"10.1145\/3626246.3653385"},{"key":"e_1_3_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Patrick Lewis Ludovic Denoyer and Sebastian Riedel. Unsupervised question answering by cloze translation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1906.04980 2019.","DOI":"10.18653\/v1\/P19-1484"},{"key":"e_1_3_3_1_21_2","unstructured":"Wei Liu Weihao Zeng Keqing He Yong Jiang and Junxian He. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.15685 2023."},{"key":"e_1_3_3_1_22_2","unstructured":"huggingface. Perplexity of fixed-length models."},{"key":"e_1_3_3_1_23_2","unstructured":"Shahul Es Jithin James Luis Espinosa-Anke and Steven Schockaert. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.15217 2023."},{"key":"e_1_3_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Philip\u00a0M McCarthy and Scott Jarvis. Mtld vocd-d and hd-d: A validation study of sophisticated approaches to lexical diversity assessment. Behavior research methods 42(2):381\u2013392 2010.","DOI":"10.3758\/BRM.42.2.381"},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Ming Zhong Yang Liu Da\u00a0Yin Yuning Mao Yizhu Jiao Pengfei Liu Chenguang Zhu Heng Ji and Jiawei Han. Towards a unified multi-dimensional evaluator for text generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2210.07197 2022.","DOI":"10.18653\/v1\/2022.emnlp-main.131"},{"key":"e_1_3_3_1_26_2","doi-asserted-by":"crossref","unstructured":"Mary\u00a0L McHugh. Interrater reliability: the kappa statistic. Biochemia medica 22(3):276\u2013282 2012.","DOI":"10.11613\/BM.2012.031"}],"event":{"name":"ICCPR 2024: 2024 13th International Conference on Computing and Pattern Recognition","location":"Tianjin China","acronym":"ICCPR 2024"},"container-title":["Proceedings of the 2024 13th International Conference on Computing and Pattern Recognition"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3704323.3704357","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3704323.3704357","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:04Z","timestamp":1750295884000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3704323.3704357"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,25]]},"references-count":25,"alternative-id":["10.1145\/3704323.3704357","10.1145\/3704323"],"URL":"https:\/\/doi.org\/10.1145\/3704323.3704357","relation":{},"subject":[],"published":{"date-parts":[[2024,10,25]]},"assertion":[{"value":"2025-01-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}