{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T15:30:41Z","timestamp":1781019041343,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T00:00:00Z","timestamp":1774224000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"funder":[{"name":"FAPESP","award":["2022\/03176-1"],"award-info":[{"award-number":["2022\/03176-1"]}]},{"name":"FAPESP","award":["019\/07665-4"],"award-info":[{"award-number":["019\/07665-4"]}]},{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","doi-asserted-by":"publisher","award":["001"],"award-info":[{"award-number":["001"]}],"id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,3,23]]},"DOI":"10.1145\/3748522.3779928","type":"proceedings-article","created":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T14:17:49Z","timestamp":1781014669000},"page":"926-933","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["NEAGE: NER Data Augmentation via Small LLMs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-4296-0937","authenticated-orcid":false,"given":"Jo\u00e3o Lucas","family":"Luz Lima Sarcinelli","sequence":"first","affiliation":[{"name":"Universidade de S\u00e3o Paulo, S\u00e3o Carlos, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5184-9413","authenticated-orcid":false,"given":"Diego","family":"Furtado Silva","sequence":"additional","affiliation":[{"name":"Universidade de S\u00e3o Paulo, S\u00e3o Carlos, Brazil"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,6,9]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Local large language models for complex structured medical tasks, (Aug","author":"Cody Bumgardner V. K.","year":"2023","unstructured":"V. K. Cody Bumgardner, Aaron Mullen, Sam Armstrong, Caylin Hickey, and Jeff Talbert. 2023. Local large language models for complex structured medical tasks, (Aug. 2023). http:\/\/arxiv.org\/abs\/2308.01727."},{"key":"e_1_3_2_1_2_1","volume-title":"Livio Pompianu, and Sandro Gabriele Tiddia.","author":"Carta Salvatore","year":"2023","unstructured":"Salvatore Carta, Alessandro Giuliani, Leonardo Piano, Alessandro Sebastian Podda, Livio Pompianu, and Sandro Gabriele Tiddia. 2023. Iterative zero-shot llm prompting for knowledge graph construction. (2023). https:\/\/arxiv.org\/abs\/2307.01128 arXiv: 2307.01128 [cs.CL]."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.343"},{"key":"e_1_3_2_1_4_1","volume-title":"Shafiq Joty, Luo Si, and Chunyan Miao.","author":"Ding Bosheng","year":"2020","unstructured":"Bosheng Ding, Linlin Liu, Lidong Bing, Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, and Chunyan Miao. 2020. Daga: data augmentation with a generation approach for low-resource tagging tasks, (Nov. 2020). http:\/\/arxiv.org\/abs\/2011.01549."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Bosheng Ding et al. 2024. Data augmentation using llms: data perspectives learning paradigms and challenges (Mar. 2024). http:\/\/arxiv.org\/abs\/2403.02990.","DOI":"10.18653\/v1\/2024.findings-acl.97"},{"key":"e_1_3_2_1_6_1","unstructured":"Steven Y. Feng Varun Gangal Jason Wei Sarath Chandar Soroush Vosoughi Teruko Mitamura and Eduard Hovy. 2021. A survey of data augmentation approaches for nlp. (2021). https:\/\/arxiv.org\/abs\/2105.03075 arXiv: 2105.03075 [cs.CL]."},{"key":"e_1_3_2_1_7_1","unstructured":"Gabriel Lino Garcia et al. 2024. Introducing bode: a fine-tuned large language model for portuguese prompt-based task (Jan. 2024). http:\/\/arxiv.org\/abs\/2401.02909."},{"key":"e_1_3_2_1_8_1","unstructured":"Aaron Grattafiori et al. 2024. The llama 3 herd of models. (2024). https:\/\/arxiv.org\/abs\/2407.21783 arXiv: 2407.21783 [cs.AI]."},{"key":"e_1_3_2_1_9_1","unstructured":"Edward J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. Lora: low-rank adaptation of large language models. (2021). https:\/\/arxiv.org\/abs\/2106.09685 arXiv: 2106.09685 [cs.CL]."},{"key":"e_1_3_2_1_10_1","volume-title":"Yu","author":"Hu Xuming","year":"2023","unstructured":"Xuming Hu, Yong Jiang, Aiwei Liu, Zhongqiang Huang, Pengjun Xie, Fei Huang, Lijie Wen, and Philip S. Yu. 2023. Entity-to-text based data augmentation for various named entity recognition tasks. (2023). https:\/\/arxiv.org\/abs\/2210.10343 arXiv: 2210.10343 [cs.CL]."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1363"},{"key":"e_1_3_2_1_12_1","volume-title":"Synthetic data generation with large language models for text classification: potential and limitations, (Oct","author":"Li Zhuoyan","year":"2023","unstructured":"Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, and Ming Yin. 2023. Synthetic data generation with large language models for text classification: potential and limitations, (Oct. 2023). http:\/\/arxiv.org\/abs\/2310.07849."},{"key":"e_1_3_2_1_13_1","unstructured":"Yinhan Liu et al. 2019. Roberta: a robustly optimized bert pretraining approach. (2019). https:\/\/arxiv.org\/abs\/1907.11692 arXiv: 1907.11692 [cs.CL]."},{"key":"e_1_3_2_1_14_1","volume-title":"Gl\u00f3ria - a generative and open large language model for portuguese, (Feb","author":"Lopes Ricardo","year":"2024","unstructured":"Ricardo Lopes, Jo\u00e3o Magalh\u00e3es, and David Semedo. 2024. Gl\u00f3ria - a generative and open large language model for portuguese, (Feb. 2024). http:\/\/arxiv.org\/abs\/2402.12969."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-99722-3_32"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5753\/lasdigov.2025.9471"},{"key":"e_1_3_2_1_17_1","unstructured":"Stephen Mayhew et al. 2024. Universal ner: a gold-standard multilingual named entity recognition benchmark. (2024). https:\/\/arxiv.org\/abs\/2311.09122 arXiv: 2311.09122 [cs.CL]."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-3709"},{"key":"e_1_3_2_1_19_1","volume-title":"Harem: an advanced ner evaluation contest for portuguese. In quot","author":"Santos Diana","year":"2006","unstructured":"Diana Santos, Nuno Seco, Nuno Cardoso, and Rui Vilela. 2006. Harem: an advanced ner evaluation contest for portuguese. In quot; In Nicoletta Calzolari; Khalid Choukri; Aldo Gangemi; Bente Maegaard; Joseph Mariani; Jan Odjik; Daniel Tapias (ed) Proceedings of the 5 th International Conference on Language Resources and Evaluation (LREC'2006)(Genoa Italy 22\u201328 May 2006)."},{"key":"e_1_3_2_1_20_1","volume-title":"Does synthetic data generation of llms help clinical text mining? (Mar","author":"Tang Ruixiang","year":"2023","unstructured":"Ruixiang Tang, Xiaotian Han, Xiaoqian Jiang, and Xia Hu. 2023. Does synthetic data generation of llms help clinical text mining? (Mar. 2023). http:\/\/arxiv.org\/abs\/2303.04360."},{"key":"e_1_3_2_1_21_1","volume-title":"COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)","author":"Erik","year":"2024","unstructured":"Erik F. Tjong Kim Sang. 2002. Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002). https:\/\/aclanthology.org\/W02-2024\/."},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL","author":"Erik","year":"2003","unstructured":"Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 142\u2013147. https:\/\/aclanthology.org\/W03-0419\/."},{"key":"e_1_3_2_1_23_1","unstructured":"Shuhe Wang Xiaofei Sun Xiaoya Li Rongbin Ouyang Fei Wu Tianwei Zhang Jiwei Li and Guoyin Wang. 2023. Gpt-ner: named entity recognition via large language models. (2023). https:\/\/arxiv.org\/abs\/2304.10428 arXiv: 2304.10428 [cs.CL]."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-022-10144-1"},{"key":"e_1_3_2_1_25_1","volume-title":"Advances in Neural Information Processing Systems","author":"Xie Qizhe","year":"2020","unstructured":"Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020. Unsupervised data augmentation for consistency training. In Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, (Eds.) Vol. 33. Curran Associates, Inc., 6256\u20136268. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/44feb0096faa8326192570788b38c1d1-Paper.pdf."},{"key":"e_1_3_2_1_26_1","unstructured":"Derong Xu et al. 2023. Large language models for generative information extraction: a survey (Dec. 2023). http:\/\/arxiv.org\/abs\/2312.17617."},{"key":"e_1_3_2_1_27_1","volume-title":"Llm-da: data augmentation via large language models for few-shot named entity recognition, (Feb","author":"Ye Junjie","year":"2024","unstructured":"Junjie Ye, Nuo Xu, Yikun Wang, Jie Zhou, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. Llm-da: data augmentation via large language models for few-shot named entity recognition, (Feb. 2024). http:\/\/arxiv.org\/abs\/2402.14568."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.160"}],"event":{"name":"SAC '26: 41st ACM\/SIGAPP Symposium on Applied Computing","location":"Grand Hotel Palace Thessaloniki Greece","acronym":"SAC '26","sponsor":["SIGAPP ACM Special Interest Group on Applied Computing"]},"container-title":["Proceedings of the 41st ACM\/SIGAPP Symposium on Applied Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748522.3779928","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T14:50:28Z","timestamp":1781016628000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748522.3779928"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,23]]},"references-count":28,"alternative-id":["10.1145\/3748522.3779928","10.1145\/3748522"],"URL":"https:\/\/doi.org\/10.1145\/3748522.3779928","relation":{},"subject":[],"published":{"date-parts":[[2026,3,23]]},"assertion":[{"value":"2026-06-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}