{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T00:03:17Z","timestamp":1756425797114,"version":"3.44.0"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686165","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,8,26]],"date-time":"2025-08-26T00:00:00Z","timestamp":1756166400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,8,26]]},"abstract":"<jats:p>Automatic term extraction (ATE) identifies domain-specific concepts from specialized corpora, but suffers from limited annotated training data across diverse domains. We propose three novel LLM-based data augmentation schemes for ATE: context-level augmentation (generating diverse sentences using existing terms), term-level augmentation (replacing terms with domain-relevant alternatives), and combined augmentation (creating novel sentences with new terminology). Our approach leverages both ChatGPT-4o and Wikipedia-derived domain lexicons to generate synthetic training data. Experiments across four domains in the ACTER dataset demonstrate consistent improvements over state-of-the-art XLM-RoBERTa baselines, with gains of up to 28% F1-score in few-shot scenarios (5-10 samples) and 1-2% improvements in larger datasets (100-500 samples). Context-level and term-level augmentation consistently outperform combined augmentation, while LLM- based methods surpass Wikipedia-based augmentation. Our findings establish the effectiveness of targeted data augmentation for ATE across varying data availability scenarios, with performance gains extending beyond few-shot settings to practical dataset sizes.<\/jats:p>","DOI":"10.3233\/ssw250013","type":"book-chapter","created":{"date-parts":[[2025,8,28]],"date-time":"2025-08-28T08:05:39Z","timestamp":1756368339000},"source":"Crossref","is-referenced-by-count":0,"title":["DA-ATE: Data Augmentation for Automatic Term Extraction"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3969-5183","authenticated-orcid":false,"given":"Shubhanker","family":"Banerjee","sequence":"first","affiliation":[{"name":"Research Ireland ADAPT Centre, University of Galway, Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4575-7934","authenticated-orcid":false,"given":"Bharathi Raja","family":"Chakravarthi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Galway, Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7227-1331","authenticated-orcid":false,"given":"John P.","family":"McCrae","sequence":"additional","affiliation":[{"name":"Research Ireland ADAPT Centre, University of Galway, Ireland"}]}],"member":"7437","container-title":["Studies on the Semantic Web","Linking Meaning: Semantic Technologies Shaping the Future of AI"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/SSW250013","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,28]],"date-time":"2025-08-28T08:05:40Z","timestamp":1756368340000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SSW250013"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,26]]},"ISBN":["9781643686165"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/ssw250013","relation":{},"ISSN":["1868-1158","2215-0870"],"issn-type":[{"value":"1868-1158","type":"print"},{"value":"2215-0870","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,26]]}}}