{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,22]],"date-time":"2026-06-22T12:42:17Z","timestamp":1782132137565,"version":"3.54.5"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T00:00:00Z","timestamp":1711152000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T00:00:00Z","timestamp":1711152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Deep learning models produce impressive results in any natural language processing applications when given a better learning strategy and trained with large labeled datasets. However, the annotation of massive training data is far too expensive, especially in the legal domain, due to the need for trained legal professionals. Data augmentation solves the problem of learning without labeled big data. In this paper, we employ pre-trained language models and prompt engineering to generate large-scale pseudo-labeled data for the legal overruling task using 100 data samples. We train small recurrent and convolutional deep-learning models using this data and fine-tune a few other transformer models. We then evaluate the effectiveness of the models, both with and without data augmentation, using the benchmark dataset and analyze the results. We also test the performance of these models with the state-of-the-art GPT-3 model under few-shot setting. Our experimental findings demonstrate that data augmentation results in better model performance in the legal overruling task than models trained without augmentation. Furthermore, our best-performing deep learning model trained on augmented data outperforms the few-shot GPT-3 by 18% in the F1-score. Additionally, our results highlight that the small neural networks trained with augmented data achieve outcomes comparable to those of other large language models.<\/jats:p>","DOI":"10.1007\/s11063-024-11574-4","type":"journal-article","created":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T07:03:04Z","timestamp":1711177384000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Neural Data Augmentation for Legal Overruling Task: Small Deep Learning Models vs. Large Language Models"],"prefix":"10.1007","volume":"56","author":[{"given":"Reshma","family":"Sheik","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"K. P.","family":"Siva Sundara","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"S. Jaya","family":"Nirmala","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,3,23]]},"reference":[{"key":"11574_CR1","doi-asserted-by":"crossref","unstructured":"Feng SY, Gangal V, Wei J, Chandar S, Vosoughi S, Mitamura T, Hovy E (2021) A survey of data augmentation approaches for nlp. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp 968\u2013988","DOI":"10.18653\/v1\/2021.findings-acl.84"},{"key":"11574_CR2","unstructured":"Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28"},{"key":"11574_CR3","doi-asserted-by":"crossref","unstructured":"Wang WY, Yang D (2015) That\u2019s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 2557\u20132563. Association for Computational Linguistics, Lisbon, Portugal (2015)","DOI":"10.18653\/v1\/D15-1306"},{"key":"11574_CR4","doi-asserted-by":"crossref","unstructured":"Fadaee M, Bisazza A, Monz C (2017) Data augmentation for low-resource neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol 2, pp 567\u2013573. Association for Computational Linguistics, Vancouver, Canada","DOI":"10.18653\/v1\/P17-2090"},{"key":"11574_CR5","doi-asserted-by":"crossref","unstructured":"Kobayashi S (2018) Contextual augmentation: Data augmentation by words with paradigmatic relations. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 2, pp 452\u2013457. Association for Computational Linguistics, New Orleans, Louisiana","DOI":"10.18653\/v1\/N18-2072"},{"key":"11574_CR6","doi-asserted-by":"crossref","unstructured":"Sennrich R, Haddow B, Birch A (2016) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 1, pp 86\u201396. Association for Computational Linguistics, Berlin, Germany","DOI":"10.18653\/v1\/P16-1009"},{"key":"11574_CR7","doi-asserted-by":"crossref","unstructured":"Kafle K, Yousefhussien M, Kanan C (2017) Data augmentation for visual question answering. In: Proceedings of the 10th international conference on natural language generation, pp 198\u2013202. Association for Computational Linguistics, Santiago de Compostela, Spain","DOI":"10.18653\/v1\/W17-3529"},{"issue":"1","key":"11574_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-016-0043-6","volume":"3","author":"K Weiss","year":"2016","unstructured":"Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3(1):1\u201340","journal-title":"J Big Data"},{"key":"11574_CR9","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877\u20131901","journal-title":"Adv Neural Inf Process Syst"},{"key":"11574_CR10","doi-asserted-by":"crossref","unstructured":"Zheng L, Guha N, Anderson BR, Henderson P, Ho DE (2021) When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings. In: Proceedings of the eighteenth international conference on artificial intelligence and law, pp 159\u2013168","DOI":"10.1145\/3462757.3466088"},{"key":"11574_CR11","first-page":"2","volume":"1","author":"A Radford","year":"2019","unstructured":"Radford A, Wu J, Amodei D, Amodei D, Clark J, Brundage M, Sutskever I (2019) Better language models and their implications. OpenAI blog 1:2","journal-title":"OpenAI blog"},{"key":"11574_CR12","unstructured":"Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171\u20134186"},{"key":"11574_CR13","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108"},{"key":"11574_CR14","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692"},{"key":"11574_CR15","unstructured":"Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781"},{"key":"11574_CR16","doi-asserted-by":"crossref","unstructured":"Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532\u20131543","DOI":"10.3115\/v1\/D14-1162"},{"key":"11574_CR17","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2017","unstructured":"Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135\u2013146","journal-title":"Trans Assoc Comput Linguist"},{"issue":"2","key":"11574_CR18","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1016\/j.ipm.2010.07.003","volume":"47","author":"E Chen","year":"2011","unstructured":"Chen E, Lin Y, Xiong H, Luo Q, Ma H (2011) Exploiting probabilistic topic models to improve text categorization under class imbalance. Inf Process Manag 47(2):202\u2013214","journal-title":"Inf Process Manag"},{"key":"11574_CR19","unstructured":"Ratner AJ, De\u00a0Sa CM, Wu S, Selsam D, R\u00e9 C (2016) Data programming: Creating large training sets, quickly. Adv Neural Inf Process Syst 29"},{"key":"11574_CR20","doi-asserted-by":"crossref","unstructured":"Wei J, Zou K (2019) EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 6382\u20136388. Association for Computational Linguistics, Hong Kong, China (2019)","DOI":"10.18653\/v1\/D19-1670"},{"key":"11574_CR21","doi-asserted-by":"crossref","unstructured":"Kafle K, Yousefhussien M, Kanan C (2017) Data augmentation for visual question answering. In: Proceedings of the 10th international conference on natural language generation, pp 198\u2013202","DOI":"10.18653\/v1\/W17-3529"},{"issue":"8","key":"11574_CR22","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735\u20131780","journal-title":"Neural Comput"},{"key":"11574_CR23","unstructured":"Guo H, Mao Y, Zhang R (2019) Augmenting data with mixup for sentence classification: an empirical study. arXiv"},{"key":"11574_CR24","unstructured":"Kaushik D, Hovy E, Lipton ZC (2019) Learning the difference that makes a difference with counterfactually-augmented data. arXiv preprint arXiv:1909.12434 (2019)"},{"key":"11574_CR25","doi-asserted-by":"crossref","unstructured":"Wu X, Lv S, Zang L, Han J, Hu S (2019) Conditional bert contextual augmentation. In: International conference on computational science, pp 84\u201395. Springer","DOI":"10.1007\/978-3-030-22747-0_7"},{"key":"11574_CR26","doi-asserted-by":"crossref","unstructured":"Elsahar H, Gravier C, Laforest F (2018) Zero-shot question generation from knowledge graphs for unseen predicates and entity types. arXiv preprint arXiv:1802.06842","DOI":"10.18653\/v1\/N18-1020"},{"key":"11574_CR27","unstructured":"Papanikolaou Y, Pierleoni A (2020) Dare: data augmented relation extraction with gpt-2. arXiv preprint arXiv:2004.13845"},{"key":"11574_CR28","unstructured":"Zhang D, Li T, Zhang H, Yin B (2020) On data augmentation for extreme multi-label classification. arXiv preprint arXiv:2009.10778"},{"key":"11574_CR29","unstructured":"Moradi M, Blagec K, Haberl F, Samwald M (2021) Gpt-3 models are poor few-shot learners in the biomedical domain. arXiv preprint arXiv:2109.02555"},{"issue":"4","key":"11574_CR30","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234\u20131240","journal-title":"Bioinformatics"},{"key":"11574_CR31","doi-asserted-by":"crossref","unstructured":"Chen Z, Eavani H, Chen W, Liu Y, Wang WY (2019) Few-shot nlg with pre-trained language model. arXiv preprint arXiv:1904.09521","DOI":"10.18653\/v1\/2020.acl-main.18"},{"key":"11574_CR32","unstructured":"Edwards A, Ushio A, Camacho-Collados J, de Ribaupierre H, Preece A (2021) Guiding generative language models for data augmentation in few-shot text classification. arXiv preprint arXiv:2111.09064"},{"key":"11574_CR33","doi-asserted-by":"crossref","unstructured":"Anaby-Tavor A, Carmeli B, Goldbraich E, Kantor A, Kour G, Shlomov S, Tepper N, Zwerdling N (2020) Do not have enough data? Deep learning to the rescue! In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 7383\u20137390","DOI":"10.1609\/aaai.v34i05.6233"},{"key":"11574_CR34","doi-asserted-by":"crossref","unstructured":"Yoo KM, Park D, Kang J, Lee SW, Park W (2021) Gpt3mix: Leveraging large-scale language models for text augmentation. In: Findings of the association for computational linguistics: EMNLP 2021, pp 2225\u20132239","DOI":"10.18653\/v1\/2021.findings-emnlp.192"},{"key":"11574_CR35","unstructured":"Kumar V, Choudhary A, Cho E (2020) Data augmentation using pre-trained transformer models. In: Proceedings of the 2nd workshop on life-long learning for spoken language systems, pp 18\u201326. Association for Computational Linguistics, Suzhou, China (2020)"},{"key":"11574_CR36","doi-asserted-by":"crossref","unstructured":"Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"11574_CR37","unstructured":"Chen Y, Liu Y (2022) Rethinking data augmentation in text-to-text paradigm. In: Proceedings of the 29th international conference on computational linguistics, pp 1157\u20131162. International Committee on Computational Linguistics, Gyeongju, Republic of Korea"},{"key":"11574_CR38","unstructured":"Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer"},{"key":"11574_CR39","doi-asserted-by":"crossref","unstructured":"Okimura I, Reid M, Kawano M, Matsuo Y (2022) On the impact of data augmentation on downstream performance in natural language processing. In: Proceedings of the third workshop on insights from negative results in NLP, pp 88\u201393","DOI":"10.18653\/v1\/2022.insights-1.12"},{"issue":"1","key":"11574_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-021-00492-0","volume":"8","author":"C Shorten","year":"2021","unstructured":"Shorten C, Khoshgoftaar TM, Furht B (2021) Text data augmentation for deep learning. J Big Data 8(1):1\u201334","journal-title":"J Big Data"},{"key":"11574_CR41","doi-asserted-by":"crossref","unstructured":"Bayer M, Kaufhold MA, Reuter C (2021) A survey on data augmentation for text classification. ACM Comput Surv (2021)","DOI":"10.1145\/3544558"},{"issue":"1","key":"11574_CR42","doi-asserted-by":"publisher","first-page":"15","DOI":"10.14513\/actatechjaur.00628","volume":"15","author":"G Cs\u00e1nyi","year":"2022","unstructured":"Cs\u00e1nyi G, Orosz T (2022) Comparison of data augmentation methods for legal document classification. Acta Technica Jaurinensis 15(1):15\u201321","journal-title":"Acta Technica Jaurinensis"},{"key":"11574_CR43","doi-asserted-by":"crossref","unstructured":"Yan G, Li Y, Zhang S, Chen Z (2019) Data augmentation for deep learning of judgment documents. In: International conference on intelligent science and big data engineering, Springer, pp 232\u2013242","DOI":"10.1007\/978-3-030-36204-1_19"},{"key":"11574_CR44","doi-asserted-by":"crossref","unstructured":"Guo Z, Liu J, He T, Li Z, Zhangzhu P (2020) Taujud: test augmentation of machine learning in judicial documents. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, pp 549\u2013552","DOI":"10.1145\/3395363.3404364"},{"key":"11574_CR45","unstructured":"Peric L, Mijic S, Stammbach D, Ash E (2020) Legal language modeling with transformers. In: Proceedings of the fourth workshop on automated semantic analysis of information in legal text (ASAIL 2020) held online in conjunction with te 33rd international conference on legal knowledge and information systems (JURIX 2020) December 9, 2020, vol 2764 (2020). CEUR-WS"},{"key":"11574_CR46","unstructured":"Nguyen HT, Nguyen LM (2021) Sublanguage: A serious issue affects pretrained models in legal domain. arXiv preprint arXiv:2104.07782"},{"key":"11574_CR47","doi-asserted-by":"crossref","unstructured":"Bonthu S, Dayal A, Lakshmi M, Rama\u00a0Sree S (2022) Effective text augmentation strategy for nlp models. In: Proceedings of third international conference on sustainable computing, pp 521\u2013531. Springer","DOI":"10.1007\/978-981-16-4538-9_51"},{"key":"11574_CR48","unstructured":"Agarap AF (2018) Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375"},{"key":"11574_CR49","unstructured":"Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings (2015)"},{"key":"11574_CR50","unstructured":"Hsu H, Lachenbruch PA (2014) Paired t test. Wiley StatsRef: statistics reference online"},{"key":"11574_CR51","doi-asserted-by":"crossref","unstructured":"Yang Y, Malaviya C, Fernandez J, Swayamdipta S, Le\u00a0Bras R, Wang JP, Bhagavatula C, Choi Y, Downey D (2020) Generative data augmentation for commonsense reasoning. In: Findings of the association for computational linguistics: EMNLP 2020, pp 1008\u20131025. Association for Computational Linguistics","DOI":"10.18653\/v1\/2020.findings-emnlp.90"},{"key":"11574_CR52","doi-asserted-by":"crossref","unstructured":"Chalkidis I, Jana A, Hartung D, Bommarito M, Androutsopoulos I, Katz DM, Aletras N (2021) Lexglue: abenchmark dataset for legal language understanding in english. arXiv preprint arXiv:2110.00976","DOI":"10.2139\/ssrn.3936759"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11574-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11574-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11574-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T16:42:04Z","timestamp":1715877724000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11574-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,23]]},"references-count":52,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["11574"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11574-4","relation":{"is-referenced-by":[{"id-type":"doi","id":"10.1007\/s44163-026-00898-w","asserted-by":"object"}]},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,23]]},"assertion":[{"value":"15 February 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors certify that they have no involvement in any firm or entity with any financial or non-financial interest in the materials covered in this document.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"121"}}