{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T21:14:21Z","timestamp":1775164461908,"version":"3.50.1"},"reference-count":107,"publisher":"Cambridge University Press (CUP)","issue":"2","license":[{"start":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T00:00:00Z","timestamp":1648598400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2023,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Word embeddings have become important building blocks that are used profoundly in natural language processing (NLP). Despite their several advantages, word embeddings can unintentionally accommodate some gender- and ethnicity-based biases that are present within the corpora they are trained on. Therefore, ethical concerns have been raised since word embeddings are extensively used in several high-level algorithms. Studying such biases and debiasing them have recently become an important research endeavor. Various studies have been conducted to measure the extent of bias that word embeddings capture and to eradicate them. Concurrently, as another subfield that has started to gain traction recently, the applications of NLP in the field of law have started to increase and develop rapidly. As law has a direct and utmost effect on people\u2019s lives, the issues of bias for NLP applications in legal domain are certainly important. However, to the best of our knowledge, bias issues have not yet been studied in the context of legal corpora. In this article, we approach the gender bias problem from the scope of legal text processing domain. Word embedding models that are trained on corpora composed by legal documents and legislation from different countries have been utilized to measure and eliminate gender bias in legal documents. Several methods have been employed to reveal the degree of gender bias and observe its variations over countries. Moreover, a debiasing method has been used to neutralize unwanted bias. The preservation of semantic coherence of the debiased vector space has also been demonstrated by using high-level tasks. Finally, overall results and their implications have been discussed in the scope of NLP in legal domain.<\/jats:p>","DOI":"10.1017\/s1351324922000122","type":"journal-article","created":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T08:55:52Z","timestamp":1648630552000},"page":"449-482","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":11,"title":["Gender bias in legal corpora and debiasing it"],"prefix":"10.1017","volume":"29","author":[{"given":"Nurullah","family":"Sevim","sequence":"first","affiliation":[]},{"given":"Furkan","family":"\u015eahinu\u00e7","sequence":"additional","affiliation":[]},{"given":"Aykut","family":"Ko\u00e7","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2022,3,30]]},"reference":[{"key":"S1351324922000122_ref78","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"S1351324922000122_ref2","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(03)00105-X"},{"key":"S1351324922000122_ref77","author":"Perez","year":"2019"},{"key":"S1351324922000122_ref44","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1141"},{"key":"S1351324922000122_ref89","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-2208"},{"key":"S1351324922000122_ref82","doi-asserted-by":"crossref","unstructured":"Rudinger, R. , Naradowsky, J. , Leonard, B. and Van Durme, B. (2018). Gender bias in coreference resolution. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana. Association for Computational Linguistics.","DOI":"10.18653\/v1\/N18-2002"},{"key":"S1351324922000122_ref30","doi-asserted-by":"publisher","DOI":"10.1145\/3287560.3287572"},{"key":"S1351324922000122_ref40","unstructured":"Galgani, F. , Compton, P. and Hoffmann, A. (2012). Combining different summarization techniques for legal text. In Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data (HYBRID), USA. Association for Computational Linguistics, pp. 115\u2013123."},{"key":"S1351324922000122_ref29","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324918000475"},{"key":"S1351324922000122_ref9","unstructured":"Bartl, M. , Nissim, M. and Gatt, A. (2020). Unmasking contextual stereotypes: Measuring and mitigating BERT\u2019s gender bias. In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, Spain (Online). Barcelona: Association for Computational Linguistics, pp. 1\u201316."},{"key":"S1351324922000122_ref58","first-page":"558","author":"Long","year":"2019"},{"key":"S1351324922000122_ref13","unstructured":"Bolukbasi, T. , Chang, K.-W. , Zou, J. , Saligrama, V. and Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), Red Hook, NY, USA. Curran Associates Inc., pp. 4356\u20134364."},{"key":"S1351324922000122_ref22","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2041"},{"key":"S1351324922000122_ref28","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1061"},{"key":"S1351324922000122_ref49","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0174698"},{"key":"S1351324922000122_ref69","unstructured":"Murphy, B. , Talukdar, P. and Mitchell, T. (2012). Learning effective and interpretable semantic models using non-negative sparse embedding. In Proceedings of COLING, Mumbai, India. The COLING 2012 Organizing Committee, pp. 1933\u20131950."},{"key":"S1351324922000122_ref36","doi-asserted-by":"publisher","DOI":"10.1017\/S135132490600427X"},{"key":"S1351324922000122_ref64","unstructured":"Mikolov, T. , Chen, K. , Corrado, G. and Dean, J. (2013a). Efficient estimation of word representations in vector space. In Bengio Y. and LeCun Y. (eds), 1st International Conference on Learning Representations (ICLR), Workshop Track Proceedings, Scottsdale, Arizona, USA."},{"key":"S1351324922000122_ref37","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1144"},{"key":"S1351324922000122_ref96","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324919000111"},{"key":"S1351324922000122_ref21","doi-asserted-by":"publisher","DOI":"10.1145\/3086512.3086515"},{"key":"S1351324922000122_ref4","doi-asserted-by":"publisher","DOI":"10.1016\/0020-7373(91)90011-U"},{"key":"S1351324922000122_ref5","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114920"},{"key":"S1351324922000122_ref17","doi-asserted-by":"publisher","DOI":"10.1126\/science.aal4230"},{"key":"S1351324922000122_ref73","doi-asserted-by":"publisher","DOI":"10.1007\/s10506-018-9225-1"},{"key":"S1351324922000122_ref54","doi-asserted-by":"crossref","unstructured":"Lai, S. , Xu, L. , Liu, K. and Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press, pp. 2267\u20132273.","DOI":"10.1609\/aaai.v29i1.9513"},{"key":"S1351324922000122_ref32","doi-asserted-by":"publisher","DOI":"10.1145\/3278721.3278729"},{"key":"S1351324922000122_ref46","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1588"},{"key":"S1351324922000122_ref14","first-page":"465","author":"Branting","year":"2018"},{"key":"S1351324922000122_ref60","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1062"},{"key":"S1351324922000122_ref68","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2021.102684"},{"key":"S1351324922000122_ref62","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1063"},{"key":"S1351324922000122_ref51","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S18-2005"},{"key":"S1351324922000122_ref38","volume-title":"Lecture Notes in Computer Science","volume":"6036","author":"Francesconi","year":"2010"},{"key":"S1351324922000122_ref106","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1521"},{"key":"S1351324922000122_ref107","doi-asserted-by":"publisher","DOI":"10.1038\/d41586-018-05707-8"},{"key":"S1351324922000122_ref85","doi-asserted-by":"crossref","unstructured":"Sartor, G. and Rotolo, A. (2013). Agreement Technologies, Chapter AI and Law. New York: Springer, pp. 199\u2013207.","DOI":"10.1007\/978-94-007-5583-3_13"},{"key":"S1351324922000122_ref27","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324916000334"},{"key":"S1351324922000122_ref39","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1113"},{"key":"S1351324922000122_ref55","first-page":"272","volume-title":"International Conference on Semantic Systems","author":"Leitner","year":"2019"},{"key":"S1351324922000122_ref66","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S18-1001"},{"key":"S1351324922000122_ref23","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-2209"},{"key":"S1351324922000122_ref48","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1160"},{"key":"S1351324922000122_ref45","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"S1351324922000122_ref95","volume-title":"IBM China Research","author":"Tang","year":"2016"},{"key":"S1351324922000122_ref11","doi-asserted-by":"publisher","DOI":"10.1007\/s10506-012-9131-x"},{"key":"S1351324922000122_ref74","doi-asserted-by":"crossref","unstructured":"O\u2019Neill, J. , Buitelaar, P. , Robin, C. and O\u2019Brien, L. (2017). Classifying sentential modality in legal language: A use case in financial regulations, acts and directives. In Proceedings of the 16th Edition of the International Conference on Artificial Intelligence and Law (ICAIL), New York, NY, USA. Association for Computing Machinery, pp. 159\u2013168.","DOI":"10.1145\/3086512.3086528"},{"key":"S1351324922000122_ref98","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324920000406"},{"key":"S1351324922000122_ref99","doi-asserted-by":"crossref","unstructured":"Vardhan, H. , Surana, N. and Tripathy, B. (2020). Named-entity recognition for legal documents. In International Conference on Advanced Machine Learning Technologies and Applications. Springer, pp. 469\u2013479.","DOI":"10.1007\/978-981-15-3383-9_43"},{"key":"S1351324922000122_ref47","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2068"},{"key":"S1351324922000122_ref3","unstructured":"Ashley, K.D. (1988). Modelling Legal Argument: Reasoning with Cases and Hypotheticals. PhD thesis, University of Massachusetts, USA. Order No: GAX88-13198."},{"key":"S1351324922000122_ref19","volume-title":"Lecture Notes in Computer Science","volume":"8929","author":"Casanovas","year":"2013"},{"key":"S1351324922000122_ref65","unstructured":"Mikolov, T. , Sutskever, I. , Chen, K. , Corrado, G. and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS) - Volume 2, Red Hook, NY, USA. Curran Associates Inc., pp. 3111\u20133119."},{"key":"S1351324922000122_ref43","doi-asserted-by":"publisher","DOI":"10.1023\/A:1019516031847"},{"key":"S1351324922000122_ref15","unstructured":"Brunet, M.-E. , Alkalay-Houlihan, C. , Anderson, A. and Zemel, R. (2019). Understanding the origins of bias in word embeddings. In Chaudhuri, K. and Salakhutdinov, R. (eds), Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97. PMLR, pp. 803\u2013811."},{"key":"S1351324922000122_ref12","doi-asserted-by":"publisher","DOI":"10.1007\/s12559-021-09881-2"},{"key":"S1351324922000122_ref1","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.93"},{"key":"S1351324922000122_ref31","unstructured":"Devlin, J. , Chang, M.-W. , Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 4171\u20134186."},{"key":"S1351324922000122_ref81","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"S1351324922000122_ref67","unstructured":"Morimoto, A. , Kubo, D. , Sato, M. , Shindo, H. and Matsumoto, Y. (2017). Legal question answering system using neural attention. In Satoh K., Kim M., Kano Y., Goebel R. and Oliveira T. (eds), 4th Competition on Legal Information Extraction and Entailment (COLIEE), held in conjunction with the 16th International Conference on Artificial Intelligence and Law (ICAIL) in King\u2019s College London, UK, EPiC Series in Computing, vol. 47. EasyChair, pp. 79\u201389."},{"key":"S1351324922000122_ref86","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324920000315"},{"key":"S1351324922000122_ref61","doi-asserted-by":"publisher","DOI":"10.1017\/S1537592704040502"},{"key":"S1351324922000122_ref25","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.3936759"},{"key":"S1351324922000122_ref35","doi-asserted-by":"publisher","DOI":"10.1145\/3299819.3299846"},{"key":"S1351324922000122_ref53","first-page":"4066","volume-title":"Advances in Neural Information Processing Systems 30","author":"Kusner","year":"2017"},{"key":"S1351324922000122_ref6","doi-asserted-by":"publisher","DOI":"10.1007\/s10506-009-9077-9"},{"key":"S1351324922000122_ref41","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1720347115"},{"key":"S1351324922000122_ref63","doi-asserted-by":"publisher","DOI":"10.1007\/s10506-019-09255-y"},{"key":"S1351324922000122_ref79","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324920000170"},{"key":"S1351324922000122_ref84","doi-asserted-by":"publisher","DOI":"10.1109\/ICoAC.2017.7951772"},{"key":"S1351324922000122_ref91","doi-asserted-by":"publisher","DOI":"10.26615\/978-954-452-049-6_092"},{"key":"S1351324922000122_ref80","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-3810"},{"key":"S1351324922000122_ref93","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324905004080"},{"key":"S1351324922000122_ref16","doi-asserted-by":"publisher","DOI":"10.2307\/1227753"},{"key":"S1351324922000122_ref52","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-3823"},{"key":"S1351324922000122_ref75","unstructured":"O\u2019Sullivan, C. and Beel, J. (2019). Predicting the outcome of judicial decisions made by the european court of human rights. In In Proceedings of the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland."},{"key":"S1351324922000122_ref50","first-page":"282","author":"Kim","year":"2017"},{"key":"S1351324922000122_ref90","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1164"},{"key":"S1351324922000122_ref94","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1146"},{"key":"S1351324922000122_ref18","doi-asserted-by":"publisher","DOI":"10.1145\/3086512.3086514"},{"key":"S1351324922000122_ref42","doi-asserted-by":"crossref","unstructured":"Gonen, H. and Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. Computing Research Repository, arXiv:1903.03862. version 2.","DOI":"10.18653\/v1\/N19-1061"},{"key":"S1351324922000122_ref59","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-99722-3_32"},{"key":"S1351324922000122_ref83","doi-asserted-by":"publisher","DOI":"10.2307\/4099370"},{"key":"S1351324922000122_ref97","doi-asserted-by":"publisher","DOI":"10.3115\/1119176.1119195"},{"key":"S1351324922000122_ref88","doi-asserted-by":"publisher","DOI":"10.1109\/RE.2018.00022"},{"key":"S1351324922000122_ref20","unstructured":"Chalkidis, I. and Androutsopoulos, I. (2017). A deep learning approach to contract element extraction. In Wyner A.Z. and Casini, G. (eds), Legal Knowledge and Information Systems - (JURIX): The Thirtieth Annual Conference, Frontiers in Artificial Intelligence and Applications, vol. 302, Luxembourg. IOS Press, pp. 155\u2013164."},{"key":"S1351324922000122_ref102","doi-asserted-by":"publisher","DOI":"10.1145\/3278721.3278779"},{"key":"S1351324922000122_ref105","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1323"},{"key":"S1351324922000122_ref26","doi-asserted-by":"publisher","DOI":"10.1007\/s10506-018-9238-9"},{"key":"S1351324922000122_ref8","doi-asserted-by":"publisher","DOI":"10.1145\/2425327.2425330"},{"key":"S1351324922000122_ref33","unstructured":"Do, P.-K. , Nguyen, H.-T. , Tran, C.-X. , Nguyen, M.-T. and Nguyen, M.-L. (2017). Legal question answering using ranking svm and deep convolutional neural network. arXiv preprint arXiv:1703.05320."},{"key":"S1351324922000122_ref57","doi-asserted-by":"publisher","DOI":"10.1145\/3372124.3372128"},{"key":"S1351324922000122_ref71","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324919000305"},{"key":"S1351324922000122_ref76","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"S1351324922000122_ref56","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.488"},{"key":"S1351324922000122_ref72","unstructured":"Nejadgholi, I. , Bougueng, R. and Witherspoon, S. (2017). A semi-supervised training method for semantic search of legal facts in Canadian immigration cases. In Wyner, A.Z. and Casini G. (eds), Legal Knowledge and Information Systems - (JURIX): The Thirtieth Annual Conference, Luxembourg, 13\u201315 December 2017, Frontiers in Artificial Intelligence and Applications, vol. 302. IOS Press, pp. 125\u2013134."},{"key":"S1351324922000122_ref34","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12837-0_2"},{"key":"S1351324922000122_ref70","unstructured":"Nanda, R. , John, A.K. , Caro, L.D. , Boella, G. and Robaldo, L. (2017). Legal information retrieval using topic clustering and neural networks. In Satoh K., Kim M.-Y., Kano Y., Goebel R. and Oliveira T. (eds), 4th Competition on Legal Information Extraction and Entailment (COLIEE), EPiC Series in Computing, vol. 47. EasyChair, pp. 68\u201378."},{"key":"S1351324922000122_ref24","unstructured":"Chalkidis, I. , Fergadiotis, M. , Malakasiotis, P. , Aletras, N. and Androutsopoulos, I. (2020). Legal-bert: \u2018Ppreparing the muppets for court\u201d. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 2898\u20132904."},{"key":"S1351324922000122_ref7","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324920000029"},{"key":"S1351324922000122_ref92","first-page":"13230","author":"Tan","year":"2019"},{"key":"S1351324922000122_ref100","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSAC.2018.10348"},{"key":"S1351324922000122_ref101","doi-asserted-by":"publisher","DOI":"10.1145\/3086512.3086531"},{"key":"S1351324922000122_ref103","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1064"},{"key":"S1351324922000122_ref87","doi-asserted-by":"publisher","DOI":"10.1007\/s10506-017-9197-6"},{"key":"S1351324922000122_ref10","unstructured":"Baziotis, C. and Jafari, B. 2018. ntua-slp-semeval2018. https:\/\/github.com\/cbaziotis\/ntua-slp-semeval2018."},{"key":"S1351324922000122_ref104","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2003"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324922000122","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,13]],"date-time":"2023-03-13T04:19:52Z","timestamp":1678681192000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324922000122\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,30]]},"references-count":107,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["S1351324922000122"],"URL":"https:\/\/doi.org\/10.1017\/s1351324922000122","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,30]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}]}}