{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T06:07:08Z","timestamp":1770271628862,"version":"3.49.0"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2021,8,4]],"date-time":"2021-08-04T00:00:00Z","timestamp":1628035200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,4]],"date-time":"2021-08-04T00:00:00Z","timestamp":1628035200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003407","name":"ministero dell\u2019istruzione, dell\u2019universit\u00e0 e della ricerca","doi-asserted-by":"publisher","award":["AIM 1852414-1-1"],"award-info":[{"award-number":["AIM 1852414-1-1"]}],"id":[{"id":"10.13039\/501100003407","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Universit\u00e0 degli Studi di Bari Aldo Moro"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Law"],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In an era characterized by fast technological progress that introduces new unpredictable scenarios every day, working in the law field may appear very difficult, if not supported by the right tools. In this respect, some systems based on Artificial Intelligence methods have been proposed in the literature, to support several tasks in the legal sector. Following this line of research, in this paper we propose a novel method, called PRILJ, that identifies paragraph regularities in legal case judgments, to support legal experts during the redaction of legal documents. Methodologically, PRILJ adopts a two-step approach that first groups documents into clusters, according to their semantic content, and then identifies regularities in the paragraphs for each cluster. Embedding-based methods are adopted to properly represent documents and paragraphs into a semantic numerical feature space, and an Approximated Nearest Neighbor Search method is adopted to efficiently retrieve the most similar paragraphs with respect to the paragraphs of a document under preparation. Our extensive experimental evaluation, performed on a real-world dataset provided by EUR-Lex, proves the effectiveness and the efficiency of the proposed method. In particular, its ability of modeling different topics of legal documents, as well as of capturing the semantics of the textual content, appear very beneficial for the considered task, and make PRILJ very robust to the possible presence of noise in the data.<\/jats:p>","DOI":"10.1007\/s10506-021-09297-1","type":"journal-article","created":{"date-parts":[[2021,8,4]],"date-time":"2021-08-04T03:25:30Z","timestamp":1628047530000},"page":"359-390","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["PRILJ: an efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments"],"prefix":"10.1007","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3492-6317","authenticated-orcid":false,"given":"Graziella","family":"De Martino","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2520-3616","authenticated-orcid":false,"given":"Gianvito","family":"Pio","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6690-7583","authenticated-orcid":false,"given":"Michelangelo","family":"Ceci","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,8,4]]},"reference":[{"key":"9297_CR1","unstructured":"Berkhin P (2002) Survey of clustering data mining techniques. A Survey of Clustering Data Mining Techniques Grouping Multidimensional Data: Recent Advances in Clustering, vol 10"},{"key":"9297_CR2","unstructured":"Bernhardsson E (2015) Annoy at github. https:\/\/github.com\/spotify\/annoy"},{"key":"9297_CR3","doi-asserted-by":"crossref","unstructured":"Biagioli C, Francesconi E, Passerini A, Montemagni S, Soria C (2005) Automatic semantics extraction in law documents. In: The tenth international conference on artificial intelligence and law, proceedings of the conference, June 6-11, 2005, Bologna, Italy, ACM, pp 133\u2013140","DOI":"10.1145\/1165485.1165506"},{"key":"9297_CR4","doi-asserted-by":"crossref","unstructured":"Br\u00fcninghaus S, Ashley K (2001) Improving the representation of legal case texts with information extraction methods. In: Proceedings of the international conference on artificial intelligence and law, pp 42\u201351","DOI":"10.1145\/383535.383540"},{"key":"9297_CR5","doi-asserted-by":"publisher","first-page":"156053","DOI":"10.1109\/ACCESS.2020.3019095","volume":"8","author":"M Ceci","year":"2020","unstructured":"Ceci M, Corizzo R, Japkowicz N, Mignone P, Pio G (2020) ECHAD: embedding-based change detection from multivariate time series in smart grids. IEEE Access 8:156053\u2013156066","journal-title":"IEEE Access"},{"key":"9297_CR6","doi-asserted-by":"crossref","unstructured":"Chalkidis I, Fergadiotis M, Malakasiotis P, Aletras N, Androutsopoulos I (2020) LEGAL-BERT: The muppets straight out of law school. In: Findings of the association for computational linguistics: EMNLP 2020, Association for Computational Linguistics, Online, pp 2898\u20132904","DOI":"10.18653\/v1\/2020.findings-emnlp.261"},{"key":"#cr-split#-9297_CR7.1","doi-asserted-by":"crossref","unstructured":"Conrad JG, Al-Kofahi K, Zhao Y, Karypis G (2005) Effective document clustering for large heterogeneous law firm collections. In: Sartor G","DOI":"10.1145\/1165485.1165513"},{"key":"#cr-split#-9297_CR7.2","doi-asserted-by":"crossref","unstructured":"(ed) The tenth international conference on artificial intelligence and law, proceedings of the conference, June 6-11, 2005, Bologna, Italy, ACM, pp 177-187, 10.1145\/1165485.1165513, https:\/\/doi.org\/10.1145\/1165485.1165513","DOI":"10.1145\/1165485.1165513"},{"key":"9297_CR8","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1186\/s40537-019-0207-2","volume":"6","author":"R Corizzo","year":"2019","unstructured":"Corizzo R, Pio G, Ceci M, Malerba D (2019) DENCAST: distributed density-based clustering for multi-target regression. J Big Data 6:43","journal-title":"J Big Data"},{"key":"9297_CR9","doi-asserted-by":"publisher","first-page":"113378","DOI":"10.1016\/j.eswa.2020.113378","volume":"151","author":"R Corizzo","year":"2020","unstructured":"Corizzo R, Ceci M, Zdravevski E, Japkowicz N (2020) Scalable auto-encoders for gravitational waves detection from time series data. Expert Syst Appl 151:113378","journal-title":"Expert Syst Appl"},{"key":"9297_CR10","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171\u20134186"},{"key":"9297_CR11","doi-asserted-by":"crossref","unstructured":"Donghwa K, Seo D, Cho S, Kang P (2018) Multi-co-training for document classification using various document representations: Tf\u2013idf, lda, and doc2vec. Information Sciences, vol 477","DOI":"10.1016\/j.ins.2018.10.006"},{"key":"9297_CR12","unstructured":"Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, KDD\u201996, pp 226\u2013231"},{"key":"9297_CR13","doi-asserted-by":"crossref","unstructured":"Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Krishnapuram B, Shah M, Smola AJ, Aggarwal CC, Shen D, Rastogi R (eds) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13-17, 2016, ACM, pp 855\u2013864","DOI":"10.1145\/2939672.2939754"},{"key":"9297_CR14","doi-asserted-by":"crossref","unstructured":"Jin L, Schuler W (2015) A comparison of word similarity performance using explanatory and non-explanatory texts. In: Mihalcea R, Chai JY, Sarkar A (eds) NAACL HLT 2015, The 2015 conference of the north american chapter of the association for computational linguistics: human language technologies, Denver, Colorado, USA, May 31 - June 5, 2015, The Association for Computational Linguistics, pp 990\u2013994","DOI":"10.3115\/v1\/N15-1101"},{"key":"9297_CR15","doi-asserted-by":"publisher","first-page":"855","DOI":"10.14419\/ijet.v7i2.9657","volume":"7","author":"D Kachappilly","year":"2018","unstructured":"Kachappilly D, Wagh R (2018) Similarity analysis of court judgments using clustering of case citation data: a study. Int J Eng Technol 7:855","journal-title":"Int J Eng Technol"},{"key":"9297_CR16","doi-asserted-by":"crossref","unstructured":"Kumar A, Makhija P, Gupta A (2020) Noisy text data: Achilles\u2019 heel of bert. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp 16\u201321","DOI":"10.18653\/v1\/2020.wnut-1.3"},{"key":"9297_CR17","doi-asserted-by":"crossref","unstructured":"Kumar S, Reddy PK, Reddy VB, Singh A (2011) Similarity analysis of legal judgments. In: Proceedings of the 4th Bangalore Annual Compute Conference, Compute 2011, Bangalore, India, March 25-26, 2011, ACM, p\u00a017","DOI":"10.1145\/1980422.1980439"},{"key":"9297_CR18","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1007\/978-3-642-37134-9_9","volume-title":"Databases in networked information systems","author":"S Kumar","year":"2013","unstructured":"Kumar S, Reddy PK, Reddy VB, Suri M (2013) Finding similar legal judgements under common law system. In: Madaan A, Kikuchi S, Bhalla S (eds) Databases in networked information systems. Springer, Berlin Heidelberg, pp 103\u2013116"},{"key":"9297_CR19","unstructured":"Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: 31st International conference on machine learning, ICML 2014 4"},{"key":"9297_CR20","unstructured":"Li W, Zhang Y, Sun Y, Wang W, Zhang W, Lin X (2016) Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR"},{"key":"9297_CR21","doi-asserted-by":"crossref","unstructured":"Lu Q, Conrad JG, Al-Kofahi K, Keenan W (2011) Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM conference on information and knowledge management, CIKM 2011, Glasgow, United Kingdom, October 24-28, 2011, ACM, pp 383\u2013392","DOI":"10.1145\/2063576.2063636"},{"key":"9297_CR22","doi-asserted-by":"crossref","unstructured":"Mandal A, Chaki R, Saha S, Ghosh K, Pal A, Ghosh S (2017) Measuring similarity among legal court case documents. In: Proceedings of the 10th Annual ACM India Compute Conference, Association for Computing Machinery, Compute \u201917, pp 1\u20139","DOI":"10.1145\/3140107.3140119"},{"key":"9297_CR23","unstructured":"Maxwell KT, Schafer B (2008) Concept and context in legal information retrieval. In: Francesconi E, Sartor G, Tiscornia D (eds) Legal knowledge and information systems - JURIX 2008: the twenty-first annual conference on legal knowledge and information systems, Florence, Italy, 10-13 December 2008, IOS Press, Frontiers in Artificial Intelligence and Applications, vol 189, pp 63\u201372"},{"key":"9297_CR24","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1007\/s10506-019-09255-y","volume":"28","author":"M Medvedeva","year":"2020","unstructured":"Medvedeva M, Vols M, Wieling M (2020) Using machine learning to predict decisions of the european court of human rights. Artif Intell Law 28:237\u2013266","journal-title":"Artif Intell Law"},{"key":"9297_CR25","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26:3111\u20133119"},{"key":"9297_CR26","unstructured":"Mi\u00f1arro-Gim\u00e9nez JA, Mar\u00edn-Alonso O, Samwald M (2015) Applying deep learning techniques on medical corpora from the world wide web: a prototypical system and evaluation. CoRR"},{"key":"9297_CR27","doi-asserted-by":"crossref","unstructured":"Minocha A, Singh N, Srivastava A (2015) Finding relevant indian judgments using dispersion of citation network. In: Proceedings of the 24th International Conference on World Wide Web, Association for Computing Machinery, pp 1085\u20131088","DOI":"10.1145\/2740908.2744717"},{"key":"9297_CR28","unstructured":"Pio G, Ceci M, Loglisci C, D\u2019Elia D, Malerba D (2012) Hierarchical and overlapping co-clustering of mrna: mirna interactions. In: Raedt LD, Bessiere C, Dubois D, Doherty P, Frasconi P, Heintz F, Lucas PJF (eds) ECAI 2012 - 20th European conference on artificial intelligence. Including prestigious applications of artificial intelligence (PAIS-2012) system demonstrations track, Montpellier, France, August 27-31 , 2012, IOS Press, Frontiers in Artificial Intelligence and Applications, vol 242, pp 654\u2013659"},{"issue":"6","key":"9297_CR29","doi-asserted-by":"publisher","first-page":"1231","DOI":"10.1007\/s10994-019-05861-8","volume":"109","author":"G Pio","year":"2020","unstructured":"Pio G, Ceci M, Prisciandaro F, Malerba D (2020) Exploiting causality in gene network reconstruction based on graph embedding. Mach Learn 109(6):1231\u20131279","journal-title":"Mach Learn"},{"key":"9297_CR30","doi-asserted-by":"crossref","unstructured":"Raghav K, Reddy P, Reddy V, Krishna\u00a0RP (2015) Text and citations based cluster analysis of legal judgments. In: Mining Intelligence and Knowledge Exploration, Springer International Publishing, pp 449\u2013459","DOI":"10.1007\/978-3-319-26832-3_42"},{"key":"9297_CR31","doi-asserted-by":"crossref","unstructured":"Shao Y, Mao J, Liu Y, Ma W, Satoh K, Zhang M, Ma S (2020) Bert-pli: Modeling paragraph-level interactions for legal case retrieval. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20, pp 3501\u20133507","DOI":"10.24963\/ijcai.2020\/484"},{"issue":"1","key":"9297_CR32","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1007\/s10506-017-9197-6","volume":"25","author":"O Shulayeva","year":"2017","unstructured":"Shulayeva O, Siddharthan A, Wyner A (2017) Recognizing cited facts and principles in legal judgements. Artif Intell Law 25(1):107\u2013126","journal-title":"Artif Intell Law"},{"key":"9297_CR33","doi-asserted-by":"publisher","first-page":"791","DOI":"10.1016\/j.ipm.2004.04.015","volume":"40","author":"M Silveira","year":"2004","unstructured":"Silveira M, Ribeiro-neto B (2004) Concept-based ranking: A case study in the juridical domain. Inf Process Manage 40:791\u2013805","journal-title":"Inf Process Manage"},{"key":"9297_CR34","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1561\/2200000013","volume":"4","author":"C Sutton","year":"2012","unstructured":"Sutton C, McCallum A (2012) An introduction to conditional random fields. Found Trends Mach Learn 4:267\u2013373","journal-title":"Found Trends Mach Learn"},{"key":"9297_CR35","unstructured":"Thenmozhi D, Kannan K, Aravindan C (2017) A text similarity approach for precedence retrieval from legal documents. In: Working notes of FIRE 2017 - Forum for Information Retrieval Evaluation, Bangalore, India, December 8-10, 2017, CEUR-WS.org, CEUR Workshop Proceedings, vol 2036, pp 90\u201391"},{"key":"9297_CR36","unstructured":"Tomlinson S, Oard DW, Baron JR, Thompson P (2007) Overview of the TREC 2007 legal track. In: Proceedings of The Sixteenth Text REtrieval Conference, TREC 2007, Gaithersburg, Maryland, USA, November 5-9, 2007, National Institute of Standards and Technology (NIST), NIST Special Publication, vol 500-274"},{"key":"9297_CR37","unstructured":"Trompper M, Winkels R (2016) Automatic assignment of section structure to texts of dutch court judgments. In: Legal Knowledge and Information Systems - JURIX 2016: The Twenty-Ninth Annual Conference, IOS Press, Frontiers in Artificial Intelligence and Applications, vol 294, pp 167\u2013172"},{"key":"9297_CR38","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1007\/s10618-005-0361-3","volume":"10","author":"Y Zhao","year":"2005","unstructured":"Zhao Y, Karypis G, Fayyad U (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov 10:141\u2013168","journal-title":"Data Min Knowl Discov"},{"key":"9297_CR39","doi-asserted-by":"crossref","unstructured":"Zhong H, Xiao C, Tu C, Zhang T, Liu Z, Sun M (2020) How does NLP benefit legal system: A summary of legal artificial intelligence. CoRR arXiv:2004.12158","DOI":"10.18653\/v1\/2020.acl-main.466"}],"container-title":["Artificial Intelligence and Law"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10506-021-09297-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10506-021-09297-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10506-021-09297-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,26]],"date-time":"2022-08-26T06:10:20Z","timestamp":1661494220000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10506-021-09297-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,4]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["9297"],"URL":"https:\/\/doi.org\/10.1007\/s10506-021-09297-1","relation":{},"ISSN":["0924-8463","1572-8382"],"issn-type":[{"value":"0924-8463","type":"print"},{"value":"1572-8382","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,4]]},"assertion":[{"value":"10 July 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 August 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}