{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T05:01:49Z","timestamp":1772773309504,"version":"3.50.1"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,2,17]],"date-time":"2022-02-17T00:00:00Z","timestamp":1645056000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,2,17]],"date-time":"2022-02-17T00:00:00Z","timestamp":1645056000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Legal professionals strongly demand an automatic and convenient legal document recommendation system (LDRS) to identify similar judgments for preparing the advantageous and strategic arguments in the Court. Doc2Vec excellently learns semantically rich embedding (i.e., vector) space from the textual information of judgment corpus. During Doc2Vec learning, the practice of prior domain-specific knowledge can potentially enhance the embedding representation. This research thus proposes a pre-learned word embedding based LDRS (P-LDRS) that learns the Doc2Vec embedding using Legal domain-specific pre-learned word embedding possessing the Legal semantic knowledge. However, learning the judgment embedding from existing substantial Legal documents turns out to be a scalability issue for Doc2Vec. The proposed P-LDRS also provides additional functionality to learn the judgment embedding distributedly over the cluster of computing nodes using frameworks like MapReduce and Spark to address the scalability issue. The empirical analysis is performed with a non-distributed and a distributed variant of the proposed P-LDRS to validate the effectiveness and scalability. Experiment results showcase that proposed non-distributed P-LDRS perform significantly better than traditional Doc2Vec based LDRS with an Accuracy of 0.88, F1-Score of 0.82 and MCC Score of 0.73. They also demonstrate that the proposed distributed P-LDRS improves the time efficiency and achieves stable Accuracy of <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\approx $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u2248<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>0.88, F1-Score of <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\approx $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u2248<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>0.83 and MCC Score of <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\approx $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u2248<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>0.72, with an increasing number of nodes.<\/jats:p>","DOI":"10.1007\/s40747-022-00673-1","type":"journal-article","created":{"date-parts":[[2022,2,17]],"date-time":"2022-02-17T03:02:33Z","timestamp":1645066953000},"page":"3199-3213","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Effective and scalable legal judgment recommendation using pre-learned word embedding"],"prefix":"10.1007","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6783-4435","authenticated-orcid":false,"given":"Jenish","family":"Dhanani","sequence":"first","affiliation":[]},{"given":"Rupa","family":"Mehta","sequence":"additional","affiliation":[]},{"given":"Dipti","family":"Rana","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,2,17]]},"reference":[{"issue":"Jan","key":"673_CR1","first-page":"993","volume":"3","author":"D Blei","year":"2003","unstructured":"Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993\u20131022","journal-title":"J Mach Learn Res"},{"key":"673_CR2","doi-asserted-by":"crossref","unstructured":"Chakrabarti D, Patodia N, Bhattacharya U, Mitra I, Roy S, Mandi J, Roy N, Nandy P (2018) Use of artificial intelligence to analyse risk in legal documents for a better decision support. In: TENCON 2018-2018 IEEE region 10 conference, IEEE, pp 683\u2013688","DOI":"10.1109\/TENCON.2018.8650382"},{"issue":"2","key":"673_CR3","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/s10506-018-9238-9","volume":"27","author":"I Chalkidis","year":"2019","unstructured":"Chalkidis I, Kampas D (2019) Deep learning in law: early adaptation and legal word embeddings trained on large corpora. Artificial Intell Law 27(2):171\u2013198","journal-title":"Artificial Intell Law"},{"key":"673_CR4","doi-asserted-by":"publisher","first-page":"132027","DOI":"10.1109\/ACCESS.2019.2937220","volume":"7","author":"LLH Chang","year":"2019","unstructured":"Chang LLH, Phoa FKH, Nakano J (2019) A new metric for the analysis of the scientific article citation network. IEEE Access 7:132027\u2013132032","journal-title":"IEEE Access"},{"issue":"35","key":"673_CR5","first-page":"1","volume":"10","author":"D Chicco","year":"2017","unstructured":"Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Min 10(35):1\u201317","journal-title":"BioData Min"},{"issue":"1","key":"673_CR6","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1145\/1327452.1327492","volume":"51","author":"J Dean","year":"2008","unstructured":"Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107\u2013113","journal-title":"Commun ACM"},{"key":"673_CR7","doi-asserted-by":"crossref","unstructured":"Dhanani J, Mehta R, Rana D, Tidke B (2018) Sentiment analysis using novel distributed word embedding for movie reviews. In: proceedings of 10th International Conference on Advanced Computing (ICoAC), IEEE, pp 138\u2013145","DOI":"10.1109\/ICoAC44903.2018.8939104"},{"issue":"5","key":"673_CR8","doi-asserted-by":"publisher","first-page":"5497","DOI":"10.3233\/JIFS-189871","volume":"41","author":"J Dhanani","year":"2021","unstructured":"Dhanani J, Mehta R, Rana D (2021) Legal document recommendation system: a cluster based pairwise similarity computation. J Intell Fuzzy Syst 41(5):5497\u20135509","journal-title":"J Intell Fuzzy Syst"},{"key":"673_CR9","unstructured":"Farhangi A (2018) Legal domain-specific pre-trained word vectors. https:\/\/github.com\/ashkonf\/LeGloVe"},{"key":"673_CR10","doi-asserted-by":"crossref","unstructured":"Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 855\u2013864","DOI":"10.1145\/2939672.2939754"},{"key":"673_CR11","doi-asserted-by":"crossref","unstructured":"Guo C, Lu M, Wei W (2019) An improved lda topic modeling method based on partition for medium and long texts. Ann Data Sci pp 1\u201314","DOI":"10.1007\/s40745-019-00218-3"},{"key":"673_CR12","unstructured":"Ji S, Satish N, Li S, Dubey P (2016) Parallelizing word2vec in shared and distributed memory. arXiv preprint arXiv:1604.04661"},{"issue":"2","key":"673_CR13","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1093\/comnet\/cnx029","volume":"6","author":"M Koniaris","year":"2017","unstructured":"Koniaris M, Anagnostopoulos I, Vassiliou Y (2017) Network analysis in the legal domain: a complex model for European Union legal sources. J Complex Netw 6(2):243\u2013268","journal-title":"J Complex Netw"},{"key":"673_CR14","doi-asserted-by":"crossref","unstructured":"Kumar S, Reddy PK, Reddy VB, Singh A (2011) Similarity analysis of legal judgments. In: Proceedings of the fourth annual ACM Bangalore conference, pp 1\u20134","DOI":"10.1145\/1980422.1980439"},{"key":"673_CR15","doi-asserted-by":"crossref","unstructured":"Kumar S, Reddy PK, Reddy VB, Suri M (2013) Finding similar legal judgements under common law system. In: International Workshop on Databases in Networked Information Systems, Springer, pp 103\u2013116","DOI":"10.1007\/978-3-642-37134-9_9"},{"key":"673_CR16","doi-asserted-by":"crossref","unstructured":"Lau JH, Baldwin T (2016) An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368","DOI":"10.18653\/v1\/W16-1609"},{"key":"673_CR17","unstructured":"Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188\u20131196"},{"issue":"2","key":"673_CR18","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1007\/s10506-018-9224-2","volume":"26","author":"G Leibon","year":"2018","unstructured":"Leibon G, Livermore M, Harder R, Riddell A, Rockmore D (2018) Bending the law: geometric tools for quantifying influence in the multinetwork of legal opinions. Artificial Intell Law 26(2):145\u2013167","journal-title":"Artificial Intell Law"},{"issue":"10","key":"673_CR19","first-page":"883","volume":"13","author":"S Lodha","year":"2019","unstructured":"Lodha S, Wagh R (2019) Exploratory analysis of legal case citation data using node embedding. ICIC Express Lett 13(10):883\u2013889","journal-title":"ICIC Express Lett"},{"key":"673_CR20","doi-asserted-by":"crossref","unstructured":"Mandal A, Chaki R, Saha S, Ghosh K, Pal A, Ghosh S (2017) Measuring similarity among legal court case documents. In: Proceedings of the 10th annual ACM India compute conference, ACM, pp 1\u20139","DOI":"10.1145\/3140107.3140119"},{"issue":"4","key":"673_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/app9040743","volume":"9","author":"S Martin\u010di\u0107-Ip\u0161i\u0107","year":"2019","unstructured":"Martin\u010di\u0107-Ip\u0161i\u0107 S, Mili\u010di\u0107 T, Todorovski L (2019) The influence of feature representation of text on the performance of document classification. Appl Sci 9(4):1\u201327","journal-title":"Appl Sci"},{"key":"673_CR22","unstructured":"Mihalcea R, Tarau P (2004) Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404\u2013411"},{"key":"673_CR23","unstructured":"Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781"},{"key":"673_CR24","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111\u20133119"},{"key":"673_CR25","doi-asserted-by":"crossref","unstructured":"Mou L, Meng Z, Yan R, Li G, Xu Y, Zhang L, Jin Z (2016) How transferable are neural networks in nlp applications? In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 479\u2013489","DOI":"10.18653\/v1\/D16-1046"},{"key":"673_CR26","unstructured":"Nanda R, Adebayo KJ, Di\u00a0Caro L, Boella G, Robaldo L (2017) Legal information retrieval using topic clustering and neural networks. In: COLIEE@ ICAIL, pp 68\u201378"},{"key":"673_CR27","doi-asserted-by":"crossref","unstructured":"Ordentlich E, Yang L, Feng A, Cnudde P, Grbovic M, Djuric N, Radosavljevic V, Owens G (2016) Network-efficient distributed word2vec training system for large vocabularies. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 1139\u20131148","DOI":"10.1145\/2983323.2983361"},{"key":"673_CR28","first-page":"302","volume":"2017","author":"K Patel","year":"2017","unstructured":"Patel K, Patel D, Golakiya M, Bhattacharyya P, Birari N (2017) Adapting pre-trained word embeddings for use in medical coding. BioNLP 2017:302\u2013306","journal-title":"BioNLP"},{"key":"673_CR29","doi-asserted-by":"crossref","unstructured":"Raghav K, Reddy PB, Reddy VB, Reddy PK (2015) Text and citations based analysis of legal judgments. In: International Conference on Mining Intelligence and Knowledge Exploration, Springer, pp 449\u2013459","DOI":"10.1007\/978-3-319-26832-3_42"},{"key":"673_CR30","doi-asserted-by":"crossref","unstructured":"Sugathadasa K, Ayesha\u00a0et al B (2017) Synergistic union of word2vec and lexicon for domain specific semantic similarity. In: 2017 IEEE International conference on industrial and information systems (ICIIS), IEEE, pp 1\u20136","DOI":"10.1109\/ICIINFS.2017.8300343"},{"key":"673_CR31","doi-asserted-by":"crossref","unstructured":"Sugathadasa K, Ayesha B, de\u00a0Silva N, Perera AS, Jayawardana V, Lakmal D, Perera M (2018) Legal document retrieval using document vector embeddings and deep learning. In: Science and Information Conference, Springer, pp 160\u2013175","DOI":"10.1007\/978-3-030-01177-2_12"},{"issue":"10\u201310","key":"673_CR32","first-page":"95","volume":"10","author":"M Zaharia","year":"2010","unstructured":"Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I et al (2010) Spark: Cluster computing with working sets. HotCloud 10(10\u201310):95","journal-title":"HotCloud"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00673-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00673-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00673-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,3]],"date-time":"2022-08-03T10:26:09Z","timestamp":1659522369000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00673-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,17]]},"references-count":32,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,8]]}},"alternative-id":["673"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00673-1","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,17]]},"assertion":[{"value":"28 March 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 January 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 February 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}]}}