{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T05:14:42Z","timestamp":1765775682761,"version":"3.48.0"},"reference-count":43,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T00:00:00Z","timestamp":1764892800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>This paper presents an in-production Rhetorical Role Labeling (RRL) classifier developed for Hungarian judicial decisions. RRL is a sequential classification problem in Natural Language Processing, aiming to assign functional roles (such as facts, arguments, decision, etc.) to every segment or sentence in a legal document. The study was conducted on a human-annotated sentence-level RRL corpus and compares multiple neural architectures, including BiLSTM, attention-based networks, and a support vector machine as baseline. It further investigates the impact of late chunking during vectorization, in contrast to classical approaches. Results from tests on the labeled dataset and annotator agreement statistics are reported, and performance is analyzed across architecture types and embedding strategies. Contrary to recent findings in retrieval tasks, late chunking does not show consistent improvements for sentence-level RRL, suggesting that contextualization through chunk embeddings may introduce noise rather than useful context in Hungarian legal judgments. The work also discusses the unique structure and labeling challenges of Hungarian cases compared to international datasets and provides empirical insights for future legal NLP research in non-English court decisions.<\/jats:p>","DOI":"10.3390\/bdcc9120315","type":"journal-article","created":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T15:11:24Z","timestamp":1764947484000},"page":"315","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Sentence-Level Rhetorical Role Labeling in Judicial Decisions"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8475-5969","authenticated-orcid":false,"given":"Gergely M\u00e1rk","family":"Cs\u00e1nyi","sequence":"first","affiliation":[{"name":"MONTANA Knowledge Management Ltd., H-1029 Budapest, Hungary"},{"name":"Department of Electric Power Engineering, Budapest University of Technology and Economics, H-1111 Budapest, Hungary"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5897-9379","authenticated-orcid":false,"given":"Istv\u00e1n","family":"\u00dcveges","sequence":"additional","affiliation":[{"name":"MONTANA Knowledge Management Ltd., H-1029 Budapest, Hungary"},{"name":"Political and Legal Text Mining & Artificial Intelligence Laboratory (poltextLAB), ELTE Centre for Social Sciences, H-1097 Budapest, Hungary"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4644-9486","authenticated-orcid":false,"given":"Dorina","family":"Lakatos","sequence":"additional","affiliation":[{"name":"MONTANA Knowledge Management Ltd., H-1029 Budapest, Hungary"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9948-4852","authenticated-orcid":false,"given":"D\u00f3ra","family":"Ripsz\u00e1m","sequence":"additional","affiliation":[{"name":"UNESCO Chair on Digital Platforms for Learning Societies, Institute of the Information Society, Ludovika University of Public Service, H-1083 Budapest, Hungary"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7576-3517","authenticated-orcid":false,"given":"Korn\u00e9lia","family":"Koz\u00e1k","sequence":"additional","affiliation":[{"name":"Department of European Public and Private Law, Faculty of Public Governance and International Studies, Ludovika University of Public Service, H-1083 Budapest, Hungary"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0467-1410","authenticated-orcid":false,"given":"D\u00e1niel","family":"Nagy","sequence":"additional","affiliation":[{"name":"MONTANA Knowledge Management Ltd., H-1029 Budapest, Hungary"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3848-6096","authenticated-orcid":false,"given":"J\u00e1nos P\u00e1l","family":"Vad\u00e1sz","sequence":"additional","affiliation":[{"name":"MONTANA Knowledge Management Ltd., H-1029 Budapest, Hungary"},{"name":"UNESCO Chair on Digital Platforms for Learning Societies, Institute of the Information Society, Ludovika University of Public Service, H-1083 Budapest, Hungary"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Cs\u00e1nyi, G.M., Lakatos, D., \u00dcveges, I., Megyeri, A., Vad\u00e1sz, J.P., Nagy, D., and V\u00e1gi, R. (2024). From Fact Drafts to Operational Systems: Semantic Search in Legal Decisions Using Fact Drafts. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8120185"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wang, H., He, T., Zou, Z., Shen, S., and Li, Y. (2019, January 22\u201326). Using case facts to predict accusation based on deep learning. Proceedings of the 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Sofia, Bulgaria.","DOI":"10.1109\/QRS-C.2019.00038"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Muhammed, A., Muslihuddeen, H., Sankar, S., and Kumar, M.A. (2024, January 15\u201316). Impact of Rhetorical Roles in Abstractive Legal Document Summarization. Proceedings of the 2024 5th International Conference on Innovative Trends in Information Technology (ICITIIT), Kottayam, India.","DOI":"10.1109\/ICITIIT61487.2024.10580671"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., and Wyner, A. (2019). Identification of rhetorical roles of sentences in Indian legal judgments. Legal Knowledge and Information Systems, IOS Press.","DOI":"10.3233\/FAIA190301"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Malik, V., Sanjay, R., Guha, S.K., Hazarika, A., Nigam, S.K., Bhattacharya, A., and Modi, A. (2022, January 8). Semantic segmentation of legal documents via rhetorical roles. Proceedings of the Natural Legal Language Processing Workshop 2022, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.nllp-1.13"},{"key":"ref_6","unstructured":"Santosh, T., Isaia, A., Hong, S., and Grabmair, M. (2024, January 12\u201316). HiCuLR: Hierarchical Curriculum Learning for Rhetorical Role Labeling of Legal Documents. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bambroo, P., Adhikary, S., Bhattacharya, P., Chakraborty, A., Ghosh, S., and Ghosh, K. (2025). MARRO: Multi-headed attention for rhetorical role labeling in legal documents. Artificial Intelligence and Law, Springer.","DOI":"10.1007\/s10506-025-09449-7"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Nigam, S.K., Dubey, T., Sharma, G., Shallum, N., Ghosh, K., and Bhattacharya, A. (May, January 29). LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification. Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, NM, USA.","DOI":"10.18653\/v1\/2025.findings-naacl.63"},{"key":"ref_9","unstructured":"Marino, G., Licari, D., Bushipaka, P., Comand\u00e9, G., and Cucinotta, T. (2023, January 15\u201317). Automatic rhetorical roles classification for legal documents using legal-transformer over BERT. Proceedings of the CEUR WORKSHOP PROCEEDINGS. CEUR-WS, \u00d6rebro, Sweden."},{"key":"ref_10","unstructured":"Aragy, R., Fernandes, E.R., and Caceres, E.N. (December, January 29). Rhetorical role identification for Portuguese legal documents. Proceedings of the Brazilian Conference on Intelligent Systems, Virtual."},{"key":"ref_11","unstructured":"G\u00fcnther, M., Mohr, I., Williams, D.J., Wang, B., and Xiao, H. (2024). Late chunking: Contextual chunk embeddings using long-context embedding models. arXiv."},{"key":"ref_12","first-page":"243","article-title":"Rhetorical Structure Theory: Toward a functional theory of text organization","volume":"8","author":"MANN","year":"1988","journal-title":"Text-Interdiscip. J. Study Discourse"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1177\/1461445606061881","article-title":"Rhetorical Structure Theory: Looking back and moving ahead","volume":"8","author":"Taboada","year":"2006","journal-title":"Discourse Stud."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1177\/1461445606064836","article-title":"Applications of Rhetorical Structure Theory","volume":"8","author":"Taboada","year":"2006","journal-title":"Discourse Stud."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1080\/01638539409544883","article-title":"Using linguistic phenomena to motivate a set of coherence relations","volume":"18","author":"Knott","year":"1994","journal-title":"Discourse Process"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/S0378-2166(98)00023-X","article-title":"The classification of coherence relations and their linguistic markers: An exploration of two languages","volume":"30","author":"Knott","year":"1998","journal-title":"J. Pragmat."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1162\/089120102762671936","article-title":"Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status","volume":"28","author":"Teufel","year":"2002","journal-title":"Comput. Linguist."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1007\/s10506-007-9039-z","article-title":"Extractive summarisation of legal texts","volume":"14","author":"Hachey","year":"2007","journal-title":"Artif. Intell. Law"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1007\/s10506-010-9087-7","article-title":"Identification of Rhetorical Roles for Segmentation and Summarization of a Legal Judgment","volume":"18","author":"Saravanan","year":"2010","journal-title":"Artif. Intell. Law"},{"key":"ref_20","first-page":"1","article-title":"Automatic Classification of Rhetorical Roles for Sentences: Comparing Rule-Based Scripts with Machine Learning","volume":"2385","author":"Walker","year":"2019","journal-title":"ASAIL@ ICAIL"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sanchez, G. (2019, January 6\u20137). Sentence Boundary Detection in Legal Text. Proceedings of the Natural Legal Language Processing Workshop 2019. Association for Computational Linguistics, Minneapolis, MN, USA.","DOI":"10.18653\/v1\/W19-2204"},{"key":"ref_22","unstructured":"Kalamkar, P., Tiwari, A., Agarwal, A., Karn, S., Gupta, S., Raghavan, V., and Modi, A. (2022, January 20\u201325). Corpus for Automatic Structuring of Legal Documents. Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Aumiller, D., Almasian, S., Lackner, S., and Gertz, M. (2021, January 21\u201325). Structural text segmentation of legal documents. Proceedings of the 18th International Conference on Artificial Intelligence and Law, S\u00e3o Paulo, Brazil. ICAIL \u201921.","DOI":"10.1145\/3462757.3466085"},{"key":"ref_24","first-page":"123","article-title":"Legal Document Segmentation and Labeling Through Named Entity Recognition Approaches","volume":"15","author":"Lisboa","year":"2024","journal-title":"J. Inf. Data Manag."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2024, January 11\u201316). M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. Proceedings of the Findings of the Association for Computational Linguistics ACL 2024, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.findings-acl.137"},{"key":"ref_26","unstructured":"Niklaus, J. (2024). Decoding Legalese Without Borders: Multilingual Evaluation of Language Models on Long Legal Texts. [Ph.D. Thesis, University of Bern]."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sturua, S., Mohr, I., Kalim Akram, M., G\u00fcnther, M., Wang, B., Krimmel, M., Wang, F., Mastrapas, G., Koukounas, A., and Wang, N. (2025, January 6\u201310). Jina Embeddings V3: Multilingual Text Encoder with Low-Rank Adaptations. Proceedings of the European Conference on Information Retrieval, Lucca, Italy.","DOI":"10.1007\/978-3-031-88720-8_21"},{"key":"ref_28","unstructured":"Zhong, Z., Liu, H., Cui, X., Zhang, X., and Qin, Z. (2025, January 19\u201324). Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation. Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates."},{"key":"ref_29","unstructured":"Setty, S., Thakkar, H., Lee, A., Chung, E., and Vidra, N. (2024). Improving Retrieval for RAG based Question Answering Models on Financial Documents. arXiv."},{"key":"ref_30","unstructured":"National Office for the Judiciary (2025, November 24). Anonymized Hungarian Court Documents. Available online: https:\/\/eakta.birosag.hu\/anonimizalt-hatarozatok."},{"key":"ref_31","unstructured":"Cs\u00e1nyi, G.M., Lakatos, D., \u00dcveges, I., V\u00e1gi, R., Megyeri, A., F\u00fcl\u00f6p, A., Nagy, D., and Vad\u00e1sz, J.P. (2024, January 25\u201326). Evaluating the Effectiveness of Automatic Sentence Segmentation for Judicial Decisions, (B\u00edr\u00f3s\u00e1gi hat\u00e1rozatok automatikus mondatszegment\u00e1l\u00e1s\u00e1nak hat\u00e9konys\u00e1gm\u00e9r\u00e9se). Proceedings of the XX. Magyar Sz\u00e1m\u00edt\u00f3g\u00e9pes Nyelv\u00e9szeti Konferencia (MSZNY2024), Szegedi Tudom\u00e1nyegyetem, Informatikai Int\u00e9zet, Szeged, Hungary."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Orosz, G., Szab\u00f3, G., Berkecz, P., Sz\u00e1nt\u00f3, Z., and Farkas, R. (2023, January 4\u20136). Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines. Proceedings of the Text, Speech, and Dialogue, Pilsen, Czech Republic.","DOI":"10.1007\/978-3-031-40498-6_6"},{"key":"ref_33","unstructured":"Krippendorff, K. (2025, November 24). Computing Krippendorff\u2019s Alpha-Reliability. Available online: https:\/\/repository.upenn.edu\/entities\/publication\/034a6030-c584-4d14-9d3d-7b7e8d16df20."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Orosz, T., V\u00e1gi, R., Cs\u00e1nyi, G.M., Nagy, D., \u00dcveges, I., Vad\u00e1sz, J.P., and Megyeri, A. (2021). Evaluating Human versus Machine Learning Performance in a LegalTech Problem. Appl. Sci., 12.","DOI":"10.3390\/app12010297"},{"key":"ref_35","unstructured":"Vatsal, S., Meyers, A., and Ortega, J.E. (2023, January 4\u20136). Classification of US Supreme Court Cases Using BERT-Based Techniques. Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, Varna, Bulgaria."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"G\u00fcnther, M., Sturua, S., Akram, M.K., Mohr, I., Ungureanu, A., Wang, B., Eslami, S., Martens, S., Werk, M., and Wang, N. (2025, January 8\u20139). jina-embeddings-v4: Universal embeddings for multimodal multilingual retrieval. Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), Suzhuo, China.","DOI":"10.18653\/v1\/2025.mrl-main.36"},{"key":"ref_37","unstructured":"Jina AI (2025, November 24). Jina Embedding v3 HuggingFace. Available online: https:\/\/huggingface.co\/jinaai\/jina-embeddings-v3."},{"key":"ref_38","unstructured":"Nemeskey, D.M. (2021, January 28\u201329). Introducing huBERT. Proceedings of the XVII. Magyar Sz\u00e1m\u00edt\u00f3g\u00e9pes Nyelv\u00e9szeti Konferencia (MSZNY2021), Szeged, Hungary."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_40","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1007\/s10506-021-09304-5","article-title":"DeepRhole: Deep learning for rhetorical role labeling of sentences in legal case documents","volume":"31","author":"Bhattacharya","year":"2021","journal-title":"Artif. Intell. Law"},{"key":"ref_42","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Merola, C., and Singh, J. (2025, January 10). Reconstructing context: Evaluating advanced chunking strategies for retrieval-augmented generation. Proceedings of the International Workshop on Knowledge-Enhanced Information Retrieval, Lucca, Italy.","DOI":"10.1007\/978-3-032-02899-0_1"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/12\/315\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T05:10:20Z","timestamp":1765775420000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/12\/315"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,5]]},"references-count":43,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["bdcc9120315"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9120315","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2025,12,5]]}}}