{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,18]],"date-time":"2025-05-18T15:40:06Z","timestamp":1747582806631,"version":"3.40.5"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T00:00:00Z","timestamp":1721260800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T00:00:00Z","timestamp":1721260800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"Conselho Nacional de Desenvolvimento Cient\u00edfico e Tecnol\u00f3gico","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001807","name":"Funda\u00e7\u00e3o de Amparo \u00e0 Pesquisa do Estado de S\u00e3o Paulo","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001807","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Brazilian Chamber of Deputies"},{"name":"Court of Justice of S\u00e3o Paulo"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2025,6]]},"DOI":"10.1007\/s10579-024-09762-8","type":"journal-article","created":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T14:02:31Z","timestamp":1721311351000},"page":"1685-1704","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Ulysses Tesem\u00f5: a new large corpus for Brazilian legal and governmental domain"],"prefix":"10.1007","volume":"59","author":[{"given":"Felipe A.","family":"Siqueira","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Douglas","family":"Vit\u00f3rio","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ellen","family":"Souza","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9 A. P.","family":"Santos","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hidelberg O.","family":"Albuquerque","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"M\u00e1rcio S.","family":"Dias","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"N\u00e1dia F. F.","family":"Silva","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andr\u00e9 C. P. L. F.","family":"de Carvalho","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adriano L. I.","family":"Oliveira","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carmelo","family":"Bastos-Filho","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,7,18]]},"reference":[{"key":"9762_CR1","doi-asserted-by":"crossref","unstructured":"Agrawal, A., Singh, S., Schneider, L., et\u00a0al. (2021). On the role of corpus ordering in language modeling. In Proceedings of the second workshop on simple and efficient natural language processing (pp. 142\u2013154).","DOI":"10.18653\/v1\/2021.sustainlp-1.15"},{"key":"9762_CR2","doi-asserted-by":"publisher","unstructured":"Albuquerque, H. O., Costa, R., Silvestre, G., et\u00a0al. (2022). UlyssesNER-Br: a corpus of brazilian legislative documents for named entity recognition. In Computational processing of the Portuguese language (pp. 3\u201314). Springer. https:\/\/doi.org\/10.1007\/978-3-030-98305-5_1","DOI":"10.1007\/978-3-030-98305-5_1"},{"key":"9762_CR6","unstructured":"Berber\u00a0Sardinha, T., Moreira\u00a0Filho, J. L., & Alambert, \u00c9. (2009a). The Brazilian corpus. In AACL 2009\u2014American Association for Corpus Linguistics."},{"key":"9762_CR7","unstructured":"Berber\u00a0Sardinha, T., Moreira\u00a0Filho, J. L., & Alambert, \u00c9. (2009b). The Brazilian corpus: a one-billion word online resource. In 5th corpus linguistics conference."},{"key":"9762_CR8","first-page":"25","volume-title":"The parsing system palavras. Automatic grammatical analysis of Portuguese in a constraint grammar framework","author":"E Bick","year":"2000","unstructured":"Bick, E. (2000). The parsing system palavras. Automatic grammatical analysis of Portuguese in a constraint grammar framework (p. 25). Aarhus Universitetsforlag."},{"key":"9762_CR9","unstructured":"Bird, S., Loper, E., & Klein, E. (2009). Natural language processing with python. In Proceedings of the 2009 conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics."},{"key":"9762_CR10","doi-asserted-by":"publisher","unstructured":"Brito, M., Pinheiro, V., Furtado, V., et\u00a0al. (2023). Cdjur-br\u2014uma cole\u00e7\u00e3o dourada do judici\u00e1rio brasileiro com entidades nomeadas refinadas. In Anais do XIV Simp\u00f3sio Brasileiro de Tecnologia da Informa\u00e7\u00e3o e da Linguagem Humana (pp. 177\u2013186). SBC. https:\/\/doi.org\/10.5753\/stil.2023.234217, https:\/\/sol.sbc.org.br\/index.php\/stil\/article\/view\/25449","DOI":"10.5753\/stil.2023.234217"},{"key":"9762_CR11","unstructured":"Brown, T., Mann, B., Ryder, N., et\u00a0al. (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, & R. Hadsell, et\u00a0al. (Eds.), Advances in neural information processing systems (Vol.\u00a033, pp. 1877\u20131901). Curran Associates, Inc."},{"key":"9762_CR12","doi-asserted-by":"crossref","unstructured":"Cantador, I., & S\u00e1nchez, L. Q. (2020). Semantic annotation and retrieval of parliamentary content: a case study on the Spanish congress of deputies. In Proceedings of the first joint conference of the information retrieval communities in Europe (CIRCLE 2020), CEUR workshop proceedings (Vol. 2621). CEUR-WS.org","DOI":"10.1145\/3483382.3483394"},{"key":"9762_CR13","doi-asserted-by":"publisher","unstructured":"Chalkidis, I., Androutsopoulos, I., & Aletras, N. (2019). Neural legal judgment prediction in English. In Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics (pp. 4317\u20134323). https:\/\/doi.org\/10.18653\/v1\/P19-1424","DOI":"10.18653\/v1\/P19-1424"},{"key":"9762_CR14","doi-asserted-by":"publisher","unstructured":"Chalkidis, I., Fergadiotis, M., Manginas, N., et\u00a0al. (2021). Regulatory compliance through Doc2Doc information retrieval: a case study in EU\/UK legislation where text similarity has limitations. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume (pp. 3498\u20133511). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2021.eacl-main.305","DOI":"10.18653\/v1\/2021.eacl-main.305"},{"key":"9762_CR15","doi-asserted-by":"publisher","unstructured":"Costa, R., Albuquerque, H. O., Silvestre, G., et\u00a0al. (2022). Expanding UlyssesNER-Br named entity recognition corpus with informal user-generated text. In Progress in artificial intelligence (pp. 767\u2013779). Cham. https:\/\/doi.org\/10.1007\/978-3-031-16474-3_62","DOI":"10.1007\/978-3-031-16474-3_62"},{"key":"9762_CR29","doi-asserted-by":"crossref","unstructured":"de\u00a0Vargas\u00a0Feij\u00f3, D., & Moreira, V. P. (2018). Rulingbr: a summarization dataset for legal texts. In Computational processing of the Portuguese language: 13th international conference, PROPOR 2018, September 24\u201326, 2018, Proceedings 13 (pp. 255\u2013264). Springer.","DOI":"10.1007\/978-3-319-99722-3_26"},{"key":"9762_CR16","unstructured":"European Union, T. (2005). Thesaurus Eurovoc (Vol. 2). Subject-oriented version: Publications Office."},{"key":"9762_CR17","doi-asserted-by":"publisher","unstructured":"Feng, F., Yang, Y., Cer, D., et\u00a0al. (2022). Language-agnostic BERT sentence embedding. In S. Muresan, P. Nakov & A. Villavicencio (Eds.), Proceedings of the 60th annual meeting of the association for computational linguistics (Vol. 1: Long Papers, pp. 878\u2013891). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2022.acl-long.62","DOI":"10.18653\/v1\/2022.acl-long.62"},{"key":"9762_CR18","doi-asserted-by":"publisher","unstructured":"Jayakumar, T., Farooqui, F., & Farooqui, L. (2023). Large language models are legal but they are not: Making the case for a powerful LegalLLM. In D. Preoiuc-Pietro, C. Goanta & I. Chalkidis et\u00a0al. (Eds.), Proceedings of the natural legal language processing workshop 2023 (pp. 223\u2013229). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2023.nllp-1.22","DOI":"10.18653\/v1\/2023.nllp-1.22"},{"key":"9762_CR19","doi-asserted-by":"publisher","unstructured":"Kapoor, A., Dhawan, M., Goel, A., et\u00a0al. (2022). HLDC: Hindi legal documents corpus. In Findings of the association for computational linguistics: ACL 2022 (pp. 3521\u20133536). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2022.findings-acl.278","DOI":"10.18653\/v1\/2022.findings-acl.278"},{"key":"9762_CR20","doi-asserted-by":"publisher","unstructured":"Kornilova, A., & Eidelman, V. (2019). BillSum: a corpus for automatic summarization of US legislation. In Proceedings of the 2nd workshop on new frontiers in summarization (pp. 48\u201356). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/D19-5406","DOI":"10.18653\/v1\/D19-5406"},{"key":"9762_CR21","doi-asserted-by":"publisher","unstructured":"Lima, M., Silva, R., Lopes\u00a0de Souza\u00a0Mendes, F., et\u00a0al. (2020). Inferring about fraudulent collusion risk on Brazilian public works contracts in official texts using a Bi-LSTM approach. In T. Cohn, Y. He & Y. Liu (Eds.), Findings of the association for computational linguistics: EMNLP 2020 (pp. 1580\u20131588). Association for Computational Linguistic. https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.143","DOI":"10.18653\/v1\/2020.findings-emnlp.143"},{"issue":"1","key":"9762_CR5","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/s10032-022-00406-7","volume":"26","author":"PH Luz de Araujo","year":"2023","unstructured":"Luz de Araujo, P. H., de Almeida, A. P. G. S., Ataides Braz, F., et al. (2023). Sequence-aware multimodal page classification of Brazilian legal documents. International Journal on Document Analysis and Recognition (IJDAR), 26(1), 33\u201349. https:\/\/doi.org\/10.1007\/s10032-022-00406-7","journal-title":"International Journal on Document Analysis and Recognition (IJDAR)"},{"key":"9762_CR3","doi-asserted-by":"publisher","unstructured":"Luz\u00a0de Araujo, P. H., de\u00a0Campos, T. E., de\u00a0Oliveira, R. R. R., et\u00a0al. (2018). LeNER-Br: a dataset for named entity recognition in Brazilian legal text. In Computational processing of the Portuguese language (pp. 313\u2013323). Springer. https:\/\/doi.org\/10.1007\/978-3-319-99722-3_32","DOI":"10.1007\/978-3-319-99722-3_32"},{"key":"9762_CR4","unstructured":"Luz\u00a0de Araujo, P. H., de\u00a0Campos, T. E., Ataides\u00a0Braz, F., et\u00a0al. (2020). VICTOR: a dataset for Brazilian legal documents classification. In N. Calzolari, F. B\u00e9chet & P. Blache et\u00a0al. (Eds.) Proceedings of the twelfth language resources and evaluation conference (pp. 1449\u20131458). European Language Resources Association. https:\/\/aclanthology.org\/2020.lrec-1.181"},{"key":"9762_CR22","doi-asserted-by":"publisher","unstructured":"Malik, V., Sanjay, R., Nigam, S. K., et\u00a0al. (2021). ILDC for CJPE: Indian legal documents corpus for court judgment prediction and explanation. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Vol. 1: Long Papers, pp. 4046\u20134062). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2021.acl-long.313","DOI":"10.18653\/v1\/2021.acl-long.313"},{"key":"9762_CR23","volume-title":"Dicion\u00e1rio Tupi Antigo: A L\u00edngua Ind\u00edgena Cl\u00e1ssica do Brasil","author":"EDA Navarro","year":"2015","unstructured":"Navarro, E. D. A. (2015). Dicion\u00e1rio Tupi Antigo: A L\u00edngua Ind\u00edgena Cl\u00e1ssica do Brasil. Global."},{"key":"9762_CR24","unstructured":"Samy, D., Arenas-Garc\u00eda, J., & P\u00e9rez-Fern\u00e1ndez, D. (2020). Legal-ES: a set of large scale resources for Spanish legal text processing. In Proceedings of the 1st workshop on language technologies for government and public administration (LT4Gov) (pp. 32\u201336). European Language Resources Association."},{"key":"9762_CR25","doi-asserted-by":"publisher","unstructured":"Silva, M., Paula, A., Oliveira, G., et\u00a0al. (2022). Lipset: Um conjunto de dados com documentos rotulados de licita\u00e7\u00f5es p\u00fablicas. In Anais do IV dataset showcase workshop (pp. 13\u201324). SBC. https:\/\/doi.org\/10.5753\/dsw.2022.224925","DOI":"10.5753\/dsw.2022.224925"},{"key":"9762_CR26","doi-asserted-by":"publisher","unstructured":"Souza, E., Vit\u00f3rio, D., Moriyama, G., et\u00a0al. (2021). An information retrieval pipeline for legislative documents from the Brazilian chamber of deputies. In Legal knowledge and information systems (pp. 119\u2013126). IOS Press. https:\/\/doi.org\/10.3233\/FAIA210326","DOI":"10.3233\/FAIA210326"},{"key":"9762_CR27","unstructured":"Steinberger, R., Pouliquen, B., Widiger, A., et\u00a0al. (2006). The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In Proceedings of the fifth international conference on language resources and evaluation (LREC\u201906). European Language Resources Association (ELRA)."},{"key":"9762_CR28","unstructured":"V\u00e1radi, T., Koeva, S., Yamalov, M., et\u00a0al. (2020). The MARCELL legislative corpus. In Proceedings of the twelfth language resources and evaluation conference (pp. 3761\u20133768). European Language Resources Association."},{"key":"9762_CR30","doi-asserted-by":"publisher","unstructured":"Zhong, H., Xiao, C., Tu, C., et\u00a0al. (2020). How does NLP benefit legal system: a summary of legal artificial intelligence. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 5218\u20135230). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.466","DOI":"10.18653\/v1\/2020.acl-main.466"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-024-09762-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-024-09762-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-024-09762-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,18]],"date-time":"2025-05-18T15:04:02Z","timestamp":1747580642000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-024-09762-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,18]]},"references-count":30,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["9762"],"URL":"https:\/\/doi.org\/10.1007\/s10579-024-09762-8","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"type":"print","value":"1574-020X"},{"type":"electronic","value":"1574-0218"}],"subject":[],"published":{"date-parts":[[2024,7,18]]},"assertion":[{"value":"1 July 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 July 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}