{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:50:41Z","timestamp":1760147441463,"version":"build-2065373602"},"reference-count":31,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T00:00:00Z","timestamp":1675209600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Netherlands Organization for Scientific Research (NWO)","award":["CISC.CC.016"],"award-info":[{"award-number":["CISC.CC.016"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>The task of coreference resolution concerns the clustering of words and phrases referring to the same entity in text, either in the same document or across multiple documents. The task is challenging, as it concerns elements of named entity recognition and reading comprehension, as well as others. In this paper, we introduce DutchParliament, a new Dutch coreference resolution dataset obtained through the manual annotation of 74 government debates, expanded with a domain-specific class. In contrast to existing datasets, which are often composed of news articles, blogs or other documents, the debates in DutchParliament are transcriptions of speech, and therefore offer a unique structure and way of referencing compared to other datasets. By constructing and releasing this dataset, we hope to facilitate the research on coreference resolution in niche domains, with different characteristics than traditional datasets. The DutchParliament dataset was compared to SoNaR-1 and RiddleCoref, two other existing Dutch coreference resolution corpora, to highlight its particularities and differences from existing datasets. Furthermore, two coreference resolution models for Dutch, the rule-based DutchCoref model and the neural e2eDutch model, were evaluated on the DutchParliament dataset to examine their performance on the DutchParliament dataset. It was found that the characteristics of the DutchParliament dataset are quite different from that of the other two datasets, although the performance of the e2eDutch model does not seem to be significantly affected by this. Furthermore, experiments were conducted by utilizing the metadata present in the DutchParliament corpus to improve the performance of the e2eDutch model. The results indicate that the addition of available metadata about speakers has a beneficial effect on the performance of the model, although the addition of the gender of speakers seems to have a limited effect.<\/jats:p>","DOI":"10.3390\/data8020034","type":"journal-article","created":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T05:57:56Z","timestamp":1675231076000},"page":"34","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Neural Coreference Resolution for Dutch Parliamentary Documents with the DutchParliament Dataset"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9204-9220","authenticated-orcid":false,"given":"Ruben","family":"van Heusden","sequence":"first","affiliation":[{"name":"Information Retrieval Lab, University of Amsterdam, 1098 XH Amsterdam, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6614-0087","authenticated-orcid":false,"given":"Jaap","family":"Kamps","sequence":"additional","affiliation":[{"name":"Faculty of Humanities, University of Amsterdam, 1012 GC Amsterdam, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3255-3729","authenticated-orcid":false,"given":"Maarten","family":"Marx","sequence":"additional","affiliation":[{"name":"Information Retrieval Lab, University of Amsterdam, 1098 XH Amsterdam, The Netherlands"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/0024-3841(78)90006-2","article-title":"Resolving Pronoun References","volume":"44","author":"Hobbs","year":"1978","journal-title":"Lingua"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kameyama, M. (1986, January 10\u201313). A Property-Sharing Constraint in Centering. Proceedings of the ACL\u201986: 24th Annual Meeting on Association for Computational Linguistics, New York, NY, USA.","DOI":"10.3115\/981131.981159"},{"key":"ref_3","first-page":"535","article-title":"An Algorithm for Pronominal Anaphora Resolution","volume":"20","author":"Lappin","year":"1994","journal-title":"Comput. Linguist."},{"key":"ref_4","unstructured":"Connolly, D., Burger, J.D., and Day, D.S. (1997, January 11\u201317). A Machine Learning Approach to Anaphoric Reference. Proceedings of the New Methods in Language Processing, Sydney, Australia."},{"key":"ref_5","unstructured":"Cardie, C., and Wagstaff, K. (1999, January 21\u201322). Noun Phrase Coreference as Clustering. Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, USA."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ng, V. (2005, January 25\u201330). Machine Learning for Coreference Resolution: From Local Classification to Global Ranking. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI, USA.","DOI":"10.3115\/1219840.1219860"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lee, K., He, L., Lewis, M., and Zettlemoyer, L. (2017, January 7\u201311). End-to-end Neural Coreference Resolution. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark.","DOI":"10.18653\/v1\/D17-1018"},{"key":"ref_8","unstructured":"Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., and Xue, N. (2011, January 23\u201324). Conll-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes. Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, Portland, OR, USA."},{"key":"ref_9","unstructured":"Oostdijk, N., Reynaert, M., Hoste, V., and Schuurman, I. (2013). Essential Speech and Language Technology for Dutch, Springer."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1007\/s10579-009-9108-x","article-title":"AnCora-CO: Coreferentially Annotated Corpora for Spanish and Catalan","volume":"44","author":"Recasens","year":"2010","journal-title":"Lang. Resour. Eval."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Recasens, M., M\u00e0rquez, L., Sapena, E., Mart\u00ed, M.A., Taul\u00e9, M., Hoste, V., Poesio, M., and Versley, Y. (2010, January 15\u201316). Semeval-2010 Task 1: Coreference Resolution in Multiple Languages. Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden.","DOI":"10.3115\/1621969.1621982"},{"key":"ref_12","first-page":"27","article-title":"A Dutch Coreference Resolution System with an Evaluation on Literary Fiction","volume":"9","year":"2019","journal-title":"Comput. Linguist. Neth. J."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Brennan, S.E., Friedman, M.W., and Pollard, C. (1987, January 6\u20139). A Centering Approach to Pronouns. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Stanford CA, USA.","DOI":"10.3115\/981175.981197"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Strube, M., and Hahn, U. (1996). Functional Centering. arXiv.","DOI":"10.3115\/981863.981899"},{"key":"ref_15","unstructured":"Iida, R., Inui, K., Takamura, H., and Matsumoto, Y. (2003, January 14). Incorporating contextual cues in trainable models for coreference resolution. Proceedings of the EACL Workshop on the Computational Treatment of Anaphora, Budapest, Hungary."},{"key":"ref_16","first-page":"21","article-title":"Automatic Pronominal Anaphora Resolution in English Texts","volume":"9","author":"Liang","year":"2001","journal-title":"Int. J. Comput. Linguist. Chin. Lang. Process."},{"key":"ref_17","unstructured":"van Kuppevelt, D., and Attema, J. (2023, January 29). e2e-Dutch. Available online: https:\/\/github.com\/Filter-Bubble\/e2e-Dutch."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1162\/089120101753342653","article-title":"A machine learning approach to coreference resolution of noun phrases","volume":"27","author":"Soon","year":"2001","journal-title":"Comput. Linguist."},{"key":"ref_19","unstructured":"Kobdani, H., and Sch\u00fctze, H. (2010, January 15\u201316). Sucre: A modular system for coreference resolution. Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden."},{"key":"ref_20","unstructured":"Hendrickx, I., Bouma, G., Coppens, F., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A.M., Van Der Vloet, J., and Verschelde, J.L. (2008, January 28\u201330). A Coreference Corpus and Resolution System for Dutch. Proceedings of the LREC, Citeseer, Marrakech, Morocco."},{"key":"ref_21","unstructured":"Poot, C., and van Cranenburgh, A. (2020, January 12). A Benchmark of Rule-Based and Neural Coreference Resolution in Dutch Novels and News. Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, Barcelona, Spain."},{"key":"ref_22","unstructured":"Erjavec, T., Ogrodniczuk, M., Osenova, P., Ljube\u0161i\u0107, N., Simov, K., Grigorova, V., Rudolf, M., Pan\u010dur, A., Kopp, M., and Barkarson, S. (2023, January 29). Linguistically Annotated Multilingual Comparable Corpora of Parliamentary Debates ParlaMint.ana 2.1, 2021. Slovenian Language Resource Repository CLARIN. SI. Online Resource. Available online: https:\/\/link.springer.com\/article\/10.1007\/s10579-021-09574-0."},{"key":"ref_23","unstructured":"Schoen, A., van Son, C., van Erp, M., and van Vliet, H. (2014). NewsReader Document-Level Annotation Guidelines-Dutch TechReport 2014-8, VU University. Technical Report."},{"key":"ref_24","unstructured":"Reiter, N. (2018, January 7\u20139). CorefAnnotator\u2014A New Annotation Tool for Entity References. Proceedings of the Abstracts of EADH: Data in the Digital Humanities, Galway, Ireland."},{"key":"ref_25","unstructured":"Hendrickx, I., Hoste, V., and Daelemans, W. (2008). Lecture Notes in Computer Science, Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel, 17\u201323 February 2008, Springer."},{"key":"ref_26","unstructured":"Honnibal, M., and Montani, I. (2023, January 29). spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing. Online Resource. Available online: https:\/\/sentometrics-research.com\/publication\/72\/."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1177\/003368829402500202","article-title":"The Lexical Profile of Second Language Writing: Does It Change Over Time?","volume":"25","author":"Laufer","year":"1994","journal-title":"RELC J."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Vanmassenhove, E., Shterionov, D., and Gwilliam, M. (2021). Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation. arXiv.","DOI":"10.18653\/v1\/2021.eacl-main.188"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"643","DOI":"10.3758\/BRM.42.3.643","article-title":"SUBTLEX-NL: A New Measure for Dutch Word Frequency Based on Film Subtitles","volume":"42","author":"Keuleers","year":"2010","journal-title":"Behav. Res. Methods"},{"key":"ref_30","unstructured":"Beek, L.V.D., Bouma, G., Malouf, R., and van Noord, G. (2001, January 30). The Alpino dependency treebank. Proceedings of the Computational linguistics in the Netherlands, Twente, The Netherlands."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Moosavi, N.S., and Strube, M. (2016, January 7\u201312). Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-Based Entity Aware Metric. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.","DOI":"10.18653\/v1\/P16-1060"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/2\/34\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:21:08Z","timestamp":1760120468000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/2\/34"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,1]]},"references-count":31,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["data8020034"],"URL":"https:\/\/doi.org\/10.3390\/data8020034","relation":{},"ISSN":["2306-5729"],"issn-type":[{"type":"electronic","value":"2306-5729"}],"subject":[],"published":{"date-parts":[[2023,2,1]]}}}