{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T17:46:17Z","timestamp":1767894377006,"version":"3.49.0"},"reference-count":56,"publisher":"MIT Press - Journals","issue":"3","license":[{"start":{"date-parts":[[2021,6,30]],"date-time":"2021-06-30T00:00:00Z","timestamp":1625011200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,11,3]]},"abstract":"<jats:p>Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multidocument applications, but despite recent progress on corpora and system development, downstream improvements from applying CDCR have not been shown yet. We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus. This raises strong concerns on their generalizability\u2014a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus. To investigate this assumption, we define a uniform evaluation setup involving three CDCR corpora: ECB+, the Gun Violence Corpus, and the Football Coreference Corpus (which we reannotate on token level to make our analysis possible). We compare a corpus-independent, feature-based system against a recent neural system developed for ECB+. Although being inferior in absolute numbers, the feature-based system shows more consistent performance across all corpora whereas the neural system is hit-or-miss. Via model introspection, we find that the importance of event actions, event time, and so forth, for resolving coreference in practice varies greatly between the corpora. Additional analysis shows that several systems overfit on the structure of the ECB+ corpus. We conclude with recommendations on how to achieve generally applicable CDCR systems in the future\u2014the most important being that evaluation on multiple CDCR corpora is strongly necessary. To facilitate future research, we release our dataset, annotation guidelines, and system implementation to the public.1<\/jats:p>","DOI":"10.1162\/coli_a_00407","type":"journal-article","created":{"date-parts":[[2021,6,30]],"date-time":"2021-06-30T19:10:44Z","timestamp":1625080244000},"page":"575-614","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":4,"title":["Generalizing Cross-Document Event Coreference Resolution Across Multiple Corpora"],"prefix":"10.1162","volume":"47","author":[{"given":"Michael","family":"Bugert","sequence":"first","affiliation":[{"name":"UKP Lab, Department of Computer Science, Technical University of Darmstadt. https:\/\/www.ukp.tu-darmstadt.de\/"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nils","family":"Reimers","sequence":"additional","affiliation":[{"name":"UKP Lab"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Iryna","family":"Gurevych","sequence":"additional","affiliation":[{"name":"UKP Lab"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2021,11,3]]},"reference":[{"key":"2021111022500994200_bib1","doi-asserted-by":"crossref","first-page":"2623","DOI":"10.1145\/3292500.3330701","article-title":"Optuna: A next-generation hyperparameter optimization framework","volume-title":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Akiba","year":"2019"},{"issue":"4","key":"2021111022500994200_bib2","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1162\/coli.07-034-R2","article-title":"Inter-Coder Agreement for Computational Linguistics","volume":"34","author":"Artstein","year":"2008","journal-title":"Computational Linguistics"},{"key":"2021111022500994200_bib3","first-page":"79","article-title":"Entity-based cross-document coreferencing using the vector space model","volume-title":"36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1","author":"Bagga","year":"1998"},{"key":"2021111022500994200_bib4","doi-asserted-by":"crossref","first-page":"4179","DOI":"10.18653\/v1\/P19-1409","article-title":"Revisiting joint modeling of cross-document entity and event coreference resolution","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Barhom","year":"2019"},{"key":"2021111022500994200_bib5","first-page":"1412","article-title":"Unsupervised event coreference resolution with rich linguistic features","volume-title":"Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics","author":"Bejan","year":"2010"},{"issue":"2","key":"2021111022500994200_bib6","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1162\/COLI_a_00174","article-title":"Unsupervised event coreference resolution","volume":"40","author":"Bejan","year":"2014","journal-title":"Computational Linguistics"},{"key":"2021111022500994200_bib7","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching word vectors with subword information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021111022500994200_bib8","first-page":"23","article-title":"Breaking the subtopic barrier in cross-document event coreference resolution","volume-title":"Text2Story@ ECIR","author":"Bugert","year":"2020"},{"issue":"2","key":"2021111022500994200_bib9","first-page":"249","article-title":"Assessing agreement on classification tasks: The kappa statistic","volume":"22","author":"Carletta","year":"1996","journal-title":"Computational Linguistics"},{"key":"2021111022500994200_bib10","first-page":"3735","article-title":"SUTime: A library for recognizing and normalizing time expressions","volume-title":"Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC\u201912)","author":"Chang","year":"2012"},{"key":"2021111022500994200_bib11","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1145\/2939672.2939785","article-title":"XGBoost: A scalable tree boosting system","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Chen","year":"2016"},{"key":"2021111022500994200_bib12","first-page":"2114","article-title":"Event coreference resolution by iteratively unfolding inter-dependencies among events","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Choubey","year":"2017"},{"key":"2021111022500994200_bib13","doi-asserted-by":"crossref","first-page":"485","DOI":"10.18653\/v1\/P18-1045","article-title":"Improving event coreference resolution by modeling correlations between event coreference chains and document topic structures","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Choubey","year":"2018"},{"key":"2021111022500994200_bib14","first-page":"340","article-title":"Identifying the most dominant event in a news article by mining event coreference relations","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Choubey","year":"2018"},{"key":"2021111022500994200_bib15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/2020.nuse-1.1","article-title":"New insights into cross-document event coreference: Systematic comparison and a simplified approach","volume-title":"Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events","author":"Cremisini","year":"2020"},{"key":"2021111022500994200_bib16","article-title":"Guidelines for ECB+ annotation of events and their coreference","volume-title":"Technical Report","author":"Cybulska","year":"2014"},{"key":"2021111022500994200_bib17","first-page":"4545","article-title":"Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution","volume-title":"Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC\u201914)","author":"Cybulska","year":"2014"},{"issue":"2","key":"2021111022500994200_bib18","first-page":"11","article-title":"\u201cBag of events\u201d approach to event coreference resolution. Supervised classification of event templates","volume":"6","author":"Cybulska","year":"2015","journal-title":"International Journal of Computational Linguistics and Applications"},{"key":"2021111022500994200_bib19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/D19-5801","article-title":"MRQA 2019 shared task: Evaluating generalization in reading comprehension","volume-title":"Proceedings of the 2nd Workshop on Machine Reading for Question Answering","author":"Fisch","year":"2019"},{"key":"2021111022500994200_bib20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/W18-2501","article-title":"AllenNLP: A deep semantic natural language processing platform","volume-title":"Proceedings of Workshop for NLP Open Source Software (NLP-OSS)","author":"Gardner","year":"2018"},{"key":"2021111022500994200_bib21","first-page":"94","article-title":"MultiReQA: A cross-domain evaluation for retrieval question answering models","volume-title":"Proceedings of the Second Workshop on Domain Adaptation for NLP","author":"Guo","year":"2021"},{"key":"2021111022500994200_bib22","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene selection for cancer classification using support vector machines","volume":"46","author":"Guyon","year":"2002","journal-title":"Machine Learning"},{"key":"2021111022500994200_bib23","article-title":"In defense of the triplet loss for person re-identification","author":"Hermans","year":"2017","journal-title":"arXiv preprint"},{"key":"2021111022500994200_bib24","first-page":"21","article-title":"Events are not simple: Identity, non-identity, and quasi-identity","volume-title":"Workshop on Events: Definition, Detection, Coreference, and Representation","author":"Hovy","year":"2013"},{"key":"2021111022500994200_bib25","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1162\/tacl_a_00300","article-title":"SpanBERT: Improving pre-training by representing and predicting spans","volume":"8","author":"Joshi","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021111022500994200_bib26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/S18-2001","article-title":"Resolving event coreference with supervised representation learning and clustering-oriented regularization","volume-title":"Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics","author":"Kenyon-Dean","year":"2018"},{"key":"2021111022500994200_bib27","first-page":"5","article-title":"The INCEpTION platform: Machine-assisted and knowledge-oriented interactive annotation","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations","author":"Klie","year":"2018"},{"key":"2021111022500994200_bib28","doi-asserted-by":"crossref","first-page":"47","DOI":"10.2307\/271061","article-title":"On the reliability of unitizing continuous data","volume":"25","author":"Krippendorff","year":"1995","journal-title":"Sociological Methodology"},{"key":"2021111022500994200_bib29","first-page":"489","article-title":"Joint entity and event coreference resolution across documents","volume-title":"Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"Lee","year":"2012"},{"key":"2021111022500994200_bib30","article-title":"PyTorch-BigGraph: A large-scale graph embedding system","volume-title":"Proceedings of the 2nd SysML Conference","author":"Lerer","year":"2019"},{"key":"2021111022500994200_bib31","first-page":"5479","article-title":"Event coreference resolution: A survey of two decades of research","volume-title":"Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18","author":"Lu","year":"2018"},{"key":"2021111022500994200_bib32","first-page":"25","article-title":"On coreference resolution performance metrics","volume-title":"Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing","author":"Luo","year":"2005"},{"key":"2021111022500994200_bib33","first-page":"55","article-title":"The Stanford CoreNLP natural language processing toolkit","volume-title":"Association for Computational Linguistics (ACL) System Demonstrations","author":"Manning","year":"2014"},{"key":"2021111022500994200_bib34","doi-asserted-by":"crossref","first-page":"4897","DOI":"10.18653\/v1\/2020.findings-emnlp.440","article-title":"Paraphrasing vs coreferring: Two sides of the same coin","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Meged","year":"2020"},{"key":"2021111022500994200_bib35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2063518.2063519","article-title":"DBpedia spotlight: Shedding light on the web of documents","volume-title":"Proceedings of the 7th International Conference on Semantic Systems","author":"Mendes","year":"2011"},{"key":"2021111022500994200_bib36","doi-asserted-by":"crossref","first-page":"81","DOI":"10.18653\/v1\/S18-1010","article-title":"KOI at SemEval-2018 task 5: Building knowledge graph of incidents","volume-title":"Proceedings of the 12th International Workshop on Semantic Evaluation","author":"Mirza","year":"2018"},{"key":"2021111022500994200_bib37","doi-asserted-by":"crossref","first-page":"632","DOI":"10.18653\/v1\/P16-1060","article-title":"Which coreference evaluation metric do you trust? A proposal for a link-based entity aware metric","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Moosavi","year":"2016"},{"key":"2021111022500994200_bib38","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"2021111022500994200_bib39","doi-asserted-by":"crossref","first-page":"70","DOI":"10.18653\/v1\/S18-1009","article-title":"SemEval-2018 task 5: Counting events and participants in the long tail","volume-title":"Proceedings of the 12th International Workshop on Semantic Evaluation","author":"Postma","year":"2018"},{"key":"2021111022500994200_bib40","doi-asserted-by":"crossref","first-page":"30","DOI":"10.3115\/v1\/P14-2006","article-title":"Scoring coreference partitions of predicted mentions: A reference implementation","volume-title":"Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Pradhan","year":"2014"},{"key":"2021111022500994200_bib41","doi-asserted-by":"crossref","first-page":"998","DOI":"10.3115\/v1\/P14-1094","article-title":"Cross-narrative temporal ordering of medical events","volume-title":"Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Raghavan","year":"2014"},{"key":"2021111022500994200_bib42","doi-asserted-by":"crossref","first-page":"3982","DOI":"10.18653\/v1\/D19-1410","article-title":"Sentence-BERT: Sentence embeddings using Siamese BERT-networks","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Reimers","year":"2019"},{"key":"2021111022500994200_bib43","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"Journal of Computational and Applied Mathematics"},{"key":"2021111022500994200_bib44","first-page":"2","article-title":"Using product similarity for adding business value and returning customers","volume":"10","author":"Shannaq","year":"2010","journal-title":"Global Journal of Computer Science and Technology"},{"key":"2021111022500994200_bib45","unstructured":"Shi, Peng and JimmyLin. 2019. Simple BERT models for relation extraction and semantic role labeling. arXiv preprint, arXiv:1904.05255."},{"key":"2021111022500994200_bib46","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1613\/jair.2088","article-title":"Combination strategies for semantic role labeling","volume":"29","author":"Surdeanu","year":"2007","journal-title":"Journal of Artificial Intelligence Research"},{"key":"2021111022500994200_bib47","doi-asserted-by":"crossref","first-page":"4911","DOI":"10.18653\/v1\/P19-1485","article-title":"MultiQA: An empirical investigation of generalization and transfer in reading comprehension","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Talmor","year":"2019"},{"key":"2021111022500994200_bib48","first-page":"1949","article-title":"Revisiting the evaluation for cross document event coreference","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers","author":"Upadhyay","year":"2016"},{"issue":"4","key":"2021111022500994200_bib49","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0055814","article-title":"Large-scale event extraction from literature with multi-level gene normalization","volume":"8","author":"Van Landeghem","year":"2013","journal-title":"PLOS ONE"},{"key":"2021111022500994200_bib50","doi-asserted-by":"crossref","first-page":"45","DOI":"10.3115\/1072399.1072405","article-title":"A model-theoretic coreference scoring scheme","volume-title":"Sixth Message Understanding Conference (MUC-6): Proceedings","author":"Vilain","year":"1995"},{"key":"2021111022500994200_bib51","doi-asserted-by":"crossref","first-page":"660","DOI":"10.18653\/v1\/S18-1108","article-title":"NewsReader at SemEval-2018 task 5: Counting events by reasoning over event-centric-knowledge-graphs","volume-title":"Proceedings of the 12th International Workshop on Semantic Evaluation","author":"Vossen","year":"2018"},{"key":"2021111022500994200_bib52","first-page":"501","article-title":"Identity and Granularity of Events in Text","volume-title":"Computational Linguistics and Intelligent Text Processing","author":"Vossen","year":"2016"},{"key":"2021111022500994200_bib53","first-page":"3034","article-title":"Don\u2019t annotate, but validate: A data-to-text method for capturing event data","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Vossen","year":"2018"},{"key":"2021111022500994200_bib54","article-title":"ACE 2005 multilingual training corpus","author":"Walker","year":"2006"},{"key":"2021111022500994200_bib55","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/D19-6201","article-title":"Cross-document coreference: An approach to capturing coreference without context","volume-title":"Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)","author":"Wright-Bettner","year":"2019"},{"key":"2021111022500994200_bib56","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1162\/tacl_a_00155","article-title":"A hierarchical distance-dependent Bayesian model for event coreference resolution","volume":"3","author":"Yang","year":"2015","journal-title":"Transactions of the Association for Computational Linguistics"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/47\/3\/575\/1971857\/coli_a_00407.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/47\/3\/575\/1971857\/coli_a_00407.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,11]],"date-time":"2021-11-11T00:53:47Z","timestamp":1636592027000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/47\/3\/575\/102774\/Generalizing-Cross-Document-Event-Coreference"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11]]},"references-count":56,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,11,3]]},"published-print":{"date-parts":[[2021,11,3]]}},"URL":"https:\/\/doi.org\/10.1162\/coli_a_00407","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,11]]}}}