{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T03:14:28Z","timestamp":1770520468123,"version":"3.49.0"},"reference-count":53,"publisher":"China Science Publishing & Media Ltd.","issue":"2","license":[{"start":{"date-parts":[[2021,4,30]],"date-time":"2021-04-30T00:00:00Z","timestamp":1619740800000},"content-version":"vor","delay-in-days":119,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information, constructing knowledge bases that enable comparative social and political science studies. For each news source, the annotation starts with random samples of news articles and continues with samples drawn using active learning. Each batch of samples is annotated by two social and political scientists, adjudicated by an annotation supervisor, and improved by identifying annotation errors semi-automatically. We found that the corpus possesses the variety and quality that are necessary to develop and benchmark text classification and event extraction systems in a cross-context setting, contributing to the generalizability and robustness of automated text processing systems. This corpus and the reported results will establish a common foundation in automated protest event collection studies, which is currently lacking in the literature.<\/jats:p>","DOI":"10.1162\/dint_a_00092","type":"journal-article","created":{"date-parts":[[2021,4,30]],"date-time":"2021-04-30T16:04:42Z","timestamp":1619798682000},"page":"308-335","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":10,"title":["Cross-Context News Corpus for Protest Event-Related Knowledge Base\n                    Construction"],"prefix":"10.3724","volume":"3","author":[{"given":"Ali","family":"H\u00fcrriyeto\u011flu","sequence":"first","affiliation":[{"name":"Ko\u00e7 University, Rumelifeneri yolu, Sariyer, Istanbul 34450, Turkey"}]},{"given":"Erdem","family":"Y\u00f6r\u00fck","sequence":"additional","affiliation":[{"name":"Ko\u00e7 University, Rumelifeneri yolu, Sariyer, Istanbul 34450, Turkey"}]},{"given":"Osman","family":"Mutlu","sequence":"additional","affiliation":[{"name":"Ko\u00e7 University, Rumelifeneri yolu, Sariyer, Istanbul 34450, Turkey"}]},{"given":"F\u0131rat","family":"Duru\u015fan","sequence":"additional","affiliation":[{"name":"Ko\u00e7 University, Rumelifeneri yolu, Sariyer, Istanbul 34450, Turkey"}]},{"given":"\u00c7a\u011fr\u0131","family":"Yoltar","sequence":"additional","affiliation":[{"name":"Ko\u00e7 University, Rumelifeneri yolu, Sariyer, Istanbul 34450, Turkey"}]},{"given":"Deniz","family":"Y\u00fcret","sequence":"additional","affiliation":[{"name":"Ko\u00e7 University, Rumelifeneri yolu, Sariyer, Istanbul 34450, Turkey"}]},{"given":"Burak","family":"G\u00fcrel","sequence":"additional","affiliation":[{"name":"Ko\u00e7 University, Rumelifeneri yolu, Sariyer, Istanbul 34450, Turkey"}]}],"member":"2026","published-online":{"date-parts":[[2021,6,2]]},"reference":[{"key":"2021091617271747200_ref1","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1146\/annurev.soc.24.1.371","article-title":"Was it worth the effort? The outcomes and consequences of\n                        social movements","volume":"24","author":"Giugni","year":"1998","journal-title":"Annual Review of\n                        Sociology"},{"key":"2021091617271747200_ref2","volume-title":"Power in movement: Social movements, collective action and\n                        politics","author":"Tarrow","year":"1994"},{"issue":"3","key":"2021091617271747200_ref3","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1177\/0022343312471551","article-title":"Unpacking nonviolent campaigns: Introducing the NAVCO 2.0\n                        dataset","volume":"50","author":"Chenoweth","year":"2013","journal-title":"Journal of Peace Research"},{"key":"2021091617271747200_ref4","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780190918309.001.0001","volume-title":"The Internet and political protest in autocracies, chapter coding\n                        protest events in autocracies","author":"Weidmann","year":"2019"},{"issue":"5","key":"2021091617271747200_ref5","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1177\/0022343310378914","article-title":"Introducing ACLED: An armed conflict location and event\n                        dataset: Special data feature","volume":"47","author":"Raleigh","year":"2010","journal-title":"Journal of Peace\n                        Research"},{"key":"2021091617271747200_ref6","volume-title":"The politics of the Turkish welfare system transformation in the\n                        neoliberal era: Welfare as mobilization and containment","author":"Y\u00f6r\u00fck","year":"2012"},{"issue":"1","key":"2021091617271747200_ref7","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1177\/0081175015581378","article-title":"A progressive supervised-learning approach to generating rich\n                        civil strife data","volume":"45","author":"Nardulli","year":"2015","journal-title":"Sociological Methodology"},{"key":"2021091617271747200_ref8","first-page":"1","article-title":"GDELT: Global data on events, location, and tone,\n                        1979\u20132012","volume-title":"Annual Meeting of the\n                        International Studies Association","author":"Leetaru","year":"2013"},{"key":"2021091617271747200_ref9","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1007\/978-1-4614-5311-6_3","article-title":"Automatic extraction of events from open source text for\n                        predictive forecasting","volume-title":"Handbook of Computational Approaches to Counterterrorism","author":"Boschee","year":"2013"},{"key":"2021091617271747200_ref10","first-page":"1","article-title":"Three's a charm?: Open event data coding with EL:\n                        Diablo, Petrarch, and the open event data alliance","volume-title":"Annual Meeting of the International Studies Association","author":"Schrodt","year":"2014"},{"key":"2021091617271747200_ref11","doi-asserted-by":"crossref","first-page":"106","DOI":"10.18653\/v1\/W16-2113","article-title":"Towards building a political protest database to explain\n                        changes in the welfare state","volume-title":"Proceedings of the\n                        10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social\n                        Sciences, and Humanities","author":"S\u00f6nmez","year":"2016"},{"issue":"6307","key":"2021091617271747200_ref12","doi-asserted-by":"crossref","first-page":"1502","DOI":"10.1126\/science.aaf6758","article-title":"Growing pains for global monitoring of societal\n                        events","volume":"353","author":"Wang","year":"2016","journal-title":"Science"},{"key":"2021091617271747200_ref13","first-page":"1","article-title":"Towards linguistically generalizable NLP systems: A workshop\n                        and shared task","volume-title":"Proceedings of the First\n                        Workshop on Building Linguistically Generalizable NLP Systems","author":"Ettinger","year":"2017"},{"issue":"1","key":"2021091617271747200_ref14","first-page":"267","article-title":"Comparing GDELT and ICEWS event data","volume":"21","author":"Ward","year":"2013","journal-title":"Event Data Analysis"},{"key":"2021091617271747200_ref15","volume-title":"Towards a dataset of automatically coded protest events from\n                        English-language newswire documents","author":"Lorenzini","year":"2016"},{"key":"2021091617271747200_ref16","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1007\/978-3-030-15719-7_42","article-title":"A task set proposal for automatic protest information\n                        collection across multiple countries","volume-title":"Advances in Information Retrieval","author":"H\u00fcrriyeto\u011flu","year":"2019"},{"key":"2021091617271747200_ref17","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1007\/978-3-030-28577-7_32","article-title":"Overview of CLEF 2019 Lab ProtestNews: Extracting protests\n                        from news in a cross-context setting","volume-title":"Experimental IR Meets Multilinguality, Multimodality, and\n                        Interaction","author":"H\u00fcrriyeto\u011flu","year":"2019"},{"key":"2021091617271747200_ref18","first-page":"1","article-title":"Automated extraction of socio-political events from news\n                        (AESPEN): Workshop and shared task report","volume-title":"Proceedings of the Workshop on Automated Extraction of\n                        Socio-political Events from News","author":"H\u00fcrriyeto\u011flu","year":"2020"},{"issue":"4","key":"2021091617271747200_ref19","doi-asserted-by":"crossref","DOI":"10.1177\/2053168015615596","article-title":"Improving the selection of news reports for event coding\n                        using ensemble classification","volume":"2","author":"Croicu","year":"2015","journal-title":"Research &\n                        Politics"},{"key":"2021091617271747200_ref20","volume-title":"MPEDS: Automating the generation of protest event data","author":"Hanna"},{"key":"2021091617271747200_ref21","first-page":"1","article-title":"Towards automated protest event analysis","volume-title":"New Frontiers of Automated Content Analysis in the Social\n                        Sciences","author":"Makarov","year":"2015"},{"key":"2021091617271747200_ref22","first-page":"837","article-title":"The automatic content extraction (ACE) program \u2013\n                        tasks, data, and evaluation","volume-title":"Proceedings of the\n                        Fourth International Conference on Language Resources and Evaluation\n                        (LREC'04)","author":"Doddington","year":"2004"},{"key":"2021091617271747200_ref23","article-title":"Overview of TAC-KBP 2015 event nugget track","volume-title":"Proceedings of the 2015 Text Analysis Conference, TAC 2015","author":"Mitamura"},{"key":"2021091617271747200_ref24","volume-title":"Conflict and mediation event observations (CAMEO): A new event data\n                        framework for the analysis of foreign policy interactions","author":"Gerner","year":"2002"},{"key":"2021091617271747200_ref25","doi-asserted-by":"crossref","first-page":"44","DOI":"10.18653\/v1\/D16-1005","article-title":"Distinguishing past, on-going, and future events: The\n                        EventStatus corpus","volume-title":"Proceedings of the 2016\n                        Conference on Empirical Methods in Natural Language Processing","author":"Huang","year":"2016"},{"key":"2021091617271747200_ref26","doi-asserted-by":"crossref","first-page":"102","DOI":"10.18653\/v1\/W16-5613","article-title":"Constructing an annotated corpus for protest event\n                        mining","volume-title":"Proceedings of the First Workshop on NLP\n                        and Computational Social Science","author":"Makarov","year":"2016"},{"issue":"4","key":"2021091617271747200_ref27","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1162\/coli_a_00331","article-title":"What can be accomplished with the state of the art in\n                        information extraction? A personal view","volume":"44","author":"Weischedel","year":"2018","journal-title":"Computational Linguistics"},{"key":"2021091617271747200_ref28","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1007\/978-3-540-69858-6_21","article-title":"Real-time news event extraction for global crisis\n                        monitoring","volume-title":"Natural Language and Information Systems","author":"Tanev","year":"2008"},{"key":"2021091617271747200_ref29","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1108\/S0163-786X20160000040020","article-title":"The effect of New York Times event coding techniques on\n                        social movement analyses of protest data","volume-title":"Narratives of Identity in Social Movements, Conflicts and\n                        Change","author":"Johnson","year":"2016"},{"key":"2021091617271747200_ref30","volume-title":"English Gigaword","author":"Parker","year":"2011","edition":"5th Ed."},{"issue":"1","key":"2021091617271747200_ref31","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1080\/14742830903442535","article-title":"Protest records, data validity, and the Mexican media:\n                        Development and assessment of a keyword search protocol","volume":"9","author":"Strawn","year":"2010","journal-title":"Social Movement Studies"},{"issue":"2","key":"2021091617271747200_ref32","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1177\/0049124101030002001","article-title":"Finding collective events: Sources, searches,\n                        timing","volume":"30","author":"Maney","year":"2001","journal-title":"Sociological Methods &\n                        Research"},{"key":"2021091617271747200_ref33","volume-title":"Active learning literature survey","author":"Settles","year":"2009"},{"key":"2021091617271747200_ref34","first-page":"361","article-title":"Rcv1: A new benchmark collection for text categorization\n                        research","volume":"5","author":"Lewis","year":"2004","journal-title":"The Journal of Machine Learning\n                        Research"},{"key":"2021091617271747200_ref35","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1086\/378340","article-title":"Political opportunities and African-American protest,\n                        1948\u20131997","volume":"109","author":"Jenkins","year":"2003","journal-title":"American Journal of\n                        Sociology"},{"issue":"6","key":"2021091617271747200_ref36","doi-asserted-by":"crossref","first-page":"2347","DOI":"10.1007\/s11135-015-0266-1","article-title":"On the reliability of unitizing textual continua: Further\n                        developments","volume":"50","author":"Krippendorff","year":"2016","journal-title":"Quality and Quantity"},{"key":"2021091617271747200_ref37","first-page":"87","article-title":"Global joint models for coreference resolution and named\n                        entity classification","volume":"42","author":"Denis","year":"2009","journal-title":"Procesamiento Del Lenguaje\n                        Natural"},{"key":"2021091617271747200_ref38","first-page":"1","article-title":"CoNLL-2012 shared task: Modeling multilingual unrestricted\n                        coreference in ontonotes","volume-title":"Joint Conference on\n                        EMNLP and CoNLL - Shared Task, CoNLL '12","author":"Pradhan","year":"2012"},{"key":"2021091617271747200_ref39","first-page":"63","article-title":"FoLiA: A practical XML format for linguistic annotation - a\n                        descriptive and comparative study","volume":"3","author":"van\n                                Gompel","year":"2013","journal-title":"Computational\n                        Linguistics in the Netherlands Journal"},{"key":"2021091617271747200_ref40","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for\n                        language understanding","volume-title":"Proceedings of the 2019\n                        Conference of the North American Chapter of the Association for\n                        Computational Linguistics: Human Language Technologies","author":"Devlin","year":"2019"},{"key":"2021091617271747200_ref41","first-page":"1638","article-title":"Contextual string embeddings for sequence\n                        labeling","volume-title":"Proceedings of the 27th International\n                        Conference on Computational Linguistics","author":"Akbik","year":"2018"},{"key":"2021091617271747200_ref42","volume-title":"Introduction to the CoNLL-2003 shared task: Language-independent\n                        named entity recognition","author":"Tjong","year":"2003"},{"key":"2021091617271747200_ref43","volume-title":"ProTestA: Identifying and extracting protest events in news notebook\n                        for ProtestNews Lab at CLEF 2019","author":"Basile","year":"2019"},{"key":"2021091617271747200_ref44","volume-title":"A\n                        comparative study on generalizability of information extraction models on\n                        protest news","author":"Basar","year":"2019"},{"key":"2021091617271747200_ref45","first-page":"63","article-title":"Event clustering within news articles","volume-title":"Proceedings of the Workshop on Automated Extraction of\n                        Socio-political Events from News 2020","author":"\u00d6rs","year":"2020"},{"key":"2021091617271747200_ref46","first-page":"1337","article-title":"Mixture regression for covariate shift","volume-title":"Proceedings of the 19th International Conference on Neural\n                        Information Processing Systems (NIPS'06)","author":"Storkey","year":"2006"},{"key":"2021091617271747200_ref47","article-title":"Learning to generalize: MetaLearning for domain\n                        generalization","volume-title":"AAAI Conference on Artificial\n                        Intelligence 2018","author":"Li"},{"key":"2021091617271747200_ref48","doi-asserted-by":"crossref","first-page":"3467","DOI":"10.18653\/v1\/D18-1383","article-title":"Adaptive semi-supervised learning for cross-domain sentiment\n                        classification","volume-title":"Proceedings of the 2018\n                        Conference on Empirical Methods in Natural Language Processing","author":"He","year":"2018"},{"key":"2021091617271747200_ref49","volume-title":"Random sampling in corpus design: Cross-context generalizability in\n                        automated cross-national protest event collection","author":"Y\u00f6r\u00fck","year":"2021"},{"key":"2021091617271747200_ref50","first-page":"45","article-title":"Semeval-2010 Task 10: Linking events and their participants\n                        in discourse","volume-title":"Proceedings of the 5th\n                        International Workshop on Semantic Evaluation (SemEval'10)","author":"Ruppenhofer","year":"2010"},{"key":"2021091617271747200_ref51","first-page":"288","article-title":"Coreference for learning to extract relations: Yes virginia,\n                        coreference matters","volume-title":"Proceedings of the 49th\n                        Annual Meeting of the Association for Computational Linguistics: Human\n                        Language Technologies","author":"Gabbard","year":"2011"},{"key":"2021091617271747200_ref52","first-page":"5479","article-title":"Event coreference resolution: A survey of two decades of\n                        research","volume-title":"Proceedings of the Twenty-Seventh\n                        International Joint Conference on Artificial Intelligence\n                        (IJCAI-18)","author":"Lu","year":"2018"},{"key":"2021091617271747200_ref53","first-page":"7057","article-title":"Cross-lingual language model pretrainingz","volume":"32","author":"Conneau","journal-title":"Advances in Neural Information Processing Systems"}],"container-title":["Data Intelligence"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/dint\/article-pdf\/3\/2\/308\/1963469\/dint_a_00092.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/dint\/article-pdf\/3\/2\/308\/1963469\/dint_a_00092.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T23:06:48Z","timestamp":1741129608000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/dint\/article\/3\/2\/308\/100736\/Cross-Context-News-Corpus-for-Protest-Event"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":53,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,6,2]]}},"URL":"https:\/\/doi.org\/10.1162\/dint_a_00092","relation":{},"ISSN":["2641-435X"],"issn-type":[{"value":"2641-435X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}