{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T00:14:02Z","timestamp":1770509642722,"version":"3.49.0"},"reference-count":69,"publisher":"MIT Press","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Linguistics"],"published-print":{"date-parts":[[2014,12]]},"abstract":"<jats:p>The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either lexically-grounded in explicit discourse connectives or associated with sentential adjacency has not only facilitated its use in language technology and psycholinguistics but also has spawned the annotation of comparable corpora in other languages and genres.<\/jats:p><jats:p>Given this situation, this paper has four aims: (1) to provide a comprehensive introduction to the PDTB for those who are unfamiliar with it; (2) to correct some wrong (or perhaps inadvertent) assumptions about the PDTB and its annotation that may have weakened previous results or the performance of decision procedures induced from the data; (3) to explain variations seen in the annotation of comparable resources in other languages and genres, which should allow developers of future comparable resources to recognize whether the variations are relevant to them; and (4) to enumerate and explain relationships between PDTB annotation and complementary annotation of other linguistic phenomena. The paper draws on work done by ourselves and others since the corpus was released.<\/jats:p>","DOI":"10.1162\/coli_a_00204","type":"journal-article","created":{"date-parts":[[2014,6,18]],"date-time":"2014-06-18T15:12:36Z","timestamp":1403104356000},"page":"921-950","source":"Crossref","is-referenced-by-count":25,"title":["Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation"],"prefix":"10.1162","volume":"40","author":[{"given":"Rashmi","family":"Prasad","sequence":"first","affiliation":[{"name":"University of Wisconsin\u2013Milwaukee"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bonnie","family":"Webber","sequence":"additional","affiliation":[{"name":"University of Edinburgh"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aravind","family":"Joshi","sequence":"additional","affiliation":[{"name":"University of Pennsylvania"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","reference":[{"key":"R1","unstructured":"Agarwal, M., R. Shah, and P. Mannem. 2011. Automatic question generation using discourse cues. In Proceedings of the ACL HLT 2011 Workshop on Innovative Use of NLP for Building Educational Applications, pages 1\u20139, Portland, OR."},{"key":"R2","doi-asserted-by":"publisher","DOI":"10.1016\/j.pragma.2004.05.005"},{"key":"R3","unstructured":"Akta\u015f, B., C. Boz\u015fahin, and D. Zeyrek. 2010. Discourse relation configurations in Turkish and an annotation environment. In Proceedings of the 4th Linguistic Annotation Workshop, pages 202\u2013206, Uppsala."},{"key":"R4","unstructured":"Al-Saif, A. 2012. Human and automatic annotation of discourse relations for Arabic. Ph.D. thesis, University of Leeds."},{"key":"R5","unstructured":"Al-Saif, A. and K. Markert. 2010. The Leeds Arabic Discourse Treebank: Annotating discourse connectives for Arabic. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), pages 2,046\u20132,053, Valletta."},{"key":"R6","unstructured":"Al-Saif, A. and K. Markert. 2011. Modelling discourse relations for Arabic. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 736\u2013747, Edinburgh."},{"key":"R7","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(84)90008-0"},{"key":"R10","unstructured":"Asr, F. T. and V. Demberg. 2012a. Implicitness of discourse relations. In Proceedings of COLING, pages 2,669\u20132,684, Mumbai."},{"key":"R11","unstructured":"Asr, F. T. and V. Demberg. 2012b. Measuring the strength of linguistic cues for discourse relations. In Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects (ADACA), pages 33\u201342, Mumbai."},{"key":"R12","unstructured":"Asr, F. T. and V. Demberg. 2013. On the information conveyed by discourse markers. In Proceedings of the 4thAnnual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pages 84\u201393, Sofia."},{"key":"R13","doi-asserted-by":"publisher","DOI":"10.1515\/ZFS.2007.018"},{"key":"R15","unstructured":"Bunt, H., R. Prasad, and A. Joshi. 2012. First steps towards an ISO standard for annotating discourse relations. In Proceedings of the Joint ISA-7, SRSL-3, and I2MRT Workshop on Semantic Annotation and the Integration and Interoperability of Multimodal Resources and Tools, pages 60\u201369, Istanbul."},{"key":"R16","doi-asserted-by":"crossref","unstructured":"Carlson, L., D. Marcu, and M. E. Okurowski. 2001. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proceedings of the 2nd SIGDIAL Workshop on Discourse and Dialogue, Eurospeech 2001, pages 1\u201310, Aalborg.","DOI":"10.3115\/1118078.1118083"},{"key":"R17","unstructured":"Danlos, L., D. Antolinos-Basso, C. Braud, and C. Roze. 2012. Vers le FDTB: French Discourse Tree Bank. In Proceedings of the Joint Conference JEP-TALN-RECITAL, pages 471\u2013479, Grenoble."},{"key":"R18","unstructured":"Demirsahin, I., A. Ozturel, C. Bozsahin, and D. Zeyrek. 2013. Applicative structures and immediate discourse in the Turkish Discourse Bank. In Proceedings of the 7thLinguistic Annotation Workshop and Interoperability with Discourse, pages 122\u2013130, Sofia."},{"key":"R19","doi-asserted-by":"crossref","unstructured":"Dinesh, N., A. Lee, E. Miltsakaki, R. Prasad, A. Joshi, and B. Webber. 2005. Attribution and the (non)-alignment of syntactic and discourse arguments of connectives. In Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pages 29\u201336, Ann Arbor, MI.","DOI":"10.3115\/1608829.1608834"},{"key":"R20","doi-asserted-by":"crossref","unstructured":"Elwell, R. and J. Baldridge. 2008. Discourse connective argument identification with connective specific rankers. In Proceedings of ICSC-2008, pages 198\u2013205, Santa Clara, CA.","DOI":"10.1109\/ICSC.2008.50"},{"key":"R21","doi-asserted-by":"publisher","DOI":"10.1093\/jos\/ffh032"},{"key":"R22","unstructured":"Ghosh, S., R. Johansson, G. Riccardi, and S. Tonelli. 2011a. Shallow discourse parsing with conditional random fields. In Proceedings of the International Joint Conference on Natural Language Processing, pages 1,071\u20131,079, Chiang Mai."},{"key":"R23","unstructured":"Ghosh, S., R. Johansson, G. Riccardi, and S. Tonelli. 2012. Improving the recall of a discourse parser by constraint-based postprocessing. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pages 2,791\u20132,794, Istanbul."},{"key":"R24","doi-asserted-by":"publisher","DOI":"10.1109\/ICSC.2011.40"},{"key":"R26","unstructured":"Hirschberg, J. and D. Litman. 1993. Empirical studies on the disambiguation of cue phrases. Computational Linguistics, 19(3):501\u2013530."},{"key":"R28","unstructured":"Jiang, X. 2013. Predicting the use and interpretation of implicit and explicit discourse connectives. Ph.D. thesis, M.Sc. Thesis, School of Psychology, Philosophy and Language Sciences (PPLS), University of Edinburgh."},{"key":"R29","unstructured":"J\u00ednov\u00e1, P., J. M\u00edrovsk\u00fd, and L. Pol\u00e1kov\u00e1. 2012. Semi-automatic annotation of intra-sentential discourse relations in PDT. In Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects (ADACA), pages 43\u201358, Mumbai."},{"key":"R31","unstructured":"Knott, A. 1996. A Data-Driven Methodology for Motivating a Set of Coherence Relations. Ph.D. thesis, University of Edinburgh."},{"key":"R33","unstructured":"Kolachina, S., R. Prasad, D. M. Sharma, and A. Joshi. 2012. Evaluation of discourse relation annotation in the Hindi Discourse Relation Bank. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pages 823\u2013828, Istanbul."},{"key":"R34","unstructured":"Lakoff, R. 1971. Ifs, ands and buts about conjunction. Studies in Linguistic Semantics, 3:114\u2013149."},{"key":"R35","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324912000307"},{"key":"R36","doi-asserted-by":"publisher","DOI":"10.1515\/text.1.1988.8.3.243"},{"key":"R37","unstructured":"Marcus, M. P., B. Santorini, and M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313\u2013330."},{"key":"R39","unstructured":"Meyer, T. 2011. Disambiguating temporal-contrastive connectives for machine translation. In Proceedings of the ACL 2011 Student Session, pages 46\u201351, Portland, OR."},{"key":"R40","unstructured":"Meyer, T. and A. Popescu-Belis. 2012. Using sense-labeled discourse connectives for statistical machine translation. In Proceedings of the Workshop on Hybrid Approaches to Machine Translation (HyTra), pages 129\u2013138, Avignon."},{"key":"R41","unstructured":"Meyer, T. and B. Webber. 2013. Implicitation of discourse connectives in (machine) translation. In Proceedings of the ACL Workshop on Discourse in Machine Translation, pages 19\u201326, Sofia."},{"key":"R42","unstructured":"Miltsakaki, E., N. Dinesh, R. Prasad, A. Joshi, and B. Webber. 2005. Experiments on sense annotation and sense disambiguation of discourse connectives. In Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT), pages 1\u201312, Barcelona."},{"key":"R43","unstructured":"Miltsakaki, E., R. Prasad, A. Joshi, and B. Webber. 2004. Annotating discourse connectives and their arguments. In Proceedings of the Workshop on Frontiers in Corpus Annotation (Human Language Technology Conference and the Conference of the North American Association of Computational Linguistics), pages 9\u201316, Boston, MA."},{"key":"R44","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78135-6_23"},{"key":"R45","unstructured":"Mladov\u00e1, L. \u0160\u00e1rka Zik\u00e1nov\u00e1, and E. Haji\u010dov\u00e1. 2008. From sentence to discourse: Building an annotation scheme for discourse based on Prague Dependency Treebank. In Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), pages 2,564\u20132,570, Marrakech."},{"key":"R46","unstructured":"Moens, M. and M. Steedman. 1988. Temporal ontology and temporal reference. Computational Linguistics, 14(2):15\u201328."},{"key":"R48","doi-asserted-by":"crossref","unstructured":"Oza, U., R. Prasad, S. Kolachina, S. Meena, D. M. Sharma, and A. Joshi. 2009. Experiments with annotating discourse relations in the Hindi Discourse Relation Bank. In Proceedings of the 7th International Conference on Natural Language Processing (ICON), pages 1\u201310, Hyderabad.","DOI":"10.3115\/1698381.1698410"},{"key":"R49","doi-asserted-by":"publisher","DOI":"10.1162\/0891201053630264"},{"key":"R50","unstructured":"Pareti, S. 2012. A database of attribution relations. In Proceedings of the 8thConference on International Language Resources and Evaluation (LREC12), pages 3,213\u20133,217, Istanbul."},{"key":"R51","doi-asserted-by":"crossref","unstructured":"Patterson, G. and A. Kehler. 2013. Predicting the presence of discourse connectives. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 914\u2013923, Seattle, WA.","DOI":"10.18653\/v1\/D13-1094"},{"key":"R53","doi-asserted-by":"crossref","unstructured":"Pitler, E. and A. Nenkova. 2009. Using syntax to disambiguate explicit discourse connectives in text. In Proceedings of the Joint Conference of the 47th Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing, pages 13\u201316, Singapore.","DOI":"10.3115\/1667583.1667589"},{"key":"R54","unstructured":"Pitler, E., M. Raghupathy, H. Mehta, A. Nenkova, A. Lee, and A. Joshi. 2008. Easily identifiable discourse relations. In Proceedings of COLING, pages 87\u201390, Manchester."},{"key":"R56","unstructured":"Pol\u00e1kov\u00e1, L., J. M\u00edrovsk\u00fd, A. Nedoluzhko, P. J\u00ednov\u00e1, V. Zik\u00e1nov\u00e1, and E. Haji\u010dov\u00e1. 2013. Introducing the Prague Discourse Treebank 1.0. In Proceedings of the 6thInternational Joint Conference on Natural Language Processing, pages 91\u201399, Nagoya."},{"key":"R57","doi-asserted-by":"publisher","DOI":"10.3115\/1608938.1608949"},{"key":"R58","unstructured":"Prasad, R., N. Dinesh, A. Lee, A. Joshi, and B. Webber. 2007. Attribution and its annotation in the Penn Discourse TreeBank. Traitement Automatique des Langues, Special Issue on Computational Approaches to Document and Discourse, 47(2):43\u201364."},{"key":"R59","unstructured":"Prasad, R., N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. 2008. The Penn Discourse TreeBank 2.0. In Proceedings of LREC, pages 2,961\u20132,968, Marrakesh."},{"key":"R60","unstructured":"Prasad, R. and A. Joshi. 2008. A discourse-based approach to generating why-questions from texts. In Proceedings of the Workshop on the Question Generation Shared Task and Evaluation Challenge, pages 1\u20133, Arlington, VA."},{"key":"R61","unstructured":"Prasad, R., A. Joshi, and B. Webber. 2010a. Exploiting scope for shallow discourse parsing. In Proceedings of the Seventh International Conference on Language Resources and their Evaluation, pages 2,076\u20132,083, Valletta."},{"key":"R62","unstructured":"Prasad, R., A. Joshi, and B. Webber. 2010b. Realization of discourse relations by other means: Alternative lexicalizations. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 1,023\u20131,031, Beijing."},{"key":"R63","doi-asserted-by":"crossref","unstructured":"Prasad, R., S. McRoy, N. Frid, A. Joshi, and H. Yu. 2011. The Biomedical Discourse Relation Bank. BMC Bioinformatics, 12(188):1\u201318.","DOI":"10.1186\/1471-2105-12-188"},{"key":"R64","unstructured":"Pustejovsky, J., P. Hanks, R. Sauri, A. See, R. Gaizauskas, A. Setzer, and D. Radev. 2003a. The Timebank corpus. In Proceedings of the Corpus Linguistics Meeting, pages 647\u2013656, Lancaster."},{"key":"R65","doi-asserted-by":"publisher","DOI":"10.3115\/1608829.1608831"},{"key":"R66","unstructured":"Pustejovsky, J., J. Casta\u00f1o, R. Ingria, R. Sauri, R. Gaizauskas, A. Setzer, and G. Katz. 2003b. TimeML: Robust specification of event and temporal expressions in text. New Directions in Question Answering, 3:28\u201334."},{"key":"R67","doi-asserted-by":"publisher","DOI":"10.1136\/amiajnl-2011-000775"},{"key":"R68","unstructured":"Rysov\u00e1, M. 2012. Alternative lexicalizations of discourse connectives in Czech. In Proceedings of the 8th International Conference on Language Resources and Evaluation, pages 2,800\u20132,807, Istanbul."},{"key":"R73","unstructured":"Versley, Y. 2010. Discovery of ambiguous and unambiguous discourse connectives via annotation projection. In Proceedings of the Workshop on the Annotation and Exploitation of Parallel Corpora (AEPC), pages 83\u201392, Tartu."},{"key":"R74","unstructured":"Webber, B. 2013. What excludes an alternative in coherence relations? In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013), pages 276\u2013287, Potsdam."},{"key":"R75","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324911000337"},{"key":"R77","unstructured":"Wellner, B. 2009. Sequence Models and Re-ranking Methods for Discourse Parsing. Ph.D. thesis, Brandeis University, Boston, MA."},{"key":"R78","unstructured":"Wellner, B. and J. Pustejovsky. 2007. Automatically identifiying the arguments of discourse connectives. In Proceedings of EMNLP-CoNLL, pages 92\u2013101."},{"key":"R79","doi-asserted-by":"crossref","unstructured":"Xue, N. 2005. Annotating discourse connectives in the Chinese Treebank. In Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pages 84\u201391, Ann Arbor, MI.","DOI":"10.3115\/1608829.1608841"},{"key":"R81","doi-asserted-by":"crossref","unstructured":"Zeyrek, D. \u00dcmit Deniz Turan, C. Boz\u015fahin, R. \u00c7ak\u0131c\u0131, A. Sevdik-\u00c7all\u0131, I. Demir\u015fahin, B. Akta\u015f, \u0130hsan Yal\u00e7\u0131nkaya, and H. \u00d6gel. 2009. Annotating subordinators in the Turkish Discourse Bank. In Proceedings of the Third Linguistic Annotation Workshop (LAW III), ACL-IJCNLP-2009, pages 44\u201348, Singapore.","DOI":"10.3115\/1698381.1698387"},{"key":"R82","doi-asserted-by":"crossref","unstructured":"Zeyrek, D., I. Demir\u015fahin, A. Sevdik-\u00c7all\u0131, H. \u00d6gel, \u0130hsan Yal\u00e7\u0131nkaya, and \u00dcmit Deniz Turan. 2010. The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations. In Proceedings of the Fourth Linguistic Annotation Workshop (LAW-IV), ACL 2010, pages 282\u2013289, Uppsala.","DOI":"10.3115\/1698381.1698387"},{"key":"R83","doi-asserted-by":"publisher","DOI":"10.5087\/dad.2013.208"},{"key":"R84","unstructured":"Zhou, Y. and N. Xue. (in press). The Chinese Discourse TreeBank: A Chinese corpus annotated with discourse relations. Journal of Language Resources and Evaluation."},{"key":"R85","unstructured":"Zhou, Y. and N. Xue. 2012. PDTB-style discourse annotation of Chinese text. In Proceedings of the 50thAnnual Meeting of the ACL, pages 69\u201377, Jeju Island."},{"key":"R86","unstructured":"Zhou, Z.M., M. Lan, Y. Xu, Z.Y. Niu, J. Su, and C. L. Tan. 2010. Predicting discourse connectives for implicit discourse relation recognition. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pages 1,507\u20131,514, Beijing."}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/COLI_a_00204","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,3]],"date-time":"2025-05-03T11:14:50Z","timestamp":1746270890000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/40\/4\/921-950\/1485"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12]]},"references-count":69,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,12]]}},"alternative-id":["10.1162\/COLI_a_00204"],"URL":"https:\/\/doi.org\/10.1162\/coli_a_00204","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,12]]}}}