{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T22:03:45Z","timestamp":1766268225124},"reference-count":56,"publisher":"MIT Press - Journals","license":[{"start":{"date-parts":[[2021,4,28]],"date-time":"2021-04-28T00:00:00Z","timestamp":1619568000000},"content-version":"vor","delay-in-days":117,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,4,26]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The metrics standardly used to evaluate Natural Language Generation (NLG) models, such as BLEU or METEOR, fail to provide information on which linguistic factors impact performance. Focusing on Surface Realization (SR), the task of converting an unordered dependency tree into a well-formed sentence, we propose a framework for error analysis which permits identifying which features of the input affect the models\u2019 results. This framework consists of two main components: (i) correlation analyses between a wide range of syntactic metrics and standard performance metrics and (ii) a set of techniques to automatically identify syntactic constructs that often co-occur with low performance scores. We demonstrate the advantages of our framework by performing error analysis on the results of 174 system runs submitted to the Multilingual SR shared tasks; we show that dependency edge accuracy correlate with automatic metrics thereby providing a more interpretable basis for evaluation; and we suggest ways in which our framework could be used to improve models and data. The framework is available in the form of a toolkit which can be used both by campaign organizers to provide detailed, linguistically interpretable feedback on the state of the art in multilingual SR, and by individual researchers to improve models and datasets.1<\/jats:p>","DOI":"10.1162\/tacl_a_00376","type":"journal-article","created":{"date-parts":[[2021,4,28]],"date-time":"2021-04-28T23:51:04Z","timestamp":1619653864000},"page":"429-446","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":1,"title":["An Error Analysis Framework for Shallow Surface Realization"],"prefix":"10.1162","volume":"9","author":[{"given":"Anastasia","family":"Shimorina","sequence":"first","affiliation":[{"name":"Universit\u00e9 de Lorraine, CNRS, LORIA, F-54000 Nancy, France. anastasia.shimorina@loria.fr"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yannick","family":"Parmentier","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Lorraine, CNRS, LORIA, F-54000 Nancy, France. yannick.parmentier@loria.fr"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Claire","family":"Gardent","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Lorraine, CNRS, LORIA, F-54000 Nancy, France. claire.gardent@loria.fr"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2021,4,26]]},"reference":[{"key":"2021060823405980000_bib1","first-page":"217","article-title":"The first surface realisation shared task: Overview and evaluation results","volume-title":"Proceedings of the 13th European Workshop on Natural Language Generation","author":"Belz","year":"2011"},{"key":"2021060823405980000_bib2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.3115\/1667583.1667615","article-title":"Correlating human and automatic evaluation of a German surface realiser","volume-title":"Proceedings of the ACL-IJCNLP 2009 Conference Short Papers","author":"Cahill","year":"2009"},{"key":"2021060823405980000_bib3","doi-asserted-by":"crossref","first-page":"552","DOI":"10.18653\/v1\/D19-1052","article-title":"Neural data-to-text generation: A comparison between pipeline and end-to-end architectures","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Ferreira","year":"2019"},{"key":"2021060823405980000_bib4","doi-asserted-by":"crossref","first-page":"362","DOI":"10.3115\/v1\/W14-3346","article-title":"A systematic comparison of smoothing techniques for sentence-level BLEU","volume-title":"Proceedings of the Ninth Workshop on Statistical Machine Translation","author":"Chen","year":"2014"},{"key":"2021060823405980000_bib5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/K17-2001","article-title":"CoNLL-SIGMORPHON 2017 shared task: Universal morphological reinflection in 52 languages","volume-title":"Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection","author":"Cotterell","year":"2017"},{"key":"2021060823405980000_bib6","article-title":"Overview of the TAC 2008 update summarization task","volume-title":"Proceedings of the First Text Analysis Conference, TAC 2008","author":"Dang","year":"2008"},{"key":"2021060823405980000_bib7","doi-asserted-by":"crossref","first-page":"45","DOI":"10.18653\/v1\/P16-2008","article-title":"Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Du\u0161ek","year":"2016"},{"key":"2021060823405980000_bib8","doi-asserted-by":"crossref","first-page":"61","DOI":"10.18653\/v1\/W19-7807","article-title":"Weighted posets: Learning surface order from dependency trees","volume-title":"Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)","author":"Dyer","year":"2019"},{"key":"2021060823405980000_bib9","doi-asserted-by":"crossref","first-page":"65","DOI":"10.18653\/v1\/W19-2308","article-title":"Designing a symbolic intermediate representation for neural surface realization","volume-title":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","author":"Elder","year":"2019"},{"key":"2021060823405980000_bib10","doi-asserted-by":"crossref","first-page":"49","DOI":"10.18653\/v1\/W18-3606","article-title":"Generating high-quality surface realizations using data augmentation and factored sequence models","volume-title":"Proceedings of the First Workshop on Multilingual Surface Realisation","author":"Elder","year":"2018"},{"key":"2021060823405980000_bib11","first-page":"225","article-title":"Tree linearization in English: Improving language model based approaches","volume-title":"Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers","author":"Filippova","year":"2009"},{"key":"2021060823405980000_bib12","first-page":"91","article-title":"Quantifying word order freedom in dependency corpora","volume-title":"Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)","author":"Futrell","year":"2015"},{"key":"2021060823405980000_bib13","first-page":"592","article-title":"Error mining on dependency trees","volume-title":"Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Gardent","year":"2012"},{"key":"2021060823405980000_bib14","doi-asserted-by":"crossref","first-page":"250","DOI":"10.3115\/1626431.1626479","article-title":"On the robustness of syntactic and semantic features for automatic MT evaluation","volume-title":"Proceedings of the Fourth Workshop on Statistical Machine Translation","author":"Gim\u00e9nez","year":"2009"},{"key":"2021060823405980000_bib15","doi-asserted-by":"crossref","first-page":"140","DOI":"10.18653\/v1\/K19-1014","article-title":"Weird inflects but OK: Making sense of morphological generation errors","volume-title":"Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)","author":"Gorman","year":"2019"},{"key":"2021060823405980000_bib16","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1162\/tacl_a_00103","article-title":"Multi-lingual dependency parsing evaluation: A large-scale analysis of word order properties using artificial data","volume":"4","author":"Gulordava","year":"2016","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021060823405980000_bib17","article-title":"Evaluating DUC 2005 using basic elements","volume-title":"Proceedings of the 5th Document Understanding Conference (DUC)","author":"Hovy","year":"2005"},{"key":"2021060823405980000_bib18","first-page":"73","article-title":"What are the limitations on the flux of syntactic dependencies? Evidence from UD treebanks","volume-title":"Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)","author":"Kahane","year":"2017"},{"key":"2021060823405980000_bib19","article-title":"On alternative automated content evaluation measures","volume-title":"Proceedings of the Second Text Analysis Conference","author":"Katragadda","year":"2009"},{"key":"2021060823405980000_bib20","doi-asserted-by":"crossref","first-page":"39","DOI":"10.18653\/v1\/W18-3605","article-title":"The OSU realizer for SRST \u201818: Neural sequence-to-sequence inflection and incremental locality-based linearization","volume-title":"Proceedings of the First Workshop on Multilingual Surface Realisation","author":"King","year":"2018"},{"key":"2021060823405980000_bib21","doi-asserted-by":"crossref","first-page":"146","DOI":"10.18653\/v1\/P17-1014","article-title":"Neural AMR: Sequence-to-sequence models for parsing and generation","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Konstas","year":"2017"},{"key":"2021060823405980000_bib22","doi-asserted-by":"crossref","first-page":"35","DOI":"10.18653\/v1\/D19-6304","article-title":"BME-UW at SRST-2019: Surface realization with interpreted regular tree grammars","volume-title":"Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)","author":"Kov\u00e1cs","year":"2019"},{"key":"2021060823405980000_bib23","doi-asserted-by":"crossref","first-page":"1908","DOI":"10.18653\/v1\/D15-1219","article-title":"Abstractive multi-document summarization with semantic information extraction","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing","author":"Li","year":"2015"},{"key":"2021060823405980000_bib24","first-page":"25","article-title":"Syntactic features for evaluation of machine translation","volume-title":"Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization","author":"Liu","year":"2005"},{"issue":"2","key":"2021060823405980000_bib25","doi-asserted-by":"crossref","first-page":"159","DOI":"10.17791\/jcs.2008.9.2.159","article-title":"Dependency distance as a metric of language comprehension difficulty","volume":"9","author":"Liu","year":"2008","journal-title":"Journal of Cognitive Science"},{"issue":"6","key":"2021060823405980000_bib26","doi-asserted-by":"crossref","first-page":"1567","DOI":"10.1016\/j.lingua.2009.10.001","article-title":"Dependency direction as a means of word-order typology: A method based on dependency treebanks","volume":"120","author":"Liu","year":"2010","journal-title":"Lingua"},{"key":"2021060823405980000_bib27","first-page":"243","article-title":"Fully automatic semantic MT evaluation","volume-title":"Proceedings of the Seventh Workshop on Statistical Machine Translation","author":"Lo","year":"2012"},{"key":"2021060823405980000_bib28","first-page":"85","article-title":"The order of prenominal adjectives in natural language generation","volume-title":"Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics","author":"Malouf","year":"2000"},{"key":"2021060823405980000_bib29","first-page":"122","article-title":"Bleu\u00e2tre: Flattening syntactic dependencies for MT evaluation","volume-title":"Proceedings of the 11th International Conference on Theoretical and Methodological Issues in Machine Translation","author":"Mehay","year":"2007"},{"key":"2021060823405980000_bib30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/W18-3601","article-title":"The first multilingual surface realisation shared task (SR\u201918): Overview and evaluation results","volume-title":"Proceedings of the First Workshop on Multilingual Surface Realisation","author":"Mille","year":"2018"},{"key":"2021060823405980000_bib31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/D19-6301","article-title":"The second multilingual surface realisation shared task (SR\u201919): Overview and evaluation results","volume-title":"Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)","author":"Mille","year":"2019"},{"issue":"2","key":"2021060823405980000_bib32","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1037\/h0043158","article-title":"The magical number seven plus or minus two: Some limits on our capacity for processing information.","volume":"63","author":"Miller","year":"1956","journal-title":"Psychological Review"},{"key":"2021060823405980000_bib33","first-page":"2267","article-title":"Step-by-step: Separating planning from realization in neural data-to-text generation","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Moryossef","year":"2019"},{"key":"2021060823405980000_bib34","first-page":"2011","article-title":"Error mining with suspicion trees: Seeing the forest for the trees","volume-title":"Proceedings of COLING 2012","author":"Narayan","year":"2012"},{"key":"2021060823405980000_bib35","article-title":"Universal dependencies 2.0","author":"Nivre","year":"2017"},{"key":"2021060823405980000_bib36","first-page":"2241","article-title":"Why we need new evaluation metrics for NLG","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Novikova","year":"2017"},{"key":"2021060823405980000_bib37","first-page":"190","article-title":"DEPEVAL(summ): Dependency-based evaluation for automatic summaries","volume-title":"Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP","author":"Owczarzak","year":"2009"},{"key":"2021060823405980000_bib38","first-page":"80","article-title":"Dependency-based automatic evaluation for machine translation","volume-title":"Proceedings of SSST, NAACL-HLT 2007 \/ AMTA Workshop on Syntax and Structure in Statistical Translation","author":"Owczarzak","year":"2007"},{"key":"2021060823405980000_bib39","first-page":"488","article-title":"Transition-based syntactic linearization with lookahead features","volume-title":"Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Puduppully","year":"2016"},{"key":"2021060823405980000_bib40","doi-asserted-by":"crossref","first-page":"268","DOI":"10.18653\/v1\/W19-8635","article-title":"Revisiting the binary linearization technique for surface realization","volume-title":"Proceedings of the 12th International Conference on Natural Language Generation","author":"Puzikov","year":"2019"},{"key":"2021060823405980000_bib41","first-page":"160","article-title":"Universal Dependency parsing from scratch","volume-title":"Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies","author":"Qi","year":"2018"},{"issue":"1","key":"2021060823405980000_bib42","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1007\/BF00116251","article-title":"Induction of decision trees","volume":"1","author":"Quinlan","year":"1986","journal-title":"Machine Learning"},{"issue":"3","key":"2021060823405980000_bib43","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1162\/coli_a_00322","article-title":"A structured review of the validity of BLEU","volume":"44","author":"Reiter","year":"2018","journal-title":"Computational Linguistics"},{"key":"2021060823405980000_bib44","doi-asserted-by":"crossref","first-page":"2319","DOI":"10.18653\/v1\/D16-1255","article-title":"Word ordering without syntax","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing","author":"Schmaltz","year":"2016"},{"key":"2021060823405980000_bib45","doi-asserted-by":"crossref","first-page":"3086","DOI":"10.18653\/v1\/D19-1305","article-title":"Surface realisation using full delexicalisation","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Shimorina","year":"2019"},{"key":"2021060823405980000_bib46","doi-asserted-by":"crossref","first-page":"431","DOI":"10.18653\/v1\/W18-6553","article-title":"Neural transition-based syntactic linearization","volume-title":"Proceedings of the 11th International Conference on Natural Language Generation","author":"Song","year":"2018"},{"key":"2021060823405980000_bib47","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1007\/978-3-540-30586-6_38","article-title":"Evaluating evaluation methods for generation in the presence of variation","volume-title":"Computational Linguistics and Intelligent Text Processing, 6th International Conference, CICLing 2005, Mexico City, Mexico, February 13\u201319, 2005, Proceedings","author":"Stent","year":"2005"},{"key":"2021060823405980000_bib48","article-title":"BEwT-E for TAC 2009\u2019s AESOP task","volume-title":"Proceedings of the Second Text Analysis Conference","author":"Tratz","year":"2009"},{"key":"2021060823405980000_bib49","first-page":"244","article-title":"Minimal dependency length in realization ranking","volume-title":"Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"White","year":"2012"},{"key":"2021060823405980000_bib50","first-page":"2042","article-title":"RED: A reference dependency based MT evaluation metric","volume-title":"Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers","author":"Hui","year":"2014"},{"key":"2021060823405980000_bib51","first-page":"50","article-title":"IMSurReal: IMS at the surface realization shared task 2019","volume-title":"Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)","author":"Xiang","year":"2019"},{"key":"2021060823405980000_bib52","first-page":"1451","article-title":"Fast and accurate non-projective dependency tree linearization","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Xiang","year":"2020"},{"key":"2021060823405980000_bib53","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1515\/pralin-2016-0007","article-title":"Universal annotation of Slavic verb forms","volume":"105","author":"Zeman","year":"2016","journal-title":"The Prague Bulletin of Mathematical Linguistics"},{"key":"2021060823405980000_bib54","first-page":"2232","article-title":"Partial-tree linearization: Generalized word ordering for text synthesis","volume-title":"Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence","author":"Zhang","year":"2013"},{"key":"2021060823405980000_bib55","first-page":"736","article-title":"Syntax-based word ordering incorporating a large-scale language model","volume-title":"Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics","author":"Zhang","year":"2012"},{"issue":"3","key":"2021060823405980000_bib56","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1162\/COLI_a_00229","article-title":"Discriminative syntax-based word ordering for text generation","volume":"41","author":"Zhang","year":"2015","journal-title":"Computational Linguistics"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00376\/1924221\/tacl_a_00376.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00376\/1924221\/tacl_a_00376.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,6,9]],"date-time":"2021-06-09T09:54:37Z","timestamp":1623232477000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00376\/100683\/An-Error-Analysis-Framework-for-Shallow-Surface"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":56,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00376","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}