{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T18:46:03Z","timestamp":1774291563969,"version":"3.50.1"},"reference-count":66,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2015,9,16]],"date-time":"2015-09-16T00:00:00Z","timestamp":1442361600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2017,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement levels may not be high. In this paper, we present a new methodology for crowd-sourcing human assessments of translation quality, which allows individual workers to develop their own individual assessment strategy. Agreement with experts is no longer required, and a worker is deemed reliable if they are consistent relative to their own previous work. Individual translations are assessed in isolation from all others in the form of direct estimates of translation quality. This allows more meaningful statistics to be computed for systems and enables significance to be determined on smaller sets of assessments. We demonstrate the methodology's feasibility in large-scale human evaluation through replication of the human evaluation component of Workshop on Statistical Machine Translation shared translation task for two language pairs, Spanish-to-English and English-to-Spanish. Results for measurement based solely on crowd-sourced assessments show system rankings in line with those of the original evaluation. Comparison of results produced by the relative preference approach and the direct estimate method described here demonstrate that the direct estimate method has a substantially increased ability to identify significant differences between translation systems.<\/jats:p>","DOI":"10.1017\/s1351324915000339","type":"journal-article","created":{"date-parts":[[2015,9,15]],"date-time":"2015-09-15T23:53:52Z","timestamp":1442361232000},"page":"3-30","source":"Crossref","is-referenced-by-count":35,"title":["Can machine translation systems be evaluated by the crowd alone"],"prefix":"10.1017","volume":"23","author":[{"given":"YVETTE","family":"GRAHAM","sequence":"first","affiliation":[]},{"given":"TIMOTHY","family":"BALDWIN","sequence":"additional","affiliation":[]},{"given":"ALISTAIR","family":"MOFFAT","sequence":"additional","affiliation":[]},{"given":"JUSTIN","family":"ZOBEL","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2015,9,16]]},"reference":[{"key":"S1351324915000339_ref029","first-page":"1183","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics and Human Language Technologies","author":"Graham","year":"2015"},{"key":"S1351324915000339_ref022","first-page":"162","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Dreyer","year":"2012"},{"key":"S1351324915000339_ref019","first-page":"71","volume-title":"Proceedings of the 9th Machine Translation Summit","author":"Culy","year":"2003"},{"key":"S1351324915000339_ref047","first-page":"809","volume-title":"Proceedings of the 24th International Conference on Computational Linguistics: Posters","author":"Mathet","year":"2012"},{"key":"S1351324915000339_ref032","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3333"},{"key":"S1351324915000339_ref001","doi-asserted-by":"publisher","DOI":"10.1162\/coli.07-034-R2"},{"key":"S1351324915000339_ref023","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00057"},{"key":"S1351324915000339_ref037","first-page":"179","volume-title":"Proceedings of the 9th International Workshop on Spoken Language Translation","author":"Koehn","year":"2012"},{"key":"S1351324915000339_ref060","first-page":"73","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence Workshop on Grammatical Inference Applications","author":"Smith","year":"2005"},{"key":"S1351324915000339_ref057","unstructured":"Schwartz L. , Aikawa T. , and Quirk C. 2003. Disambiguation of English PP attachment using multilingual aligned data. In Proceedings of the 9th Machine Translation Summit, New Orleans, LA."},{"key":"S1351324915000339_ref002","first-page":"157","volume-title":"Proceedings of the 1994 Human Language Technology Workshop","author":"Berger","year":"1994"},{"key":"S1351324915000339_ref013","first-page":"17","volume-title":"Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics MATR","author":"Callison-Burch","year":"2010"},{"key":"S1351324915000339_ref043","doi-asserted-by":"publisher","DOI":"10.1145\/1380584.1380586"},{"key":"S1351324915000339_ref018","doi-asserted-by":"publisher","DOI":"10.1177\/001316446002000104"},{"key":"S1351324915000339_ref066","first-page":"52","volume-title":"Proceedings of the 7th Conference on Computational Natural Language Learning","author":"Yuan","year":"2013"},{"key":"S1351324915000339_ref044","first-page":"1","volume-title":"Proceedings of the 7th Workshop on Statistical Machine Translation","author":"Lopez","year":"2012"},{"key":"S1351324915000339_ref039","first-page":"169","volume-title":"Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics Annual Meeting","author":"Kumar","year":"2004"},{"key":"S1351324915000339_ref034","first-page":"1416","volume-title":"Proceedings of the Fifty-First Annual Meeting of the Association for Computational Linguistics","author":"Hopkins","year":"2013"},{"key":"S1351324915000339_ref020","unstructured":"Denkowski M. , and Lavie A. 2010. Choosing the right evaluation for machine translation: an examination of annotator and automatic metric performance on human judgement tasks. In Proceedings of the 12th Conference of the Association of Machine Translation in the Americas, Denver, CO."},{"key":"S1351324915000339_ref036","first-page":"145","volume-title":"Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics","author":"Izumi","year":"2003"},{"key":"S1351324915000339_ref017","first-page":"249","volume-title":"Proceedings of the 11th European Chapter of the Association for Computational Linguistics","author":"Callison-Burch","year":"2006"},{"key":"S1351324915000339_ref006","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3302"},{"key":"S1351324915000339_ref061","first-page":"386","volume-title":"Proceedings of the 9th Machine Translation Summit","author":"Turian","year":"2003"},{"key":"S1351324915000339_ref056","first-page":"1","volume-title":"Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics","author":"Sakaguchi","year":"2014"},{"key":"S1351324915000339_ref038","doi-asserted-by":"publisher","DOI":"10.3115\/1654650.1654666"},{"key":"S1351324915000339_ref054","doi-asserted-by":"publisher","DOI":"10.1007\/s10590-009-9065-6"},{"key":"S1351324915000339_ref012","doi-asserted-by":"crossref","first-page":"70","DOI":"10.3115\/1626394.1626403","volume-title":"Proceedings of the 3rd Workshop on Statistical Machine Translation","author":"Callison-Burch","year":"2008"},{"key":"S1351324915000339_ref010","first-page":"286","volume-title":"Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing","author":"Callison-Burch","year":"2009"},{"key":"S1351324915000339_ref058","unstructured":"Sj\u00f6bergh J. , and Knutsson O. 2005. Faking errors to avoid making errors: very weakly supervised learning for error detection in writing. In Proceedings of Recent Advances in Natural Language Processing, Borovets, Bulgaria."},{"key":"S1351324915000339_ref063","first-page":"474","article-title":"Judging grammaticality: experiments in sentence classification","volume":"26","author":"Wagner","year":"2009","journal-title":"Computer-Assisted Language Instruction Consortium Journal"},{"key":"S1351324915000339_ref062","first-page":"112","volume-title":"Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing and the Conference on Computational Natural Language Learning","author":"Wagner","year":"2007"},{"key":"S1351324915000339_ref005","first-page":"1","volume-title":"Proceedings of the Eighth Workshop on Statistical Machine Translation","author":"Bojar","year":"2013"},{"key":"S1351324915000339_ref050","doi-asserted-by":"publisher","DOI":"10.1162\/089120102317341756"},{"key":"S1351324915000339_ref065","first-page":"163","volume-title":"Proceedings of the 1994 Human Language Technology Workshop","author":"Yamron","year":"1994"},{"key":"S1351324915000339_ref014","first-page":"10","volume-title":"Proceedings of the 7th Workshop on Statistical Machine Translation","author":"Callison-Burch","year":"2012"},{"key":"S1351324915000339_ref008","first-page":"32","volume-title":"Proceedings of the Second Paclic Association for Computational Linguistics Conference: PACLING-95","author":"Bond","year":"1995"},{"key":"S1351324915000339_ref040","unstructured":"LDC. 2005. Linguistic data annotation specification: assessment of fluency and adequacy in translations. Linguistic Data Consortium. Revision 1.5."},{"key":"S1351324915000339_ref042","first-page":"375","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Lo","year":"2013"},{"key":"S1351324915000339_ref055","first-page":"154","volume-title":"Proceedings of Human Language Technologies: The Eleventh Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Rozovskaya","year":"2010"},{"key":"S1351324915000339_ref041","first-page":"174","volume-title":"Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies","author":"Lee","year":"2008"},{"key":"S1351324915000339_ref003","first-page":"1633","volume-title":"Proceedings of the 4th International Conference on Language Resources and Evaluation","author":"Bigert","year":"2004"},{"key":"S1351324915000339_ref007","first-page":"1","volume-title":"Proceedings of the 6th Workshop on Statistical Machine Translation","author":"Bojar","year":"2011"},{"key":"S1351324915000339_ref048","first-page":"160","volume-title":"Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics","author":"Och","year":"2003"},{"key":"S1351324915000339_ref015","doi-asserted-by":"publisher","DOI":"10.3115\/1626431"},{"key":"S1351324915000339_ref028","first-page":"70","volume-title":"Proceedings of the Australasian Language Technology Workshop","author":"Graham","year":"2012"},{"key":"S1351324915000339_ref021","first-page":"259","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics","author":"Dickinson","year":"2010"},{"key":"S1351324915000339_ref030","first-page":"33","volume-title":"Proceedings Seventh Linguistic Annotation Workshop and Interoperability with Discourse","author":"Graham","year":"2013"},{"key":"S1351324915000339_ref031","doi-asserted-by":"crossref","unstructured":"Graham Y. , Baldwin T. , Moffat A. , and Zobel J. 2014. Is machine translation getting better over time? In Proceedings of the 14th European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 443\u201351.","DOI":"10.3115\/v1\/E14-1047"},{"key":"S1351324915000339_ref052","first-page":"76","volume-title":"Proceedings of Frontiers in Corpus Annotations II: Pie in the Sky","author":"Poesio","year":"2005"},{"key":"S1351324915000339_ref033","first-page":"1","volume-title":"Proceedings of the 18th International Conference on Supporting Group Work","author":"Gupta","year":"2014"},{"key":"S1351324915000339_ref049","first-page":"73","volume-title":"Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics","author":"Okanohara","year":"2007"},{"key":"S1351324915000339_ref051","unstructured":"Pierce J. R. , Carroll John B. , Hamp E. P. , Hays David G. , Hockett C. F. , Oettinger A. G. and Perlis A. 1966. Languages and machines: Computers in translation and linguistics. A report by the Automatic Language Processing Advisory Committee, Division of Behavioral Sciences, National Academy of Sciences, National Research Council."},{"key":"S1351324915000339_ref064","first-page":"193","volume-title":"Proceedings of the First Conference of the Association of Machine Translation in the Americas","author":"White","year":"1994"},{"key":"S1351324915000339_ref027","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1020"},{"key":"S1351324915000339_ref035","first-page":"569","article-title":"Criteria for evaluating the linguistic quality of Japanese to English machine translations","volume":"9","author":"Ikehara","year":"1994","journal-title":"Journal of the Japanese Society for Artificial Intelligence"},{"key":"S1351324915000339_ref053","unstructured":"Pierce J. R. , Carroll J. B. , Hamp E. P. , Hays D. G. , Hockett C. F. , Oettinger A. G. , and Perlis A. 1966. ALPAC: Languages and machines: Computers in translation and linguistics. A report by the Automatic Language Processing Advisory Committee, Division of Behavioral Sciences, Washington, DC: National Academy of Sciences, National Research Council."},{"key":"S1351324915000339_ref024","doi-asserted-by":"crossref","unstructured":"Foster J. 2007. Treebanks gone bad: International Journal of Document Analysis and Recognition (IJDAR) 10(3\u20134): 129\u201345.","DOI":"10.1007\/s10032-007-0059-8"},{"key":"S1351324915000339_ref045","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3336"},{"key":"S1351324915000339_ref009","first-page":"249","volume-title":"Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics","author":"Brockett","year":"2006"},{"key":"S1351324915000339_ref004","doi-asserted-by":"crossref","unstructured":"Bigert J. , Sj\u00f6bergh J. , Knutsson O. , and Sahlgren M. 2005. Unsupervised evaluation of parser robustness. In Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing, pp. 142\u201354. Mexico City, Mexico: Springer.","DOI":"10.1007\/978-3-540-30586-6_14"},{"key":"S1351324915000339_ref026","first-page":"73","volume-title":"Proceedings of the 1st Conference of the Association of Machine Translation in the Americas","author":"Frederking","year":"1994"},{"key":"S1351324915000339_ref011","doi-asserted-by":"publisher","DOI":"10.3115\/1626355.1626373"},{"key":"S1351324915000339_ref046","unstructured":"Madnani N. , Resnik P. , Dorr B. J. , and Schwartz R. 2008. Are multiple reference translations necessary? Investigating the value of paraphrased reference translations in parameter optimization. In Proceedings of the 10th Conference of the Association of Machine Translation in the Americas, Waikiki, HI."},{"key":"S1351324915000339_ref059","first-page":"354","volume-title":"Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics","author":"Smith","year":"2005"},{"key":"S1351324915000339_ref025","first-page":"82","volume-title":"Proceedings of the 4th Workshop on Innovative Use of Natural Language Processing for Building Educational Applications","author":"Foster","year":"2009"},{"key":"S1351324915000339_ref016","first-page":"22","volume-title":"Proceedings of the 6th Workshop on Statistical Machine Translation","author":"Callison-Burch","year":"2011"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324915000339","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,18]],"date-time":"2019-04-18T01:03:37Z","timestamp":1555549417000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324915000339\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,9,16]]},"references-count":66,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2017,1]]}},"alternative-id":["S1351324915000339"],"URL":"https:\/\/doi.org\/10.1017\/s1351324915000339","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,9,16]]}}}