{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,5]],"date-time":"2026-05-05T19:46:04Z","timestamp":1778010364645,"version":"3.51.4"},"reference-count":77,"publisher":"MIT Press - Journals","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Transactions of the Association for Computational Linguistics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:p> Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TyDi QA\u2014a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology\u2014the set of linguistic features each language expresses\u2014such that we expect models performing well on this set to generalize across a large number of the world\u2019s languages. We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer, but don\u2019t know the answer yet, and the data is collected directly in each language without the use of translation. <\/jats:p>","DOI":"10.1162\/tacl_a_00317","type":"journal-article","created":{"date-parts":[[2020,7,23]],"date-time":"2020-07-23T19:48:33Z","timestamp":1595533713000},"page":"454-470","source":"Crossref","is-referenced-by-count":160,"title":["T<scp>y<\/scp>D<scp>i<\/scp> QA: A Benchmark for Information-Seeking Question Answering in <i>Ty<\/i>pologically <i>Di<\/i>verse Languages"],"prefix":"10.1162","volume":"8","author":[{"given":"Jonathan H.","family":"Clark","sequence":"first","affiliation":[{"name":"Google Research"}]},{"given":"Eunsol","family":"Choi","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"Michael","family":"Collins","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"Dan","family":"Garrette","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"Tom","family":"Kwiatkowski","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"Vitaly","family":"Nikolaev","sequence":"additional","affiliation":[{"name":"Google Research"}]},{"given":"Jennimaria","family":"Palomaki","sequence":"additional","affiliation":[{"name":"Google Research"}]}],"member":"281","reference":[{"key":"bib1","author":"Alberti Chris","year":"2019","journal-title":"arXiv preprint arXiv:1901.08634"},{"key":"bib2","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1609\/aimag.v36i1.2564","volume":"36","author":"Aroyo Lora","year":"2015","journal-title":"AI Magazine"},{"key":"bib3","author":"Artetxe Mikel","year":"2019","journal-title":"arXiv preprint arXiv:1910.11856"},{"key":"bib4","author":"Asai Akari","year":"2018","journal-title":"arXiv preprint arXiv:1809.03275"},{"key":"bib5","volume-title":"Swahili Grammar","author":"Ashton Ethel O.","year":"1947","edition":"2"},{"key":"bib6","first-page":"65","volume-title":"Proceedings of the 2007 workshop on Computational Approaches to Semitic Languages: Common Issues and Resources","author":"Attia Mohammed A.","year":"2007"},{"issue":"1","key":"bib7","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1093\/llc\/fqu047","volume":"31","author":"Avner Ehud Alexander","year":"2014","journal-title":"Digital Scholarship in the Humanities"},{"key":"bib8","volume-title":"Element Order","volume":"7","author":"Bivon Roy","year":"1971"},{"key":"bib9","first-page":"335","volume":"49","author":"Bizzarri Camilla","year":"2015","journal-title":"Annali di CaFoscari. Serie occidentale"},{"key":"bib10","doi-asserted-by":"crossref","first-page":"632","DOI":"10.18653\/v1\/D15-1075","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing","author":"Bowman Samuel R.","year":"2015"},{"key":"bib11","author":"Boyd-Graber Jordan","year":"2019","journal-title":"arXiv preprint arXiv:1910.14464,"},{"key":"bib12","author":"Chen Danqi","year":"2017","journal-title":"arXiv preprint arXiv:1704.00051"},{"key":"bib13","author":"Choi Eunsol","year":"2018","journal-title":"arXiv preprint arXiv:1808.07036"},{"key":"bib14","first-page":"2924","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)","author":"Clark Christopher","year":"2019"},{"key":"bib15","volume-title":"Language Universals and Linguistic Typology: Syntax and Morphology","author":"Comrie Bernard","year":"1989"},{"key":"bib16","volume-title":"The World Atlas of Language Structures","author":"Comrie Bernard","year":"2005"},{"key":"bib17","author":"Conneau Alexis","year":"2018","journal-title":"arXiv preprint arXiv:1809.05053"},{"key":"bib18","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/BF03545848","author":"Corbett Greville G.","year":"1982","journal-title":"Russian Linguistics"},{"key":"bib19","series-title":"Cambridge Textbooks in Linguistics","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139166119","volume-title":"Gender","author":"Corbett Greville G.","year":"1991"},{"key":"bib20","first-page":"389","volume-title":"XVIIth International Conference of the Italian Association for Artificial Intelligence","author":"Croce Danilo","year":"2018"},{"key":"bib21","volume-title":"The Thai Writing System","volume":"39","author":"D\u0101nwiwat Nanthan\u0101","year":"1987"},{"key":"bib22","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)","author":"Devlin Jacob","year":"2019"},{"key":"bib23","volume-title":"WALS Online","author":"Dryer Matthew S.","year":"2013"},{"key":"bib24","volume-title":"Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL)","author":"Dua Dheeru","year":"2019"},{"key":"bib25","author":"Dunn Matthew","year":"2017","journal-title":"arXix preprint arXiV:1704.05179"},{"key":"bib26","doi-asserted-by":"crossref","first-page":"159","DOI":"10.3115\/v1\/D14-1018","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Eetemadi Sauleh","year":"2014"},{"issue":"3","key":"bib27","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1609\/aimag.v31i3.2303","volume":"31","author":"Ferrucci David","year":"2010","journal-title":"AI Magazine"},{"issue":"15","key":"bib28","first-page":"2296","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems","author":"Gao Haoyuan","year":"2015"},{"key":"bib29","author":"Gardner Matt","year":"2019","journal-title":"arXiv preprint arXiv:1909.11291"},{"key":"bib30","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Gupta Deepak","year":"2018"},{"key":"bib31","volume-title":"Iso suomen kielioppi","author":"Hakulinen Auli","year":"2004"},{"key":"bib32","first-page":"7","author":"Han Na-Rae","year":"2005","journal-title":"IRCS Technical Reports Series"},{"key":"bib33","author":"He Wei","year":"2017","journal-title":"arXiv preprint arXiv:1711.05073"},{"key":"bib34","author":"Joshi Mandar","year":"2017","journal-title":"arXiv preprint arXiv:1705.03551"},{"key":"bib35","doi-asserted-by":"crossref","DOI":"10.4324\/9780203085196","volume-title":"Japanese: A Comprehensive Grammar","author":"Kaiser Stefan","year":"2013"},{"key":"bib36","doi-asserted-by":"crossref","DOI":"10.4324\/9780203008713","volume-title":"Finnish: An Essential Grammar","author":"Karlsson Fred","year":"2013"},{"key":"bib37","volume-title":"Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)","author":"Kenter Tom","year":"2018"},{"key":"bib38","first-page":"202","volume-title":"The Dravidian Languages","author":"Krishnamurti Bhadriraju","year":"1998"},{"key":"bib39","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511486876","volume-title":"The Dravidian Languages","author":"Krishnamurti Bhadriraju","year":"2003"},{"key":"bib40","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00276"},{"key":"bib41","first-page":"785","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Lai Guokun","year":"2017"},{"key":"bib42","author":"Lee Kenton","year":"2019","journal-title":"arXiv preprint arXiv:1906.00300"},{"key":"bib43","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Lee Kyungjae","year":"2018"},{"key":"bib44","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00111"},{"key":"bib45","author":"Lewis Patrick","year":"2019","journal-title":"arXiv preprint arXiv:1910.07475"},{"key":"bib46","author":"Lim Seungyoung","year":"2019","journal-title":"arXiv preprint arXiv:1909.07005"},{"key":"bib47","volume-title":"Introduction to Spoken Telugu","author":"Lisker Leigh","year":"1963"},{"key":"bib48","first-page":"2358","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Liu Jiahua","year":"2019"},{"key":"bib49","first-page":"552","author":"Liu Pengyuan","year":"2019","journal-title":"Lecture Notes in Computer Science"},{"key":"bib50","author":"Mitra Rajarshee","year":"2017","journal-title":"arXiv preprint arXiv:1711.06238"},{"key":"bib51","volume-title":"Modern Swahili Grammar","author":"Mohamed Mohamed Abdulla","year":"2001"},{"key":"bib52","author":"Mozannar Hussein","year":"2019","journal-title":"Proceedings of the Fourth Arabic Natural Language Processing Workshop"},{"key":"bib53","first-page":"2340","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics","author":"Naik Aakanksha","year":"2018"},{"key":"bib54","author":"Nguyen Tri","year":"2016","journal-title":"arXiv preprint arXiv:1611.09268"},{"key":"bib55","first-page":"1659","volume-title":"Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Nivre Joakim","year":"2016"},{"key":"bib56","volume-title":"Conference of the International Speech Communication Association","author":"Peskov Denis","year":"2019"},{"key":"bib57","doi-asserted-by":"crossref","first-page":"180","DOI":"10.18653\/v1\/S18-2023","volume-title":"Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics","author":"Poliak Adam","year":"2018"},{"key":"bib58","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00148"},{"key":"bib59","first-page":"784","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics","author":"Rajpurkar Pranav","year":"2018"},{"key":"bib60","author":"Rajpurkar Pranav","year":"2016","journal-title":"arXiv preprint arXiv:1606.05250"},{"key":"bib61","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00266"},{"key":"bib62","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511486975","volume-title":"A Reference Grammar of Modern Standard Arabic","author":"Ryding Karin C.","year":"2005"},{"key":"bib63","first-page":"17","volume":"33","author":"Seidl Amanda","year":"1997","journal-title":"Chicago Linguistic Society (CLS)"},{"key":"bib64","author":"Shao Chih Chieh","year":"2018","journal-title":"arXiv preprint arXiv:1806.00920"},{"key":"bib65","doi-asserted-by":"crossref","DOI":"10.4324\/9780203720882","volume-title":"Indonesian: A Comprehensive Grammar","author":"Sneddon James Neil","year":"2012"},{"key":"bib66","volume-title":"The Korean Language","author":"Sohn Ho-Min","year":"2001"},{"key":"bib67","volume-title":"Bengali: A Comprehensive Grammar","author":"Thompson Hanne-Ruth","year":"2010"},{"key":"bib68","author":"Trischler Adam","year":"2017","journal-title":"Proceedings of the 2nd Workshop on Representation Learning for NLP"},{"key":"bib69","author":"Vania Clara","year":"2017","journal-title":"arXiv preprint arXiv:1704.08352"},{"issue":"1","key":"bib70","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1093\/llc\/fqt031","volume":"30","author":"Volansky Vered","year":"2013","journal-title":"Digital Scholarship in the Humanities"},{"key":"bib71","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1145\/345508.345577","volume-title":"Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Voorhees Ellen M.","year":"2000"},{"key":"bib72","first-page":"991","author":"Wald Benji","year":"1987","journal-title":"The World\u2019s Major Languages"},{"key":"bib73","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00021"},{"key":"bib74","first-page":"1112","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Williams Adina","year":"2018"},{"key":"bib75","first-page":"18","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts","author":"Wintner Shuly","year":"2016"},{"key":"bib76","doi-asserted-by":"crossref","first-page":"2013","DOI":"10.18653\/v1\/D15-1237","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing","author":"Yi Yang","year":"2015"},{"key":"bib77","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Yang Zhilin","year":"2018"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/tacl_a_00317","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:39:38Z","timestamp":1615585178000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/96451"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":77,"alternative-id":["10.1162\/tacl_a_00317"],"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00317","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12]]}}}