{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T10:51:04Z","timestamp":1772794264242,"version":"3.50.1"},"reference-count":1,"publisher":"MIT Press - Journals","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["TACL"],"published-print":{"date-parts":[[2014,12]]},"abstract":"<jats:p> We present a large scale study of the languages spoken by bilingual workers on Mechanical Turk (MTurk). We establish a methodology for determining the language skills of anonymous crowd workers that is more robust than simple surveying. We validate workers\u2019 self-reported language skill claims by measuring their ability to correctly translate words, and by geolocating workers to see if they reside in countries where the languages are likely to be spoken. Rather than posting a one-off survey, we posted paid tasks consisting of 1,000 assignments to translate a total of 10,000 words in each of 100 languages. Our study ran for several months, and was highly visible on the MTurk crowdsourcing platform, increasing the chances that bilingual workers would complete it. Our study was useful both to create bilingual dictionaries and to act as census of the bilingual speakers on MTurk. We use this data to recommend languages with the largest speaker populations as good candidates for other researchers who want to develop crowdsourced, multilingual technologies. To further demonstrate the value of creating data via crowdsourcing, we hire workers to create bilingual parallel corpora in six Indian languages, and use them to train statistical machine translation systems. <\/jats:p>","DOI":"10.1162\/tacl_a_00167","type":"journal-article","created":{"date-parts":[[2018,12,28]],"date-time":"2018-12-28T15:43:26Z","timestamp":1546011806000},"page":"79-92","source":"Crossref","is-referenced-by-count":43,"title":["The Language Demographics of Amazon Mechanical Turk"],"prefix":"10.1162","volume":"2","author":[{"given":"Ellie","family":"Pavlick","sequence":"first","affiliation":[{"name":"Computer and Information Science Department, University of                         Pennsylvania"}]},{"given":"Matt","family":"Post","sequence":"additional","affiliation":[{"name":"Human Language Technology Center of Excellence, Johns Hopkins                         University"}]},{"given":"Ann","family":"Irvine","sequence":"additional","affiliation":[{"name":"Human Language Technology Center of Excellence, Johns Hopkins                         University"}]},{"given":"Dmitry","family":"Kachaev","sequence":"additional","affiliation":[{"name":"Human Language Technology Center of Excellence, Johns Hopkins                         University"}]},{"given":"Chris","family":"Callison-Burch","sequence":"additional","affiliation":[{"name":"Computer and Information Science Department, University of                         Pennsylvania"},{"name":"Human Language Technology Center of Excellence, Johns Hopkins                         University"}]}],"member":"281","reference":[{"issue":"11","key":"p_25","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/219717.219748","volume":"38","author":"George A.","year":"1995","journal-title":"Communications of the ACM"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/tacl_a_00167","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:38:51Z","timestamp":1615585131000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/43317"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12]]},"references-count":1,"alternative-id":["10.1162\/tacl_a_00167"],"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00167","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,12]]}}}