{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T20:05:42Z","timestamp":1769630742149,"version":"3.49.0"},"reference-count":125,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T00:00:00Z","timestamp":1725408000000},"content-version":"vor","delay-in-days":247,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,9,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of novel development datasets for reading comprehension relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, we see CreoleVal as an opportunity to empower research on Creoles in NLP and computational linguistics, and in general, a step towards more equitable language technology around the globe.<\/jats:p>","DOI":"10.1162\/tacl_a_00682","type":"journal-article","created":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T13:34:51Z","timestamp":1725456891000},"page":"950-978","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":5,"title":["CreoleVal: Multilingual Multitask Benchmarks for Creoles"],"prefix":"10.1162","volume":"12","author":[{"given":"Heather","family":"Lent","sequence":"first","affiliation":[{"name":"Aalborg University, Denmark. hcle@cs.aau.dk"}]},{"given":"Kushal","family":"Tatariya","sequence":"additional","affiliation":[{"name":"KU Leuven, Belgium"}]},{"given":"Raj","family":"Dabre","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology, Japan"}]},{"given":"Yiyi","family":"Chen","sequence":"additional","affiliation":[{"name":"Aalborg University, Denmark"}]},{"given":"Marcell","family":"Fekete","sequence":"additional","affiliation":[{"name":"Aalborg University, Denmark"}]},{"given":"Esther","family":"Ploeger","sequence":"additional","affiliation":[{"name":"Aalborg University, Denmark"}]},{"given":"Li","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Copenhagen, Denmark"},{"name":"University of Electronic Science and Technology of China, China"}]},{"given":"Ruth-Ann","family":"Armstrong","sequence":"additional","affiliation":[{"name":"Meta, USA"}]},{"given":"Abee","family":"Eijansantos","sequence":"additional","affiliation":[{"name":"Zamboanga State College of Marine Sciences and Technology, Philippines"}]},{"given":"Catriona","family":"Malau","sequence":"additional","affiliation":[{"name":"University of Newcastle, Australia"}]},{"given":"Hans Erik","family":"Heje","sequence":"additional","affiliation":[{"name":"Aalborg University, Denmark"}]},{"given":"Ernests","family":"Lavrinovics","sequence":"additional","affiliation":[{"name":"Aalborg University, Denmark"}]},{"given":"Diptesh","family":"Kanojia","sequence":"additional","affiliation":[{"name":"University of Surrey, UK"}]},{"given":"Paul","family":"Belony","sequence":"additional","affiliation":[{"name":"Kean University, USA"}]},{"given":"Marcel","family":"Bollmann","sequence":"additional","affiliation":[{"name":"Link\u00f6ping University, Sweden"}]},{"given":"Lo\u00efc","family":"Grobol","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Paris Nanterre, France"}]},{"given":"Miryam de","family":"Lhoneux","sequence":"additional","affiliation":[{"name":"KU Leuven, Belgium"}]},{"given":"Daniel","family":"Hershcovich","sequence":"additional","affiliation":[{"name":"University of Copenhagen, Denmark"}]},{"given":"Michel","family":"DeGraff","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"given":"Anders","family":"S\u00f8gaard","sequence":"additional","affiliation":[{"name":"University of Copenhagen, Denmark"}]},{"given":"Johannes","family":"Bjerva","sequence":"additional","affiliation":[{"name":"Aalborg University, Denmark"}]}],"member":"281","published-online":{"date-parts":[[2024,9,4]]},"reference":[{"issue":"2","key":"2024090413342139400_bib1","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1075\/jpcl.31.2.07abo","article-title":"Creole distinctiveness: A dead end","volume":"31","author":"Aboh","year":"2016","journal-title":"Journal of Pidgin and Creole Languages"},{"key":"2024090413342139400_bib2","article-title":"A null theory of creole formation based on universal grammar","author":"Aboh","year":"2016","journal-title":"The Oxford Handbook of Universal Grammar"},{"key":"2024090413342139400_bib3","doi-asserted-by":"publisher","first-page":"1116","DOI":"10.1162\/tacl_a_00416","article-title":"MasakhaNER: Named entity recognition for African languages","volume":"9","author":"Adelani","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024090413342139400_bib4","doi-asserted-by":"publisher","first-page":"268","DOI":"10.3115\/v1\/P15-2044","article-title":"If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)","author":"Agi\u0107","year":"2015"},{"key":"2024090413342139400_bib5","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1162\/tacl_a_00100","article-title":"Multilingual projection for parsing truly low-resource languages","volume":"4","author":"Agi\u0107","year":"2016","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024090413342139400_bib6","doi-asserted-by":"publisher","first-page":"3204","DOI":"10.18653\/v1\/P19-1310","article-title":"JW300: A wide-coverage parallel corpus for low-resource languages","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Agi\u0107","year":"2019"},{"key":"2024090413342139400_bib7","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511845192","volume-title":"Romance Languages: A Historical Introduction","author":"Ti","year":"2010"},{"key":"2024090413342139400_bib8","first-page":"169","article-title":"Acculturation and the cultural matrix of creolization","volume":"1971","author":"Alleyne","year":"1971","journal-title":"Pidginization and Creolization of Languages"},{"key":"2024090413342139400_bib9","doi-asserted-by":"publisher","first-page":"5307","DOI":"10.18653\/v1\/2022.findings-emnlp.389","article-title":"JamPatoisNLI: A Jamaican patois natural language inference dataset","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2022","author":"Armstrong","year":"2022"},{"key":"2024090413342139400_bib10","doi-asserted-by":"publisher","first-page":"597","DOI":"10.1162\/tacl_a_00288","article-title":"Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond","volume":"7","author":"Artetxe","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024090413342139400_bib11","volume-title":"Jamaican Creole Syntax","author":"Bailey","year":"1966"},{"key":"2024090413342139400_bib12","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1515\/9783111339801.65","article-title":"Creativity in creole genesis","author":"Baker","year":"1994","journal-title":"Creolization and Language Change"},{"issue":"1","key":"2024090413342139400_bib13","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1075\/jpcl.26.1.02bak","article-title":"Creoles are typologically distinct from non-creoles","volume":"26","author":"Bakker","year":"2011","journal-title":"Journal of Pidgin and Creole Languages"},{"key":"2024090413342139400_bib14","first-page":"229","article-title":"Language identification: The long and the short of the matter","volume-title":"Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Baldwin","year":"2010"},{"issue":"1","key":"2024090413342139400_bib15","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1038\/scientificamerican0783-116","article-title":"Creole languages","volume":"249","author":"Bickerton","year":"1983","journal-title":"Scientific American"},{"key":"2024090413342139400_bib16","article-title":"\u201cLT4All!? Rethinking the agenda\u201d keynote","author":"Bird","year":"2021"},{"key":"2024090413342139400_bib17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/2020.sigtyp-1.1","article-title":"SIGTYP 2020 shared task: Prediction of typological features","volume-title":"Proceedings of the Second Workshop on Computational Research in Linguistic Typology","author":"Bjerva","year":"2020"},{"key":"2024090413342139400_bib18","first-page":"22","article-title":"Findings of the 2011 workshop on statistical machine translation","volume-title":"Proceedings of the Sixth Workshop on Statistical Machine Translation","author":"Callison-Burch","year":"2011"},{"key":"2024090413342139400_bib19","doi-asserted-by":"publisher","first-page":"13","DOI":"10.18653\/v1\/W19-7803","article-title":"A surface-syntactic UD treebank for Naija","volume-title":"Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)","author":"Caron","year":"2019"},{"key":"2024090413342139400_bib20","doi-asserted-by":"publisher","first-page":"3470","DOI":"10.18653\/v1\/2021.naacl-main.272","article-title":"ZS-BERT: Towards zero-shot relation extraction with attribute representation learning","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Chen","year":"2021"},{"key":"2024090413342139400_bib21","doi-asserted-by":"publisher","first-page":"1059","DOI":"10.18653\/v1\/2022.emnlp-main.69","article-title":"Multilingual relation classification via efficient and effective prompting","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Chen","year":"2022"},{"key":"2024090413342139400_bib22","doi-asserted-by":"publisher","first-page":"59","DOI":"10.18653\/v1\/2023.loresmt-1.5","article-title":"Language-family adapters for low-resource multilingual neural machine translation","volume-title":"Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)","author":"Chronopoulou","year":"2023"},{"key":"2024090413342139400_bib23","doi-asserted-by":"crossref","first-page":"385","DOI":"10.18653\/v1\/P17-2061","article-title":"An empirical comparison of domain adaptation methods for neural machine translation","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Chu","year":"2017"},{"key":"2024090413342139400_bib24","doi-asserted-by":"publisher","first-page":"8440","DOI":"10.18653\/v1\/2020.acl-main.747","article-title":"Unsupervised cross-lingual representation learning at scale","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Conneau","year":"2020"},{"key":"2024090413342139400_bib25","volume-title":"Explaining Language Change: An Evolutionary Approach","author":"Croft","year":"2000"},{"key":"2024090413342139400_bib26","volume-title":"Bislama Reference Grammar","author":"Crowley","year":"2004"},{"key":"2024090413342139400_bib27","first-page":"22","article-title":"KreolMorisienMT: A dataset for mauritian creole machine translation","volume-title":"Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022","author":"Dabre","year":"2022"},{"key":"2024090413342139400_bib28","article-title":"YANMTT: Yet another neural machine translation toolkit","author":"Dabre","year":"2021","journal-title":"CoRR"},{"issue":"2\/3","key":"2024090413342139400_bib29","first-page":"213","article-title":"On the origin of creoles: A cartesian critique of neo-darwinian linguistics","volume":"5","author":"DeGraff","year":"2001","journal-title":"Linguistic Typology"},{"issue":"2","key":"2024090413342139400_bib30","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1353\/lan.2003.0114","article-title":"Against creole exceptionalism","volume":"79","author":"DeGraff","year":"2003","journal-title":"Language"},{"issue":"4","key":"2024090413342139400_bib31","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1017\/S0047404505050207","article-title":"Linguists\u2019 most dangerous myth: The fallacy of creole exceptionalism","volume":"34","author":"DeGraff","year":"2005","journal-title":"Language in Society"},{"key":"2024090413342139400_bib32","article-title":"Opentapioca: Lightweight entity linking for wikidata","author":"Delpeuch","year":"2019","journal-title":"arXiv preprint arXiv:1904.09131"},{"key":"2024090413342139400_bib33","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2024090413342139400_bib34","doi-asserted-by":"publisher","first-page":"4901","DOI":"10.18653\/v1\/2021.findings-acl.433","article-title":"Adapting monolingual models: Data can be scarce when language similarity is high","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"de Vries","year":"2021"},{"key":"2024090413342139400_bib35","doi-asserted-by":"publisher","first-page":"7676","DOI":"10.18653\/v1\/2022.acl-long.529","article-title":"Make the best of cross-lingual transfer: Evidence from POS tagging with over 100 languages","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"de Vries","year":"2022"},{"key":"2024090413342139400_bib36","doi-asserted-by":"publisher","first-page":"2112","DOI":"10.18653\/v1\/2021.eacl-main.181","article-title":"Word alignment by fine-tuning embeddings on parallel corpora","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Dou","year":"2021"},{"key":"2024090413342139400_bib37","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.7385533","volume-title":"WALS Online (v2020.3)","author":"Dryer","year":"2013"},{"key":"2024090413342139400_bib38","doi-asserted-by":"crossref","DOI":"10.1075\/la.127","volume-title":"The Syntax of Jamaican Creole","author":"Durrleman","year":"2008"},{"key":"2024090413342139400_bib39","unstructured":"Martin\n              Eberl\n            \n          . 2019. Innovation and Grammaticalization in the Emergence of Tok Pisin. Ph.D. thesis, LMU."},{"key":"2024090413342139400_bib40","doi-asserted-by":"publisher","first-page":"6279","DOI":"10.18653\/v1\/2022.acl-long.435","article-title":"AmericasNLI: Evaluating zero-shot natural language understanding of pretrained multilingual models in truly low-resource languages","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ebrahimi","year":"2022"},{"key":"2024090413342139400_bib41","doi-asserted-by":"publisher","first-page":"206","DOI":"10.18653\/v1\/2023.americasnlp-1.23","article-title":"Findings of the AmericasNLP 2023 shared task on machine translation into indigenous languages","volume-title":"Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)","author":"Ebrahimi","year":"2023"},{"key":"2024090413342139400_bib42","first-page":"723","article-title":"Zamboanga Chavacano verbal aspects: Superstrate and substrate influences in morphosyntactic behavior","volume-title":"Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation","author":"Eijansantos","year":"2022"},{"key":"2024090413342139400_bib43","doi-asserted-by":"publisher","first-page":"3046","DOI":"10.18653\/v1\/2022.findings-acl.240","article-title":"Factual consistency of multilingual pretrained language models","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Fierro","year":"2022"},{"key":"2024090413342139400_bib44","article-title":"Unsupervised alignment of embeddings with wasserstein procrustes","author":"Grave","year":"2018","journal-title":"CoRR"},{"key":"2024090413342139400_bib45","doi-asserted-by":"publisher","first-page":"4803","DOI":"10.18653\/v1\/D18-1514","article-title":"FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Han","year":"2018"},{"key":"2024090413342139400_bib46","article-title":"Teaching machines to read and comprehend","volume":"28","author":"Hermann","year":"2015","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024090413342139400_bib47","doi-asserted-by":"publisher","first-page":"6997","DOI":"10.18653\/v1\/2022.acl-long.482","article-title":"Challenges and strategies in cross-cultural NLP","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Hershcovich","year":"2022"},{"key":"2024090413342139400_bib48","first-page":"399","article-title":"The value of monolingual crowdsourcing in a real-world translation scenario: Simulation using Haitian Creole emergency SMS messages","volume-title":"Proceedings of the Sixth Workshop on Statistical Machine Translation","author":"Chang","year":"2011"},{"key":"2024090413342139400_bib49","first-page":"128","article-title":"Exploiting out-of-domain parallel data through multilingual transfer learning for low-resource neural machine translation","volume-title":"Proceedings of Machine Translation Summit XVII: Research Track","author":"Imankulova","year":"2019"},{"key":"2024090413342139400_bib50","doi-asserted-by":"publisher","first-page":"49","DOI":"10.18653\/v1\/2021.sigmorphon-1.6","article-title":"A study of morphological robustness of neural machine translation","volume-title":"Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology","author":"Jayanthi","year":"2021"},{"key":"2024090413342139400_bib51","doi-asserted-by":"publisher","first-page":"5943","DOI":"10.18653\/v1\/2020.emnlp-main.479","article-title":"X-FACTR: Multilingual factual knowledge retrieval from pretrained language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Jiang","year":"2020"},{"key":"2024090413342139400_bib52","doi-asserted-by":"publisher","first-page":"6155","DOI":"10.18653\/v1\/2023.findings-emnlp.410","article-title":"GlotLID: Language identification for low-resource languages","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Kargaran","year":"2023"},{"key":"2024090413342139400_bib53","doi-asserted-by":"publisher","first-page":"3336","DOI":"10.18653\/v1\/D19-1328","article-title":"Lost in evaluation: Misleading benchmarks for bilingual dictionary induction","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Kementchedjhieva","year":"2019"},{"key":"2024090413342139400_bib54","article-title":"Adam: A method for stochastic optimization","author":"Kingma","year":"2014","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"2024090413342139400_bib55","first-page":"1459","article-title":"Inducing crosslingual distributed representations of words","volume-title":"Proceedings of COLING 2012","author":"Klementiev","year":"2012"},{"key":"2024090413342139400_bib56","doi-asserted-by":"publisher","first-page":"340","DOI":"10.18653\/v1\/2020.findings-emnlp.32","article-title":"The RELX dataset and matching the multilingual blanks for cross-lingual relation classification","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"K\u00f6ksal","year":"2020"},{"key":"2024090413342139400_bib57","doi-asserted-by":"publisher","DOI":"10.1002\/9781444305982","volume-title":"The Handbook of Pidgin and Creole Studies","author":"Kouwenberg","year":"2009"},{"key":"2024090413342139400_bib58","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1162\/tacl_a_00447","article-title":"Quality at a glance: An audit of web-crawled multilingual datasets","volume":"10","author":"Kreutzer","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024090413342139400_bib59","doi-asserted-by":"publisher","first-page":"66","DOI":"10.18653\/v1\/D18-2012","article-title":"SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Kudo","year":"2018"},{"key":"2024090413342139400_bib60","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1075\/cll.23.02lef","article-title":"Relexification in creole genesis and its effects on the development of the creole","author":"Lefebvre","year":"2001","journal-title":"Creolization and Contact"},{"key":"2024090413342139400_bib61","doi-asserted-by":"publisher","first-page":"58","DOI":"10.18653\/v1\/2021.conll-1.5","article-title":"On language models for creoles","volume-title":"Proceedings of the 25th Conference on Computational Natural Language Learning","author":"Lent","year":"2021"},{"key":"2024090413342139400_bib62","doi-asserted-by":"publisher","first-page":"68","DOI":"10.18653\/v1\/2022.insights-1.9","article-title":"Ancestor-to-creole transfer is not a walk in the park","volume-title":"Proceedings of the Third Workshop on Insights from Negative Results in NLP","author":"Lent","year":"2022"},{"key":"2024090413342139400_bib63","first-page":"6439","article-title":"What a creole wants, what a creole needs","volume-title":"Proceedings of the Thirteenth Language Resources and Evaluation Conference","author":"Lent","year":"2022"},{"key":"2024090413342139400_bib64","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v29i1.9491","article-title":"Learning entity and relation embeddings for knowledge graph completion","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Lin","year":"2015"},{"key":"2024090413342139400_bib65","first-page":"373","article-title":"Zamboangue\u00f1o creole spanish","author":"Lipski","year":"2007","journal-title":"Comparative Creole Syntax"},{"key":"2024090413342139400_bib66","doi-asserted-by":"publisher","first-page":"726","DOI":"10.1162\/tacl_a_00343","article-title":"Multilingual denoising pre-training for neural machine translation","volume":"8","author":"Liu","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024090413342139400_bib67","first-page":"3924","article-title":"Singlish message paraphrasing: A joint task of creole translation and text normalization","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics","author":"Liu","year":"2022"},{"key":"2024090413342139400_bib68","doi-asserted-by":"publisher","first-page":"4871","DOI":"10.18653\/v1\/2023.acl-long.268","article-title":"Ethical considerations for machine translation of indigenous languages: Giving a voice to the speakers","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Mager","year":"2023"},{"key":"2024090413342139400_bib69","doi-asserted-by":"publisher","first-page":"202","DOI":"10.18653\/v1\/2021.americasnlp-1.23","article-title":"Findings of the AmericasNLP 2021 shared task on open machine translation for indigenous languages of the Americas","volume-title":"Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas","author":"Mager","year":"2021"},{"key":"2024090413342139400_bib70","doi-asserted-by":"publisher","first-page":"4810","DOI":"10.18653\/v1\/2020.coling-main.423","article-title":"Manual clustering and spatial arrangement of verbs for multilingual evaluation and typology analysis","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Majewska","year":"2020"},{"key":"2024090413342139400_bib71","first-page":"3158","article-title":"Creating a massively parallel Bible corpus","volume-title":"Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC\u201914)","author":"Mayer","year":"2014"},{"key":"2024090413342139400_bib72","doi-asserted-by":"publisher","DOI":"10.1093\/oso\/9780195166699.001.0001","volume-title":"Defining Creole","author":"McWhorter","year":"2005"},{"key":"2024090413342139400_bib73","volume-title":"APiCS Online","author":"Michaelis","year":"2013"},{"key":"2024090413342139400_bib74","doi-asserted-by":"publisher","first-page":"4975","DOI":"10.18653\/v1\/P19-1491","article-title":"What kind of language is hard to language-model?","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Mielke","year":"2019"},{"key":"2024090413342139400_bib75","first-page":"4397","article-title":"How to parse a creole: When martinican creole meets French","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics","author":"Mompelat","year":"2022"},{"issue":"1","key":"2024090413342139400_bib76","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1075\/dia.13.1.05muf","article-title":"The founder principle in creole genesis","volume":"13","author":"Mufwene","year":"1996","journal-title":"Diachronica"},{"key":"2024090413342139400_bib77","doi-asserted-by":"publisher","DOI":"10.1002\/9781444302851.ch54","volume-title":"What Do Creoles and Pidgins Tell Us About the Evolution of Language?","author":"Mufwene","year":"2008"},{"key":"2024090413342139400_bib78","first-page":"1","article-title":"The evolution of language: Hints from creoles and pidgins","author":"Mufwene","year":"2009","journal-title":"Language Evolution and the Brain"},{"key":"2024090413342139400_bib79","first-page":"348","article-title":"The emergence of creoles and language change","volume-title":"The Routledge Handbook of Linguistic Anthropology","author":"Mufwene","year":"2015"},{"key":"2024090413342139400_bib80","doi-asserted-by":"crossref","first-page":"2319","DOI":"10.18653\/v1\/2023.semeval-1.315","article-title":"SemEval-2023 task 12: Sentiment analysis for African languages (AfriSenti-SemEval)","volume-title":"Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)","author":"Muhammad","year":"2023"},{"key":"2024090413342139400_bib81","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2201.08277","article-title":"Naijasenti: A Nigerian Twitter sentiment corpus for multilingual sentiment analysis","author":"Muhammad","year":"2022"},{"key":"2024090413342139400_bib82","doi-asserted-by":"publisher","DOI":"10.1515\/9781501501418","volume-title":"Pitkern-Norf\u2019k: The Language of Pitcairn Island and Norfolk Island","author":"M\u00fchlh\u00e4usler","year":"2020"},{"key":"2024090413342139400_bib83","doi-asserted-by":"publisher","first-page":"575","DOI":"10.18653\/v1\/2021.conll-1.45","article-title":"A data bootstrapping recipe for low-resource multilingual relation classification","volume-title":"Proceedings of the 25th Conference on Computational Natural Language Learning","author":"Nag","year":"2021"},{"issue":"2","key":"2024090413342139400_bib84","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1080\/00437956.1945.11659254","article-title":"Linguistics and ethnology in translation-problems","volume":"1","author":"Nida","year":"1945","journal-title":"Word"},{"key":"2024090413342139400_bib85","doi-asserted-by":"publisher","first-page":"4547","DOI":"10.18653\/v1\/2020.emnlp-main.368","article-title":"Zero-shot cross-lingual transfer with meta learning","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Nooralahzadeh","year":"2020"},{"key":"2024090413342139400_bib86","article-title":"Pidginunmt: Unsupervised neural machine translation from West African pidgin to English","author":"Ogueji","year":"2019","journal-title":"ArXiv"},{"key":"2024090413342139400_bib87","article-title":"Semantic enrichment of Nigerian pidgin English for contextual sentiment classification","author":"Oyewusi","year":"2020","journal-title":"ArXiv"},{"key":"2024090413342139400_bib88","article-title":"Cross-lingual annotation projection for semantic roles","author":"Pad\u00f3","year":"2014","journal-title":"CoRR"},{"key":"2024090413342139400_bib89","doi-asserted-by":"publisher","first-page":"682","DOI":"10.18653\/v1\/2023.wmt-1.56","article-title":"Findings of the WMT 2023 shared task on low-resource Indic language translation","volume-title":"Proceedings of the Eighth Conference on Machine Translation","author":"Pal","year":"2023"},{"key":"2024090413342139400_bib90","doi-asserted-by":"crossref","first-page":"1946","DOI":"10.18653\/v1\/P17-1178","article-title":"Cross-lingual name tagging and linking for 282 languages","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Pan","year":"2017"},{"key":"2024090413342139400_bib91","first-page":"407","article-title":"Jamaican creole: Morphology and syntax","volume":"2","author":"Patrick","year":"2004","journal-title":"A Handbook of Varieties of English"},{"key":"2024090413342139400_bib92","first-page":"126","article-title":"Jamaican creole","author":"Patrick","year":"2014","journal-title":"Languages and Dialects in the US: Focus on Diversity and Linguistics"},{"key":"2024090413342139400_bib93","doi-asserted-by":"publisher","first-page":"7428","DOI":"10.18653\/v1\/2022.emnlp-main.503","article-title":"Subword evenness (SuE) as a predictor of cross-lingual transfer to low-resource languages","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Pelloni","year":"2022"},{"key":"2024090413342139400_bib94","doi-asserted-by":"publisher","first-page":"4996","DOI":"10.18653\/v1\/P19-1493","article-title":"How multilingual is multilingual BERT?","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Pires","year":"2019"},{"key":"2024090413342139400_bib95","article-title":"What is \u2019typological diversity\u2019 in NLP?","author":"Ploeger","year":"2024"},{"key":"2024090413342139400_bib96","first-page":"140:1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2019","journal-title":"Journal of Machine Learning Research"},{"key":"2024090413342139400_bib97","first-page":"975","article-title":"How good are typological distances for determining genealogical relationships among languages?","volume-title":"Proceedings of COLING 2012: Posters","author":"Rama","year":"2012"},{"key":"2024090413342139400_bib98","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410","article-title":"Sentence-bert: Sentence embeddings using Siamese BERT-networks","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing","author":"Reimers","year":"2019"},{"key":"2024090413342139400_bib99","first-page":"193","article-title":"MCTest: A challenge dataset for the open-domain machine comprehension of text","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Richardson","year":"2013"},{"key":"2024090413342139400_bib100","article-title":"Choice of plausible alternatives: An evaluation of commonsense causal reasoning","volume-title":"2011 AAAI Spring Symposium Series","author":"Roemmele","year":"2011"},{"key":"2024090413342139400_bib101","article-title":"Bloom: A 176b-parameter open-access multilingual language model","author":"Scao","year":"2023"},{"issue":"4","key":"2024090413342139400_bib102","doi-asserted-by":"publisher","first-page":"701","DOI":"10.2307\/3587883","article-title":"Stigmatized and standardized varieties in the classroom: Interference or separation?","volume":"33","author":"Siegel","year":"1999","journal-title":"Tesol Quarterly"},{"key":"2024090413342139400_bib103","doi-asserted-by":"publisher","DOI":"10.1145\/3551624.3555285","article-title":"Participation is not a design fix for machine learning","volume-title":"Equity and Access in Algorithms, Mechanisms, and Optimization","author":"Sloane","year":"2022"},{"key":"2024090413342139400_bib104","first-page":"728","article-title":"Transfer to a low-resource language via close relatives: The case study on Faroese","volume-title":"Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)","author":"Sn\u00e6bjarnarson","year":"2023"},{"key":"2024090413342139400_bib105","doi-asserted-by":"publisher","first-page":"1784","DOI":"10.18653\/v1\/D17-1188","article-title":"Context-aware representations for knowledge base relation extraction","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Sorokin","year":"2017"},{"key":"2024090413342139400_bib106","volume-title":"Zamboanga chabacano structure dataset","author":"Steinkr\u00fcger","year":"2013"},{"key":"2024090413342139400_bib107","article-title":"Multilingual translation with extensible multilingual pretraining and finetuning","author":"Tang","year":"2020","journal-title":"CoRR"},{"key":"2024090413342139400_bib108","first-page":"479","article-title":"OPUS-MT \u2013 building open translation services for the world","volume-title":"Proceedings of the 22nd Annual Conference of the European Association for Machine Translation","author":"Tiedemann","year":"2020"},{"key":"2024090413342139400_bib109","doi-asserted-by":"publisher","first-page":"29","DOI":"10.18653\/v1\/2020.sigtyp-1.4","article-title":"Predicting typological features in WALS using language embeddings and conditional probabilities: \u00daFAL submission to the SIGTYP 2020 shared task","volume-title":"Proceedings of the Second Workshop on Computational Research in Linguistic Typology","author":"Vastl","year":"2020"},{"key":"2024090413342139400_bib110","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1002\/9781444305982.ch9","article-title":"Creole genesis: The impact of the language bioprogram hypothesis","author":"Veenstra","year":"2008","journal-title":"The Handbook of Pidgin and Creole Studies"},{"key":"2024090413342139400_bib111","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/2020.sigmorphon-1.1","article-title":"SIGMORPHON 2020 shared task 0: Typologically diverse morphological inflection","volume-title":"Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology","author":"Vylomova","year":"2020"},{"key":"2024090413342139400_bib112","doi-asserted-by":"publisher","first-page":"1732","DOI":"10.18653\/v1\/P17-1159","article-title":"Universal Dependencies parsing for colloquial Singaporean English","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Wang","year":"2017"},{"key":"2024090413342139400_bib113","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.18653\/v1\/N18-1101","article-title":"A broad-coverage challenge corpus for sentence understanding through inference","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Williams","year":"2018"},{"key":"2024090413342139400_bib114","first-page":"777","article-title":"Cross-lingual few-shot learning on unseen languages","volume-title":"Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Winata","year":"2022"},{"key":"2024090413342139400_bib115","doi-asserted-by":"publisher","first-page":"833","DOI":"10.18653\/v1\/D19-1077","article-title":"Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Shijie","year":"2019"},{"key":"2024090413342139400_bib116","article-title":"Question answering on freebase via relation extraction and textual evidence","author":"Kun","year":"2016","journal-title":"arXiv preprint arXiv:1603.00957"},{"key":"2024090413342139400_bib117","first-page":"8073","article-title":"Cross-linguistic syntactic difference in multilingual BERT: How good is it and how does it affect transfer?","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Ningyu","year":"2022"},{"key":"2024090413342139400_bib118","first-page":"483","article-title":"mT5: A massively multilingual pre-trained text-to-text transformer","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Xue","year":"2021"},{"key":"2024090413342139400_bib119","doi-asserted-by":"publisher","DOI":"10.3115\/1072133.1072187","article-title":"Inducing multilingual text analysis tools via robust projection across aligned corpora","volume-title":"Proceedings of the First International Conference on Human Language Technology Research","author":"Yarowsky","year":"2001"},{"key":"2024090413342139400_bib120","doi-asserted-by":"crossref","first-page":"11682","DOI":"10.18653\/v1\/2023.acl-long.653","article-title":"BLOOM+1: Adding language support to BLOOM for zero-shot prompting","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Yong","year":"2023"},{"key":"2024090413342139400_bib121","first-page":"7210","article-title":"Language embeddings for typology and cross-lingual transfer learning","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Dian","year":"2021"},{"key":"2024090413342139400_bib122","doi-asserted-by":"publisher","first-page":"229","DOI":"10.18653\/v1\/2021.starsem-1.22","article-title":"Inducing language-agnostic multilingual representations","volume-title":"Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics","author":"Zhao","year":"2021"},{"key":"2024090413342139400_bib123","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1109\/JPROC.2020.3004555","article-title":"A comprehensive survey on transfer learning","volume":"109","author":"Zhuang","year":"2019","journal-title":"Proceedings of the IEEE"},{"key":"2024090413342139400_bib124","article-title":"Towards a general purpose machine translation system for sranantongo","author":"Zwennicker","year":"2022"},{"issue":"1","key":"2024090413342139400_bib125","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1162\/coli_a_00425","article-title":"To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP","volume":"48","author":"\u015eahin","year":"2022","journal-title":"Computational Linguistics"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00682\/2468651\/tacl_a_00682.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00682\/2468651\/tacl_a_00682.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T13:35:36Z","timestamp":1725456936000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00682\/124256\/CreoleVal-Multilingual-Multitask-Benchmarks-for"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":125,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00682","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}