{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T13:52:41Z","timestamp":1762264361049,"version":"build-2065373602"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,11,4]]},"DOI":"10.1145\/3757232.3757340","type":"proceedings-article","created":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T12:13:01Z","timestamp":1762258381000},"page":"454-458","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Capturing Linguistic Diversity in Data Annotation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-9153-0157","authenticated-orcid":false,"given":"Wangui","family":"Kamande","sequence":"first","affiliation":[{"name":"Sama, Nairobi, Kenya"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4675-5808","authenticated-orcid":false,"given":"Claudel","family":"Rheault","sequence":"additional","affiliation":[{"name":"Sama, Montreal, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7737-0793","authenticated-orcid":false,"given":"Margret","family":"Gatwiri","sequence":"additional","affiliation":[{"name":"Sama, Nairobi, Kenya"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1398-9175","authenticated-orcid":false,"given":"Nicolas","family":"Duch\u00eane","sequence":"additional","affiliation":[{"name":"Sama, Montreal, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6348-2885","authenticated-orcid":false,"given":"Bryan","family":"Gachambi","sequence":"additional","affiliation":[{"name":"Sama, Nairobi, Kenya"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3419-2950","authenticated-orcid":false,"given":"Pascal","family":"Jauffret","sequence":"additional","affiliation":[{"name":"Sama, Montreal, Canada"}]}],"member":"320","published-online":{"date-parts":[[2025,11,4]]},"reference":[{"key":"e_1_3_3_1_2_2","doi-asserted-by":"publisher","unstructured":"Sabreena Ahmed and Ratnawati\u00a0Mohd. Asraf. 2018. The Workshop as a Qualitative Research Approach: Lessons Learnt from a \u201cCritical Thinking Through Writing\u201d Workshop. The Turkish Online Journal of Design Art and Communication - TOJDAC 8 (September 2018) 1504\u20131510. 10.7456\/1080SSE\/201Special Edition.","DOI":"10.7456\/1080SSE\/201"},{"key":"e_1_3_3_1_3_2","volume-title":"The Digital Factory: The Human Labor of Automation","author":"Altenried Moritz","year":"2020","unstructured":"Moritz Altenried. 2020. The Digital Factory: The Human Labor of Automation. Pluto Press."},{"key":"e_1_3_3_1_4_2","unstructured":"Cynthia\u00a0Jayne Amol Everlyn\u00a0Asiko Chimoto Rose\u00a0Delilah Gesicho Antony\u00a0M Gitau Naome\u00a0A Etori Caringtone Kinyanjui Steven Ndung\u2019u Lawrence Moruye Samson\u00a0Otieno Ooko Kavengi Kitonga et\u00a0al. 2024. State of NLP in Kenya: A Survey. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.09948 (2024)."},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Zahra Ashktorab Michael Desmond Josh Andres Michael Muller Narendra\u00a0Nath Joshi Michelle Brachman Aabhas Sharma Kristina Brimijoin Qian Pan Christine\u00a0T Wolf et\u00a0al. 2021. Ai-assisted human labeling: Batching for efficiency without overreliance. Proceedings of the ACM on Human-Computer Interaction 5 CSCW1 (2021) 1\u201327.","DOI":"10.1145\/3449163"},{"key":"e_1_3_3_1_6_2","unstructured":"Alessio Buscemi C\u00e9dric Lothritz Sergio Morales Marcos Gomez-Vazquez Robert Claris\u00f3 Jordi Cabot and German Castignani. 2025. Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High-and Low-Resource Languages. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2504.18560 (2025)."},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3026044"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-70549-6_13"},{"key":"e_1_3_3_1_9_2","unstructured":"David Dwyer. 2021. The Analysis of Tone in Acholi Luo and Lango."},{"key":"e_1_3_3_1_10_2","doi-asserted-by":"crossref","unstructured":"Roopal Garg Andrea Burns Burcu\u00a0Karagol Ayan Yonatan Bitton Ceslee Montgomery Yasumasa Onoe Andrew Bunner Ranjay Krishna Jason Baldridge and Radu Soricut. 2024. Imageinwords: Unlocking hyper-detailed image descriptions. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.02793 (2024).","DOI":"10.18653\/v1\/2024.emnlp-main.6"},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"publisher","unstructured":"Rebecca Gill Joshua Barbour and Marleah Dean. 2014. Shadowing in\/as work: ten recommendations for shadowing fieldwork practice. Qualitative Research in Organizations and Management 9 1 (2014) 69\u201389. 10.1108\/QROM-09-2012-1100","DOI":"10.1108\/QROM-09-2012-1100"},{"key":"e_1_3_3_1_12_2","unstructured":"Google Research. 2023. Amplify Initiative: Localized Data for Globalized AI. https:\/\/research.google\/blog\/amplify-initiative-localized-data-for-globalized-ai\/ Accessed: 2025-04-01."},{"key":"e_1_3_3_1_13_2","volume-title":"Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass","author":"Gray Mary\u00a0L.","year":"2019","unstructured":"Mary\u00a0L. Gray and Siddharth Suri. 2019. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Houghton Mifflin Harcourt."},{"key":"e_1_3_3_1_14_2","volume-title":"Labor in the Global Digital Economy: The Cybertariat Comes of Age","author":"Huws Ursula","year":"2014","unstructured":"Ursula Huws (Ed.). 2014. Labor in the Global Digital Economy: The Cybertariat Comes of Age. Monthly Review Press."},{"key":"e_1_3_3_1_15_2","volume-title":"Digital Labor","author":"Jarrett Kylie","year":"2016","unstructured":"Kylie Jarrett. 2016. Digital Labor. Routledge."},{"key":"e_1_3_3_1_16_2","unstructured":"Shabina Khan. 2014. Hofstede\u2019s Individualism-Collectivism Cultural Dimension: Its Relevance to Foreign Language Teaching. The International Journal of Humanities & Social Studies 2 3 (March 2014) 87\u201392. https:\/\/faculty.ksu.edu.sa\/sites\/default\/files\/18.hs1403-038_2.pdf"},{"key":"e_1_3_3_1_17_2","unstructured":"Hannah Kim Kushan Mitra Rafael\u00a0Li Chen Sajjadur Rahman and Dan Zhang. 2024. Meganno+: A human-llm collaborative annotation system. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.18050 (2024)."},{"key":"e_1_3_3_1_18_2","doi-asserted-by":"crossref","unstructured":"Cheng Li Mengzhuo Chen Jindong Wang Sunayana Sitaram and Xing Xie. 2024. Culturellm: Incorporating cultural differences into large language models. Advances in Neural Information Processing Systems 37 (2024) 84799\u201384838.","DOI":"10.52202\/079017-2693"},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642089"},{"key":"e_1_3_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1017\/9781108283977.007"},{"key":"e_1_3_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.23"},{"key":"e_1_3_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3711542.3711596"},{"key":"e_1_3_3_1_23_2","unstructured":"Milagros Miceli and Julian Posada. 2021. Wisdom for the crowd: discoursive power in annotation instructions for computer vision. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2105.10990 (2021)."},{"key":"e_1_3_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Milagros Miceli and Julian Posada. 2022. The data-production dispositif. Proceedings of the ACM on human-computer interaction 6 CSCW2 (2022) 1\u201337.","DOI":"10.1145\/3555561"},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Milagros Miceli Martin Schuessler and Tianling Yang. 2020. Between subjectivity and imposition: Power dynamics in data annotation for computer vision. Proceedings of the ACM on Human-Computer Interaction 4 CSCW2 (2020) 1\u201325.","DOI":"10.1145\/3415186"},{"key":"e_1_3_3_1_26_2","unstructured":"Mozilla. 2023. Common Voice. https:\/\/commonvoice.mozilla.org\/en Accessed: 2025-04-01."},{"key":"e_1_3_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Yasumasa Onoe Sunayana Rane Zachary Berger Yonatan Bitton Jaemin Cho Roopal Garg Alexander Ku Zarana Parekh Jordi Pont-Tuset Garrett Tanzer et\u00a0al. 2024. Docci: Descriptions of Connected and Contrasting Images. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.19753 (2024).","DOI":"10.1007\/978-3-031-73027-6_17"},{"key":"e_1_3_3_1_28_2","unstructured":"Juan\u00a0N Pava Caroline Meinhardt Haifa Badi\u00a0Uz Zaman Toni Friedman Sang\u00a0T Truong Daniel Zhang Vukosi Marivate and Sanmi Koyejo. 2023. Mind the Language Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts. https:\/\/hai.stanford.edu\/policy\/mind-the-language-gap-mapping-the-challenges-of-llm-development-in-low-resource-language-contexts Accessed: 2025-04-13."},{"key":"e_1_3_3_1_29_2","unstructured":"David Romero Chenyang Lyu Haryo\u00a0Akbarianto Wibowo Teresa Lynn Injy Hamed Aditya\u00a0Nanda Kishore Aishik Mandal Alina Dragonetti Artem Abzaliev Atnafu\u00a0Lambebo Tonja et\u00a0al. 2024. Cvqa: Culturally-diverse multilingual visual question answering benchmark. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.05967 (2024)."},{"key":"e_1_3_3_1_30_2","volume-title":"Digital Labor: The Internet as Playground and Factory","author":"Scholz Trebor","year":"2013","unstructured":"Trebor Scholz (Ed.). 2013. Digital Labor: The Internet as Playground and Factory. Routledge. https:\/\/www.google.ca\/books\/edition\/Digital_Labor\/y050EAAAQBAJ Accessed: 2025-01-29."},{"key":"e_1_3_3_1_31_2","unstructured":"Godfrey\u00a0Steven Semwaiko Kang-Ming Chang and Kai-Chun Hou. 2023. Colors Preferences in Tanzanian Culture. International Multilingual Journal of Science and Technology 8 2 (February 2023) 5834\u20135846. https:\/\/www.imjst.org\/wp-content\/uploads\/2023\/02\/IMJSTP29120805.pdf"},{"key":"e_1_3_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3654777.3676450"},{"key":"e_1_3_3_1_33_2","first-page":"121","volume-title":"Collaborative Research Design: Working with Business for Meaningful Results","author":"Storvang Pia","year":"2021","unstructured":"Pia Storvang, Ann\u00a0H\u00f8jbjerg Clarke, and Bo Mortensen. 2021. Workshop as a Research Method in Business Research. In Collaborative Research Design: Working with Business for Meaningful Results, Per\u00a0Vagn Freytag, Louise Young, and Majbritt\u00a0Rostgaard Evald (Eds.). Springer, Cham, Chapter\u00a06, 121\u2013138. https:\/\/books.google.co.ke\/books?id=3SI6EQAAQBAJ"},{"key":"e_1_3_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Zhen Tan Dawei Li Song Wang Alimohammad Beigi Bohan Jiang Amrita Bhattacharjee Mansooreh Karami Jundong Li Lu Cheng and Huan Liu. 2024. Large language models for data annotation and synthesis: A survey. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.13446 (2024).","DOI":"10.18653\/v1\/2024.emnlp-main.54"},{"key":"e_1_3_3_1_35_2","unstructured":"Ashmal Vayani Dinura Dissanayake Hasindri Watawana Noor Ahsan Nevasini Sasikumar Omkar Thawakar Henok\u00a0Biadglign Ademtew Yahya Hmaiti Amandeep Kumar Kartik Kuckreja et\u00a0al. 2024. All languages matter: Evaluating lmms on culturally diverse 100 languages. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2411.16508 (2024)."},{"key":"e_1_3_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502121"},{"key":"e_1_3_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3641960"},{"key":"e_1_3_3_1_38_2","unstructured":"Rikke \u00d8rngreen and Karin\u00a0Tweddell Levinsen. 2017. Workshops as a Research Methodology. Electronic Journal of e-Learning 15 1 (2017) 70\u201381. https:\/\/vbn.aau.dk\/files\/257686207\/_rngreen_Levinsen_Workshop_as_a_Research_methodology_ejel_volume15_issue1_article569.pdf"}],"event":{"name":"Africhi 2025: The 5th Biennial African Human Computer Interaction Conference","acronym":"Africhi 2025","location":"Cairo Egypt"},"container-title":["Proceedings of the Fifth Biennial African Human-Computer Interaction Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3757232.3757340","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T13:48:10Z","timestamp":1762264090000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3757232.3757340"}},"subtitle":["Learnings From Reviewing AI Generated Content in Swahili and English"],"short-title":[],"issued":{"date-parts":[[2025,11,4]]},"references-count":37,"alternative-id":["10.1145\/3757232.3757340","10.1145\/3757232"],"URL":"https:\/\/doi.org\/10.1145\/3757232.3757340","relation":{},"subject":[],"published":{"date-parts":[[2025,11,4]]},"assertion":[{"value":"2025-11-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}