{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T14:35:22Z","timestamp":1776090922564,"version":"3.50.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"CSCW1","license":[{"start":{"date-parts":[[2024,4,17]],"date-time":"2024-04-17T00:00:00Z","timestamp":1713312000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"name":"Institute of Information Communications Technology Planning Evaluation","award":["2021-0-01347"],"award-info":[{"award-number":["2021-0-01347"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2024,4,17]]},"abstract":"<jats:p>Research suggests that automatic speech recognition (ASR) systems, which automatically convert speech to text, show different performances according to various input classes (e.g., accent, age), requiring attention to building fairer AI systems that would perform similarly across various input classes. However, would an AI system with the same performance regardless of input classes really be perceived as fair enough? To this end, we investigate how listeners perceive the ASR system of the same result differently according to whether the speaker is a native speaker (NS) or a non-native speaker (NNS), which may lead to unfair situations. We conducted a study (n = 420), where participants were given one of the ten speech recordings with various accents of the same script along with the same captions. We found that even with the same ASR output, listeners perceive the ASR results differently. They found captions to be more useful for NNS's speech and blamed NNS more for the errors than NS. Based on the findings, we present design implications suggesting that we should take a step further than just achieving the same performance across various input classes to build a fair ASR system.<\/jats:p>","DOI":"10.1145\/3641008","type":"journal-article","created":{"date-parts":[[2024,4,29]],"date-time":"2024-04-29T10:05:31Z","timestamp":1714385131000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Is the Same Performance Really the Same?: Understanding How Listeners Perceive ASR Results Differently According to the Speaker's Accent"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0680-0856","authenticated-orcid":false,"given":"Seoyoung","family":"Kim","sequence":"first","affiliation":[{"name":"School of Computing, KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3071-6664","authenticated-orcid":false,"given":"Yeon Su","family":"Park","sequence":"additional","affiliation":[{"name":"School of Computing, KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-9139-1605","authenticated-orcid":false,"given":"Dakyeom","family":"Ahn","sequence":"additional","affiliation":[{"name":"College of Education, SNU, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3292-8839","authenticated-orcid":false,"given":"Jin Myung","family":"Kwak","sequence":"additional","affiliation":[{"name":"School of Computing, KAIST, Dajeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6348-4127","authenticated-orcid":false,"given":"Juho","family":"Kim","sequence":"additional","affiliation":[{"name":"School of Computing, KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,4,26]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1080\/15348458.2019.1635022"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1080\/01434632.2016.1242595"},{"key":"e_1_2_2_3_1","first-page":"150","article-title":"The impact of language anxiety and language proficiency on WTC in EFL context","volume":"7","author":"Alemi Minoo","year":"2011","unstructured":"Minoo Alemi, Parisa Daftarifard, and Roya Pashmforoosh. 2011. The impact of language anxiety and language proficiency on WTC in EFL context. Cross-Cultural Communication, Vol. 7, 3 (2011), 150--166.","journal-title":"Cross-Cultural Communication"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.5594\/JMI.2016.2614919"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1177\/1350508419855714"},{"key":"e_1_2_2_6_1","volume-title":"Big data's disparate impact. California law review","author":"Barocas Solon","year":"2016","unstructured":"Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. California law review (2016), 671--732."},{"key":"e_1_2_2_7_1","volume-title":"English with an accent: Language, ideology, and discrimination in the United States","author":"Barrett Rusty","unstructured":"Rusty Barrett, Jennifer Cramer, and Kevin B McGowan. 2022. English with an accent: Language, ideology, and discrimination in the United States. Taylor & Francis."},{"key":"e_1_2_2_8_1","volume-title":"Racial disparity in natural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061","author":"Blodgett Su Lin","year":"2017","unstructured":"Su Lin Blodgett and Brendan O'Connor. 2017. Racial disparity in natural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061 (2017)."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1080\/01434632.2014.909443"},{"key":"e_1_2_2_10_1","volume-title":"Conference on fairness, accountability and transparency. PMLR, 77--91","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77--91."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1958824.1958848"},{"key":"e_1_2_2_12_1","volume-title":"Toward fairness in speech recognition: Discovery and mitigation of performance disparities. arXiv preprint arXiv:2207.11345","author":"Dheram Pranav","year":"2022","unstructured":"Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, and Andreas Stolcke. 2022. Toward fairness in speech recognition: Discovery and mitigation of performance disparities. arXiv preprint arXiv:2207.11345 (2022)."},{"key":"e_1_2_2_13_1","volume-title":"Performance disparities between accents in automatic speech recognition. arXiv preprint arXiv:2208.01157","author":"DiChristofano Alex","year":"2022","unstructured":"Alex DiChristofano, Henry Shuster, Shefali Chandra, and Neal Patwari. 2022. Performance disparities between accents in automatic speech recognition. arXiv preprint arXiv:2208.01157 (2022)."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1600-0587.2012.07348.x"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0047404515000743"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2631488.2631497"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472749.3474784"},{"key":"e_1_2_2_18_1","volume-title":"False impressions? The effect of language proficiency on cues, perceptions, and lie detection. Canadian Journal of Behavioural Science\/Revue canadienne des sciences du comportement","author":"Elliott Elizabeth","year":"2022","unstructured":"Elizabeth Elliott and Amy-May Leach. 2022. False impressions? The effect of language proficiency on cues, perceptions, and lie detection. Canadian Journal of Behavioural Science\/Revue canadienne des sciences du comportement (2022)."},{"key":"e_1_2_2_19_1","unstructured":"ETS. 2023. TOEFL iBT Listening Section. https:\/\/www.ets.org\/toefl\/test-takers\/ibt\/about\/content\/listening.html"},{"key":"e_1_2_2_20_1","volume-title":"Bence Mark Halpern, and Odette Scharenborg","author":"Feng Siyuan","year":"2021","unstructured":"Siyuan Feng, Olya Kudina, Bence Mark Halpern, and Odette Scharenborg. 2021. Quantifying bias in automatic speech recognition. arXiv preprint arXiv:2103.15122 (2021)."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1002\/ejsp.862"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556288.2557303"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvoice.2006.07.004"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2631488.2631495"},{"key":"e_1_2_2_25_1","volume-title":"Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 297--309","author":"He Helen Ai","year":"2017","unstructured":"Helen Ai He, Naomi Yamashita, Ari Hautasaari, Xun Cao, and Elaine M Huang. 2017. Why did they do that? Exploring attribution mismatches between native and non-native speakers using videoconferencing. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 297--309."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0261444800006583"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1177\/0146167208326477"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1915768117"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLHCC.2013.6645235"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jesp.2010.05.025"},{"key":"e_1_2_2_31_1","volume-title":"Analysis and modeling of non-native speech for automatic speech recognition. Ph.,D. Dissertation","author":"Livescu Karen","unstructured":"Karen Livescu. 1999. Analysis and modeling of non-native speech for automatic speech recognition. Ph.,D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_2_2_32_1","volume-title":"The effects of nonnative accents on listening comprehension: Implications for ESL assessment. TESOL quarterly","author":"Major Roy C","year":"2002","unstructured":"Roy C Major, Susan F Fitzmaurice, Ferenc Bunta, and Chandrika Balasubramanian. 2002. The effects of nonnative accents on listening comprehension: Implications for ESL assessment. TESOL quarterly, Vol. 36, 2 (2002), 173--190."},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.2167\/jmmd565.0"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2011-364"},{"key":"e_1_2_2_35_1","volume-title":"A Survey on Bias and Fairness in Machine Learning. arXiv","author":"Fred Morstatter FMNSKL","year":"1908","unstructured":"FMNSKL a AG NINAREH MEHRABI and Fred Morstatter. 2019. A Survey on Bias and Fairness in Machine Learning. arXiv 1908.09635 (2019)."},{"key":"e_1_2_2_36_1","unstructured":"Poppy Noor. 2021. 'I had to change who I am': 'bison' reporter Deion Broxton on his TV accent struggle. https:\/\/www.theguardian.com\/us-news\/2021\/apr\/02\/deion-broxton-bison-montana-journalist-accent"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2016.2603599"},{"key":"e_1_2_2_38_1","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems , Vol. 35 (2022), 27730--27744.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2998181.2998304"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1518701.1519061"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1753326.1753584"},{"key":"e_1_2_2_42_1","volume-title":"How model accuracy and explanation fidelity influence user trust. arXiv preprint arXiv:1907.12652","author":"Papenmeier Andrea","year":"2019","unstructured":"Andrea Papenmeier, Gwenn Englebienne, and Christin Seifert. 2019. How model accuracy and explanation fidelity influence user trust. arXiv preprint arXiv:1907.12652 (2019)."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3501915"},{"key":"e_1_2_2_44_1","volume-title":"Machine Bias: Risk Assessments in Criminal Sentencing. https:\/\/www.propublica.org\/article\/machine-bias-risk-assessments-in-criminal-sentencing","year":"2016","unstructured":"ProPublica. 2016. Machine Bias: Risk Assessments in Criminal Sentencing. https:\/\/www.propublica.org\/article\/machine-bias-risk-assessments-in-criminal-sentencing"},{"key":"e_1_2_2_45_1","volume-title":"Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever.","author":"Radford Alec","year":"2022","unstructured":"Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356 (2022)."},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1841853.1841865"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.21437\/interspeech.2019--1427"},{"key":"e_1_2_2_48_1","first-page":"3008","article-title":"Learning to summarize with human feedback","volume":"33","author":"Stiennon Nisan","year":"2020","unstructured":"Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. 2020. Learning to summarize with human feedback. Advances in Neural Information Processing Systems , Vol. 33 (2020), 3008--3021.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017--1746"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1940761.1940772"},{"key":"e_1_2_2_51_1","unstructured":"Kentaro Toyama. 2015. Geek heresy: Rescuing social change from the cult of technology. PublicAffairs."},{"key":"e_1_2_2_52_1","volume-title":"The social differentiation of English in Norwich","author":"Trudgill Peter","unstructured":"Peter Trudgill. 1997. The social differentiation of English in Norwich. Springer."},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/223355.223446"},{"key":"e_1_2_2_54_1","volume-title":"ICASSP 2023--2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Veliche Irina-Elena","unstructured":"Irina-Elena Veliche and Pascale Fung. 2023. Improving Fairness and Robustness in End-to-End Speech Recognition Through Unsupervised Clustering. In ICASSP 2023--2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1--5."},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.3390\/sym11081018"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978942.1978963"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543829.3543839"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379503.3403563"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23774-4_19"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300509"},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijmedinf.2004.05.008"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIG.2018.8490433"},{"key":"e_1_2_2_63_1","volume-title":"Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593","author":"Ziegler Daniel M","year":"2019","unstructured":"Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019). io"}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3641008","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3641008","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T17:24:28Z","timestamp":1755883468000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3641008"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,17]]},"references-count":63,"journal-issue":{"issue":"CSCW1","published-print":{"date-parts":[[2024,4,17]]}},"alternative-id":["10.1145\/3641008"],"URL":"https:\/\/doi.org\/10.1145\/3641008","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,17]]},"assertion":[{"value":"2024-04-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}