{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,18]],"date-time":"2026-05-18T15:43:44Z","timestamp":1779119024000,"version":"3.51.4"},"reference-count":37,"publisher":"Proceedings of the National Academy of Sciences","issue":"14","license":[{"start":{"date-parts":[[2020,3,23]],"date-time":"2020-03-23T00:00:00Z","timestamp":1584921600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["www.pnas.org"],"crossmark-restriction":true},"short-container-title":["Proc. Natl. Acad. Sci. U.S.A."],"published-print":{"date-parts":[[2020,4,7]]},"abstract":"<jats:p>Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems\u2014developed by Amazon, Apple, Google, IBM, and Microsoft\u2014to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies\u2014such as using more diverse training datasets that include African American Vernacular English\u2014to reduce these performance differences and ensure speech recognition technology is inclusive.<\/jats:p>","DOI":"10.1073\/pnas.1915768117","type":"journal-article","created":{"date-parts":[[2020,3,24]],"date-time":"2020-03-24T00:20:59Z","timestamp":1585009259000},"page":"7684-7689","update-policy":"https:\/\/doi.org\/10.1073\/pnas.cm10313","source":"Crossref","is-referenced-by-count":466,"title":["Racial disparities in automated speech recognition"],"prefix":"10.1073","volume":"117","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6233-8256","authenticated-orcid":false,"given":"Allison","family":"Koenecke","sequence":"first","affiliation":[{"name":"Institute for Computational &amp; Mathematical Engineering, Stanford University, Stanford, CA 94305;"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Nam","sequence":"additional","affiliation":[{"name":"Department of Psychology, Stanford University, Stanford, CA 94305;"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emily","family":"Lake","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Stanford University, Stanford, CA 94305;"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joe","family":"Nudell","sequence":"additional","affiliation":[{"name":"Department of Management Science &amp; Engineering, Stanford University, Stanford, CA 94305;"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Minnie","family":"Quartey","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Georgetown University, Washington, DC 20057;"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zion","family":"Mengesha","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Stanford University, Stanford, CA 94305;"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Connor","family":"Toups","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Stanford University, Stanford, CA 94305;"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John R.","family":"Rickford","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Stanford University, Stanford, CA 94305;"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dan","family":"Jurafsky","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Stanford University, Stanford, CA 94305;"},{"name":"Department of Computer Science, Stanford University, Stanford, CA 94305"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6103-9318","authenticated-orcid":false,"given":"Sharad","family":"Goel","sequence":"additional","affiliation":[{"name":"Department of Management Science &amp; Engineering, Stanford University, Stanford, CA 94305;"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"341","published-online":{"date-parts":[[2020,3,23]]},"reference":[{"key":"e_1_3_4_1_2","doi-asserted-by":"crossref","first-page":"53","DOI":"10.18653\/v1\/W17-1606","volume-title":"Proceedings of the First ACL Workshop on Ethics in Natural Language Processing","author":"Tatman R.","year":"2017","unstructured":"R. Tatman, \u201cGender and dialect bias in YouTube\u2019s automatic captions\u201d in Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, D. Hovy , Eds. (Association for Computational Linguistics, 2017), pp. 53\u201359."},{"key":"e_1_3_4_2_2","doi-asserted-by":"crossref","first-page":"934","DOI":"10.21437\/Interspeech.2017-1746","volume-title":"INTERSPEECH","author":"Tatman R.","year":"2017","unstructured":"R. Tatman, C. Kasten, \u201cEffects of talker dialect, gender & race on accuracy of Bing speech and YouTube automatic captions\u201d in INTERSPEECH, F. Lacerda , Eds. (International Speech Communication Association, 2017), pp. 934\u2013938."},{"key":"e_1_3_4_3_2","unstructured":"D. Harwell B. Mayes M. Walls S. Hashemi The accent gap. The Washington Post 19 July 2018. https:\/\/www.washingtonpost.com\/graphics\/2018\/business\/alexa-does-not-understand-your-accent\/. Accessed 28 February 2020."},{"key":"e_1_3_4_4_2","unstructured":"F. Kitashov E. Svitanko D. Dutta Foreign English accent adjustment by learning phonetic patterns. arXiv:1807.03625 (9 July 2018)."},{"key":"e_1_3_4_5_2","first-page":"77","volume-title":"Proceedings of the Conference on Fairness, Accountability and Transparency","author":"Buolamwini J.","year":"2018","unstructured":"J. Buolamwini, T. Gebru, \u201cGender shades: Intersectional accuracy disparities in commercial gender classification\u201d in Proceedings of the Conference on Fairness, Accountability and Transparency, S. A. Friedler, C. Wilson, Eds. (Association for Computing Machinery, New York, NY, 2018), pp. 77\u201391."},{"key":"e_1_3_4_6_2","first-page":"429","volume-title":"AAAI\/ACM Conference on AI Ethics and Society","author":"Raji I. D.","year":"2019","unstructured":"I. D. Raji, J. Buolamwini, \u201cActionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products\u201d in AAAI\/ACM Conference on AI Ethics and Society (Association for Computing Machinery, 2019), Vol. 1, pp. 429\u2013435."},{"key":"e_1_3_4_7_2","doi-asserted-by":"crossref","first-page":"1119","DOI":"10.18653\/v1\/D16-1120","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing","author":"Blodgett S. L.","year":"2016","unstructured":"S. L. Blodgett, L. Green, B. O\u2019Connor, \u201cDemographic dialectal variation in social media: A case study of African-American English\u201d in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, J. Su, K. Duh, X. Carreras, Eds. (Association for Computational Linguistics, 2016), pp. 1119\u20131130."},{"key":"e_1_3_4_8_2","unstructured":"S. L. Blodgett B. O\u2019Connor Racial disparity in natural language processing: A case study of social media African-American English. arXiv:1707.00061 (30 June 2017)."},{"key":"e_1_3_4_9_2","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1126\/science.aal4230","article-title":"Semantics derived automatically from language corpora contain human-like biases","volume":"356","author":"Caliskan A.","year":"2017","unstructured":"A. Caliskan, J. J. Bryson, A. Narayanan, Semantics derived automatically from language corpora contain human-like biases. Science 356, 183\u2013186 (2017).","journal-title":"Science"},{"key":"e_1_3_4_10_2","doi-asserted-by":"crossref","first-page":"1668","DOI":"10.18653\/v1\/P19-1163","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Sap M.","year":"2019","unstructured":"M. Sap, D. Card, S. Gabriel, Y. Choi, N. A. Smith, \u201cThe risk of racial bias in hate speech detection\u201d in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, A. Korhonen, D. Traum, L. M\u00e0rquez, Eds. (Association for Computational Linguistics, 2019), pp. 1668\u20131678."},{"key":"e_1_3_4_11_2","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1145\/3287560.3287572","volume-title":"Proceedings of the Conference on Fairness, Accountability, and Transparency","author":"De-Arteaga M.","year":"2019","unstructured":"M. De-Arteaga , \u201cBias in bios: A case study of semantic representation bias in a high-stakes setting\u201d in Proceedings of the Conference on Fairness, Accountability, and Transparency (ACM, 2019), pp. 120\u2013128."},{"key":"e_1_3_4_12_2","doi-asserted-by":"crossref","unstructured":"M. Ali Discrimination through optimization: How Facebook\u2019s ad delivery can lead to skewed outcomes. arXiv:1904.02095 (12 September 2019).","DOI":"10.1145\/3359301"},{"key":"e_1_3_4_13_2","first-page":"20","volume-title":"Proceedings of the Conference on Fairness, Accountability and Transparency","author":"Datta A.","year":"2018","unstructured":"A. Datta, A. Datta, J. Makagon, D. K. Mulligan, M. C. Tschantz, \u201cDiscrimination in online advertising: A multidisciplinary inquiry\u201d in Proceedings of the Conference on Fairness, Accountability and Transparency, S. A. Friedler, C. Wilson, Eds. (Association for Computing Machinery, New York, NY, 2018), pp. 20\u201334."},{"key":"e_1_3_4_14_2","doi-asserted-by":"crossref","first-page":"797","DOI":"10.1145\/3097983.3098095","volume-title":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Corbett-Davies S.","year":"2017","unstructured":"S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, A. Huq, \u201cAlgorithmic decision making and the cost of fairness\u201d in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2017), pp. 797\u2013806."},{"key":"e_1_3_4_15_2","unstructured":"S. Corbett-Davies S. Goel The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv:1808.00023 (14 August 2018)."},{"key":"e_1_3_4_16_2","volume-title":"Proceedings of Innovations in Theoretical Computer Science","author":"Kleinberg J.","year":"2017","unstructured":"J. Kleinberg, S. Mullainathan, M. Raghavan, \u201cInherent trade-offs in the fair determination of risk scores\u201d in Proceedings of Innovations in Theoretical Computer Science, C. H. Papadimitriou, Ed. (ITCS, 2017)."},{"key":"e_1_3_4_17_2","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1089\/big.2016.0047","article-title":"Fair prediction with disparate impact: A study of bias in recidivism prediction instruments","volume":"5","author":"Chouldechova A.","year":"2017","unstructured":"A. Chouldechova, Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 5, 153\u2013163 (2017).","journal-title":"Big Data"},{"key":"e_1_3_4_18_2","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1126\/science.aax2342","article-title":"Dissecting racial bias in an algorithm used to manage the health of populations","volume":"366","author":"Obermeyer Z.","year":"2019","unstructured":"Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447\u2013453 (2019).","journal-title":"Science"},{"key":"e_1_3_4_19_2","doi-asserted-by":"crossref","first-page":"883","DOI":"10.7326\/M18-3297","article-title":"Machine learning, health disparities, and causal reasoning","volume":"169","author":"Goodman S. N.","year":"2018","unstructured":"S. N. Goodman, S. Goel, M. R. Cullen, Machine learning, health disparities, and causal reasoning. Ann. Intern. Med. 169, 883 (2018).","journal-title":"Ann. Intern. Med."},{"key":"e_1_3_4_20_2","first-page":"134","volume-title":"Proceedings of the Conference on Fairness, Accountability and Transparency","author":"Chouldechova A.","year":"2018","unstructured":"A. Chouldechova, D. Benavides-Prado, O. Fialko, R. Vaithianathan, \u201cA case study of algorithm-assisted decision making in child maltreatment hotline screening decisions\u201d in Proceedings of the Conference on Fairness, Accountability and Transparency, S. A. Friedler, C. Wilson, Eds. (Association for Computing Machinery, New York, NY, 2018), pp. 134\u2013148."},{"key":"e_1_3_4_21_2","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1089\/big.2016.0052","article-title":"Predictive analytics for city agencies: Lessons from children\u2019s services","volume":"5","author":"Shroff R.","year":"2017","unstructured":"R. Shroff, Predictive analytics for city agencies: Lessons from children\u2019s services. Big Data 5, 189\u2013196 (2017).","journal-title":"Big Data"},{"key":"e_1_3_4_22_2","unstructured":"T. Kendall C. Farrington The corpus of regional African American language (2018). https:\/\/oraal.uoregon.edu\/coraal. Accessed 28 February 2020."},{"key":"e_1_3_4_23_2","volume-title":"Phonological and Grammatical Features of African American Vernacular (AAVE)","author":"Rickford J. R.","year":"1999","unstructured":"J. R. Rickford, Phonological and Grammatical Features of African American Vernacular (AAVE) (Blackwell Publishers, 1999)."},{"key":"e_1_3_4_24_2","doi-asserted-by":"crossref","first-page":"450","DOI":"10.1111\/j.1749-818X.2007.00029.x","article-title":"Phonological and phonetic characteristics of African American Vernacular English","volume":"1","author":"Thomas E. R.","year":"2007","unstructured":"E. R. Thomas, Phonological and phonetic characteristics of African American Vernacular English. Lang. Linguist. Compass 1, 450\u2013475 (2007).","journal-title":"Lang. Linguist. Compass"},{"key":"e_1_3_4_25_2","first-page":"85","volume-title":"African-American English: Structure, History and Use","author":"Bailey G.","year":"1998","unstructured":"G. Bailey, E. Thomas, \u201cSome aspects of African-American Vernacular English phonology\u201d in African-American English: Structure, History and Use, S. Mufwene, J. R. Rickford, G. Bailey, J. Baugh, Eds. (Routledge, New York, NY, 1998), pp. 85\u2013109."},{"key":"e_1_3_4_26_2","unstructured":"Stanford Linguistics Voices of California. http:\/\/web.stanford.edu\/dept\/linguistics\/VoCal\/. Accessed 28 February 2020."},{"key":"e_1_3_4_27_2","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/S0167-6393(01)00041-3","article-title":"Testing the correlation of word error rate and perplexity","volume":"38","author":"Klakow D.","year":"2002","unstructured":"D. Klakow, J. Peters, Testing the correlation of word error rate and perplexity. Speech Commun. 38, 19\u201328 (2002).","journal-title":"Speech Commun."},{"key":"e_1_3_4_28_2","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/j.specom.2009.10.001","article-title":"Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates","volume":"52","author":"Goldwater S.","year":"2010","unstructured":"S. Goldwater, D. Jurafsky, C. D. Manning, Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Commun. 52, 181\u2013200 (2010).","journal-title":"Speech Commun."},{"key":"e_1_3_4_29_2","doi-asserted-by":"crossref","first-page":"2205","DOI":"10.21437\/Interspeech.2005-699","volume-title":"Proceedings of INTERSPEECH-2005","author":"Adda-Decker M.","year":"2005","unstructured":"M. Adda-Decker, L. Lamel, \u201cDo speech recognizers prefer female speakers?\u201d in Proceedings of INTERSPEECH-2005 (International Speech Communication Association, 2005), pp. 2205\u20132208."},{"key":"e_1_3_4_30_2","doi-asserted-by":"crossref","first-page":"2978","DOI":"10.18653\/v1\/P19-1285","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Dai Z.","year":"2019","unstructured":"Z. Dai , \u201cTransformer-XL: Attentive language models beyond a fixed-length context\u201d in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, A. Korhonen, D. Traum, L. M\u00e0rquez, Eds. (Association for Computational Linguistics, 2019), pp. 2978\u20132988."},{"key":"e_1_3_4_31_2","unstructured":"A. Radford K. Narasimhan T. Salimans I. Sutskever Improving language understanding by generative pre-training. https:\/\/s3-us-west-2.amazonaws.com\/openai-assets\/research-covers\/language-unsupervised\/language_understanding_paper.pdf. Accessed 28 February 2020."},{"key":"e_1_3_4_32_2","unstructured":"A. Radford Language models are unsupervised multitask learners. https:\/\/openai.com\/blog\/better-language-models\/. Accessed 28 February 2020."},{"key":"e_1_3_4_33_2","first-page":"1","article-title":"MatchIt: Nonparametric preprocessing for parametric causal inference","volume":"42","author":"Ho D. E.","year":"2011","unstructured":"D. E. Ho, K. Imai, G. King, E. A. Stuart, MatchIt: Nonparametric preprocessing for parametric causal inference. J. Stat. Software 42, 1\u201328 (2011).","journal-title":"J. Stat. Software"},{"key":"e_1_3_4_34_2","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1044\/jslhr.4302.366","article-title":"An assessment battery for identifying language impairments in African American children","volume":"43","author":"Craig H. K.","year":"2000","unstructured":"H. K. Craig, J. A. Washington, An assessment battery for identifying language impairments in African American children. J. Speech Lang. Hear. Res. 43, 366\u2013379 (2000).","journal-title":"J. Speech Lang. Hear. Res."},{"key":"e_1_3_4_35_2","doi-asserted-by":"crossref","DOI":"10.1044\/1092-4388(2002\/040)","article-title":"Methods for characterizing participants\u2019 nonmainstream dialect use in child language research","author":"Oetting J. B.","year":"2002","unstructured":"J. B. Oetting, J. L. McDonald, Methods for characterizing participants\u2019 nonmainstream dialect use in child language research. J. Speech Lang. Hear. Res. (2002).","journal-title":"J. Speech Lang. Hear. Res."},{"key":"e_1_3_4_36_2","first-page":"381","volume-title":"The Oxford Handbook of Language and Society","author":"Hudley A. H. C.","year":"2016","unstructured":"A. H. C. Hudley, \u201cLanguage and racialization\u201d in The Oxford Handbook of Language and Society, O. Garc\u00eda, N. Flores, M. Spotti, Eds. (Oxford University Press, 2016), pp. 381\u2013402."},{"key":"e_1_3_4_37_2","doi-asserted-by":"crossref","first-page":"281","DOI":"10.4324\/9781315535258-53","volume-title":"Data Collection in Sociolinguistics: Methods and Applications","author":"Green L.","year":"2017","unstructured":"L. Green, \u201cBeyond lists of differences to accurate descriptions\u201d in Data Collection in Sociolinguistics: Methods and Applications, C. Mallinson, B. Childs, G. Van Herk, Eds. (Routledge, 2017), pp. 281\u2013285."}],"container-title":["Proceedings of the National Academy of Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/pnas.org\/doi\/pdf\/10.1073\/pnas.1915768117","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,7]],"date-time":"2022-06-07T20:52:54Z","timestamp":1654635174000},"score":1,"resource":{"primary":{"URL":"https:\/\/pnas.org\/doi\/full\/10.1073\/pnas.1915768117"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,23]]},"references-count":37,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2020,4,7]]}},"alternative-id":["10.1073\/pnas.1915768117"],"URL":"https:\/\/doi.org\/10.1073\/pnas.1915768117","relation":{},"ISSN":["0027-8424","1091-6490"],"issn-type":[{"value":"0027-8424","type":"print"},{"value":"1091-6490","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,23]]},"assertion":[{"value":"2020-03-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}