{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T02:14:59Z","timestamp":1768011299335,"version":"3.49.0"},"reference-count":21,"publisher":"Association for Computing Machinery (ACM)","issue":"122","license":[{"start":{"date-parts":[[2020,3,3]],"date-time":"2020-03-03T00:00:00Z","timestamp":1583193600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGACCESS Access. Comput."],"published-print":{"date-parts":[[2020,3,3]]},"abstract":"<jats:p>Many Deaf and Hard-of-Hearing (DHH) individuals across the world benefit from various captioning services for accessing information existing in the form of speech. Today, the Automatic Speech Recognition (ASR) technology has the potential to replace the existing human-provided services for captioning due to their lowered cost of operation and ever-increasing accuracy. However, as with most automatic systems, ASR technology is still not fully perfect --- which leads to issues in terms of its trust and acceptance when focusing on building a human-free service of communication for these users. Thus, there is a need for evaluating the usability these systems with the users before deploying them into the real-world. Yet, most researchers lack access to sufficient DHH users for extrinsic, empirical studies of these automatic captioning systems. This articles presents our work on the development of automatic caption quality evaluation metric which we design and validate through studies and real-world observations with DHH users.<\/jats:p>","DOI":"10.1145\/3386410.3386411","type":"journal-article","created":{"date-parts":[[2020,3,3]],"date-time":"2020-03-03T22:48:45Z","timestamp":1583275725000},"page":"1-1","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Usability evaluation of captions for people who are deaf or hard of hearing"],"prefix":"10.1145","author":[{"given":"Sushant","family":"Kafle","sequence":"first","affiliation":[{"name":"Rochester Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matt","family":"Huenerfauth","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,3,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/638249.638284"},{"key":"e_1_2_1_2_1","volume-title":"Frequency and predictability effects in eye fixations for skilled and less-skilled deaf readers. Visual cognition 21, 4","author":"B\u00e9langer Nathalie N","year":"2013","unstructured":"Nathalie N B\u00e9langer and Keith Rayner . 2013. Frequency and predictability effects in eye fixations for skilled and less-skilled deaf readers. Visual cognition 21, 4 ( 2013 ), 477--497. Nathalie N B\u00e9langer and Keith Rayner. 2013. Frequency and predictability effects in eye fixations for skilled and less-skilled deaf readers. Visual cognition 21, 4 (2013), 477--497."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1093\/deafed\/enp033"},{"key":"e_1_2_1_4_1","volume-title":"INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association","author":"Favre Beno\u00eet","year":"2013","unstructured":"Beno\u00eet Favre , Kyla Cheung , Siavash Kazemian , Adam Lee , Yang Liu , Cosmin Munteanu , Ani Nenkova , Dennis Ochei , Gerald Penn , Stephen Tratz , Clare R. Voss , and Frauke Zeller . 2013 . Automatic human utility evaluation of ASR systems: does WER really predict performance? . In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association , Lyon, France , August 25-29, 2013. 3463--3467. http:\/\/www.isca-speech.org\/archive\/interspeech_2013\/i13_3463.html Beno\u00eet Favre, Kyla Cheung, Siavash Kazemian, Adam Lee, Yang Liu, Cosmin Munteanu, Ani Nenkova, Dennis Ochei, Gerald Penn, Stephen Tratz, Clare R. Voss, and Frauke Zeller. 2013. Automatic human utility evaluation of ASR systems: does WER really predict performance?. In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. 3463--3467. http:\/\/www.isca-speech.org\/archive\/interspeech_2013\/i13_3463.html"},{"key":"e_1_2_1_5_1","volume-title":"6th International Conference, College de France","author":"Garofolo John S.","year":"2000","unstructured":"John S. Garofolo , Cedric G. P. Auzanne , and Ellen M. Voorhees . 2000. The TREC Spoken Document Retrieval Track: A Success Story. In Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications) - RIAO 2000 , 6th International Conference, College de France , France , April 12-14, 2000 . Proceedings. 1--20. John S. Garofolo, Cedric G. P. Auzanne, and Ellen M. Voorhees. 2000. The TREC Spoken Document Retrieval Track: A Success Story. In Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications) - RIAO 2000, 6th International Conference, College de France, France, April 12-14, 2000. Proceedings. 1--20."},{"key":"e_1_2_1_7_1","volume-title":"the 4th Workshop on Child, Computer and Interaction, WOCCI 2014","author":"Gray Sharmistha S.","year":"2014","unstructured":"Sharmistha S. Gray , Daniel Willett , Jianhua Lu , Joel Pinto , Paul Maergner , and Nathan Bodenstab . 2014 . Child automatic speech recognition for US English: child interaction with living-room-electronic-devices . In the 4th Workshop on Child, Computer and Interaction, WOCCI 2014 , Singapore , September 19, 2014. 21--26. http:\/\/www.isca-speech.org\/archive\/wocci_2014\/wc14_021.html Sharmistha S. Gray, Daniel Willett, Jianhua Lu, Joel Pinto, Paul Maergner, and Nathan Bodenstab. 2014. Child automatic speech recognition for US English: child interaction with living-room-electronic-devices. In the 4th Workshop on Child, Computer and Interaction, WOCCI 2014, Singapore, September 19, 2014. 21--26. http:\/\/www.isca-speech.org\/archive\/wocci_2014\/wc14_021.html"},{"key":"e_1_2_1_8_1","volume-title":"Prior knowledge and reading comprehension ability of deaf adolescents. Journal of Deaf Studies and Deaf Education","author":"Jackson Dorothy W","year":"1997","unstructured":"Dorothy W Jackson , Peter V Paul , and Jonathan C Smith . 1997. Prior knowledge and reading comprehension ability of deaf adolescents. Journal of Deaf Studies and Deaf Education ( 1997 ), 172--184. Dorothy W Jackson, Peter V Paul, and Jonathan C Smith. 1997. Prior knowledge and reading comprehension ability of deaf adolescents. Journal of Deaf Studies and Deaf Education (1997), 172--184."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018","author":"Kafle Sushant","year":"2018","unstructured":"Sushant Kafle , Matt Huenerfauth . 2018 . A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts . In Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018 , Miyazaki, Japan, May 7 -- May 12, 2018. Sushant Kafle, Matt Huenerfauth. 2018. A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts. In Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7 -- May 12, 2018."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.21437\/SLPAT.2016-4"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132525.3132542"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2543578"},{"key":"e_1_2_1_13_1","volume-title":"INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association","author":"Lei Xin","year":"2013","unstructured":"Xin Lei , Andrew W. Senior , Alexander Gruenstein , and Jeffrey Sorensen . 2013 . Accurate and compact large vocabulary speech recognition on mobile devices . In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association , Lyon, France , August 25-29, 2013. 662--665. http:\/\/www.isca-speech.org\/archive\/interspeech_2013\/i13_0662.html Xin Lei, Andrew W. Senior, Alexander Gruenstein, and Jeffrey Sorensen. 2013. Accurate and compact large vocabulary speech recognition on mobile devices. In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. 662--665. http:\/\/www.isca-speech.org\/archive\/interspeech_2013\/i13_0662.html"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2304637"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1353\/aad.0.0006"},{"key":"e_1_2_1_17_1","volume-title":"Predicting Human Perceived Accuracy of ASR Systems. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association","author":"Mishra Taniya","year":"2011","unstructured":"Taniya Mishra , Andrej Ljolje , and Mazin Gilbert . 2011 . Predicting Human Perceived Accuracy of ASR Systems. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association , Florence, Italy , August 27-31, 2011. 1945--1948. http:\/\/www.isca-speech.org\/archive\/interspeech_2011\/i11_1945.html Taniya Mishra, Andrej Ljolje, and Mazin Gilbert. 2011. Predicting Human Perceived Accuracy of ASR Systems. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011. 1945--1948. http:\/\/www.isca-speech.org\/archive\/interspeech_2011\/i11_1945.html"},{"key":"e_1_2_1_18_1","volume-title":"INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing","author":"Morris Andrew Cameron","year":"2004","unstructured":"Andrew Cameron Morris , Viktoria Maier , and Phil D. Green . 2004. From WER and RIL to MER and WIL:Improved evaluation measures for connected speech recognition . In INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing , Jeju Island, Korea , October 4-8, 2004 . http:\/\/www.isca-speech.org\/archive\/interspeech_2004\/i04_2765.html Andrew Cameron Morris, Viktoria Maier, and Phil D. Green. 2004. From WER and RIL to MER and WIL:Improved evaluation measures for connected speech recognition. In INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004. http:\/\/www.isca-speech.org\/archive\/interspeech_2004\/i04_2765.html"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2005.1415298"},{"key":"e_1_2_1_20_1","volume-title":"Eye movements in reading and information processing: 20 years of research. Psychological bulletin 124, 3","author":"Rayner Keith","year":"1998","unstructured":"Keith Rayner . 1998. Eye movements in reading and information processing: 20 years of research. Psychological bulletin 124, 3 ( 1998 ), 372. Keith Rayner. 1998. Eye movements in reading and information processing: 20 years of research. Psychological bulletin 124, 3 (1998), 372."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661334.2661337"},{"key":"e_1_2_1_22_1","volume-title":"Automatic Speech Recognition and Understanding, 2003. ASRU'03. 2003 IEEE Workshop on. IEEE, 577--582","author":"Wang Ye-Yi","year":"2003","unstructured":"Ye-Yi Wang , Alex Acero , and Ciprian Chelba . 2003 . Is word error rate a good indicator for spoken language understanding accuracy . In Automatic Speech Recognition and Understanding, 2003. ASRU'03. 2003 IEEE Workshop on. IEEE, 577--582 . Ye-Yi Wang, Alex Acero, and Ciprian Chelba. 2003. Is word error rate a good indicator for spoken language understanding accuracy. In Automatic Speech Recognition and Understanding, 2003. ASRU'03. 2003 IEEE Workshop on. IEEE, 577--582."},{"key":"e_1_2_1_23_1","volume-title":"Achieving Human Parity in Conversational Speech Recognition. CoRR abs\/1610.05256","author":"Xiong Wayne","year":"2016","unstructured":"Wayne Xiong , Jasha Droppo , Xuedong Huang , Frank Seide , Mike Seltzer , Andreas Stolcke , Dong Yu , and Geoffrey Zweig . 2016. Achieving Human Parity in Conversational Speech Recognition. CoRR abs\/1610.05256 ( 2016 ). arXiv:1610.05256 http:\/\/arxiv.org\/abs\/1610.05256 Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. 2016. Achieving Human Parity in Conversational Speech Recognition. CoRR abs\/1610.05256 (2016). arXiv:1610.05256 http:\/\/arxiv.org\/abs\/1610.05256"}],"container-title":["ACM SIGACCESS Accessibility and Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3386410.3386411","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3386410.3386411","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:45Z","timestamp":1750200105000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3386410.3386411"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,3]]},"references-count":21,"journal-issue":{"issue":"122","published-print":{"date-parts":[[2020,3,3]]}},"alternative-id":["10.1145\/3386410.3386411"],"URL":"https:\/\/doi.org\/10.1145\/3386410.3386411","relation":{},"ISSN":["1558-2337","1558-1187"],"issn-type":[{"value":"1558-2337","type":"print"},{"value":"1558-1187","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,3]]},"assertion":[{"value":"2020-03-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}