{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,30]],"date-time":"2025-08-30T17:17:43Z","timestamp":1756574263043,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,6,21]],"date-time":"2021-06-21T00:00:00Z","timestamp":1624233600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,21]]},"DOI":"10.1145\/3452918.3458792","type":"proceedings-article","created":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T21:29:18Z","timestamp":1624483758000},"page":"96-107","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Evaluating AI assisted subtitling"],"prefix":"10.1145","author":[{"given":"Than Htut","family":"Soe","sequence":"first","affiliation":[{"name":"University of Bergen, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frode","family":"Guribye","sequence":"additional","affiliation":[{"name":"University of Bergen, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marija","family":"Slavkovik","sequence":"additional","affiliation":[{"name":"University of Bergen, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,23]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300233"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0142716412000434"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Julie Brousseau Jean-Francois Beaumont Gilles Boulianne Patrick Cardinal Claude Chapdelaine Michel Comeau Frederic Osterrath and Pierre Ouellet. 2003. Automated Closed-Captioning of Live TV Broadcast News in French. (2003) 5.  Julie Brousseau Jean-Francois Beaumont Gilles Boulianne Patrick Cardinal Claude Chapdelaine Michel Comeau Frederic Osterrath and Pierre Ouellet. 2003. Automated Closed-Captioning of Live TV Broadcast News in French. (2003) 5.","DOI":"10.21437\/Eurospeech.2003-398"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2745197.2745204"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3084289.3089915"},{"key":"e_1_3_2_1_6_1","volume-title":"State-of-the-art Speech Recognition With Sequence-to-Sequence Models. arXiv:1712.01769 [cs, eess, stat] (Dec","author":"Chiu Chung-Cheng","year":"2017","unstructured":"Chung-Cheng Chiu , Tara\u00a0 N. Sainath , Yonghui Wu , Rohit Prabhavalkar , Patrick Nguyen , Zhifeng Chen , Anjuli Kannan , Ron\u00a0 J. Weiss , Kanishka Rao , Ekaterina Gonina , Navdeep Jaitly , Bo Li , Jan Chorowski , and Michiel Bacchiani . 2017. State-of-the-art Speech Recognition With Sequence-to-Sequence Models. arXiv:1712.01769 [cs, eess, stat] (Dec . 2017 ). http:\/\/arxiv.org\/abs\/1712.01769 arXiv:1712.01769. Chung-Cheng Chiu, Tara\u00a0N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron\u00a0J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, and Michiel Bacchiani. 2017. State-of-the-art Speech Recognition With Sequence-to-Sequence Models. arXiv:1712.01769 [cs, eess, stat] (Dec. 2017). http:\/\/arxiv.org\/abs\/1712.01769 arXiv:1712.01769."},{"key":"e_1_3_2_1_7_1","unstructured":"Christopher Cieri David Miller and Kevin Walker. [n.d.]. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text. ([n.\u00a0d.]) 3.  Christopher Cieri David Miller and Kevin Walker. [n.d.]. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text. ([n.\u00a0d.]) 3."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3172944.3172983"},{"key":"e_1_3_2_1_9_1","unstructured":"British\u00a0Broadcasting Corporation.2019. Subtitle Guidelines Version 1.1.8. https:\/\/bbc.github.io\/subtitle-guidelines\/  British\u00a0Broadcasting Corporation.2019. Subtitle Guidelines Version 1.1.8. https:\/\/bbc.github.io\/subtitle-guidelines\/"},{"key":"e_1_3_2_1_10_1","unstructured":"British\u00a0Broadcasting Corporation.2020. How do I create subtitles?http:\/\/www.bbc.co.uk\/guides\/zmgnng8  British\u00a0Broadcasting Corporation.2020. How do I create subtitles?http:\/\/www.bbc.co.uk\/guides\/zmgnng8"},{"key":"e_1_3_2_1_11_1","first-page":"5","article-title":"Machine Learning Paradigms for Speech Recognition","volume":"21","author":"Deng Li","year":"2013","unstructured":"Li Deng and Xiao Li . 2013 . Machine Learning Paradigms for Speech Recognition : An Overview. IEEE Transactions on Audio, Speech, and Language Processing 21 , 5 (May 2013), 1060\u20131089. https:\/\/doi.org\/10.1109\/TASL.2013.2244083 Li Deng and Xiao Li. 2013. Machine Learning Paradigms for Speech Recognition: An Overview. IEEE Transactions on Audio, Speech, and Language Processing 21, 5 (May 2013), 1060\u20131089. https:\/\/doi.org\/10.1109\/TASL.2013.2244083","journal-title":"An Overview. IEEE Transactions on Audio, Speech, and Language Processing"},{"volume-title":"UX Design Innovation: Challenges for Working with Machine Learning as a Design Material","author":"Dove Graham","key":"e_1_3_2_1_12_1","unstructured":"Graham Dove , Kim Halskov , Jodi Forlizzi , and John Zimmerman . 2017. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material . ACM Press , 278\u2013288. https:\/\/doi.org\/10.1145\/3025453.3025739 Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. ACM Press, 278\u2013288. https:\/\/doi.org\/10.1145\/3025453.3025739"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1027\/1016-9040.12.3.196"},{"key":"e_1_3_2_1_14_1","volume-title":"2014. SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling. (May","author":"Pozo","year":"2014","unstructured":"Pozo et al. 2014. SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling. (May 2014 ). Pozo et al.2014. SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling. (May 2014)."},{"key":"e_1_3_2_1_15_1","unstructured":"Jerry\u00a0Alan Fails and Dan\u00a0R Olsen. [n.d.]. Interactive Machine Learning. ([n.\u00a0d.]) 7.  Jerry\u00a0Alan Fails and Dan\u00a0R Olsen. [n.d.]. Interactive Machine Learning. ([n.\u00a0d.]) 7."},{"key":"e_1_3_2_1_16_1","volume-title":"Sequence Transduction with Recurrent Neural Networks. arXiv:1211.3711 [cs, stat] (Nov","author":"Graves Alex","year":"2012","unstructured":"Alex Graves . 2012. Sequence Transduction with Recurrent Neural Networks. arXiv:1211.3711 [cs, stat] (Nov . 2012 ). http:\/\/arxiv.org\/abs\/1211.3711 arXiv:1211.3711. Alex Graves. 2012. Sequence Transduction with Recurrent Neural Networks. arXiv:1211.3711 [cs, stat] (Nov. 2012). http:\/\/arxiv.org\/abs\/1211.3711 arXiv:1211.3711."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/302979.303030"},{"key":"e_1_3_2_1_18_1","volume-title":"On How Users Edit Computer-Generated Visual Stories. arXiv:1902.08327 [cs] (Feb","author":"Hsu Ting-Yao","year":"2019","unstructured":"Ting-Yao Hsu , Yen-Chia Hsu , and Ting- Hao\u00a0\u2019Kenneth\u2019 Huang . 2019. On How Users Edit Computer-Generated Visual Stories. arXiv:1902.08327 [cs] (Feb . 2019 ). http:\/\/arxiv.org\/abs\/1902.08327 arXiv:1902.08327. Ting-Yao Hsu, Yen-Chia Hsu, and Ting-Hao\u00a0\u2019Kenneth\u2019 Huang. 2019. On How Users Edit Computer-Generated Visual Stories. arXiv:1902.08327 [cs] (Feb. 2019). http:\/\/arxiv.org\/abs\/1902.08327 arXiv:1902.08327."},{"key":"e_1_3_2_1_19_1","unstructured":"Chih-wei Huang. [n.d.]. Automatic Closed Caption Alignment Based on Speech Recognition Transcripts. ([n.\u00a0d.]) 14.  Chih-wei Huang. [n.d.]. Automatic Closed Caption Alignment Based on Speech Recognition Transcripts. ([n.\u00a0d.]) 14."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075434.1075480"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858402"},{"key":"e_1_3_2_1_22_1","volume-title":"Transfer Learning for Speech Recognition on a Budget. arXiv:1706.00290 [cs, stat] (June","author":"Kunze Julius","year":"2017","unstructured":"Julius Kunze , Louis Kirsch , Ilia Kurenkov , Andreas Krug , Jens Johannsmeier , and Sebastian Stober . 2017. Transfer Learning for Speech Recognition on a Budget. arXiv:1706.00290 [cs, stat] (June 2017 ). http:\/\/arxiv.org\/abs\/1706.00290 arXiv:1706.00290. Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, and Sebastian Stober. 2017. Transfer Learning for Speech Recognition on a Budget. arXiv:1706.00290 [cs, stat] (June 2017). http:\/\/arxiv.org\/abs\/1706.00290 arXiv:1706.00290."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.9790\/9622-0703022024"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073653"},{"key":"e_1_3_2_1_25_1","unstructured":"Kevin Lenzo. 2014. The CMU pronouncing dictionary. (2014). http:\/\/www.speech.cs.cmu.edu\/cgi-bin\/cmudict  Kevin Lenzo. 2014. The CMU pronouncing dictionary. (2014). http:\/\/www.speech.cs.cmu.edu\/cgi-bin\/cmudict"},{"key":"e_1_3_2_1_27_1","unstructured":"Andrew\u00a0C Morris Viktoria Maier and Phil Green. [n.d.]. From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. ([n.\u00a0d.]) 5.  Andrew\u00a0C Morris Viktoria Maier and Phil Green. [n.d.]. From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. ([n.\u00a0d.]) 5."},{"key":"e_1_3_2_1_28_1","unstructured":"Obach. [n.d.]. Automatic Speech Recognition for Live TV Subtitling for Hearing-Impaired People. http:\/\/ebooks.iospress.nl\/publication\/641  Obach. [n.d.]. Automatic Speech Recognition for Live TV Subtitling for Hearing-Impaired People. http:\/\/ebooks.iospress.nl\/publication\/641"},{"key":"e_1_3_2_1_29_1","volume-title":"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. Interspeech 2019 (Sept","author":"Park S.","year":"2019","unstructured":"Daniel\u00a0 S. Park , William Chan , Yu Zhang , Chung-Cheng Chiu , Barret Zoph , Ekin\u00a0 D. Cubuk , and Quoc\u00a0 V. Le. 2019. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. Interspeech 2019 (Sept . 2019 ), 2613\u20132617. https:\/\/doi.org\/10.21437\/Interspeech.2019-2680 arXiv:1904.08779. Daniel\u00a0S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin\u00a0D. Cubuk, and Quoc\u00a0V. Le. 2019. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. Interspeech 2019 (Sept. 2019), 2613\u20132617. https:\/\/doi.org\/10.21437\/Interspeech.2019-2680 arXiv:1904.08779."},{"key":"e_1_3_2_1_30_1","volume-title":"Very Deep Self-Attention Networks for End-to-End Speech Recognition. arXiv:1904.13377 [cs, eess] (May","author":"Pham Ngoc-Quan","year":"2019","unstructured":"Ngoc-Quan Pham , Thai-Son Nguyen , Jan Niehues , Markus M\u00fcller , Sebastian St\u00fcker , and Alexander Waibel . 2019. Very Deep Self-Attention Networks for End-to-End Speech Recognition. arXiv:1904.13377 [cs, eess] (May 2019 ). http:\/\/arxiv.org\/abs\/1904.13377 arXiv:1904.13377. Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus M\u00fcller, Sebastian St\u00fcker, and Alexander Waibel. 2019. Very Deep Self-Attention Networks for End-to-End Speech Recognition. arXiv:1904.13377 [cs, eess] (May 2019). http:\/\/arxiv.org\/abs\/1904.13377 arXiv:1904.13377."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1080\/0907676X.2012.722651"},{"key":"e_1_3_2_1_32_1","volume-title":"Why Should I Trust You?\u201d: Explaining the Predictions of Any Classifier. arXiv:1602.04938 [cs, stat] (Feb","author":"Ribeiro Marco\u00a0Tulio","year":"2016","unstructured":"Marco\u00a0Tulio Ribeiro , Sameer Singh , and Carlos Guestrin . 2016. \u201d Why Should I Trust You?\u201d: Explaining the Predictions of Any Classifier. arXiv:1602.04938 [cs, stat] (Feb . 2016 ). http:\/\/arxiv.org\/abs\/1602.04938 arXiv:1602.04938. Marco\u00a0Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. \u201dWhy Should I Trust You?\u201d: Explaining the Predictions of Any Classifier. arXiv:1602.04938 [cs, stat] (Feb. 2016). http:\/\/arxiv.org\/abs\/1602.04938 arXiv:1602.04938."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298824"},{"key":"e_1_3_2_1_34_1","unstructured":"Burr Settles. [n.d.]. Active Learning Literature Survey. ([n.\u00a0d.]) 47.  Burr Settles. [n.d.]. Active Learning Literature Survey. ([n.\u00a0d.]) 47."},{"key":"e_1_3_2_1_35_1","unstructured":"Aaron Springer Victoria Hollis and Steve Whittaker. [n.d.]. Dice in the Black Box: User Experiences with an Inscrutable Algorithm. ([n.\u00a0d.]) 4.  Aaron Springer Victoria Hollis and Steve Whittaker. [n.d.]. Dice in the Black Box: User Experiences with an Inscrutable Algorithm. ([n.\u00a0d.]) 4."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/371127.371166"},{"key":"e_1_3_2_1_37_1","unstructured":"Vincent Vandeghinste and Yi Pan. [n.d.]. Sentence Compression for Automated Subtitling: A Hybrid Approach. ([n.\u00a0d.]) 7.  Vincent Vandeghinste and Yi Pan. [n.d.]. Sentence Compression for Automated Subtitling: A Hybrid Approach. ([n.\u00a0d.]) 7."},{"volume-title":"Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI \u201908","author":"Vertanen Keith","key":"e_1_3_2_1_38_1","unstructured":"Keith Vertanen and Per\u00a0Ola Kristensson . 2008. On the benefits of confidence visualization in speech recognition . In Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI \u201908 . ACM Press , Florence, Italy , 1497. https:\/\/doi.org\/10.1145\/1357054.1357288 Keith Vertanen and Per\u00a0Ola Kristensson. 2008. On the benefits of confidence visualization in speech recognition. In Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI \u201908. ACM Press, Florence, Italy, 1497. https:\/\/doi.org\/10.1145\/1357054.1357288"},{"key":"e_1_3_2_1_39_1","volume-title":"Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai.","author":"Watanabe Shinji","year":"2018","unstructured":"Shinji Watanabe , Takaaki Hori , Shigeki Karita , Tomoki Hayashi , Jiro Nishitoba , Yuya Unno , Nelson Enrique\u00a0Yalta Soplin , Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai. 2018 . ESPnet: End-to- End Speech Processing Toolkit . arXiv:1804.00015 [cs] (March 2018). http:\/\/arxiv.org\/abs\/1804.00015 arXiv:1804.00015. Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique\u00a0Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai. 2018. ESPnet: End-to-End Speech Processing Toolkit. arXiv:1804.00015 [cs] (March 2018). http:\/\/arxiv.org\/abs\/1804.00015 arXiv:1804.00015."},{"key":"e_1_3_2_1_40_1","volume-title":"The Human Kernel. arXiv:1510.07389 [cs, stat] (Oct","author":"Wilson Andrew\u00a0Gordon","year":"2015","unstructured":"Andrew\u00a0Gordon Wilson , Christoph Dann , Christopher\u00a0 G. Lucas , and Eric\u00a0 P. Xing . 2015. The Human Kernel. arXiv:1510.07389 [cs, stat] (Oct . 2015 ). http:\/\/arxiv.org\/abs\/1510.07389 arXiv:1510.07389. Andrew\u00a0Gordon Wilson, Christoph Dann, Christopher\u00a0G. Lucas, and Eric\u00a0P. Xing. 2015. The Human Kernel. arXiv:1510.07389 [cs, stat] (Oct. 2015). http:\/\/arxiv.org\/abs\/1510.07389 arXiv:1510.07389."},{"key":"e_1_3_2_1_41_1","unstructured":"Qian Yang. [n.d.]. The Role of Design in Creating Machine-Learning-Enhanced User Experience. ([n.\u00a0d.]) 6.  Qian Yang. [n.d.]. The Role of Design in Creating Machine-Learning-Enhanced User Experience. ([n.\u00a0d.]) 6."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-015-2794-z"}],"event":{"name":"IMX '21: ACM International Conference on Interactive Media Experiences","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGMM ACM Special Interest Group on Multimedia","SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Virtual Event USA","acronym":"IMX '21"},"container-title":["ACM International Conference on Interactive Media Experiences"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452918.3458792","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3452918.3458792","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:06Z","timestamp":1750193226000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452918.3458792"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,21]]},"references-count":41,"alternative-id":["10.1145\/3452918.3458792","10.1145\/3452918"],"URL":"https:\/\/doi.org\/10.1145\/3452918.3458792","relation":{},"subject":[],"published":{"date-parts":[[2021,6,21]]},"assertion":[{"value":"2021-06-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}