{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T02:24:13Z","timestamp":1772591053210,"version":"3.50.1"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"CSCW2","license":[{"start":{"date-parts":[[2022,11,7]],"date-time":"2022-11-07T00:00:00Z","timestamp":1667779200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Hong Kong Research Grants Council - ECS","award":["CityU 21209419"],"award-info":[{"award-number":["CityU 21209419"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2022,11,7]]},"abstract":"<jats:p>Voice dictation is increasingly used for text entry, especially in mobile scenarios. However, the speech-based experience gets disrupted when users must go back to a screen and keyboard to review and edit the text. While existing dictation systems focus on improving transcription and error correction, little is known about how to support speech input for the entire text creation process, including composition, reviewing and editing. We conducted an experiment in which ten pairs of participants took on the roles of authors and typists to work on a text authoring task. By analysing the natural language patterns of both authors and typists, we identified new challenges and opportunities for the design of future dictation interfaces, including the ambiguity of human dictation, the differences between audio-only and with screen, and various passive and active assistance that can potentially be provided by future systems.<\/jats:p>","DOI":"10.1145\/3555758","type":"journal-article","created":{"date-parts":[[2022,11,11]],"date-time":"2022-11-11T22:59:06Z","timestamp":1668207546000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text Authoring"],"prefix":"10.1145","volume":"6","author":[{"given":"Can","family":"Liu","sequence":"first","affiliation":[{"name":"City University of Hong Kong, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siying","family":"Hu","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Li","family":"Feng","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingming","family":"Fan","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11,11]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1216295.1216339"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.429"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2513383.2513440"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1037\/0894-4105.8.4.485"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/632716.632859"},{"key":"e_1_2_2_6_1","volume-title":"Proceedings of Participatory Design Conference (PDC","author":"Brandt Eva","year":"2000","unstructured":"Eva Brandt and Camilla Grunnet . 2000 . Evoking the future: Drama and props in user centered design . In Proceedings of Participatory Design Conference (PDC 2000). 11--20. Eva Brandt and Camilla Grunnet. 2000. Evoking the future: Drama and props in user centered design. In Proceedings of Participatory Design Conference (PDC 2000). 11--20."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2994310.2994315"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1097\/NNA.0b013e3181ae94f8"},{"key":"e_1_2_2_9_1","volume-title":"State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4774--4778","author":"Chiu C.","year":"2018","unstructured":"C. Chiu , T. N. Sainath , Y. Wu , R. Prabhavalkar , P. Nguyen , Z. Chen , A. Kannan , R. J. Weiss , K. Rao , E. Gonina , N. Jaitly , B. Li , J. Chorowski , and M. Bacchiani . 2018 . State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4774--4778 . https:\/\/doi.org\/10.1109\/ICASSP. 2018 .8462105 10.1109\/ICASSP.2018.8462105 C. Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. J. Weiss, K. Rao, E. Gonina, N. Jaitly, B. Li, J. Chorowski, and M. Bacchiani. 2018. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4774--4778. https:\/\/doi.org\/10.1109\/ICASSP.2018.8462105"},{"key":"e_1_2_2_10_1","volume-title":"Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics","author":"Mark","unstructured":"Mark G. Core and Lenhart K. Schubert. 1999. A Syntactic Framework for Speech Repairs and Other Disruptions . In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics ( College Park, Maryland) (ACL '99). Association for Computational Linguistics, USA, 413--420. https: \/\/doi.org\/10.3115\/1034678.1034742 10.3115\/1034678.1034742 Mark G. Core and Lenhart K. Schubert. 1999. A Syntactic Framework for Speech Repairs and Other Disruptions. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (College Park, Maryland) (ACL '99). Association for Computational Linguistics, USA, 413--420. https: \/\/doi.org\/10.3115\/1034678.1034742"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.2307\/1511284"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472749.3474795"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376861"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/358916.358947"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173977"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376173"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376173"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3390889"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3390889"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2012.2205597"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/572020.572022"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242587.3242611"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/302979.303160"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/302979.303160"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICKS.2007.13"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053893"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208386"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300727"},{"key":"e_1_2_2_29_1","volume-title":"Steering the conversation: a linguistic exploration of natural language interactions with a digital assistant during simulated driving. Applied ergonomics 63","author":"Large David R","year":"2017","unstructured":"David R Large , Leigh Clark , Annie Quandt , Gary Burnett , and Lee Skrypchuk . 2017. Steering the conversation: a linguistic exploration of natural language interactions with a digital assistant during simulated driving. Applied ergonomics 63 ( 2017 ), 53--61. David R Large, Leigh Clark, Annie Quandt, Gary Burnett, and Lee Skrypchuk. 2017. Steering the conversation: a linguistic exploration of natural language interactions with a digital assistant during simulated driving. Applied ergonomics 63 (2017), 53--61."},{"key":"e_1_2_2_30_1","volume-title":"Producing spoken language. The neurocognition of language","author":"Levelt W","year":"1999","unstructured":"W Levelt . 1999. Producing spoken language. The neurocognition of language ( 1999 ), 83--122. W Levelt. 1999. Producing spoken language. The neurocognition of language (1999), 83--122."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.878255"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPC.2003.819640"},{"key":"e_1_2_2_33_1","volume-title":"Use of role-play and gamification in a software project course. In 2017 IEEE frontiers in education conference (FIE)","author":"Maxim Bruce R","unstructured":"Bruce R Maxim , Stein Brunvand , and Adrienne Decker . 2017. Use of role-play and gamification in a software project course. In 2017 IEEE frontiers in education conference (FIE) . IEEE , 1--5. Bruce R Maxim, Stein Brunvand, and Adrienne Decker. 2017. Use of role-play and gamification in a software project course. In 2017 IEEE frontiers in education conference (FIE). IEEE, 1--5."},{"key":"e_1_2_2_34_1","volume-title":"Role play in HCI studies. HCI Educators 2009-playing with our education","author":"Moroz-Lapin Kristina","year":"2009","unstructured":"Kristina Moroz-Lapin . 2009. Role play in HCI studies. HCI Educators 2009-playing with our education ( 2009 ), 64--67. Kristina Moroz-Lapin. 2009. Role play in HCI studies. HCI Educators 2009-playing with our education (2009), 64--67."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.3115\/981574.981581"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858169"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-86"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10278-007-9039-2"},{"key":"e_1_2_2_39_1","volume-title":"Proceedings of the Hawaii International Conference on System Sciences","volume":"25","author":"Posner Ilona R","year":"1993","unstructured":"Ilona R Posner , Ronald M Baecker , and M Mantei . 1993 . How people write together . In Proceedings of the Hawaii International Conference on System Sciences , Vol. 25 . IEEE INSTITUTE OF ELECTRICAL AND ELECTRONICS, 127--127. Ilona R Posner, Ronald M Baecker, and M Mantei. 1993. How people write together. In Proceedings of the Hawaii International Conference on System Sciences, Vol. 25. IEEE INSTITUTE OF ELECTRICAL AND ELECTRONICS, 127--127."},{"key":"e_1_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Mark Riedl Jonathan Rowe and David K Elson. 2008. Toward intelligent support of authoring machinima media content: story and visualization. (2008).  Mark Riedl Jonathan Rowe and David K Elson. 2008. Toward intelligent support of authoring machinima media content: story and visualization. (2008).","DOI":"10.4108\/ICST.INTETAIN2008.2473"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11023-010-9210-2"},{"key":"e_1_2_2_42_1","doi-asserted-by":"crossref","unstructured":"Hasim Sak Andrew Senior and Fran\u00e7oise Beaufays. 2014. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. arXiv:1402.1128 [cs.NE]  Hasim Sak Andrew Senior and Fran\u00e7oise Beaufays. 2014. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. arXiv:1402.1128 [cs.NE]","DOI":"10.21437\/Interspeech.2014-80"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1093\/iwc\/iwu016"},{"key":"e_1_2_2_44_1","volume-title":"speech-based navigation during dictation: Difficulties, consequences, and solutions. Human-computer interaction 18, 3","author":"Sears Andrew","year":"2003","unstructured":"Andrew Sears , Jinhuan Feng , Kwesi Oseitutu , and Claire-Marie Karat . 2003. Hands-free , speech-based navigation during dictation: Difficulties, consequences, and solutions. Human-computer interaction 18, 3 ( 2003 ), 229--257. Andrew Sears, Jinhuan Feng, Kwesi Oseitutu, and Claire-Marie Karat. 2003. Hands-free, speech-based navigation during dictation: Difficulties, consequences, and solutions. Human-computer interaction 18, 3 (2003), 229--257."},{"key":"e_1_2_2_45_1","volume-title":"satisfac- tion, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the information Society 1, 1","author":"Sears Andrew","year":"2001","unstructured":"Andrew Sears , Clare-Marie Karat , Kwesi Oseitutu , Azfar Karimullah , and Jinjuan Feng . 2001. Productivity , satisfac- tion, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the information Society 1, 1 ( 2001 ), 4--15. Andrew Sears, Clare-Marie Karat, Kwesi Oseitutu, Azfar Karimullah, and Jinjuan Feng. 2001. Productivity, satisfac- tion, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the information Society 1, 1 (2001), 4--15."},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1182475.1182499"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/765891.766123"},{"key":"e_1_2_2_48_1","volume-title":"Applied conversation analysis","author":"Stokoe Elizabeth","unstructured":"Elizabeth Stokoe . 2011. Simulated interaction and communication skills training: The ?conversation-analytic role-play method '. In Applied conversation analysis . Springer , 119--139. Elizabeth Stokoe. 2011. Simulated interaction and communication skills training: The ?conversation-analytic role-play method'. In Applied conversation analysis. Springer, 119--139."},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/371127.371166"},{"key":"e_1_2_2_50_1","volume-title":"Multimodal error correction for speech user interfaces. ACM transactions on computer-human interaction (TOCHI) 8, 1","author":"Suhm Bernhard","year":"2001","unstructured":"Bernhard Suhm , Brad Myers , and Alex Waibel . 2001. Multimodal error correction for speech user interfaces. ACM transactions on computer-human interaction (TOCHI) 8, 1 ( 2001 ), 60--98. Bernhard Suhm, Brad Myers, and Alex Waibel. 2001. Multimodal error correction for speech user interfaces. ACM transactions on computer-human interaction (TOCHI) 8, 1 (2001), 60--98."},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.21437\/Eurospeech.1997-473"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/985692.985753"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858108"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0007125000121415"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1177\/1460458209337429"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-73105-4_29"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3134742"},{"key":"e_1_2_2_58_1","volume-title":"Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones. In The 34th Annual ACM Symposium on User Interface Software and Technology. 162--178","author":"Zhao Maozheng","year":"2021","unstructured":"Maozheng Zhao , Wenzhe Cui , IV Ramakrishnan , Shumin Zhai , and Xiaojun Bi . 2021 . Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones. In The 34th Annual ACM Symposium on User Interface Software and Technology. 162--178 . Maozheng Zhao, Wenzhe Cui, IV Ramakrishnan, Shumin Zhai, and Xiaojun Bi. 2021. Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones. In The 34th Annual ACM Symposium on User Interface Software and Technology. 162--178."}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555758","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3555758","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:19Z","timestamp":1750182559000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555758"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,7]]},"references-count":58,"journal-issue":{"issue":"CSCW2","published-print":{"date-parts":[[2022,11,7]]}},"alternative-id":["10.1145\/3555758"],"URL":"https:\/\/doi.org\/10.1145\/3555758","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,7]]},"assertion":[{"value":"2022-11-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}