{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,8]],"date-time":"2026-06-08T23:16:05Z","timestamp":1780960565216,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":50,"publisher":"ACM","funder":[{"name":"the Research Council of Norway","award":["326907"],"award-info":[{"award-number":["326907"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,3,23]]},"DOI":"10.1145\/3742413.3789074","type":"proceedings-article","created":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T11:32:24Z","timestamp":1772537544000},"page":"1742-1759","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["\"Same Voice, Different Language\": An Exploration of Voice-Cloned Translation to Support Non-Native Speakers in Online Meetings"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8398-4118","authenticated-orcid":false,"given":"Yong","family":"Ma","sequence":"first","affiliation":[{"name":"University of Bergen, Bergen, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1804-6296","authenticated-orcid":false,"given":"Yuchong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Division of Robotics, Perception and Learning, KTH Royal Institute of Technology, Stockholm, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2853-2569","authenticated-orcid":false,"given":"Peter","family":"Andrews","sequence":"additional","affiliation":[{"name":"MediaFutures, t2i lab, University of Bergen, Bergen, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5300-047X","authenticated-orcid":false,"given":"Zhikun","family":"Wu","sequence":"additional","affiliation":[{"name":"Division of Media and Information Technology, Link\u00f6ping University, Norrk\u00f6ping, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2459-1949","authenticated-orcid":false,"given":"Stephanie","family":"Zubicueta Portales","sequence":"additional","affiliation":[{"name":"Norwegian University of Science and Technology, Trondheim, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9562-5147","authenticated-orcid":false,"given":"Morten","family":"Fjeld","sequence":"additional","affiliation":[{"name":"MediaFutures, t2i Lab, University of Bergen, Bergen, Norway and t2i lab, CSE, Chalmers University of Technology, Gothenburg, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,3,22]]},"reference":[{"key":"e_1_3_3_3_2_2","doi-asserted-by":"crossref","unstructured":"2025. Understanding voice naturalness. Trends in Cognitive Sciences 29 5 (2025) 467\u2013480.","DOI":"10.1016\/j.tics.2025.01.010"},{"key":"e_1_3_3_3_3_2","volume-title":"Advances in Neural Information Processing Systems","author":"Arik Sercan","year":"2018","unstructured":"Sercan Arik, Jitong Chen, Kainan Peng, Wei Ping, and Yanqi Zhou. 2018. Neural Voice Cloning with a Few Samples. In Advances in Neural Information Processing Systems , S.\u00a0Bengio, H.\u00a0Wallach, H.\u00a0Larochelle, K.\u00a0Grauman, N.\u00a0Cesa-Bianchi, and R.\u00a0Garnett (Eds.), Vol.\u00a031. Curran Associates, Inc."},{"key":"e_1_3_3_3_4_2","unstructured":"Lo\u00efc Barrault Yu-An Chung Mariano\u00a0Coria Meglioli David Dale Ning Dong Mark Duppenthaler Paul-Ambroise Duquenne Brian Ellis Hady Elsahar Justin Haaheim et\u00a0al. 2023. Seamless: Multilingual Expressive and Streaming Speech Translation."},{"key":"e_1_3_3_3_5_2","doi-asserted-by":"crossref","unstructured":"Dennis Becker Lukas Braach Lennart Clasmeier Teresa Kaufmann Oskar Ong Kyra Ahrens Connor G\u00e4de Erik Strahl Di Fu and Stefan Wermter. 2025. Influence of Robots\u2019 Voice Naturalness on Trust and Compliance. J. Hum.-Robot Interact. 14 2 Article 29 (Jan. 2025) 25\u00a0pages.","DOI":"10.1145\/3706066"},{"key":"e_1_3_3_3_6_2","doi-asserted-by":"crossref","unstructured":"Nancy\u00a0H. Brinson Steven Holiday and Jessica\u00a0L. George. 2024. Response to Advertising Delivered by Voice Assistants: The Mediating Role of Persuasion Knowledge Perceived Control Social Presence and Privacy Concerns. Journal of Interactive Advertising 24 4 (2024) 344\u2013367.","DOI":"10.1080\/15252019.2024.2391381"},{"key":"e_1_3_3_3_7_2","unstructured":"Leo Chadburn. 2023. Captions characters self-portraits: compositional approaches to the disembodied speaking voice and the voice-text-music relationship. Ph.\u00a0D. Dissertation. City University of London."},{"key":"e_1_3_3_3_8_2","unstructured":"Sanyuan Chen Shujie LIU Long Zhou Eric Liu Xu Tan Jinyu Li sheng zhao Yao Qian and Furu Wei. 2025. VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers. https:\/\/openreview.net\/forum?id=0bcRCD7YUx"},{"key":"e_1_3_3_3_9_2","unstructured":"Sanyuan Chen Shujie Liu Long Zhou Yanqing Liu Xu Tan Jinyu Li Sheng Zhao Yao Qian and Furu Wei. 2024. Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.05370 (2024)."},{"key":"e_1_3_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i16.29733"},{"key":"e_1_3_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2531602.2531702"},{"key":"e_1_3_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2675133.2675197"},{"key":"e_1_3_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702498"},{"key":"e_1_3_3_3_14_2","unstructured":"Genesis\u00a0Gregorious Genelza. 2024. A systematic literature review on AI voice cloning generator: A game-changer or a threat?Journal of Emerging Technologies 4 2 (2024) 54\u201361."},{"key":"e_1_3_3_3_15_2","doi-asserted-by":"crossref","unstructured":"Morton\u00a0Ann Gernsbacher. 2015. Video captions benefit everyone. Policy insights from the behavioral and brain sciences 2 1 (2015) 195\u2013202.","DOI":"10.1177\/2372732215602130"},{"key":"e_1_3_3_3_16_2","doi-asserted-by":"crossref","unstructured":"Marion Hersh Barbara Leporini and Marina Buzzi. 2024. A comparative study of disabled people\u2019s experiences with the video conferencing tools Zoom MS Teams Google Meet and Skype. Behaviour & Information Technology 43 15 (2024) 3777\u20133796.","DOI":"10.1080\/0144929X.2023.2286533"},{"key":"e_1_3_3_3_17_2","first-page":"10120","volume-title":"International conference on machine learning","author":"Jia Ye","year":"2022","unstructured":"Ye Jia, Michelle\u00a0Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. 2022. Translatotron 2: High-quality direct speech-to-speech translation with voice preservation. In International conference on machine learning. PMLR, 10120\u201310134."},{"key":"e_1_3_3_3_18_2","unstructured":"Ye Jia Yu Zhang Ron Weiss Quan Wang Jonathan Shen Fei Ren Patrick Nguyen Ruoming Pang Ignacio Lopez\u00a0Moreno Yonghui Wu et\u00a0al. 2018. Transfer learning from speaker verification to multispeaker text-to-speech synthesis. Advances in neural information processing systems 31 (2018)."},{"key":"e_1_3_3_3_19_2","first-page":"2348","volume-title":"Interspeech 2022","author":"Cotescu Kamil Deja and Ariadna Sanchez and Julian Roth and Marius","year":"2022","unstructured":"Kamil Deja and Ariadna Sanchez and Julian Roth and Marius Cotescu. 2022. Automatic Evaluation of Speaker Similarity. In Interspeech 2022. International Speech Communication Association (ISCA), 2348\u20132352."},{"key":"e_1_3_3_3_20_2","doi-asserted-by":"crossref","unstructured":"Victor Kenji Anthony\u00a0J. Lee Daria Altenburg David\u00a0R. Feinberg and Benedict\u00a0C. Jones. 2022. The Role of Valence Dominance and Pitch in Perceptions of Artificial Intelligence (AI) Conversational Agents\u2019 Voices. Scientific Reports 12 1 (2022) 22479.","DOI":"10.1038\/s41598-022-27124-8"},{"key":"e_1_3_3_3_21_2","doi-asserted-by":"crossref","unstructured":"Katharina K\u00fchne Martin\u00a0H. Fischer and Yuefang Zhou. 2020. The Human Takes It All: Humanlike Synthesized Voices Are Perceived as Less Eerie and More Likable. Evidence From a Subjective Ratings Study. Frontiers in Neurorobotics 14 (2020).","DOI":"10.3389\/fnbot.2020.593732"},{"key":"e_1_3_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2461121.2461142"},{"key":"e_1_3_3_3_23_2","doi-asserted-by":"crossref","unstructured":"Eun-Ju Lee. 2003. Effects of \u201cgender\u201d of the computer on informational social influence: the moderating role of task type. International Journal of Human-Computer Studies 58 4 (2003) 347\u2013362.","DOI":"10.1016\/S1071-5819(03)00009-0"},{"key":"e_1_3_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/633292.633461"},{"key":"e_1_3_3_3_25_2","doi-asserted-by":"crossref","unstructured":"Sheng Li Zhiqiang Tao Kang Li and Yun Fu. 2019. Visual to text: Survey of image and video captioning. IEEE Transactions on Emerging Topics in Computational Intelligence 3 4 (2019) 297\u2013312.","DOI":"10.1109\/TETCI.2019.2892755"},{"key":"e_1_3_3_3_26_2","doi-asserted-by":"crossref","unstructured":"Marco Matassoni Seraphina Fong and Alessio Brutti. 2024. Speaker Anonymization: Disentangling Speaker Features from Pre-Trained Speech Embeddings for Voice Conversion. Applied Sciences 14 9 (2024).","DOI":"10.3390\/app14093876"},{"key":"e_1_3_3_3_27_2","doi-asserted-by":"crossref","unstructured":"Liz McCarron. 2021. Creating accessible videos: Captions and transcripts. Communications of the Association for Information Systems 48 1 (2021) 19.","DOI":"10.17705\/1CAIS.04819"},{"key":"e_1_3_3_3_28_2","doi-asserted-by":"crossref","unstructured":"Oksana Novytska Hlib Romanchuk Oleksii Vorobets Uliana Zhornokui Liubov Slyvka and Valerii Bohdan. 2025. Translation of Subtitles: Neurolinguistic and Cognitive Aspects. BRAIN. Broad Research in Artificial Intelligence and Neuroscience 16 1 (2025) 229\u2013242.","DOI":"10.70594\/brain\/16.1\/17"},{"key":"e_1_3_3_3_29_2","doi-asserted-by":"crossref","unstructured":"Gary\u00a0M Olson Judith\u00a0S Olson Mark\u00a0R Carter and Marianne Storrosten. 1992. Small group design meetings: An analysis of collaboration. Human\u2013Computer Interaction 7 4 (1992) 347\u2013374.","DOI":"10.1207\/s15327051hci0704_1"},{"key":"e_1_3_3_3_30_2","doi-asserted-by":"crossref","unstructured":"Laura Orynbay Bibigul Razakhova Peter Peer Bla\u017e Meden and \u017diga Emer\u0161i\u010d. 2024. Recent advances in synthesis and interaction of speech text and vision. Electronics 13 9 (2024) 1726.","DOI":"10.3390\/electronics13091726"},{"key":"e_1_3_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.745"},{"key":"e_1_3_3_3_32_2","first-page":"349","volume-title":"International Conference on Information Technology","author":"Patel Abhijeet\u00a0Kumar","year":"2024","unstructured":"Abhijeet\u00a0Kumar Patel, Hardik Madnani, Sambhav Tripathi, Purushottam Sharma, and Vinod\u00a0Kumar Shukla. 2024. Real-Time Voice Cloning: Artificial Intelligence to Clone and Generate Human Voice. In International Conference on Information Technology. Springer, 349\u2013364."},{"key":"e_1_3_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3640794.3665545"},{"key":"e_1_3_3_3_34_2","doi-asserted-by":"crossref","unstructured":"Victor Rosi Emma Soopramanien and Carolyn McGettigan. 2025. Perception and social evaluation of cloned and recorded voices: Effects of familiarity and self-relevance. Computers in Human Behavior: Artificial Humans 4 (2025) 100143.","DOI":"10.1016\/j.chbah.2025.100143"},{"key":"e_1_3_3_3_35_2","unstructured":"Mohammad Sarim Saim Shakeel Laeeba Javed Mohammad Nadeem et\u00a0al. 2025. Direct Speech to Speech Translation: A Review. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.04799 (2025)."},{"key":"e_1_3_3_3_36_2","doi-asserted-by":"crossref","unstructured":"Dave Sayers Rui Sousa-Silva Sviatlana H\u00f6hn Lule Ahmedi Kais Allkivi-Metsoja Dimitra Anastasiou \u0160tefan Be\u0148u\u0161 Lynne Bowker Eliot Byty\u00e7i Alejandro Catala et\u00a0al. 2021. The Dawn of the Human-Machine Era: A forecast of new and emerging language technologies. (2021).","DOI":"10.17011\/jyx\/reports\/20210518\/1"},{"key":"e_1_3_3_3_37_2","unstructured":"Scott Schanke Gordon Burtch and Gautam Ray. 2024. Digital lyrebirds: Experimental evidence that voice-based deep fakes influence trust. Management Science (2024)."},{"key":"e_1_3_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3501843"},{"key":"e_1_3_3_3_39_2","doi-asserted-by":"crossref","unstructured":"Clay Spinuzzi. 2012. Working alone together: Coworking as emergent collaborative activity. Journal of business and technical communication 26 4 (2012) 399\u2013441.","DOI":"10.1177\/1050651912444070"},{"key":"e_1_3_3_3_40_2","unstructured":"Giselle Spiteri\u00a0Miggiani. 2024. Quality assessment tools for studio and AI-generated dubs and voice-overs. (2024)."},{"key":"e_1_3_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/1958824.1958860"},{"key":"e_1_3_3_3_42_2","unstructured":"Sunday\u00a0David Ubur. 2025. Augmenting Captions with Emotional Cues: An AR Interface for Real-Time Accessible Communication. arxiv:https:\/\/arXiv.org\/abs\/2504.17171\u00a0[cs.HC] https:\/\/arxiv.org\/abs\/2504.17171"},{"key":"e_1_3_3_3_43_2","unstructured":"Vimal\u00a0Kumar Vishwakarma. 2023. Translating cultural nuances: Challenges and strategies. ELT Voices 13 2 (2023) 8268531."},{"key":"e_1_3_3_3_44_2","unstructured":"Haldun Vural. 2025. TRANSLATION-FOCUSED TECHNOLOGICAL COMPETENCE: TRADITION AND INNOVATION. Cumhuriyet \u00dcniversitesi Fen-Edebiyat Fak\u00fcltesi Sosyal Bilimler Dergisi 49 1 (2025) 85\u201395."},{"key":"e_1_3_3_3_45_2","unstructured":"Chengyi Wang Sanyuan Chen Yu Wu Ziqiang Zhang Long Zhou Shujie Liu Zhuo Chen Yanqing Liu Huaming Wang Jinyu Li et\u00a0al. 2023. Neural codec language models are zero-shot text to speech synthesizers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2301.02111 (2023)."},{"key":"e_1_3_3_3_46_2","unstructured":"Mark West Rebecca Kraut and Han Ei\u00a0Chew. 2019. I\u2019d blush if I could: closing gender divides in digital skills through education. (2019) 1\u2013145."},{"key":"e_1_3_3_3_47_2","doi-asserted-by":"crossref","unstructured":"Kevin\u00a0JP Woods Max\u00a0H Siegel James Traer and Josh\u00a0H McDermott. 2017. Headphone screening to facilitate web-based auditory experiments. Attention Perception & Psychophysics 79 7 (2017) 2064\u20132072.","DOI":"10.3758\/s13414-017-1361-2"},{"key":"e_1_3_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/1180875.1180955"},{"key":"e_1_3_3_3_49_2","doi-asserted-by":"crossref","unstructured":"Taojie Yin. 2025. Has the use of AI-translated live captions in simultaneous interpreting changed the role of the interpreter? A study based on professional interpreters\u2019 perceptions. The Translator 31 2 (2025) 214\u2013231.","DOI":"10.1080\/13556509.2024.2412923"},{"key":"e_1_3_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411763.3451837"},{"key":"e_1_3_3_3_51_2","unstructured":"Ziqiang Zhang Long Zhou Chengyi Wang Yu Wu Shujie Liu Zhuo Chen Yanqing Liu Huaming Wang Jinyu Li Lei He Sheng Zhao and Furu Wei. 2023. Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling."}],"event":{"name":"IUI '26: 31st International Conference on Intelligent User Interfaces","location":"Paphos Cyprus","acronym":"IUI '26","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction","SIGAI ACM Special Interest Group on Artificial Intelligence"]},"container-title":["Proceedings of the 31st International Conference on Intelligent User Interfaces"],"original-title":[],"deposited":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T13:00:29Z","timestamp":1773493229000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3742413.3789074"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,22]]},"references-count":50,"alternative-id":["10.1145\/3742413.3789074","10.1145\/3742413"],"URL":"https:\/\/doi.org\/10.1145\/3742413.3789074","relation":{},"subject":[],"published":{"date-parts":[[2026,3,22]]},"assertion":[{"value":"2026-03-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}