{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T22:47:54Z","timestamp":1779317274704,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":56,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,6,21]],"date-time":"2021-06-21T00:00:00Z","timestamp":1624233600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,21]]},"DOI":"10.1145\/3452918.3458806","type":"proceedings-article","created":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T21:29:18Z","timestamp":1624483758000},"page":"144-155","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality"],"prefix":"10.1145","author":[{"given":"Daniele","family":"Giunchi","sequence":"first","affiliation":[{"name":"Computer Science University College London, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alejandro","family":"Sztrajman","sequence":"additional","affiliation":[{"name":"Computer Science University College London, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stuart","family":"James","sequence":"additional","affiliation":[{"name":"Visual Geometry and Modelling Lab Istituto Italiano di Tecnologia, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anthony","family":"Steed","sequence":"additional","affiliation":[{"name":"Department of Computer Science University College London, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,23]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of the 4th Eurographics Workshop on Sketch-based Interfaces and Modeling","author":"Adler A.","unstructured":"A. Adler and R. Davis . 2007. Speech and Sketching: An Empirical Study of Multimodal Interaction . In Proceedings of the 4th Eurographics Workshop on Sketch-based Interfaces and Modeling ( Riverside, California) (SBIM \u201907). ACM, New York, NY, USA, 83\u201390. https:\/\/doi.org\/10.1145\/1384429.1384449 A. Adler and R. Davis. 2007. Speech and Sketching: An Empirical Study of Multimodal Interaction. In Proceedings of the 4th Eurographics Workshop on Sketch-based Interfaces and Modeling (Riverside, California) (SBIM \u201907). ACM, New York, NY, USA, 83\u201390. https:\/\/doi.org\/10.1145\/1384429.1384449"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281500.1281525"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2009.4960723"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3145534"},{"key":"e_1_3_2_1_5_1","volume-title":"Industry use of virtual reality in product design and manufacturing: a survey. Virtual Reality 21 (09","author":"Berg Leif","year":"2016","unstructured":"Leif Berg and Judy Vance . 2016. Industry use of virtual reality in product design and manufacturing: a survey. Virtual Reality 21 (09 2016 ). https:\/\/doi.org\/10.1007\/s10055-016-0293-9 Leif Berg and Judy Vance. 2016. Industry use of virtual reality in product design and manufacturing: a survey. Virtual Reality 21 (09 2016). https:\/\/doi.org\/10.1007\/s10055-016-0293-9"},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the International Workshop on Information Presentation and Natural Multimodal Dialogue","author":"Bernsen Niels\u00a0Ole","year":"2001","unstructured":"Niels\u00a0Ole Bernsen and Laila Dybkjaer . 2001 . Exploring Natural Interaction in the Car . In Proceedings of the International Workshop on Information Presentation and Natural Multimodal Dialogue . University of Southern Denmark, Denmark, 75\u201379. Niels\u00a0Ole Bernsen and Laila Dybkjaer. 2001. Exploring Natural Interaction in the Car. In Proceedings of the International Workshop on Information Presentation and Natural Multimodal Dialogue. University of Southern Denmark, Denmark, 75\u201379."},{"key":"e_1_3_2_1_7_1","volume-title":"Retrieved","author":"GRAPHISOFT","year":"2020","unstructured":"GRAPHISOFT ( 2020 ). Retrieved October 2020 from https:\/\/graphisoft.com\/solutions\/products\/bimx. GRAPHISOFT. GRAPHISOFT (2020). Retrieved October 2020 from https:\/\/graphisoft.com\/solutions\/products\/bimx. GRAPHISOFT."},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of the 21st International Jont Conference on Artifical Intelligence","author":"Bischel David","year":"2009","unstructured":"David Bischel , Thomas Stahovich , Eric Peterson , Randall Davis , and Aaron Adler . 2009 . Combining Speech and Sketch to Interpret Unconstrained Descriptions of Mechanical Devices . In Proceedings of the 21st International Jont Conference on Artifical Intelligence ( Pasadena, California, USA) (IJCAI\u201909). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1401\u20131406. http:\/\/dl.acm.org\/citation.cfm?id=1661445.1661670 David Bischel, Thomas Stahovich, Eric Peterson, Randall Davis, and Aaron Adler. 2009. Combining Speech and Sketch to Interpret Unconstrained Descriptions of Mechanical Devices. In Proceedings of the 21st International Jont Conference on Artifical Intelligence (Pasadena, California, USA) (IJCAI\u201909). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1401\u20131406. http:\/\/dl.acm.org\/citation.cfm?id=1661445.1661670"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/800250.807503"},{"key":"e_1_3_2_1_10_1","unstructured":"LWJ Boves and EA Den\u00a0Os. 2002. MUST-Multimodal and multilingual services for small mobile terminals.  LWJ Boves and EA Den\u00a0Os. 2002. MUST-Multimodal and multilingual services for small mobile terminals."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cag.2017.12.006"},{"key":"e_1_3_2_1_12_1","volume-title":"SI (March","author":"Cohen R.","year":"1989","unstructured":"P.\u00a0 R. Cohen , M. Dalrymple , D.\u00a0 B. Moran , F.\u00a0 C. Pereira , and J.\u00a0 W. Sullivan . 1989. Synergistic Use of Direct Manipulation and Natural Language. SIGCHI Bull. 20 , SI (March 1989 ), 227\u2013233. https:\/\/doi.org\/10.1145\/67450.67494 P.\u00a0R. Cohen, M. Dalrymple, D.\u00a0B. Moran, F.\u00a0C. Pereira, and J.\u00a0W. Sullivan. 1989. Synergistic Use of Direct Manipulation and Natural Language. SIGCHI Bull. 20, SI (March 1989), 227\u2013233. https:\/\/doi.org\/10.1145\/67450.67494"},{"key":"e_1_3_2_1_13_1","volume-title":"QuickSet: Multimodal Interaction for Distributed Applications. In Fifth ACM International Conference on Multimedia","author":"Cohen R.","year":"1997","unstructured":"Philip\u00a0 R. Cohen , Michael Johnston , David McGee , Sharon Oviatt , Jay Pittman , Ira Smith , Liang Chen , and Josh Clow . 1997 . QuickSet: Multimodal Interaction for Distributed Applications. In Fifth ACM International Conference on Multimedia ( Seattle, Washington, USA) (MULTIMEDIA \u201997). ACM, New York, NY, USA, 31\u201340. https:\/\/doi.org\/10.1145\/266180.266328 Philip\u00a0R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen, and Josh Clow. 1997. QuickSet: Multimodal Interaction for Distributed Applications. In Fifth ACM International Conference on Multimedia (Seattle, Washington, USA) (MULTIMEDIA \u201997). ACM, New York, NY, USA, 31\u201340. https:\/\/doi.org\/10.1145\/266180.266328"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376628"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185540"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385956.3418953"},{"key":"e_1_3_2_1_17_1","volume-title":"Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA \u201919)","author":"Gasques Danilo","unstructured":"Danilo Gasques , Janet\u00a0 G. Johnson , Tommy Sharkey , and Nadir Weibel . 2019. What You Sketch Is What You Get: Quick and Easy Augmented Reality Prototyping with PintAR . In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA \u201919) . Association for Computing Machinery , New York, NY, USA , 1\u20136. https:\/\/doi.org\/10.1145\/3290607.3312847 Danilo Gasques, Janet\u00a0G. Johnson, Tommy Sharkey, and Nadir Weibel. 2019. What You Sketch Is What You Get: Quick and Easy Augmented Reality Prototyping with PintAR. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA \u201919). Association for Computing Machinery, New York, NY, USA, 1\u20136. https:\/\/doi.org\/10.1145\/3290607.3312847"},{"key":"e_1_3_2_1_18_1","volume-title":"Mixing Realities for Sketch Retrieval in Virtual Reality. In The 17th International Conference on Virtual-Reality Continuum and Its Applications in Industry (Brisbane, QLD, Australia) (VRCAI \u201919)","author":"Giunchi Daniele","year":"2019","unstructured":"Daniele Giunchi , Stuart James , Donald Degraen , and Anthony Steed . 2019 . Mixing Realities for Sketch Retrieval in Virtual Reality. In The 17th International Conference on Virtual-Reality Continuum and Its Applications in Industry (Brisbane, QLD, Australia) (VRCAI \u201919) . ACM, New York, NY, USA, Article 50, 2\u00a0pages. https:\/\/doi.org\/10.1145\/3359997.3365751 Daniele Giunchi, Stuart James, Donald Degraen, and Anthony Steed. 2019. Mixing Realities for Sketch Retrieval in Virtual Reality. In The 17th International Conference on Virtual-Reality Continuum and Its Applications in Industry (Brisbane, QLD, Australia) (VRCAI \u201919). ACM, New York, NY, USA, Article 50, 2\u00a0pages. https:\/\/doi.org\/10.1145\/3359997.3365751"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3229147.3229166"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3229147.3229166"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"D. Giunchi S. James and A. Steed. 2018. Model Retrieval by 3D Sketching in Immersive Virtual Reality. In 2018 IEEE VR. IEEE Tuebingen\/Reutlingen Germany 559\u2013560.  D. Giunchi S. James and A. Steed. 2018. Model Retrieval by 3D Sketching in Immersive Virtual Reality. In 2018 IEEE VR. IEEE Tuebingen\/Reutlingen Germany 559\u2013560.","DOI":"10.1109\/VR.2018.8446609"},{"key":"e_1_3_2_1_22_1","volume-title":"Retrieved","author":"Google","year":"2020","unstructured":"Google ( 2020 ). Retrieved October 2020 from https:\/\/www.tiltbrush.com\/. Google . Google (2020). Retrieved October 2020 from https:\/\/www.tiltbrush.com\/. Google."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2007-183"},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the ACL-08: HLT Workshop on Mobile Language Processing. Association for Computational Linguistics","author":"Gruenstein Alexander","year":"2008","unstructured":"Alexander Gruenstein , Bo- June\u00a0Paul Hsu , James Glass , Stephanie Seneff , Lee Hetherington , Scott Cyphers , Ibrahim Badr , Chao Wang , and Sean Liu . 2008 . A Multimodal Home Entertainment Interface via a Mobile Device . In Proceedings of the ACL-08: HLT Workshop on Mobile Language Processing. Association for Computational Linguistics , Columbus, Ohio, 1\u20139. https:\/\/www.aclweb.org\/anthology\/W08-0801 Alexander Gruenstein, Bo-June\u00a0Paul Hsu, James Glass, Stephanie Seneff, Lee Hetherington, Scott Cyphers, Ibrahim Badr, Chao Wang, and Sean Liu. 2008. A Multimodal Home Entertainment Interface via a Mobile Device. In Proceedings of the ACL-08: HLT Workshop on Mobile Language Processing. Association for Computational Linguistics, Columbus, Ohio, 1\u20139. https:\/\/www.aclweb.org\/anthology\/W08-0801"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004.IEEE","author":"Hansen L.","year":"2004","unstructured":"J.\u00a0H.\u00a0 L. Hansen , Rongqing Huang , P. Mangalath , Bowen Zhou , M. Seadle , and J.\u00a0 R. Deller . 2004 . SPEECHFIND: spoken document retrieval for a national gallery of the spoken word . In Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004.IEEE , Espoo, Finland, 1\u20134. J.\u00a0H.\u00a0L. Hansen, Rongqing Huang, P. Mangalath, Bowen Zhou, M. Seadle, and J.\u00a0R. Deller. 2004. SPEECHFIND: spoken document retrieval for a national gallery of the spoken word. In Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004.IEEE, Espoo, Finland, 1\u20134."},{"key":"e_1_3_2_1_26_1","volume-title":"Retrieved","author":"HTC","year":"2020","unstructured":"HTC ( 2020 ). Retrieved October 2020 from https:\/\/www.vive.com\/. HTC. HTC (2020). Retrieved October 2020 from https:\/\/www.vive.com\/. HTC."},{"key":"e_1_3_2_1_27_1","volume-title":"Score normalization in multimodal biometric systems. Pattern recognition 38, 12","author":"Jain Anil","year":"2005","unstructured":"Anil Jain , Karthik Nandakumar , and Arun Ross . 2005. Score normalization in multimodal biometric systems. Pattern recognition 38, 12 ( 2005 ), 2270\u20132285. Anil Jain, Karthik Nandakumar, and Arun Ross. 2005. Score normalization in multimodal biometric systems. Pattern recognition 38, 12 (2005), 2270\u20132285."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2668904.2668940"},{"key":"e_1_3_2_1_29_1","volume-title":"Cooperative Interfaces to Information Systems","author":"Janas M.","unstructured":"J\u00fcrgen\u00a0 M. Janas . 1986. The Semantics-Based Natural Language Interface to Relational Databases . In Cooperative Interfaces to Information Systems , Leonard Bolc and Matthias Jarke (Eds.). Springer Berlin Heidelberg, Berlin , Heidelberg , 143\u2013188. J\u00fcrgen\u00a0M. Janas. 1986. The Semantics-Based Natural Language Interface to Relational Databases. In Cooperative Interfaces to Information Systems, Leonard Bolc and Matthias Jarke (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 143\u2013188."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1006\/jmca.1993.1022"},{"key":"e_1_3_2_1_31_1","volume-title":"Olivier Pietquin, Stefanos Zafeiriou, and Bj\u00f6rn Schuller.","author":"Keren Gil","year":"2018","unstructured":"Gil Keren , Amr El-Desoky Mousa , Olivier Pietquin, Stefanos Zafeiriou, and Bj\u00f6rn Schuller. 2018 . Deep Learning for Multisensorial and Multimodal Interaction. Association for Computing Machinery and Morgan & Claypool , NY, USA, 99\u2013128. https:\/\/doi.org\/10.1145\/3107990.3107996 Gil Keren, Amr El-Desoky Mousa, Olivier Pietquin, Stefanos Zafeiriou, and Bj\u00f6rn Schuller. 2018. Deep Learning for Multisensorial and Multimodal Interaction. Association for Computing Machinery and Morgan & Claypool, NY, USA, 99\u2013128. https:\/\/doi.org\/10.1145\/3107990.3107996"},{"key":"e_1_3_2_1_32_1","unstructured":"Shinya Kikuchi and Partha Chakroborty. 1992. Car-following model based on fuzzy inference system. 82\u201382\u00a0pages.  Shinya Kikuchi and Partha Chakroborty. 1992. Car-following model based on fuzzy inference system. 82\u201382\u00a0pages."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964922"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376160"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/3056462.3056474"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2594519"},{"key":"e_1_3_2_1_37_1","volume-title":"Advances in Human Factors and Systems Interaction, Isabel\u00a0L","author":"L\u00f3pez Gustavo","unstructured":"Gustavo L\u00f3pez , Luis Quesada , and Luis\u00a0 A. Guerrero . 2018. Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces . In Advances in Human Factors and Systems Interaction, Isabel\u00a0L . Nunes (Ed.). Springer International Publishing , Cham , 241\u2013250. Gustavo L\u00f3pez, Luis Quesada, and Luis\u00a0A. Guerrero. 2018. Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. In Advances in Human Factors and Systems Interaction, Isabel\u00a0L. Nunes (Ed.). Springer International Publishing, Cham, 241\u2013250."},{"key":"e_1_3_2_1_38_1","unstructured":"Scott McGlashan. 1995. Speech interfaces to virtual reality.  Scott McGlashan. 1995. Speech interfaces to virtual reality."},{"key":"e_1_3_2_1_39_1","unstructured":"Scott McGlashan and Tomas Axling. 1996. A speech interface to virtual environments.  Scott McGlashan and Tomas Axling. 1996. A speech interface to virtual environments."},{"key":"e_1_3_2_1_40_1","volume-title":"Retrieved","author":"Facebook","year":"2020","unstructured":"Facebook ( 2020 ). Retrieved October 2020 from https:\/\/www.oculus.com\/. Facebook . Facebook (2020). Retrieved October 2020 from https:\/\/www.oculus.com\/. Facebook."},{"key":"e_1_3_2_1_41_1","volume-title":"Retrieved","author":"Oneirosvr","year":"2020","unstructured":"Oneirosvr ( 2020 ). Retrieved October 2020 from https:\/\/oneirosvr.com. Oneirosvr . Oneirosvr (2020). Retrieved October 2020 from https:\/\/oneirosvr.com. Oneirosvr."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.1998.711845"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343055.3359718"},{"key":"e_1_3_2_1_44_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition . In 3rd International Conference on Learning Representations, ICLR 2015 , San Diego, CA , USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). open publishing, San Diego, USA , arXiv:1409.1556. http:\/\/arxiv.org\/abs\/1409.1556 Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). open publishing, San Diego, USA, arXiv:1409.1556. http:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.114"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379337.3415892"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2006.55"},{"key":"e_1_3_2_1_48_1","volume-title":"Tulshan and Sudhir\u00a0Namdeorao Dhage","author":"S.","year":"2019","unstructured":"Amrita\u00a0 S. Tulshan and Sudhir\u00a0Namdeorao Dhage . 2019 . Survey on Virtual Assistant : Google Assistant, Siri, Cortana, Alexa. In Advances in Signal Processing and Intelligent Recognition Systems, Sabu\u00a0M. Thampi, Oge Marques, Sri Krishnan, Kuan-Ching Li, Domenico Ciuonzo, and Maheshkumar\u00a0H. Kolekar (Eds.). Springer Singapore , Singapore, 190\u2013201. Amrita\u00a0S. Tulshan and Sudhir\u00a0Namdeorao Dhage. 2019. Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa. In Advances in Signal Processing and Intelligent Recognition Systems, Sabu\u00a0M. Thampi, Oge Marques, Sri Krishnan, Kuan-Ching Li, Domenico Ciuonzo, and Maheshkumar\u00a0H. Kolekar (Eds.). Springer Singapore, Singapore, 190\u2013201."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324908004920"},{"key":"e_1_3_2_1_50_1","volume-title":"Retrieved","author":"Legend Pixel","year":"2020","unstructured":"Pixel Legend ( 2020 ). Retrieved October 2020 from https:\/\/virtualist.app\/. Pixel Legend . Pixel Legend (2020). Retrieved October 2020 from https:\/\/virtualist.app\/. Pixel Legend."},{"key":"e_1_3_2_1_51_1","volume-title":"1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.\u00a06. IEEE, IEEE","author":"Vo Minh\u00a0Tue","year":"1996","unstructured":"Minh\u00a0Tue Vo and Cindy Wood . 1996 . Building an application framework for speech and pen input integration in multimodal learning interfaces . In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.\u00a06. IEEE, IEEE , Atlanta, USA, 3545\u20133548. Minh\u00a0Tue Vo and Cindy Wood. 1996. Building an application framework for speech and pen input integration in multimodal learning interfaces. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.\u00a06. IEEE, IEEE, Atlanta, USA, 3545\u20133548."},{"key":"e_1_3_2_1_52_1","volume-title":"SmartKom: foundations of multimodal dialogue systems. Vol.\u00a012","author":"Wahlster Wolfgang","unstructured":"Wolfgang Wahlster . 2006. SmartKom: foundations of multimodal dialogue systems. Vol.\u00a012 . Springer , germany. Wolfgang Wahlster. 2006. SmartKom: foundations of multimodal dialogue systems. Vol.\u00a012. Springer, germany."},{"key":"e_1_3_2_1_53_1","volume-title":"CVPR. IEEE Computer Society","author":"Wu Zhirong","year":"2015","unstructured":"Zhirong Wu , Shuran Song , Aditya Khosla , Fisher Yu , Linguang Zhang , Xiaoou Tang , and Jianxiong Xiao . 2015 . 3D ShapeNets: A deep representation for volumetric shapes . In CVPR. IEEE Computer Society , Boston, USA , 1912\u20131920. Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In CVPR. IEEE Computer Society, Boston, USA, 1912\u20131920."},{"key":"e_1_3_2_1_54_1","volume-title":"PartNet: A Recursive Part Decomposition Network for Fine-Grained and Hierarchical Shape Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019","author":"Yu Fenggen","year":"2019","unstructured":"Fenggen Yu , Kun Liu , Yan Zhang , Chenyang Zhu , and Kai Xu . 2019 . PartNet: A Recursive Part Decomposition Network for Fine-Grained and Hierarchical Shape Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019 , Long Beach, CA, USA , June 16-20, 2019. Computer Vision Foundation \/ IEEE, Long Beach, USA, 9491\u20139500. https:\/\/doi.org\/10.1109\/CVPR.2019.00972 Fenggen Yu, Kun Liu, Yan Zhang, Chenyang Zhu, and Kai Xu. 2019. PartNet: A Recursive Part Decomposition Network for Fine-Grained and Hierarchical Shape Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation \/ IEEE, Long Beach, USA, 9491\u20139500. https:\/\/doi.org\/10.1109\/CVPR.2019.00972"},{"key":"e_1_3_2_1_55_1","volume-title":"Article arXiv:2010.10504 (Oct.","author":"Zhang Yu","year":"2020","unstructured":"Yu Zhang , James Qin , Daniel\u00a0 S. Park , Wei Han , Chung-Cheng Chiu , Ruoming Pang , Quoc\u00a0 V. Le , and Yonghui Wu. 2020. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition. arXiv e-prints arXiv:2010.10504, arXiv:2010.10504 , Article arXiv:2010.10504 (Oct. 2020 ), arXiv::2010.10504\u00a0pages. arxiv:2010.10504\u00a0[eess.AS] Yu Zhang, James Qin, Daniel\u00a0S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc\u00a0V. Le, and Yonghui Wu. 2020. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition. arXiv e-prints arXiv:2010.10504, arXiv:2010.10504, Article arXiv:2010.10504 (Oct. 2020), arXiv::2010.10504\u00a0pages. arxiv:2010.10504\u00a0[eess.AS]"},{"key":"e_1_3_2_1_56_1","volume-title":"Intelligent Information Processing II","author":"Zhou Lina","unstructured":"Lina Zhou , Mohammedammar Shaikh , and Dongsong Zhang . 2005. Natural Language Interface to Mobile Devices . In Intelligent Information Processing II , Zhongzhi Shi and Qing He (Eds.). Springer US , Boston, MA , 283\u2013286. Lina Zhou, Mohammedammar Shaikh, and Dongsong Zhang. 2005. Natural Language Interface to Mobile Devices. In Intelligent Information Processing II, Zhongzhi Shi and Qing He (Eds.). Springer US, Boston, MA, 283\u2013286."}],"event":{"name":"IMX '21: ACM International Conference on Interactive Media Experiences","location":"Virtual Event USA","acronym":"IMX '21","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGMM ACM Special Interest Group on Multimedia","SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["ACM International Conference on Interactive Media Experiences"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452918.3458806","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3452918.3458806","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:06Z","timestamp":1750193226000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452918.3458806"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,21]]},"references-count":56,"alternative-id":["10.1145\/3452918.3458806","10.1145\/3452918"],"URL":"https:\/\/doi.org\/10.1145\/3452918.3458806","relation":{},"subject":[],"published":{"date-parts":[[2021,6,21]]},"assertion":[{"value":"2021-06-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}