{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T21:37:54Z","timestamp":1773092274626,"version":"3.50.1"},"reference-count":83,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"DOI":"10.13039\/501100006374","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62473356,62373061"],"award-info":[{"award-number":["62473356,62373061"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Beijing Natural Science Foundation","award":["L232028"],"award-info":[{"award-number":["L232028"]}]},{"DOI":"10.13039\/501100006374","name":"National Science and Technology Major Project","doi-asserted-by":"publisher","award":["2022ZD0117904"],"award-info":[{"award-number":["2022ZD0117904"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2025,6,9]]},"abstract":"<jats:p>In the realm of VR\/MR interactions, gestures serve as a critical bridge between users and interfaces, with custom gestures enhancing creativity and providing a personalized immersive experience. We introduce a novel gesture definition and recognition framework that allows users to customize a wide array of gestures by demonstrating them just three times. A major challenge lies in effectively representing gestures computationally. To address this, we have pre-trained a hand posture representation model using a Vector Quantized Variational Autoencoder (VQ-VAE) with a codebook of adaptive size, allowing hand postures defined by 23 joint positions of the hand to be projected into a latent space. In this space, different postures are formed into clusters, and a testing posture can be assigned to a cluster by a specific distance metric. The dynamic gestures are then represented as sequences of discrete hand postures and wrist positions. Employing a straightforward sequence matching algorithm, our framework achieves highly efficient recognition with minimal computational demands. We evaluated this system through a user study that includes 16 pre-defined gestures and 106 user-defined gestures. The results confirm that our system can provide robust real-time gesture recognition and effectively supports the customization of gestures according to user preferences. Our approach surpasses previous methods by enhancing gesture diversity and reducing constraints on gesture customization. Project page: https:\/\/iscas3dv.github.io\/GestureBuilder\/.<\/jats:p>","DOI":"10.1145\/3729484","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:21:56Z","timestamp":1750281716000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Gesture Builder: Flexible Gesture Customization and Efficient Recognition on VR Devices"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-4221-5820","authenticated-orcid":false,"given":"Yang","family":"Zou","sequence":"first","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0160-1329","authenticated-orcid":false,"given":"Yanguang","family":"Wan","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5854-3724","authenticated-orcid":false,"given":"Yonghao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8812-910X","authenticated-orcid":false,"given":"Xiao","family":"Zhou","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2567-1236","authenticated-orcid":false,"given":"Jian","family":"Cheng","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Beijing, China, School of Computer Science and Technology and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3999-7429","authenticated-orcid":false,"given":"Cuixia","family":"Ma","sequence":"additional","affiliation":[{"name":"Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2591-7993","authenticated-orcid":false,"given":"Chun","family":"Yu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8576-1201","authenticated-orcid":false,"given":"Xiaoming","family":"Deng","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9320-4920","authenticated-orcid":false,"given":"Hongan","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2023. Sign Language Alphabets From Around The World. https:\/\/www.ai-media.tv\/knowledge-hub\/insights\/sign-language-alphabets\/."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01916"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2470654.2466158"},{"key":"e_1_2_2_4_1","volume-title":"Proceedings of Graphics Interface","author":"Anthony Lisa","year":"2010","unstructured":"Lisa Anthony and Jacob O Wobbrock. 2010. A lightweight multistroke recognizer for user interface prototypes. In Proceedings of Graphics Interface 2010. 245--252."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332165.3347942"},{"key":"e_1_2_2_6_1","first-page":"1027","volume-title":"SODA","volume":"7","author":"Arthur David","year":"2007","unstructured":"David Arthur, Sergei Vassilvitskii, et al. 2007. k-means++: The advantages of careful seeding. In SODA, Vol. 7.1027-1035."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3131277.3132185"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1080\/10447310802205776"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1449715.1449724"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1936652.1936657"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2166966.2166981"},{"key":"e_1_2_2_12_1","volume-title":"International Scientific-Technical conference MANUFACTURING. Springer, 317--325","author":"Bu\u0144 Pawe\u0142","year":"2022","unstructured":"Pawe\u0142 Bu\u0144, Jozef Hus\u00e1r, and Jakub Ka\u0161\u010dak. 2022. Hand tracking in extended reality educational applications. In International Scientific-Technical conference MANUFACTURING. Springer, 317--325."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35467-0_17"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/VR.2018.8446320"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858589"},{"key":"e_1_2_2_16_1","volume-title":"Computer Graphics Forum","author":"Chen Minchan","unstructured":"Minchan Chen and Manfred Lau. 2021. A Motion-guided Interface for Modeling 3D Multi-functional Furniture. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 229--240."},{"key":"e_1_2_2_17_1","volume-title":"Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv preprint arXiv:1907.08871","author":"Chen Yuxiao","year":"2019","unstructured":"Yuxiao Chen, Long Zhao, Xi Peng, Jianbo Yuan, and Dimitris N Metaxas. 2019. Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv preprint arXiv:1907.08871 (2019)."},{"key":"e_1_2_2_18_1","volume-title":"Proceedings of Workshop on Interacting with Smart Objects. 10--15","author":"Cremonesi Paolo","year":"2015","unstructured":"Paolo Cremonesi, AD Rienzo, and Franca Garzotto. 2015. Personalized interactive public screens. In Proceedings of Workshop on Interacting with Smart Objects. 10--15."},{"key":"e_1_2_2_19_1","unstructured":"Kalyan Das Jiming Jiang and JNK Rao. 2004. Mean squared error of empirical predictor. (2004)."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2016.153"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3159725"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2018.00025"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/985692.985697"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2015.2432679"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196709.3196737"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/IHCI.2012.6481816"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2669485.2669511"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332165.3347916"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2307798.2307804"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3463914.3463921"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2534329.2534366"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983310.2989209"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3131277.3134354"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compind.2013.04.012"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2380116.2380139"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3196158"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556288.2557122"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/VR.2011.5759479"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807442.2807489"},{"key":"e_1_2_2_40_1","volume-title":"User-centred process for the definition of free-hand gestures applied to controlling music playback. Multimedia systems 18","author":"L\u00f6cken Andreas","year":"2012","unstructured":"Andreas L\u00f6cken, Tobias Hesselmann, Martin Pielot, Niels Henze, and Susanne Boll. 2012. User-centred process for the definition of free-hand gestures applied to controlling music playback. Multimedia systems 18 (2012), 15--31."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556288.2557263"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208693"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2016.2590470"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00779-015-0844-1"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3012709.3017602"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174121"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126594.3126604"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2003.09.007"},{"key":"e_1_2_2_49_1","volume-title":"An easily customized gesture recognizer for assisted living using commodity mobile devices. Journal of Healthcare Engineering 2018","author":"Mezari Antigoni","year":"2018","unstructured":"Antigoni Mezari and Ilias Maglogiannis. 2018. An easily customized gesture recognizer for assisted living using commodity mobile devices. Journal of Healthcare Engineering 2018 (2018)."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445766"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.456"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2470654.2466142"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2017.10.033"},{"key":"e_1_2_2_54_1","volume-title":"Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, 889--894","author":"Ong Eng-Jon","year":"2004","unstructured":"Eng-Jon Ong and Richard Bowden. 2004. A boosted classifier tree for hand shape detection. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, 889--894."},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208695"},{"key":"e_1_2_2_56_1","volume-title":"A user-developed 3-D hand gesture set for human-computer interaction. Human factors 57, 4","author":"Pereira Anna","year":"2015","unstructured":"Anna Pereira, Juan P Wachs, Kunwoo Park, and David Rempel. 2015. A user-developed 3-D hand gesture set for human-computer interaction. Human factors 57, 4 (2015), 607--621."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2468356.2468527"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS40897.2019.8967649"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/HRI.2019.8673145"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00963"},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978942.1978971"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/2658779.2658787"},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1080\/10447318.2022.2078464"},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208585"},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298683"},{"key":"e_1_2_2_66_1","volume-title":"Chuo-Ling Chang, and Matthias Grundmann.","author":"Sung George","year":"2021","unstructured":"George Sung, Kanstantsin Sokal, Esha Uboweja, Valentin Bazarevsky, Jonathan Baccash, Eduard Gabriel Bazavan, Chuo-Ling Chang, and Matthias Grundmann. 2021. On-device real-time hand gesture recognition. arXiv preprint arXiv:2111.00038 (2021)."},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300426"},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/2735952"},{"key":"e_1_2_2_69_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems 30","author":"Den Oord Aaron Van","year":"2017","unstructured":"Aaron Van Den Oord, Oriol Vinyals, et al. 2017. Neural discrete representation learning. Proceedings of Advances in Neural Information Processing Systems 30 (2017)."},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025941"},{"key":"e_1_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/2388676.2388732"},{"key":"e_1_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.82"},{"key":"e_1_2_2_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357236.3395511"},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijhcs.2021.102609"},{"key":"e_1_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/2047196.2047269"},{"key":"e_1_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-04221-9_22"},{"key":"e_1_2_2_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9412548"},{"key":"e_1_2_2_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3501904"},{"key":"e_1_2_2_79_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2016.03.030"},{"key":"e_1_2_2_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/2380116.2380137"},{"key":"e_1_2_2_81_1","volume-title":"Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214","author":"Zhang Fan","year":"2020","unstructured":"Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020)."},{"key":"e_1_2_2_82_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00371-020-01955-w"},{"key":"e_1_2_2_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2017.2684186"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729484","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T12:23:07Z","timestamp":1755865387000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729484"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,9]]},"references-count":83,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,9]]}},"alternative-id":["10.1145\/3729484"],"URL":"https:\/\/doi.org\/10.1145\/3729484","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,9]]},"assertion":[{"value":"2025-06-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}