{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:32:09Z","timestamp":1767339129660,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":62,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3394171.3413775","type":"proceedings-article","created":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T12:27:38Z","timestamp":1602505658000},"page":"3136-3145","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":51,"title":["HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation"],"prefix":"10.1145","author":[{"given":"Lin","family":"Huang","sequence":"first","affiliation":[{"name":"University at Buffalo, SUNY, Buffalo, NY, USA"}]},{"given":"Jianchao","family":"Tan","sequence":"additional","affiliation":[{"name":"Kwai Inc., Seattle, WA, USA"}]},{"given":"Jingjing","family":"Meng","sequence":"additional","affiliation":[{"name":"University at Buffalo, SUNY, Buffalo, NY, USA"}]},{"given":"Ji","family":"Liu","sequence":"additional","affiliation":[{"name":"Kwai Inc., Seattle, WA, USA"}]},{"given":"Junsong","family":"Yuan","sequence":"additional","affiliation":[{"name":"University at Buffalo, SUNY, Buffalo, NY, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 ( 2014 ). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33783-3_46"},{"key":"e_1_3_2_2_3_1","volume-title":"Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203","author":"Bruna Joan","year":"2013","unstructured":"Joan Bruna , Wojciech Zaremba , Arthur Szlam , and Yann LeCun . 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 ( 2013 ). Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_41"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00236"},{"key":"e_1_3_2_2_6_1","volume-title":"et almbox","author":"Chang Angel X","year":"2015","unstructured":"Angel X Chang , Thomas Funkhouser , Leonidas Guibas , Pat Hanrahan , Qixing Huang , Zimo Li , Silvio Savarese , Manolis Savva , Shuran Song , Hao Su , et almbox . 2015 . Shapenet : An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015). Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et almbox. 2015. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.339"},{"key":"e_1_3_2_2_8_1","unstructured":"Micha\u00ebl Defferrard Xavier Bresson and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844--3852.  Micha\u00ebl Defferrard Xavier Bresson and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844--3852."},{"key":"e_1_3_2_2_9_1","volume-title":"HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation. arXiv preprint arXiv:2004.00060","author":"Doosti Bardia","year":"2020","unstructured":"Bardia Doosti , Shujon Naha , Majid Mirbagheri , and David Crandall . 2020. HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation. arXiv preprint arXiv:2004.00060 ( 2020 ). Bardia Doosti, Shujon Naha, Majid Mirbagheri, and David Crandall. 2020. HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation. arXiv preprint arXiv:2004.00060 (2020)."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00050"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00878"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.602"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01109"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_29"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2017.58"},{"key":"e_1_3_2_2_17_1","volume-title":"Victor OK Li, and Richard Socher","author":"Gu Jiatao","year":"2017","unstructured":"Jiatao Gu , James Bradbury , Caiming Xiong , Victor OK Li, and Richard Socher . 2017 . Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281 (2017). Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher. 2017. Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281 (2017)."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33013723"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459282"},{"key":"e_1_3_2_2_20_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","volume":"3","author":"Hampali Shreyas","year":"2019","unstructured":"Shreyas Hampali , Markus Oberweger , Mahdi Rad , and Vincent Lepetit . 2019 . Honnotate: A method for 3d annotation of hand and objects poses . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , Vol. 3 . 6. Shreyas Hampali, Markus Oberweger, Mahdi Rad, and Vincent Lepetit. 2019. Honnotate: A method for 3d annotation of hand and objects poses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , Vol. 3. 6."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01208"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_23_1","unstructured":"Yiming He Wei Hu Siyuan Yang Xiaochao Qu Pengfei Wan and Zongming Guo. 2019. 3D Hand Pose Estimation in the Wild via Graph Refinement under Adversarial Learning. arXiv .  Yiming He Wei Hu Siyuan Yang Xiaochao Qu Pengfei Wan and Zongming Guo. 2019. 3D Hand Pose Estimation in the Wild via Graph Refinement under Adversarial Learning. arXiv ."},{"key":"e_1_3_2_2_24_1","volume-title":"Proceedings of the European Conference on Computer Vision .","author":"Iqbal Umar","year":"2018","unstructured":"Umar Iqbal , Pavlo Molchanov , Thomas Breuel Juergen Gall , and Jan Kautz . 2018 . Hand pose estimation via latent 2.5 d heatmap regression . In Proceedings of the European Conference on Computer Vision . Umar Iqbal, Pavlo Molchanov, Thomas Breuel Juergen Gall, and Jan Kautz. 2018. Hand pose estimation via latent 2.5 d heatmap regression. In Proceedings of the European Conference on Computer Vision ."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.169"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.336"},{"key":"e_1_3_2_2_27_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_2_28_1","volume-title":"Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907","author":"Kipf Thomas N","year":"2016","unstructured":"Thomas N Kipf and Max Welling . 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 ( 2016 ). Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01790-3_1"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00013"},{"key":"e_1_3_2_2_31_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision Workshops .","author":"Mueller Franziska","year":"2017","unstructured":"Franziska Mueller , Dushyant Mehta , Oleksandr Sotnychenko , Srinath Sridhar , Dan Casas , and Christian Theobalt . 2017 . Real-time hand tracking under occlusion from an egocentric rgb-d sensor . In Proceedings of the IEEE International Conference on Computer Vision Workshops . Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In Proceedings of the IEEE International Conference on Computer Vision Workshops ."},{"key":"e_1_3_2_2_32_1","unstructured":"Markus Oberweger and Vincent Lepetit. 2017. DeepPrior  Markus Oberweger and Vincent Lepetit. 2017. DeepPrior"},{"volume-title":"Proceedings of the IEEE international conference on computer vision Workshops .","key":"e_1_3_2_2_33_1","unstructured":": Improving Fast and Accurate 3D Hand Pose Estimation . In Proceedings of the IEEE international conference on computer vision Workshops . : Improving Fast and Accurate 3D Hand Pose Estimation. In Proceedings of the IEEE international conference on computer vision Workshops ."},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.379"},{"key":"e_1_3_2_2_35_1","volume-title":"Generalized feedback loop for joint hand-object pose estimation","author":"Oberweger Markus","year":"2019","unstructured":"Markus Oberweger , Paul Wohlhart , and Vincent Lepetit . 2019. Generalized feedback loop for joint hand-object pose estimation . IEEE transactions on pattern analysis and machine intelligence ( 2019 ). Markus Oberweger, Paul Wohlhart, and Vincent Lepetit. 2019. Generalized feedback loop for joint hand-object pose estimation. IEEE transactions on pattern analysis and machine intelligence (2019)."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126483"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.29.123"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00469"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.413"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509753"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130883"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.494"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00017"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_19"},{"key":"e_1_3_2_2_45_1","unstructured":"Zhiqing Sun Zhuohan Li Haoqing Wang Di He Zi Lin and Zhihong Deng. 2019. Fast Structured Decoding for Sequence Models. In Advances in Neural Information Processing Systems. 3011--3020.  Zhiqing Sun Zhuohan Li Haoqing Wang Di He Zi Lin and Zhihong Deng. 2019. Fast Structured Decoding for Sequence Models. In Advances in Neural Information Processing Systems. 3011--3020."},{"key":"e_1_3_2_2_46_1","unstructured":"Bugra Tekin Federica Bogo and Marc Pollefeys. 2019. H  Bugra Tekin Federica Bogo and Marc Pollefeys. 2019. H"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition .","author":"O","key":"e_1_3_2_2_47_1","unstructured":"O : Unified egocentric recognition of 3D hand-object poses and interactions . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . O: Unified egocentric recognition of 3D hand-object poses and interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00038"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2629500"},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_30"},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0895-4"},{"key":"e_1_3_2_2_52_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems .  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems ."},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00540"},{"key":"e_1_3_2_2_54_1","volume-title":"2019 a. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints. arXiv preprint arXiv:1910.10750","author":"Wang Chen","year":"2019","unstructured":"Chen Wang , Roberto Mart'in-Mart'in , Danfei Xu , Jun Lv , Cewu Lu , Li Fei-Fei , Silvio Savarese , and Yuke Zhu . 2019 a. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints. arXiv preprint arXiv:1910.10750 ( 2019 ). Chen Wang, Roberto Mart'in-Mart'in, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, and Yuke Zhu. 2019 a. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints. arXiv preprint arXiv:1910.10750 (2019)."},{"key":"e_1_3_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00275"},{"key":"e_1_3_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33015377"},{"key":"e_1_3_2_2_57_1","volume-title":"Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199","author":"Xiang Yu","year":"2017","unstructured":"Yu Xiang , Tanner Schmidt , Venkatraman Narayanan , and Dieter Fox . 2017 . Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017). Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. 2017. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)."},{"key":"e_1_3_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01011"},{"key":"e_1_3_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00244"},{"key":"e_1_3_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49409-8_17"},{"key":"e_1_3_2_2_61_1","unstructured":"Xingyi Zhou Qingfu Wan Wei Zhang Xiangyang Xue and Yichen Wei. 2016b. Model-based Deep Hand Pose Estimation. In arXiv preprint arXiv:1606.06854 .  Xingyi Zhou Qingfu Wan Wei Zhang Xiangyang Xue and Yichen Wei. 2016b. Model-based Deep Hand Pose Estimation. In arXiv preprint arXiv:1606.06854 ."},{"key":"e_1_3_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.525"}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Seattle WA USA","acronym":"MM '20"},"container-title":["Proceedings of the 28th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413775","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394171.3413775","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:17Z","timestamp":1750197677000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413775"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":62,"alternative-id":["10.1145\/3394171.3413775","10.1145\/3394171"],"URL":"https:\/\/doi.org\/10.1145\/3394171.3413775","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}