{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:09:16Z","timestamp":1750219756045,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,12,7]],"date-time":"2023-12-07T00:00:00Z","timestamp":1701907200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62302254"],"award-info":[{"award-number":["62302254"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Sen. Netw."],"published-print":{"date-parts":[[2024,1,31]]},"abstract":"<jats:p>\n            Most existing multi-user Augmented Reality (AR) systems only support multiple co-located users to view a common set of virtual objects but lack the ability to enable each user to directly interact with other users appearing in his\/her view. Such multi-user AR systems should be able to detect the human keypoints and estimate device poses (for identifying different users) in the meantime. However, due to the stringent low latency requirements and the intensive computation of the preceding two capabilities, previous research only enables either of the two capabilities for mobile devices even with the aid of the edge server. Integrating the two capabilities is promising but non-trivial in terms of latency, accuracy, and matching. To fill this gap, we propose\n            <jats:italic>DiTing<\/jats:italic>\n            to achieve real-time ID-aware multi-device visual interaction for multi-user AR applications, which contains three key innovations:\n            <jats:italic>Shared On-device Tracking<\/jats:italic>\n            to merge the similar computation for optimized latency,\n            <jats:italic>Tightly Coupled Dual Pipeline<\/jats:italic>\n            to enhance the accuracy of each task through mutual assistance, and\n            <jats:italic>Body Affinity Particle Filter<\/jats:italic>\n            to precisely match device poses with human bodies. We implement\n            <jats:italic>DiTing<\/jats:italic>\n            on four types of mobile AR devices and develop a multi-user AR game as a case study. Extensive experiments show that\n            <jats:italic>DiTing<\/jats:italic>\n            can provide high-quality human keypoint detection and pose estimation in real time (30fps) for ID-aware multi-device interaction and outperform the state-of-the-art baseline approaches.\n          <\/jats:p>","DOI":"10.1145\/3623638","type":"journal-article","created":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T14:57:56Z","timestamp":1697122676000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Multi-User Mobile Augmented Reality with ID-Aware Visual Interaction"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3468-9403","authenticated-orcid":false,"given":"Xinjun","family":"Cai","sequence":"first","affiliation":[{"name":"Tsinghua University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4048-2684","authenticated-orcid":false,"given":"Zheng","family":"Yang","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4748-9491","authenticated-orcid":false,"given":"Liang","family":"Dong","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4196-4434","authenticated-orcid":false,"given":"Qiang","family":"Ma","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5442-7486","authenticated-orcid":false,"given":"Xin","family":"Miao","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7970-1827","authenticated-orcid":false,"given":"Zhuo","family":"Liu","sequence":"additional","affiliation":[{"name":"Remarkable Universal Network Company, China"}]}],"member":"320","published-online":{"date-parts":[[2023,12,7]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"5167","article-title":"PoseTrack: A benchmark for human pose estimation and tracking","author":"Andriluka Mykhaylo","year":"2018","unstructured":"Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele. 2018. PoseTrack: A benchmark for human pose estimation and tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5167\u20135176.","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"e_1_3_1_3_2","first-page":"325","article-title":"Edge-SLAM: Edge-assisted visual simultaneous localization and mapping","author":"Ali Ali J. Ben","year":"2020","unstructured":"Ali J. Ben Ali, Zakieh Sadat Hashemifar, and Karthik Dantu. 2020. Edge-SLAM: Edge-assisted visual simultaneous localization and mapping. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services. 325\u2013337.","journal-title":"Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services"},{"key":"e_1_3_1_4_2","first-page":"1","article-title":"Brick: A synchronous multiplayer augmented reality game for mobile phones","author":"Bhattacharyya Po","year":"2019","unstructured":"Po Bhattacharyya, Yein Jo, Ketki Jadhav, Radha Nath, and Jessica Hammer. 2019. Brick: A synchronous multiplayer augmented reality game for mobile phones. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1\u20134.","journal-title":"Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1088\/1757-899X\/966\/1\/012121"},{"key":"e_1_3_1_6_2","first-page":"82","article-title":"A distributed multi robot SLAM system for environment learning","author":"Chellali Ryad","year":"2013","unstructured":"Ryad Chellali and Ryad Chellali. 2013. A distributed multi robot SLAM system for environment learning. In Proceedings of the 2013 IEEE Workshop on Robotic Intelligence in Informationally Structured Space (RiiSS \u201913). IEEE, Los Alamitos, CA, 82\u201388.","journal-title":"Proceedings of the 2013 IEEE Workshop on Robotic Intelligence in Informationally Structured Space (RiiSS \u201913)"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3274783.3274834"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2809695.2809711"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs14133010"},{"key":"e_1_3_1_10_2","article-title":"AdaScale: Towards real-time video object detection using adaptive scaling","author":"Chin Ting-Wu","year":"2019","unstructured":"Ting-Wu Chin, Ruizhou Ding, and Diana Marculescu. 2019. AdaScale: Towards real-time video object detection using adaptive scaling. arXiv preprint arXiv:1902.02910 (2019).","journal-title":"arXiv preprint arXiv:1902.02910"},{"key":"e_1_3_1_11_2","unstructured":"Chrispolo. 2019. Keypoints-of-humanpose-with-Mask-R-CNN. Retrieved October 21 2023 from https:\/\/github.com\/chrispolo\/keypoints-of-humanpose-with-Mask-R-CNN"},{"key":"e_1_3_1_12_2","first-page":"1004","article-title":"An online multi-robot SLAM system for 3D LiDARs","author":"Dub\u00e9 Renaud","year":"2017","unstructured":"Renaud Dub\u00e9, Abel Gawel, Hannes Sommer, Juan Nieto, Roland Siegwart, and Cesar Cadena. 2017. An online multi-robot SLAM system for 3D LiDARs. In Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS \u201917). IEEE, Los Alamitos, CA, 1004\u20131011.","journal-title":"Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS \u201917)"},{"key":"e_1_3_1_13_2","unstructured":"Glenn Ekaputra Charles Lim and Kho I. Eng. 2013. Minecraft: A game as an education and scientific learning tool. In Proceedings of the Information Systems International Conference (ISICO \u201913)."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3349614.3356023"},{"key":"e_1_3_1_15_2","unstructured":"Jakob Engel Vladlen Koltun and Daniel Cremers. 2016. Direct sparse odometry. arXiv:1607.02565[cs.CV] (2016)."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2014.6906584"},{"key":"e_1_3_1_17_2","article-title":"Are we ready for autonomous driving? The KITTI Vision benchmark suite","author":"Geiger Andreas","year":"2012","unstructured":"Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI Vision benchmark suite. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR \u201912).","journal-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR \u201912)"},{"key":"e_1_3_1_18_2","first-page":"1","article-title":"DeepFaceAR: Deep face recognition and displaying personal information via augmented reality","author":"Golnari Amin","year":"2020","unstructured":"Amin Golnari, Hossein Khosravi, and Saeid Sanei. 2020. DeepFaceAR: Deep face recognition and displaying personal information via augmented reality. In Proceedings of the 2020 International Conference on Machine Vision and Image Processing (MVIP \u201920). IEEE, Los Alamitos, CA, 1\u20137.","journal-title":"Proceedings of the 2020 International Conference on Machine Vision and Image Processing (MVIP \u201920)"},{"key":"e_1_3_1_19_2","first-page":"1423","article-title":"Dynamic adaptive DNN surgery for inference acceleration on the edge","author":"Hu Chuang","year":"2019","unstructured":"Chuang Hu, Wei Bao, Dan Wang, and Fengming Liu. 2019. Dynamic adaptive DNN surgery for inference acceleration on the edge. In Proceedings of the IEEE Conference on Computer Communications (IEEE INFOCOM \u201919). IEEE, Los Alamitos, CA, 1423\u20131431.","journal-title":"Proceedings of the IEEE Conference on Computer Communications (IEEE INFOCOM \u201919)"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","unstructured":"Si Ying Diana Hu and Niniane Wang. 2018. Multiplayer augmented reality: The future is social presented by Niantic. InProceedings of the ACM SIGGRAPH 2018 Conference on Virtual Augmented and Mixed Reality (SIGGRAPH \u201918). ACM New York NY Article 21 1 page. 10.1145\/3226552.3226585","DOI":"10.1145\/3226552.3226585"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM42981.2021.9488846"},{"key":"e_1_3_1_22_2","first-page":"1","article-title":"Towards video streaming analysis and sharing for multi-device interaction with lightweight DNNs","author":"Huang Yakun","year":"2021","unstructured":"Yakun Huang, Hongru Zhao, Xiuquan Qiao, Jian Tang, and Ling Liu. 2021. Towards video streaming analysis and sharing for multi-device interaction with lightweight DNNs. In Proceedings of the IEEE Conference on Computer Communications(IEEE INFOCOM \u201921). IEEE, Los Alamitos, CA, 1\u201310.","journal-title":"Proceedings of the IEEE Conference on Computer Communications"},{"key":"e_1_3_1_23_2","first-page":"V02BT02A010","article-title":"Digital Twin: Collaborative virtual reality environment for multi-purpose industrial applications","volume":"84492","author":"Kuts Vladimir","year":"2020","unstructured":"Vladimir Kuts, Tauno Otto, Yevhen Bondarenko, and Fei Yu. 2020. Digital Twin: Collaborative virtual reality environment for multi-purpose industrial applications. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, Vol. 84492. V02BT02A010.","journal-title":"Proceedings of the ASME International Mechanical Engineering Congress and Exposition"},{"key":"e_1_3_1_24_2","first-page":"508","article-title":"Exploring an augmented reality social learning game for elementary school students","author":"Li Jingya","year":"2020","unstructured":"Jingya Li, Erik D. Van der Spek, Xiaoyu Yu, Jun Hu, and Loe Feijs. 2020. Exploring an augmented reality social learning game for elementary school students. In Proceedings of the Interaction Design and Children Conference. 508\u2013518.","journal-title":"Proceedings of the Interaction Design and Children Conference"},{"key":"e_1_3_1_25_2","unstructured":"Tsung-Yi Lin Michael Maire Serge Belongie Lubomir Bourdev Ross Girshick James Hays Pietro Perona Deva Ramanan C. Lawrence Zitnick and Piotr Doll\u00e1r. 2015. Microsoft COCO: Common objects in context. arxiv:1405.0312 [cs.CV] (2015)."},{"key":"e_1_3_1_26_2","first-page":"1","article-title":"Edge assisted real-time object detection for mobile augmented reality","author":"Liu Luyang","year":"2019","unstructured":"Luyang Liu, Hongyu Li, and Marco Gruteser. 2019. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of the 25th Annual International Conference on Mobile Computing and Networking. 1\u201316.","journal-title":"Proceedings of the 25th Annual International Conference on Mobile Computing and Networking"},{"key":"e_1_3_1_27_2","first-page":"1","article-title":"DARE: Dynamic adaptive mobile augmented reality with edge computing","author":"Liu Qiang","year":"2018","unstructured":"Qiang Liu and Tao Han. 2018. DARE: Dynamic adaptive mobile augmented reality with edge computing. In Proceedings of the 2018 IEEE 26th International Conference on Network Protocols (ICNP \u201918). IEEE, Los Alamitos, CA, 1\u201311.","journal-title":"Proceedings of the 2018 IEEE 26th International Conference on Network Protocols (ICNP \u201918)"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2015.2463671"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-02762-9_10"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025871"},{"key":"e_1_3_1_31_2","first-page":"1421","article-title":"DeepDecision: A mobile deep learning framework for edge video analytics","author":"Ran Xukan","year":"2018","unstructured":"Xukan Ran, Haolianz Chen, Xiaodan Zhu, Zhenming Liu, and Jiasi Chen. 2018. DeepDecision: A mobile deep learning framework for edge video analytics. In Proceedings of the IEEE Conference on Computer Communications (IEEE INFOCOM \u201918). IEEE, Los Alamitos, CA, 1421\u20131429.","journal-title":"Proceedings of the IEEE Conference on Computer Communications (IEEE INFOCOM \u201918)"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3386367.3431312"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989445"},{"key":"e_1_3_1_34_2","first-page":"4104","article-title":"Structure-from-motion revisited","author":"Schonberger Johannes L.","year":"2016","unstructured":"Johannes L. Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4104\u20134113.","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3356250.3361944"},{"key":"e_1_3_1_36_2","first-page":"702","article-title":"Realtime edge-based visual odometry for a monocular camera","author":"Tarrio Juan Jose","year":"2015","unstructured":"Juan Jose Tarrio and Sol Pedre. 2015. Realtime edge-based visual odometry for a monocular camera. In Proceedings of the IEEE International Conference on Computer Vision. 702\u2013710.","journal-title":"Proceedings of the IEEE International Conference on Computer Vision"},{"key":"e_1_3_1_37_2","first-page":"257","article-title":"Joint configuration adaptation and bandwidth allocation for edge-based real-time video analytics","author":"Wang Can","year":"2020","unstructured":"Can Wang, Sheng Zhang, Yu Chen, Zhuzhong Qian, Jie Wu, and Mingjun Xiao. 2020. Joint configuration adaptation and bandwidth allocation for edge-based real-time video analytics. In Proceedings of the IEEE Conference on Computer Communications(IEEE INFOCOM \u201920). IEEE, Los Alamitos, CA, 257\u2013266.","journal-title":"Proceedings of the IEEE Conference on Computer Communications"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.5555\/581043"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155438"},{"key":"e_1_3_1_40_2","first-page":"977","article-title":"SwarmMap: Scaling up real-time collaborative visual SLAM at the edge","author":"Xu Jingao","year":"2022","unstructured":"Jingao Xu, Hao Cao, Zheng Yang, Longfei Shangguan, Jialin Zhang, Xiaowu He, and Yunhao Liu. 2022. SwarmMap: Scaling up real-time collaborative visual SLAM at the edge. In Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI \u201922). 977\u2013993.","journal-title":"Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI \u201922)"},{"key":"e_1_3_1_41_2","first-page":"67","article-title":"Rendering multi-party mobile augmented reality from edge","author":"Zhang Lei","year":"2019","unstructured":"Lei Zhang, Andy Sun, Ryan Shea, Jiangchuan Liu, and Miao Zhang. 2019. Rendering multi-party mobile augmented reality from edge. In Proceedings of the 29th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video. 67\u201372.","journal-title":"Proceedings of the 29th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3177102.3177107"}],"container-title":["ACM Transactions on Sensor Networks"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3623638","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3623638","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:00Z","timestamp":1750178220000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3623638"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,7]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,31]]}},"alternative-id":["10.1145\/3623638"],"URL":"https:\/\/doi.org\/10.1145\/3623638","relation":{},"ISSN":["1550-4859","1550-4867"],"issn-type":[{"type":"print","value":"1550-4859"},{"type":"electronic","value":"1550-4867"}],"subject":[],"published":{"date-parts":[[2023,12,7]]},"assertion":[{"value":"2023-03-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}