{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:05:52Z","timestamp":1750309552869,"version":"3.41.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2025,3,12]],"date-time":"2025-03-12T00:00:00Z","timestamp":1741737600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,4,30]]},"abstract":"<jats:p>With the rapid development of AR\/VR technologies, achieving natural and seamless human-scene interactions has emerged as a critical challenge in computer vision. Existing methods suffer from low model placement accuracy and unnatural scene interactions. Therefore, we propose a framework called human-scene interaction with geometric and physical constraints (GP-HSI), which places a given pose of a 3D human model in an appropriate position within a 3D scene by establishing geometric and physical constraints, while ensuring interactive fidelity between the human and the scene. Specifically, first, we propose a pose-guided human contact semantic generation method, which generates human semantic labels by classifying the given human poses. Second, we propose a geometrically and semantically constrained human model placement method, which determines the optimal position of the human model in the scene by constraining the geometric proximity and semantic consistency between models. Third, we propose an inverse kinematics-based pose adjustment method, which finds the target human-scene interaction points by constructing a heterogeneous kinematic tree and solves the rotation matrix of human joints to obtain a physically plausible optimal human pose. At last, we develop an interactive system to visualize the generated human-scene interaction. The results of qualitative and quantitative experiments show that our approach is able to place human models at appropriate locations in the scene and generate plausible interactions.<\/jats:p>","DOI":"10.1145\/3716137","type":"journal-article","created":{"date-parts":[[2025,2,3]],"date-time":"2025-02-03T13:21:41Z","timestamp":1738588901000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["GP-HSI: Human-Scene Interaction with Geometric and Physical Constraints"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-2799-2480","authenticated-orcid":false,"given":"Nianzi","family":"Li","sequence":"first","affiliation":[{"name":"School of Information Science and Engineering, Shandong Normal University, Jinan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-1555-8385","authenticated-orcid":false,"given":"Guijuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Shandong Normal University, Jinan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5839-5618","authenticated-orcid":false,"given":"Ping","family":"Du","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Shandong Normal University, Jinan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5435-5307","authenticated-orcid":false,"given":"Dianjie","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Shandong Normal University, Jinan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,12]]},"reference":[{"issue":"2","key":"e_1_3_1_2_2","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1177\/027836498500400203","article-title":"On the numerical solution of the inverse kinematic problem","volume":"4","author":"Jorge Angeles.","year":"1985","unstructured":"Jorge Angeles. 1985. On the numerical solution of the inverse kinematic problem. The International Journal of Robotics Research 4, 2 (1985), 21\u201337.","journal-title":"The International Journal of Robotics Research"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.gmod.2011.05.003"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13310"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-015-1898-8"},{"issue":"1","key":"e_1_3_1_6_2","first-page":"1","article-title":"Representation, analysis, and recognition of 3D humans: A survey","volume":"14","author":"Berretti Stefano","year":"2018","unstructured":"Stefano Berretti, Mohamed Daoudi, Pavan Turaga, and Anup Basu. 2018. Representation, analysis, and recognition of 3D humans: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1s (2018), 1\u201336.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3674980"},{"issue":"1","key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1007\/s42452-019-1791-7","article-title":"Simulation based calculation of the inverse kinematics solution of 7-DOF robot manipulator using artificial bee colony algorithm","volume":"2","author":"Dereli Serkan","year":"2020","unstructured":"Serkan Dereli and Ra\u015fit K\u00f6ker. 2020. Simulation based calculation of the inverse kinematics solution of 7-DOF robot manipulator using artificial bee colony algorithm. SN Applied Sciences 2, 1 (2020), 27.","journal-title":"SN Applied Sciences"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.4324\/9781315740218"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00237"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01447"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01292"},{"key":"e_1_3_1_13_2","first-page":"16750","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Huang Siyuan","year":"2023","unstructured":"Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, and Song-Chun Zhu. 2023. Diffusion-based generation, optimization, and planning in 3D scenes. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 16750\u201316761."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2614217.2630591"},{"key":"e_1_3_1_16_2","unstructured":"Ben Kenwright. 2022. Real-time character inverse kinematics using the Gauss-Seidel iterative approximation method. arXiv:2211.00330. Retrieved from https:\/\/arxiv.org\/abs\/2211.00330"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i1.25195"},{"issue":"4","key":"e_1_3_1_18_2","first-page":"1","article-title":"Shape2Pose: Human-centric shape analysis","volume":"33","author":"Kim Vladimir G.","year":"2014","unstructured":"Vladimir G. Kim, Siddhartha Chaudhuri, Leonidas Guibas, and Thomas Funkhouser. 2014. Shape2Pose: Human-centric shape analysis. ACM Transactions on Graphics 33, 4 (2014), 1\u201312.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/2945.675649"},{"key":"e_1_3_1_20_2","first-page":"9663","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Lee Jiye","year":"2023","unstructured":"Jiye Lee and Hanbyul Joo. 2023. Locomotion-action-manipulation: Synthesizing human-scene interactions in complex 3D environments. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 9663\u20139674."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3592096"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00339"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01265"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00371-022-02707-8"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1167\/9.5.1"},{"issue":"2","key":"e_1_3_1_26_2","first-page":"1","article-title":"Jaime Lloret, and Houbing Song. 2023. Special issue on deep learning for intelligent human computer interaction","volume":"20","author":"Lv Zhihan","year":"2023","unstructured":"Zhihan Lv, Fabio Poiesi, Qi Dong, Jaime Lloret, and Houbing Song. 2023. Special issue on deep learning for intelligent human computer interaction. ACM Transactions on Multimedia Computing, Communications and Applications 20, 2 (2023), 1\u20135.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1177\/0268396220915917"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3350840"},{"key":"e_1_3_1_29_2","first-page":"300","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Mullen James F.","year":"2023","unstructured":"James F. Mullen, Divya Kothandaraman, Aniket Bera, and Dinesh Manocha. 2023. Placing human animations into 3D scenes by learning interaction-and geometry-driven keyframes. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, 300\u2013310."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.3390\/encyclopedia2010031"},{"issue":"8","key":"e_1_3_1_31_2","first-page":"6762","article-title":"Third-person view attention prediction in natural scenarios with weak information dependency and human-scene interaction mechanism","volume":"24","author":"Nan Zhixiong","year":"2023","unstructured":"Zhixiong Nan and Tao Xiang. 2023. Third-person view attention prediction in natural scenarios with weak information dependency and human-scene interaction mechanism. IEEE Transactions on Circuits and Systems for Video Technology 24, 8 (2023), 6762\u20136773.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_32_2","first-page":"13468","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Patel Priyanka","year":"2021","unstructured":"Priyanka Patel, Chun-Hao P. Huang, Joachim Tesch, David T. Hoffmann, Shashank Tripathi, and Michael J. Black. 2021. AGORA: Avatars in geography optimized for regression analysis. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 13468\u201313478."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01123"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Madhusudan Raghavan and Bernard Roth. 1993. Inverse kinematics of the general 6r manipulator and related linkages. ASME Journal of Mechanical Design 115 3 (1993) 502\u2013508.","DOI":"10.1115\/1.2919218"},{"issue":"2","key":"e_1_3_1_35_2","first-page":"1","article-title":"VRVul-Discovery: BiLSTM-based vulnerability discovery for virtual reality devices in metaverse","volume":"21","author":"Sha Letian","year":"2024","unstructured":"Letian Sha, Xiao Chen, Fu Xiao, Zhong Wang, Zhangbo Long, Qianyu Fan, and Jiankuo Dong. 2024. VRVul-Discovery: BiLSTM-based vulnerability discovery for virtual reality devices in metaverse. ACM Transactions on Multimedia Computing, Communications and Applications 21, 2 (2024), 1\u201319.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_36_2","first-page":"8001","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Tripathi Shashank","year":"2023","unstructured":"Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, and Michael J. Black. 2023. DECO: Dense estimation of 3D human-scene contact in the wild. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 8001\u20138013."},{"key":"e_1_3_1_37_2","first-page":"9401","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang Jiashun","year":"2021","unstructured":"Jiashun Wang, Huazhe Xu, Jingwei Xu, Sifei Liu, and Xiaolong Wang. 2021. Synthesizing long-term 3D human motion and interaction in 3D scenes. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 9401\u20139411."},{"key":"e_1_3_1_38_2","first-page":"2596","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang Xiaolong","year":"2017","unstructured":"Xiaolong Wang, Rohit Girdhar, and Abhinav Gupta. 2017. Binge watching: Scaling affordance learning from sitcoms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2596\u20132605."},{"key":"e_1_3_1_39_2","first-page":"4927","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yadav Karmesh","year":"2023","unstructured":"Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, et al. 2023. Habitat-Matterport 3D semantics dataset. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 4927\u20134936."},{"key":"e_1_3_1_40_2","first-page":"12965","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yi Hongwei","year":"2023","unstructured":"Hongwei Yi, Chun-Hao P. Huang, Shashank Tripathi, Lea Hering, Justus Thies, and Michael J. Black. 2023. MIME: Human-aware 3D scene generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 12965\u201312976."},{"key":"e_1_3_1_41_2","doi-asserted-by":"crossref","first-page":"642","DOI":"10.1109\/3DV50981.2020.00074","volume-title":"Proceedings of the 2020 International Conference on 3D Vision (3DV \u201920)","author":"Zhang Siwei","year":"2020","unstructured":"Siwei Zhang, Yan Zhang, Qianli Ma, Michael J. Black, and Siyu Tang. 2020. PLACE: Proximity learning of articulation and contact in 3D environments. In Proceedings of the 2020 International Conference on 3D Vision (3DV \u201920). IEEE, 642\u2013651."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00623"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3603618"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716137","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3716137","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:49Z","timestamp":1750295929000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716137"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,12]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,4,30]]}},"alternative-id":["10.1145\/3716137"],"URL":"https:\/\/doi.org\/10.1145\/3716137","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2025,3,12]]},"assertion":[{"value":"2024-07-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-25","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}