{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T11:14:35Z","timestamp":1762341275661,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2019,6,25]],"date-time":"2019-06-25T00:00:00Z","timestamp":1561420800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>We propose a method to automatically detect 3D poses of closely interactive humans from sparse multi-view images at one time instance. It is a challenging problem due to the strong partial occlusion and truncation between humans and no tracking process to provide priori poses information. To solve this problem, we first obtain 2D joints in every image using OpenPose and human semantic segmentation results from Mask R-CNN. With the 3D joints triangulated from multi-view 2D joints, a two-stage assembling method is proposed to select the correct 3D pose from thousands of pose seeds combined by joint semantic meanings. We further present a novel approach to minimize the interpenetration between human shapes with close interactions. Finally, we test our method on multi-view human-human interaction (MHHI) datasets. Experimental results demonstrate that our method achieves high visualized correct rate and outperforms the existing method in accuracy and real-time capability.<\/jats:p>","DOI":"10.3390\/s19122831","type":"journal-article","created":{"date-parts":[[2019,6,25]],"date-time":"2019-06-25T10:52:31Z","timestamp":1561459951000},"page":"2831","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["3D Pose Detection of Closely Interactive Humans Using Multi-View Cameras"],"prefix":"10.3390","volume":"19","author":[{"given":"Xiu","family":"Li","sequence":"first","affiliation":[{"name":"Graduate school at Shenzhen, Tsinghua University, Shenzhen 518055, China"},{"name":"Department of Automation, Tsinghua University, Beijing 100091, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhen","family":"Fan","sequence":"additional","affiliation":[{"name":"Graduate school at Shenzhen, Tsinghua University, Shenzhen 518055, China"},{"name":"Department of Automation, Tsinghua University, Beijing 100091, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yebin","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, Beijing 100091, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yipeng","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, Beijing 100091, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qionghai","family":"Dai","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, Beijing 100091, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,6,25]]},"reference":[{"key":"ref_1","unstructured":"Tompson, J.J., Jain, A., LeCun, Y., and Bregler, C. (2014, January 8\u201313). Joint training of a convolutional network and a graphical model for human pose estimation. Proceedings of the Advances in Neural Information Processing Systems (NIPS 2014), Montreal, QC, Canada."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Qiang, B., Zhang, S., Zhan, Y., Xie, W., and Zhao, T. (2019). Improved Convolutional Pose Machines for Human Pose Estimation Using Image Sensor Data. Sensors, 19.","DOI":"10.3390\/s19030718"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Martinez, J., Hossain, R., Romero, J., and Little, J.J. (2017, January 22\u201329). A simple yet effective baseline for 3d human pose estimation. Proceedings of the 2017 International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.288"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, C., Wang, Y., Lin, Z., Yuille, A.L., and Gao, W. (2014, January 23\u201328). Robust estimation of 3d human poses from a single image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.303"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. (2017, January 21\u201326). Realtime multi-person 2d pose estimation using part affinity fields. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Fang, H., Xie, S., Tai, Y.W., and Lu, C. (2017, January 22\u201329). Rmpe: Regional multi-person pose estimation. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.256"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, Y., Stoll, C., Gall, J., Seidel, H.P., and Theobalt, C. (2011, January 20\u201325). Markerless motion capture of interacting characters using multi-view image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995424"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Alp G\u00fcler, R., Neverova, N., and Kokkinos, I. (2018, January 18\u201322). Densepose: Dense human pose estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00762"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara, S., and Sheikh, Y. (2015, January 7\u201313). Panoptic studio: A massively multiview system for social motion capture. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.381"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1111\/cgf.13574","article-title":"Shape and Pose Estimation for Closely Interacting Persons Using Multi-view Images","volume":"Volume 37","author":"Li","year":"2018","journal-title":"Computer Graphics Forum"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Li, X., Li, H., Joo, H., Liu, Y., and Sheikh, Y. (2018, January 28\u201322). Structure from Recurrent Motion: From Rigidity to Recurrency. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00320"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016). Stacked hourglass networks for human pose estimation. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_13","unstructured":"Wei, S.E., Ramakrishna, V., Kanade, T., and Sheikh, Y. (July, January 26). Convolutional pose machines. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_15","unstructured":"Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., and Schiele, B. (July, January 26). Deepcut: Joint subset partition and labeling for multi person pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., and Schiele, B. (2016). Deepercut: A deeper, stronger, and faster multi-person pose estimation model. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46466-4_3"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Iqbal, U., and Gall, J. (2016). Multi-person pose estimation with local joint-to-person associations. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-48881-3_44"},{"key":"ref_18","unstructured":"Newell, A., Huang, Z., and Deng, J. (2017, January 4\u20139). Associative embedding: End-to-end learning for joint detection and grouping. Proceedings of the Advances in Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhou, X., Huang, Q., Sun, X., Xue, X., and Wei, Y. (2017, January 22\u201329). Towards 3d human pose estimation in the wild: A weakly-supervised approach. Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.51"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Moreno-Noguer, F. (2017, January 21\u201326). 3d human pose estimation from a single image via distance matrix regression. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.170"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1145\/3072959.3073596","article-title":"Vnect: Real-time 3d human pose estimation with a single rgb camera","volume":"36","author":"Mehta","year":"2017","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Tome, D., Russell, C., and Agapito, L. (2017, January 21\u201326). Lifting from the deep: Convolutional 3d pose estimation from a single image. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.603"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Rhodin, H., Sp\u00f6rri, J., Katircioglu, I., Constantin, V., Meyer, F., M\u00fcller, E., Salzmann, M., and Fua, P. (2018, January 18\u201322). Learning Monocular 3D Human Pose Estimation from Multi-view Images. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00880"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., and Black, M.J. (2016). Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46454-1_34"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1145\/2816795.2818013","article-title":"SMPL: A skinned multi-person linear model","volume":"34","author":"Loper","year":"2015","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kanazawa, A., Black, M.J., Jacobs, D.W., and Malik, J. (2018, January 18\u201322). End-to-end recovery of human shape and pose. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00744"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Rogez, G., Weinzaepfel, P., and Schmid, C. (2017, January 21\u201326). Lcr-net: Localization-classification-regression for human pose. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.134"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Rogez, G., Weinzaepfel, P., and Schmid, C. (2019). Lcr-net++: Multi-person 2d and 3d pose detection in natural images. IEEE Trans. Pattern Anal. Mach. Intell., in press.","DOI":"10.1109\/TPAMI.2019.2892985"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., and Theobalt, C. (2018, January 5\u20138). Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00024"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Yin, K., Huang, H., Ho, E.S., Wang, H., Komura, T., Cohen-Or, D., and Zhang, R. (2018). A Sampling Approach to Generating Closely Interacting 3D Pose-pairs from 2D Annotations. IEEE Trans. Vis. Comput. Graph.","DOI":"10.1109\/TVCG.2018.2832097"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., and Ilic, S. (2014, January 23\u201328). 3D pictorial structures for multiple human pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.216"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Dong, J., Jiang, W., Huang, Q., Bao, H., and Zhou, X. (2019). Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views. arXiv.","DOI":"10.1109\/CVPR.2019.00798"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1145\/2508363.2508384","article-title":"Sphere-meshes: Shape approximation using spherical quadric error metrics","volume":"32","author":"Thiery","year":"2013","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Ericson, C. (2004). Real-Time Collision Detection, CRC Press.","DOI":"10.1201\/b14581"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1145\/1276377.1276407","article-title":"Direct visibility of point sets","volume":"26","author":"Katz","year":"2007","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Huang, C., Gao, F., Pan, J., Yang, Z., Qiu, W., Chen, P., Yang, X., Shen, S., and Cheng, K.T.T. (2018, January 21\u201325). Act: An autonomous drone cinematography system for action scenes. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8460703"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"N\u00e4geli, T., Oberholzer, S., Pl\u00fcss, S., Alonso-Mora, J., and Hilliges, O. (2018). Flycon: Real-Time Environment-Independent Multi-View Human Pose Estimation with Aerial Vehicles, ACM. SIGGRAPH Asia 2018 Technical Papers.","DOI":"10.1145\/3272127.3275022"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/12\/2831\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:01:06Z","timestamp":1760187666000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/12\/2831"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,25]]},"references-count":38,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["s19122831"],"URL":"https:\/\/doi.org\/10.3390\/s19122831","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2019,6,25]]}}}