{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T15:49:47Z","timestamp":1772207387785,"version":"3.50.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,1,5]],"date-time":"2023-01-05T00:00:00Z","timestamp":1672876800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Shanghai Municipal Science and Technology Major Project","award":["2021SHZDZX0103"],"award-info":[{"award-number":["2021SHZDZX0103"]}]},{"name":"STCSM Project","award":["19ZR1471800"],"award-info":[{"award-number":["19ZR1471800"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2023,1,31]]},"abstract":"<jats:p>Three-dimensional (3D) human-like body reconstruction via a single RGB image has attracted significant research attention recently. Most of the existing methods rely on the Skinned Multi-Person Linear model and thus can only predict unified human bodies. Moreover, meshes reconstructed by current methods sometimes perform well from a canonical view but not from other views, as the reconstruction process is commonly supervised by only a single view. To address these limitations, this article proposes a multi-view shape generation network for a 3D human-like body. Particularly, we propose a coarse-to-fine learning model that gradually deforms a template body toward the ground truth body. Our model utilizes the information of multi-view renderings and corresponding 3D vertex transformation as supervision. Such supervision will help to generate 3D bodies well aligned to all views. To accurately operate mesh deformation, a graph convolutional network structure is introduced to support the shape generation from 3D vertex representation. Additionally, a graph up-pooling operation is designed over the intermediate representations of the graph convolutional network, and thus our model can generate 3D shapes with higher resolution. Novel loss functions are employed to help optimize the whole multi-view generation model, resulting in smoother surfaces. In addition, two multi-view human body datasets are produced and contributed to the community. Extensive experiments conducted on the benchmark datasets demonstrate the efficacy of our model over the competitors.<\/jats:p>","DOI":"10.1145\/3514248","type":"journal-article","created":{"date-parts":[[2022,2,18]],"date-time":"2022-02-18T19:42:08Z","timestamp":1645213328000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Multi-view Shape Generation for a 3D Human-like Body"],"prefix":"10.1145","volume":"19","author":[{"given":"Hang","family":"Yu","sequence":"first","affiliation":[{"name":"Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chilam","family":"Cheang","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanwei","family":"Fu","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China and Zhejiang Normal University, Jinhua, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiangyang","family":"Xue","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,1,5]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"1175","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Alldieck Thiemo","year":"2019","unstructured":"Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, and Gerard Pons-Moll. 2019. Learning to reconstruct people in clothing from a single RGB camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201919). 1175\u20131186."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00762"},{"key":"e_1_3_1_4_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914)","author":"Andriluka Mykhaylo","year":"2014","unstructured":"Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914)."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00552"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46454-1_34"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","first-page":"3794","DOI":"10.1109\/CVPR.2014.491","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914)","author":"Bogo Federica","year":"2014","unstructured":"Federica Bogo, Javier Romero, Matthew Loper, and Michael J. Black. 2014. FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914). 3794\u20133801."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.5555\/1462123"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2693418"},{"key":"e_1_3_1_10_2","article-title":"Spectral networks and locally connected networks on graphs","author":"Bruna Joan","year":"2013","unstructured":"Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).","journal-title":"arXiv preprint arXiv:1312.6203"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2929257"},{"key":"e_1_3_1_12_2","article-title":"ShapeNet: An information-rich 3D model repository","author":"Chang Angel X.","year":"2015","unstructured":"Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, et\u00a0al. 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).","journal-title":"arXiv preprint arXiv:1512.03012"},{"key":"e_1_3_1_13_2","volume-title":"Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV\u201919)","author":"Zhang Zhuwen Li Yanwei Fu Chao Wen, Yinda","year":"2019","unstructured":"Zhuwen Li Yanwei Fu Chao Wen, Yinda Zhang. 2019. Pixel2Mesh++: Multi-View 3D mesh generation via deformation. In Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV\u201919)."},{"key":"e_1_3_1_14_2","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201920)","author":"Choi Hongsuk","year":"2020","unstructured":"Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. 2020. Pose2Mesh: Graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In Proceedings of the European Conference on Computer Vision (ECCV\u201920)."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_38"},{"key":"e_1_3_1_16_2","volume-title":"Blender\u2014A 3D Modelling and Rendering Package","author":"Community Blender Online","year":"2018","unstructured":"Blender Online Community. 2018. Blender\u2014A 3D Modelling and Rendering Package. Stichting Blender Foundation, Amsterdam. http:\/\/www.blender.org."},{"key":"e_1_3_1_17_2","first-page":"11875","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Corona Enric","year":"2021","unstructured":"Enric Corona, Albert Pumarola, Guillem Alenya, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 11875\u201311885."},{"key":"e_1_3_1_18_2","first-page":"3844","volume-title":"Advances in Neural Information Processing Systems","author":"Defferrard Micha\u00ebl","year":"2016","unstructured":"Micha\u00ebl Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844\u20133852."},{"key":"e_1_3_1_19_2","first-page":"605","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Fan Haoqiang","year":"2017","unstructured":"Haoqiang Fan, Hao Su, and Leonidas J. Guibas. 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). 605\u2013613."},{"key":"e_1_3_1_20_2","unstructured":"The SAE Foundation. n.d. Civilian American and European Surface Anthropometry Resource Project\u2014CAESAR. Retrieved February 25 2022 from http:\/\/store.sae.org\/caesar\/."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.122653799"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02291478"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46466-4_3"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.5244\/C.24.12"},{"key":"e_1_3_1_27_2","first-page":"7122","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Kanazawa Angjoo","year":"2018","unstructured":"Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). 7122\u20137131."},{"key":"e_1_3_1_28_2","article-title":"Semi-supervised classification with graph convolutional networks","author":"Kipf Thomas N.","year":"2016","unstructured":"Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).","journal-title":"arXiv preprint arXiv:1609.02907"},{"key":"e_1_3_1_29_2","first-page":"4501","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Kolotouros Nikos","year":"2019","unstructured":"Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201919). 4501\u20134510."},{"key":"e_1_3_1_30_2","first-page":"1097","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097\u20131105."},{"key":"e_1_3_1_31_2","first-page":"6050","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Lassner Christoph","year":"2017","unstructured":"Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). 6050\u20136059."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1080\/10867651.2003.10487582"},{"key":"e_1_3_1_33_2","volume-title":"Proceedings of the International Conference on Computer Vision (ICCV\u201919)","author":"Liang Junbang","year":"2019","unstructured":"Junbang Liang and Ming C. Lin. 2019. Shape-aware human pose and shape reconstruction using multi-view images. In Proceedings of the International Conference on Computer Vision (ICCV\u201919)."},{"issue":"6","key":"e_1_3_1_34_2","first-page":"248","article-title":"SMPL: A skinned multi-person linear model","volume":"34","author":"Loper Matthew","year":"2015","unstructured":"Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6 (2015), 248.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/37402.37422"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"e_1_3_1_37_2","first-page":"2014","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Niepert Mathias","year":"2016","unstructured":"Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning. 2014\u20132023."},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1109\/3DV.2018.00062","volume-title":"Proceedings of the 2018 International Conference on 3D Vision (3DV\u201918)","author":"Omran Mohamed","year":"2018","unstructured":"Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In Proceedings of the 2018 International Conference on 3D Vision (3DV\u201918). IEEE, Los Alamitos, CA, 484\u2013494."},{"key":"e_1_3_1_39_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Pavlakos Georgios","year":"2019","unstructured":"Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201919)."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00055"},{"key":"e_1_3_1_41_2","first-page":"9054","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Peng Sida","year":"2021","unstructured":"Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9054\u20139063."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.533"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00233"},{"key":"e_1_3_1_44_2","article-title":"PIFu: Pixel-Aligned implicit function for high-resolution clothed human digitization","author":"Saito Shunsuke","year":"2019","unstructured":"Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. PIFu: Pixel-Aligned implicit function for high-resolution clothed human digitization. arXiv preprint arXiv:1905.05172 (2019).","journal-title":"arXiv preprint arXiv:1905.05172"},{"key":"e_1_3_1_45_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Saito Shunsuke","year":"2020","unstructured":"Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0273-6"},{"key":"e_1_3_1_47_2","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).","journal-title":"arXiv preprint arXiv:1409.1556"},{"key":"e_1_3_1_48_2","first-page":"5236","volume-title":"Advances in Neural Information Processing Systems","author":"Tung Hsiao-Yu","year":"2017","unstructured":"Hsiao-Yu Tung, Hsiao-Wei Tung, Ersin Yumer, and Katerina Fragkiadaki. 2017. Self-supervised learning of motion capture. In Advances in Neural Information Processing Systems. 5236\u20135246."},{"key":"e_1_3_1_49_2","first-page":"20","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Varol Gul","year":"2018","unstructured":"Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. BodyNet: Volumetric inference of 3D human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). 20\u201336."},{"key":"e_1_3_1_50_2","first-page":"52","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Wang Nanyang","year":"2018","unstructured":"Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. 2018. Pixel2Mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). 52\u201367."},{"issue":"1","key":"e_1_3_1_51_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3408317","article-title":"Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion","volume":"17","author":"Wang Yang","year":"2021","unstructured":"Yang Wang. 2021. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1\u201325.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01427149"},{"key":"e_1_3_1_53_2","first-page":"3425","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhao Long","year":"2019","unstructured":"Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, and Dimitris N. Metaxas. 2019. Semantic graph convolutional networks for 3D human pose regression. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 3425\u20133435."},{"key":"e_1_3_1_54_2","article-title":"PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction","author":"Zheng Zerong","year":"2021","unstructured":"Zerong Zheng, Tao Yu, Yebin Liu, and Qionghai Dai. 2021. PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence PP, 99 (2021), 1.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_55_2","first-page":"4491","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Zhu Hao","year":"2019","unstructured":"Hao Zhu, Xinxin Zuo, Sen Wang, Xun Cao, and Ruigang Yang. 2019. Detailed human shape estimation from a single image by hierarchical mesh deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201919). 4491\u20134500."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514248","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3514248","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:14Z","timestamp":1750183814000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514248"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,5]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,31]]}},"alternative-id":["10.1145\/3514248"],"URL":"https:\/\/doi.org\/10.1145\/3514248","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,5]]},"assertion":[{"value":"2021-07-21","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-28","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}