{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T09:37:51Z","timestamp":1762508271050,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key R&D Program of China","award":["2018YFB0505400"],"award-info":[{"award-number":["2018YFB0505400"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3394171.3413889","type":"proceedings-article","created":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T13:10:18Z","timestamp":1602508218000},"page":"1018-1027","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":30,"title":["ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences"],"prefix":"10.1145","author":[{"given":"Zhizhong","family":"Han","sequence":"first","affiliation":[{"name":"Tsinghua University &amp; University of Maryland, College Park, MD, USA"}]},{"given":"Chao","family":"Chen","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Yu-Shen","family":"Liu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Matthias","family":"Zwicker","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park, MD, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"crossref","unstructured":"Kevin Chen Christopher B Choy Manolis Savva Angel X Chang Thomas Funkhouser and Silvio Savarese. 2018. Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings. In ACCV.  Kevin Chen Christopher B Choy Manolis Savva Angel X Chang Thomas Funkhouser and Silvio Savarese. 2018. Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings. In ACCV.","DOI":"10.1007\/978-3-030-20893-6_7"},{"key":"e_1_3_2_2_2_1","unstructured":"Kyunghyun Cho Bart van Merrienboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. In SSST@EMNLP. 103--111.  Kyunghyun Cho Bart van Merrienboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. In SSST@EMNLP. 103--111."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3348"},{"key":"e_1_3_2_2_4_1","volume-title":"Semantic Compositional Networks for Visual Captioning. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Gan Zhe","year":"2017","unstructured":"Zhe Gan , Chuang Gan , Xiaodong He , Yunchen Pu , Kenneth Tran , Jianfeng Gao , Lawrence Carin , and Li Deng . 2017 . Semantic Compositional Networks for Visual Captioning. In IEEE Conference on Computer Vision and Pattern Recognition. Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, and Li Deng. 2017. Semantic Compositional Networks for Visual Captioning. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_5_1","volume-title":"Fast R-CNN. In IEEE International Conference on Computer Vision. 1440--1448","author":"Girshick Ross","year":"2015","unstructured":"Ross Girshick . 2015 . Fast R-CNN. In IEEE International Conference on Computer Vision. 1440--1448 . Ross Girshick. 2015. Fast R-CNN. In IEEE International Conference on Computer Vision. 1440--1448."},{"key":"e_1_3_2_2_6_1","unstructured":"Zhizhong Han Chao Chen Yu-Shen Liu and Matthias Zwicker. 2020 a. DRWR: A Differentiable Renderer without Rendering for Unsupervised 3D Structure Learning from Silhouette Images. In ICML.  Zhizhong Han Chao Chen Yu-Shen Liu and Matthias Zwicker. 2020 a. DRWR: A Differentiable Renderer without Rendering for Unsupervised 3D Structure Learning from Silhouette Images. In ICML."},{"key":"e_1_3_2_2_7_1","unstructured":"Zhizhong Han Xinhai Liu Yu-Shen Liu and Matthias Zwicker. 2019 b. Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views. In IJCAI.  Zhizhong Han Xinhai Liu Yu-Shen Liu and Matthias Zwicker. 2019 b. Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views. In IJCAI."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2016.2582532"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2017.2778764"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2605920"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2704426"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2816821"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2904460"},{"key":"e_1_3_2_2_14_1","volume-title":"2020 b. SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates. ArXiv","author":"Han Zhizhong","year":"2020","unstructured":"Zhizhong Han , Guanhui Qiao , Yu-Shen Liu , and Matthias Zwicker . 2020 b. SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates. ArXiv , Vol. abs\/ 2003 .05559 ( 2020 ). Zhizhong Han, Guanhui Qiao, Yu-Shen Liu, and Matthias Zwicker. 2020 b. SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates. ArXiv, Vol. abs\/2003.05559 (2020)."},{"key":"e_1_3_2_2_15_1","first-page":"685","article-title":"d. SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention","volume":"28","author":"Han Zhizhong","year":"2019","unstructured":"Zhizhong Han , Mingyang Shang , Zhenbao Liu , Chi-Man Vong , Yu-Shen Liu , Matthias Zwicker , Junwei Han , and C.L. Philip Chen . 2019 d. SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention . IEEE Transactions on Image Processing , Vol. 28 , 2 (2019), 685 -- 672 . Zhizhong Han, Mingyang Shang, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Matthias Zwicker, Junwei Han, and C.L. Philip Chen. 2019 d. SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention. IEEE Transactions on Image Processing, Vol. 28, 2 (2019), 685--672.","journal-title":"IEEE Transactions on Image Processing"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Zhizhong Han Mingyang Shang Xiyang Wang Yu-Shen Liu and Matthias Zwicker. 2019 e. Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences. In AAAI. 126--133.  Zhizhong Han Mingyang Shang Xiyang Wang Yu-Shen Liu and Matthias Zwicker. 2019 e. Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences. In AAAI. 126--133.","DOI":"10.1609\/aaai.v33i01.3301126"},{"key":"e_1_3_2_2_17_1","unstructured":"Zhizhong Han Xiyang Wang Yu-Shen Liu and Matthias Zwicker. 2019 f. Multi-Angle Point Cloud-VAE:Unsupervised Feature Learning for 3D Point Clouds from Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction. In ICCV.  Zhizhong Han Xiyang Wang Yu-Shen Liu and Matthias Zwicker. 2019 f. Multi-Angle Point Cloud-VAE:Unsupervised Feature Learning for 3D Point Clouds from Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction. In ICCV."},{"key":"e_1_3_2_2_18_1","unstructured":"Zhizhong Han Xiyang Wang Chi-Man Vong Yu-Shen Liu Matthias Zwicker and C.L. Philip Chen. 2019 g. 3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention. In IJCAI.  Zhizhong Han Xiyang Wang Chi-Man Vong Yu-Shen Liu Matthias Zwicker and C.L. Philip Chen. 2019 g. 3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention. In IJCAI."},{"key":"e_1_3_2_2_19_1","volume-title":"2019 a. Render4Completion: Synthesizing Multi-view Depth Maps for 3D Shape Completion. ArXiv","author":"Hu Tao","year":"2019","unstructured":"Tao Hu , Zhizhong Han , Abhinav Shrivastava , and Matthias Zwicker . 2019 a. Render4Completion: Synthesizing Multi-view Depth Maps for 3D Shape Completion. ArXiv , Vol. abs\/ 1904 .08366 ( 2019 ). Tao Hu, Zhizhong Han, Abhinav Shrivastava, and Matthias Zwicker. 2019 a. Render4Completion: Synthesizing Multi-view Depth Maps for 3D Shape Completion. ArXiv, Vol. abs\/1904.08366 (2019)."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Tao Hu Zhizhong Han and Matthias Zwicker. 2020. 3D Shape Completion with Multi-view Consistent Inference. In AAAI.  Tao Hu Zhizhong Han and Matthias Zwicker. 2020. 3D Shape Completion with Multi-view Consistent Inference. In AAAI.","DOI":"10.1609\/aaai.v34i07.6734"},{"key":"e_1_3_2_2_21_1","volume-title":"2019 b. Learning to Generate Dense Point Clouds with Textures on Multiple Categories. ArXiv","author":"Hu Tao","year":"2019","unstructured":"Tao Hu , Geng Lin , Zhizhong Han , and Matthias Zwicker . 2019 b. Learning to Generate Dense Point Clouds with Textures on Multiple Categories. ArXiv , Vol. abs\/ 1912 .10545 ( 2019 ). Tao Hu, Geng Lin, Zhizhong Han, and Matthias Zwicker. 2019 b. Learning to Generate Dense Point Clouds with Textures on Multiple Categories. ArXiv, Vol. abs\/1912.10545 (2019)."},{"key":"e_1_3_2_2_22_1","unstructured":"Qiuyuan Huang Pengchuan Zhang Dapeng Wu and Lei Zhang. 2018. Turbo Learning for CaptionBot and DrawingBot. In Advances in Neural Information Processing Systems. 6455--6465.  Qiuyuan Huang Pengchuan Zhang Dapeng Wu and Lei Zhang. 2018. Turbo Learning for CaptionBot and DrawingBot. In Advances in Neural Information Processing Systems. 6455--6465."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00133"},{"key":"e_1_3_2_2_24_1","volume-title":"DenseCap: Fully Convolutional Localization Networks for Dense Captioning. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Johnson Justin","year":"2016","unstructured":"Justin Johnson , Andrej Karpathy , and Li Fei-Fei . 2016 . DenseCap: Fully Convolutional Localization Networks for Dense Captioning. In IEEE Conference on Computer Vision and Pattern Recognition. Justin Johnson, Andrej Karpathy, and Li Fei-Fei. 2016. DenseCap: Fully Convolutional Localization Networks for Dense Captioning. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.702"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2598339"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_2_28_1","volume-title":"Proceeding of ACL workshop on Text Summarization Branches Out.","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin . 2004 . ROUGE: A Package for Automatic Evaluation of summaries . In Proceeding of ACL workshop on Text Summarization Branches Out. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of summaries. In Proceeding of ACL workshop on Text Summarization Branches Out."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Xinhai Liu Zhizhong Han Yu-Shen Liu and Matthias Zwicker. 2019 a. Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network. In AAAI. 8778--8785.  Xinhai Liu Zhizhong Han Yu-Shen Liu and Matthias Zwicker. 2019 a. Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network. In AAAI. 8778--8785.","DOI":"10.1609\/aaai.v33i01.33018778"},{"key":"e_1_3_2_2_30_1","unstructured":"Xinhai Liu Zhizhong Han Wen Xin Yu-Shen Liu and Matthias Zwicker. 2019 b. L2G Auto-encoder: Understanding Point Clouds by Local-to-Global Reconstruction with Hierarchical Self-Attention. In ACMMM.  Xinhai Liu Zhizhong Han Wen Xin Yu-Shen Liu and Matthias Zwicker. 2019 b. L2G Auto-encoder: Understanding Point Clouds by Local-to-Global Reconstruction with Hierarchical Self-Attention. In ACMMM."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00459"},{"key":"e_1_3_2_2_32_1","volume-title":"BLEU: A Method for Automatic Evaluation of Machine Translation. In Annual Meeting on Association for Computational Linguistics. 311--318","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni , Salim Roukos , Todd Ward , and Wei-Jing Zhu . 2002 . BLEU: A Method for Automatic Evaluation of Machine Translation. In Annual Meeting on Association for Computational Linguistics. 311--318 . Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Annual Meeting on Association for Computational Linguistics. 311--318."},{"key":"e_1_3_2_2_33_1","volume-title":"DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Park Jeong Joon","year":"2019","unstructured":"Jeong Joon Park , Peter Florence , Julian Straub , Richard Newcombe , and Steven Lovegrove . 2019 . DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In IEEE Conference on Computer Vision and Pattern Recognition. Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_34_1","volume-title":"Guibas","author":"Qi Charles Ruizhongtai","year":"2017","unstructured":"Charles Ruizhongtai Qi , Li Yi , Hao Su , and Leonidas J . Guibas . 2017 . PointNet+: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Advances in Neural Information Processing Systems . 5105--5114. Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet+: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Advances in Neural Information Processing Systems. 5105--5114."},{"key":"e_1_3_2_2_35_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems. 91--99.  Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems. 91--99."},{"key":"e_1_3_2_2_36_1","volume-title":"Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence","author":"Shen Xu","year":"2018","unstructured":"Xu Shen , Xinmei Tian , Jun Xing , Yong Rui , and Dacheng Tao . 2018 . Sequence-to-Sequence Learning via Shared Latent Representation . In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence , New Orleans, Louisiana, USA, February 2--7 , 2018. https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI18\/paper\/view\/16071 Xu Shen, Xinmei Tian, Jun Xing, Yong Rui, and Dacheng Tao. 2018. Sequence-to-Sequence Learning via Shared Latent Representation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2--7, 2018. https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI18\/paper\/view\/16071"},{"key":"e_1_3_2_2_37_1","unstructured":"K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR Vol. abs\/1409.1556 (2014).  K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR Vol. abs\/1409.1556 (2014)."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.434"},{"key":"e_1_3_2_2_39_1","volume-title":"Cross-Modal Retrieval with Implicit Concept Association. CoRR","author":"Song Yale","year":"2018","unstructured":"Yale Song and Mohammad Soleymani . 2018. Cross-Modal Retrieval with Implicit Concept Association. CoRR , Vol. abs\/ 1804 .04318 ( 2018 ). arxiv: 1804.04318 http:\/\/arxiv.org\/abs\/1804.04318 Yale Song and Mohammad Soleymani. 2018. Cross-Modal Retrieval with Implicit Concept Association. CoRR, Vol. abs\/1804.04318 (2018). arxiv: 1804.04318 http:\/\/arxiv.org\/abs\/1804.04318"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_2_2_43_1","volume-title":"Pranava Swaroop Madhyastha, and Lucia Specia","author":"Wang Josiah","year":"2018","unstructured":"Josiah Wang , Pranava Swaroop Madhyastha, and Lucia Specia . 2018 . Object Counts! Bringing Explicit Detections Back into Image Captioning. In NAACL-HLT. 2180--2193. Josiah Wang, Pranava Swaroop Madhyastha, and Lucia Specia. 2018. Object Counts! Bringing Explicit Detections Back into Image Captioning. In NAACL-HLT. 2180--2193."},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413829"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00201"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.29"},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2698200"},{"key":"e_1_3_2_2_48_1","unstructured":"Xuwang Yin and Vicente Ordonez. 2017. Obj2Text: Generating Visually Descriptive Language from Object Layouts. In EMNLP. 177--187.  Xuwang Yin and Vicente Ordonez. 2017. Obj2Text: Generating Visually Descriptive Language from Object Layouts. In EMNLP. 177--187."},{"key":"e_1_3_2_2_49_1","volume-title":"Image Captioning with Semantic Attention. In IEEE Conference on Computer Vision and Pattern Recognition. 4651--4659","author":"You Quanzeng","year":"2016","unstructured":"Quanzeng You , Hailin Jin , Zhaowen Wang , Chen Fang , and Jiebo Luo . 2016 . Image Captioning with Semantic Attention. In IEEE Conference on Computer Vision and Pattern Recognition. 4651--4659 . Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image Captioning with Semantic Attention. In IEEE Conference on Computer Vision and Pattern Recognition. 4651--4659."}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Seattle WA USA","acronym":"MM '20"},"container-title":["Proceedings of the 28th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413889","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394171.3413889","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:32:06Z","timestamp":1750195926000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413889"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":49,"alternative-id":["10.1145\/3394171.3413889","10.1145\/3394171"],"URL":"https:\/\/doi.org\/10.1145\/3394171.3413889","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}