{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T10:48:08Z","timestamp":1770720488409,"version":"3.49.0"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272016 and 62372018"],"award-info":[{"award-number":["62272016 and 62372018"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Beijing Institute of Artificial Intelligence, and Special Academic Collaborative Research Projects between BJUT and NTUT","award":["NTUT-BJUT-114-03"],"award-info":[{"award-number":["NTUT-BJUT-114-03"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>\n                    Recent advancements in 3D scanning technology have spurred extensive research on indoor scene point clouds. However, point clouds captured by 3D sensors are often sparse, noisy, and irregular, posing significant challenges for downstream tasks. While learning-based upsampling methods have achieved promising results, they often struggle with large-scale indoor scenes due to the high computational cost of K-Nearest Neighbor (KNN) search and insufficient reconstruction of fine-grained structures. To address these issues, we propose S\n                    <jats:sup>2<\/jats:sup>\n                    PU-Net, a sparse semantic-guided progressive point cloud upsampling network for indoor scenes. Specifically, we introduce a progressive upsampling mechanism based on sparse tensors, where a progressive Point Generation Block is designed to generate and prune intermediate point clouds iteratively. By employing multiple generative deconvolutions and adaptive pruning with different coefficients, our method progressively reconstructs the target point cloud while preserving fine-grained local structures. Furthermore, we integrate semantic information into each layer of the hourglass encoder through a Sparse Semantic Embedding (SSE) module, enhancing point feature extraction. To further strengthen feature representation, we propose a Multi-Scale SSE (MSSE) module, which refines the fused features across spatial and channel dimensions. Extensive experiments on multiple indoor scene datasets demonstrate that our method consistently outperforms state-of-the-art approaches across most evaluation metrics.\n                  <\/jats:p>","DOI":"10.1145\/3778045","type":"journal-article","created":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T14:49:34Z","timestamp":1765205374000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["S\n                    <sup>2<\/sup>\n                    PU-Net: Sparse Semantic-Guided Progressive Point Cloud Upsampling for Indoor Scenes"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-2475-2982","authenticated-orcid":false,"given":"Yilin","family":"Hou","sequence":"first","affiliation":[{"name":"Beijing University of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5437-3150","authenticated-orcid":false,"given":"Jin","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing University of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0703-3632","authenticated-orcid":false,"given":"Jiade","family":"Chen","sequence":"additional","affiliation":[{"name":"Beijing University of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8480-0536","authenticated-orcid":false,"given":"Yunhui","family":"Shi","sequence":"additional","affiliation":[{"name":"Beijing University of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5741-7937","authenticated-orcid":false,"given":"Nam","family":"Ling","sequence":"additional","affiliation":[{"name":"Santa Clara University, Santa Clara, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3121-1823","authenticated-orcid":false,"given":"Baocai","family":"Yin","sequence":"additional","affiliation":[{"name":"Beijing University of Technology, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2026,2,9]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs13193844"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3180904"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2003.1175093"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.23919\/IConAC.2017.8082057"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.170"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.buildenv.2021.108675"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681562"},{"key":"e_1_3_1_9_2","unstructured":"Jaesung Choe Byeongin Joung Francois Rameau Jaesik Park and In So Kweon. 2021. Deep point cloud reconstruction. arXiv:2111.11704. Retrieved from https:\/\/arxiv.org\/abs\/2111.11704"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00319"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.261"},{"key":"e_1_3_1_12_2","first-page":"586","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Du Hang","year":"2022","unstructured":"Hang Du, Xuejun Yan, Jingjing Wang, Di Xie, and Shiliang Pu. 2022. Point cloud upsampling via cascaded refinement network. In Proceedings of the Asian Conference on Computer Vision, 586\u2013601."},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Dipesh Gyawali Jian Zhang and Bijaya B. Karki. 2024. Region-transformer: Self-attention region based class-agnostic point cloud segmentation. arXiv:2403.01407. Retrieved from https:\/\/arxiv.org\/abs\/2403.01407","DOI":"10.5220\/0012424500003660"},{"key":"e_1_3_1_14_2","unstructured":"Qingdong He Jinlong Peng Zhengkai Jiang Xiaobin Hu Jiangning Zhang Qiang Nie Yabiao Wang and Chengjie Wang. 2024. PointSeg: A training-free paradigm for 3D scene segmentation via foundation models. arXiv:2403.06403. Retrieved from https:\/\/arxiv.org\/abs\/2403.06403"},{"key":"e_1_3_1_15_2","first-page":"349","volume-title":"Proceedings of the European Conference on Computer Vision","author":"He Shuting","year":"2024","unstructured":"Shuting He, Henghui Ding, Xudong Jiang, and Bihan Wen. 2024. SegPoint: Segment any point cloud via large language model. In Proceedings of the European Conference on Computer Vision. Springer, 349\u2013367."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00518"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01112"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618522"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2421636.2421645"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3139131.3139135"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01979"},{"key":"e_1_3_1_22_2","unstructured":"Chade Li Pengju Zhang and Yihong Wu. 2024. Density-aware global-local attention network for point cloud segmentation. arXiv:2412.00489. Retrieved from https:\/\/arxiv.org\/abs\/2412.00489"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00730"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00041"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3160604"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276405"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9746373"},{"key":"e_1_3_1_28_2","article-title":"PU-Mask: 3D point cloud upsampling via an implicit virtual mask","author":"Liu Hao","year":"2024","unstructured":"Hao Liu, Hui Yuan, Raouf Hamzaoui, Qi Liu, and Shuai Li. 2024. PU-Mask: 3D point cloud upsampling via an implicit virtual mask. IEEE Transactions on Circuits and Systems for Video Technology. 2024.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-022-3586-y"},{"key":"e_1_3_1_30_2","first-page":"652","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R.","year":"2017","unstructured":"Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 652\u2013660."},{"key":"e_1_3_1_31_2","first-page":"5099","article-title":"PointNet++: Deep hierarchical feature learning on point sets in a metric space","volume":"30","author":"Ruizhongtai Qi Charles","year":"2017","unstructured":"Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems 30, 5099\u20135108.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01151"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58529-7_44"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3137794"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00180"},{"key":"e_1_3_1_36_2","first-page":"2475","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Qiu Shi","year":"2022","unstructured":"Shi Qiu, Saeed Anwar, and Nick Barnes. 2022. PU-Transformer: Point cloud upsampling transformer. In Proceedings of the Asian Conference on Computer Vision, 2475\u20132493."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01951"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01989"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2022.104422"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2017.8296925"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3613807"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/DCC50243.2021.00015"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2025.3568001"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00070"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3326362"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2021.06.006"},{"key":"e_1_3_1_47_2","unstructured":"Tong Wu Liang Pan Junzhe Zhang Tai Wang Ziwei Liu and Dahua Lin. 2021. Density-aware chamfer distance as a comprehensive metric for point cloud completion. arXiv:2111.12702. Retrieved from https:\/\/arxiv.org\/abs\/2111.12702"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00463"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2025.3550011"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2021.3058311"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00008"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00611"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_24"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00295"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2015.2391296"},{"key":"e_1_3_1_56_2","unstructured":"Yuchen Zhou Jiayuan Gu Tung Yen Chiang Fanbo Xiang and Hao Su. 2024. Point-SAM: Promptable 3D segmentation model for point clouds. arXiv:2406.17741. Retrieved from https:\/\/arxiv.org\/abs\/2406.17741"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2013.2294314"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW56347.2022.00131"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3778045","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,9]],"date-time":"2026-02-09T14:56:36Z","timestamp":1770648996000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3778045"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,9]]},"references-count":57,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3778045"],"URL":"https:\/\/doi.org\/10.1145\/3778045","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,9]]},"assertion":[{"value":"2025-05-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}