{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T09:23:27Z","timestamp":1780392207718,"version":"3.54.1"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and bounding box regression as in traditional 3D object detection, 3D dense captioning aims at producing a further and finer instance-level label of natural language description on visual appearance and spatial relations for each scene object of interest. To detect and describe objects in a scene, following the spirit of neural machine translation, we propose a transformer-based encoder-decoder architecture, namely SpaCap3D, to transform objects into descriptions, where we especially investigate the relative spatiality of objects in 3D scenes and design a spatiality-guided encoder via a token-to-token spatial relation learning objective and an object-centric decoder for precise and spatiality-enhanced object caption generation. Evaluated on two benchmark datasets, ScanRefer and ReferIt3D, our proposed SpaCap3D outperforms the baseline method Scan2Cap by 4.94% and 9.61% in CIDEr@0.5IoU, respectively. Our project page with source code and supplementary files is available at https:\/\/SpaCap3D.github.io\/.<\/jats:p>","DOI":"10.24963\/ijcai.2022\/194","type":"proceedings-article","created":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T22:55:56Z","timestamp":1657925756000},"page":"1393-1400","source":"Crossref","is-referenced-by-count":33,"title":["Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds"],"prefix":"10.24963","author":[{"given":"Heng","family":"Wang","sequence":"first","affiliation":[{"name":"University of Sydney"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chaoyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Sydney"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jianhui","family":"Yu","sequence":"additional","affiliation":[{"name":"University of Sydney"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Weidong","family":"Cai","sequence":"additional","affiliation":[{"name":"University of Sydney"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"10584","event":{"name":"Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}","theme":"Artificial Intelligence","location":"Vienna, Austria","acronym":"IJCAI-2022","number":"31","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2022,7,23]]},"end":{"date-parts":[[2022,7,29]]}},"container-title":["Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T07:08:16Z","timestamp":1658128096000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2022\/194"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2022\/194","relation":{},"subject":[],"published":{"date-parts":[[2022,7]]}}}