{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T04:29:57Z","timestamp":1774326597779,"version":"3.50.1"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"10","license":[{"start":{"date-parts":[[2024,10,29]],"date-time":"2024-10-29T00:00:00Z","timestamp":1730160000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272461, 62172417, 62276266, 62277046"],"award-info":[{"award-number":["62272461, 62172417, 62276266, 62277046"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"crossref","award":["BK20201346"],"award-info":[{"award-number":["BK20201346"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"crossref"}]},{"name":"\u201cDouble First-Class\u201d Project of China University of Mining and Technology for Independent Innovation and Social Service","award":["2022ZZCX06"],"award-info":[{"award-number":["2022ZZCX06"]}]},{"DOI":"10.13039\/501100010014","name":"Six Talent Peaks Project in Jiangsu Province","doi-asserted-by":"crossref","award":["2015-DZXX-010, 2018-XYDXX-044"],"award-info":[{"award-number":["2015-DZXX-010, 2018-XYDXX-044"]}],"id":[{"id":"10.13039\/501100010014","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,10,31]]},"abstract":"<jats:p>Point cloud segmentation is essential for scene understanding, which provides advanced information for many applications, such as autonomous driving, robots, and virtual reality. To improve the accuracy and robustness of point cloud segmentation, many researchers have attempted to fuze camera images to complement the color and texture information. The common fusion strategy is the combination of convolutional operations with concatenation, element-wise addition or element-wise multiplication. However, conventional convolutional operators tend to confine the fusion of modal features within their receptive fields, which can be incomplete and limited. In addition, the inability of encoder\u2013decoder segmentation networks to explicitly perceive segmentation boundary information results in semantic ambiguity and classification errors at object edges. These errors are further amplified in point cloud segmentation tasks, significantly affecting the accuracy of point cloud segmentation. To address the above issues, we propose a novel self-attention multi-modal fusion semantic segmentation network for point cloud semantic segmentation. Firstly, to effectively fuze different modal features, we propose a self-cross fusion module (SCF), which models long-range modality dependencies and transfers complementary image information to the point cloud to fully leverage the modality-specific advantages. Secondly, we design the salience refinement module (SR), which calculates the importance of channels in the feature maps and global descriptors to enhance the representation capability of salient modal features. Finally, we propose the local-aware anisotropy loss measure the element-level importance in the data and explicitly provide boundary information for the model, which alleviates the inherent semantic ambiguity problem in segmentation networks. Extensive experiments on two benchmark datasets demonstrate that our proposed method surpasses current state-of-the-art methods.<\/jats:p>","DOI":"10.1145\/3674979","type":"journal-article","created":{"date-parts":[[2024,7,1]],"date-time":"2024-07-01T13:51:58Z","timestamp":1719841918000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Multi-Modal LiDAR Point Cloud Semantic Segmentation with Salience Refinement and Boundary Perception"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6207-0299","authenticated-orcid":false,"given":"Yong","family":"Zhou","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Mine Digitization Engineering Research Center of the Ministry of Education, China University of Mining and Technology, Xuzhou, Jiangsu, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5366-5631","authenticated-orcid":false,"given":"Zeming","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Mine Digitization Engineering Research Center of the Ministry of Education, China University of Mining and Technology, Xuzhou, Jiangsu, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3564-5090","authenticated-orcid":false,"given":"Jiaqi","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Mine Digitization Engineering Research Center of the Ministry of Education, China University of Mining and Technology, Xuzhou, Jiangsu, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9234-0912","authenticated-orcid":false,"given":"Wenliang","family":"Du","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Mine Digitization Engineering Research Center of the Ministry of Education, China University of Mining and Technology, Xuzhou, Jiangsu, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2734-915X","authenticated-orcid":false,"given":"Rui","family":"Yao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Mine Digitization Engineering Research Center of the Ministry of Education, China University of Mining and Technology, Xuzhou, Jiangsu, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7690-8547","authenticated-orcid":false,"given":"Abdulmotaleb","family":"El Saddik","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario, Canada"}]}],"member":"320","published-online":{"date-parts":[[2024,10,29]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2644615"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00939"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00464"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00584"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2699184"},{"key":"e_1_3_1_8_2","unstructured":"Liang-Chieh Chen George Papandreou Florian Schroff and Hartwig Adam. 2017b. Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587. Retrieved from https:\/\/arxiv.org\/abs\/1706.05587."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-64559-5_16"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC.2019.8917447"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3476514"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00326"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3005434"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.70"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01112"},{"key":"e_1_3_1_20_2","unstructured":"Ye Huang Wenjing Jia Xiangjian He Liu Liu Yuxin Li and Dacheng Tao. 2021. CAA: Channelized axial attention for semantic segmentation. arXiv:2101.07434. Retrieved from https:\/\/arxiv.org\/abs\/2101.07434"},{"key":"e_1_3_1_21_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093584"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2019.2953639"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.3015992"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.549"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS40897.2019.8967762"},{"key":"e_1_3_1_29_2","first-page":"652","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R.","year":"2017","unstructured":"Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017a. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652\u2013660."},{"key":"e_1_3_1_30_2","first-page":"5105","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"30","author":"Qi Charles Ruizhongtai","year":"2017","unstructured":"Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017b. PointNet++: Deep hierarchical feature learning on point sets in a metric space. Proceedings of the Advances in Neural Information Processing Systems, Vol. 30. 5105\u20135114."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00717"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58604-1_41"},{"key":"e_1_3_1_34_2","first-page":"6000","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30. 6000\u20136010."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3462219"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_1_37_2","first-page":"12077","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"34","author":"Xie Enze","year":"2021","unstructured":"Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 34. 12077\u201312090."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-03243-2_712-1"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19815-1_39"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3321512"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2018.04.022"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00962"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2022.3194213"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00981"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01597"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674979","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674979","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:05:56Z","timestamp":1750291556000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674979"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,29]]},"references-count":45,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,31]]}},"alternative-id":["10.1145\/3674979"],"URL":"https:\/\/doi.org\/10.1145\/3674979","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,29]]},"assertion":[{"value":"2023-09-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-04","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}