{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T16:33:34Z","timestamp":1776357214329,"version":"3.51.2"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:00:00Z","timestamp":1643241600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2019YFB1707503"],"award-info":[{"award-number":["2019YFB1707503"]}]},{"name":"Aeronautical Science Foundation of China","award":["2019ZE052008"],"award-info":[{"award-number":["2019ZE052008"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61772267"],"award-info":[{"award-number":["61772267"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"crossref","award":["BK20190016"],"award-info":[{"award-number":["BK20190016"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,1,31]]},"abstract":"<jats:p>In this article, we propose a Multi-feature Fusion VoteNet (MFFVoteNet) framework for improving the 3D object detection performance in cluttered and heavily occluded scenes. Our method takes the point cloud and the synchronized RGB image as inputs to provide object detection results in 3D space. Our detection architecture is built on VoteNet with three key designs. First, we augment the VoteNet input with point color information to enhance the difference of various instances in a scene. Next, we integrate an image feature module into the VoteNet to provide a strong object class signal that can facilitate deterministic detections in occlusion. Moreover, we propose a Projection Non-Maximum Suppression (PNMS) method in 3D object detection to eliminate redundant proposals and hence provide more accurate positioning of 3D objects. We evaluate the proposed MFFVoteNet on two challenging 3D object detection datasets, i.e., ScanNetv2 and SUN RGB-D. Extensive experiments show that our framework can effectively improve the performance of 3D object detection.<\/jats:p>","DOI":"10.1145\/3462219","type":"journal-article","created":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T19:44:21Z","timestamp":1643312661000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["Multi-feature Fusion VoteNet for 3D Object Detection"],"prefix":"10.1145","volume":"18","author":[{"given":"Zhoutao","family":"Wang","sequence":"first","affiliation":[{"name":"Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qian","family":"Xie","sequence":"additional","affiliation":[{"name":"Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingqiang","family":"Wei","sequence":"additional","affiliation":[{"name":"Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kun","family":"Long","sequence":"additional","affiliation":[{"name":"Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[{"name":"Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,1,27]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.593"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00047"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.236"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969287"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.691"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/3DIMPVT.2012.53"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509682"},{"key":"e_1_3_1_10_2","article-title":"Synthesizing training data for object detection in indoor scenes","author":"Georgakis Georgios","year":"2017","unstructured":"Georgios Georgakis, Arsalan Mousavian, Alexander C. Berg, and Jana Kosecka. 2017. Synthesizing training data for object detection in indoor scenes. arXiv preprint arXiv:1702.07836 (2017).","journal-title":"arXiv preprint arXiv:1702.07836"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2017.01.017"},{"key":"e_1_3_1_12_2","article-title":"Seq-NMS for video object detection","author":"Han Wei","year":"2016","unstructured":"Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, and Thomas S. Huang. 2016. Seq-NMS for video object detection. arXiv preprint arXiv:1602.08465 (2016).","journal-title":"arXiv preprint arXiv:1602.08465"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2487860"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00455"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12665-019-8516-5"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.495"},{"key":"e_1_3_1_20_2","article-title":"Vehicle detection from 3D lidar using fully convolutional network","author":"Li Bo","year":"2016","unstructured":"Bo Li, Tianlei Zhang, and Tian Xia. 2016. Vehicle detection from 3D lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016).","journal-title":"arXiv preprint arXiv:1608.07916"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.179"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.3390\/s19194188"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.3390\/s20020532"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/QoMEX48832.2020.9123147"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7899697"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.25"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00446"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00937"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00102"},{"key":"e_1_3_1_32_2","first-page":"652","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R.","year":"2017","unstructured":"Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652\u2013660."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295263"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.169"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.5555\/932479"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00086"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.94"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cie.2012.07.009"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cagd.2016.02.012"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.12187"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cad.2014.01.003"},{"key":"e_1_3_1_43_2","first-page":"164","volume-title":"Computer Graphics Forum","author":"Wang Jun","year":"2013","unstructured":"Jun Wang, Z. Yu, W Zhu, and J. Cao. 2013. Feature-preserving surface reconstruction from unoriented, noisy point data. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 164\u2013176."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ECMR.2019.8870914"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2014.6836101"},{"key":"e_1_3_1_47_2","first-page":"10447","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Xie Qian","year":"2020","unstructured":"Qian Xie, Yu-Kun Lai, Jing Wu, Zhoutao Wang, Yiming Zhang, Kai Xu, and Jun Wang. 2020. MLVCNet: Multi-level context votenet for 3D object detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10447\u201310456."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2751965"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2012.2234731"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cad.2017.07.005"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00407"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2284755"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2014.2311377"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2014.2336697"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cag.2013.05.008"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00472"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462219","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3462219","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:54Z","timestamp":1750193334000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462219"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,27]]},"references-count":55,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1,31]]}},"alternative-id":["10.1145\/3462219"],"URL":"https:\/\/doi.org\/10.1145\/3462219","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,27]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}