{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T05:24:57Z","timestamp":1761110697768,"version":"3.41.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"12","license":[{"start":{"date-parts":[[2024,11,26]],"date-time":"2024-11-26T00:00:00Z","timestamp":1732579200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272343"],"award-info":[{"award-number":["62272343"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Shuguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission","award":["21SG23"],"award-info":[{"award-number":["21SG23"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>\n            Estimating the relative pose between a camera and a LiDAR holds paramount importance in facilitating complex task execution within multi-agent systems. Nonetheless, current methodologies encounter two primary limitations. First, amid the cross-modal feature extraction, they typically employ separate modal branches to extract cross-modal features from images and point clouds. This approach results in the feature spaces of images and point clouds being misaligned, thereby reducing the robustness of establishing correspondences. Second, due to the scale differences between images and point clouds, one-to-many pixel-point correspondences are inevitably encountered, which will mislead the pose optimization. To address these challenges, we propose a framework named\n            <jats:bold>I<\/jats:bold>\n            mage-to-\n            <jats:bold>P<\/jats:bold>\n            oint cloud registration by learning the underlying alignment feature space from\n            <jats:bold>P<\/jats:bold>\n            ixel-to-\n            <jats:bold>P<\/jats:bold>\n            oint\n            <jats:bold>SIM<\/jats:bold>\n            imilarities\n            <jats:bold>\n              (I2P\n              <jats:inline-formula content-type=\"math\/tex\">\n                <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({}_{\\mathbf{ppsim}}\\)<\/jats:tex-math>\n              <\/jats:inline-formula>\n              )\n            <\/jats:bold>\n            . Central to\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\text{I2P}_{\\text{ppsim}}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            is a Shared Feature Alignment Module (SFAM). It is designed under on a coarse-to-fine architecture and uses a weight-sharing network to construct an alignment feature space. Benefiting from SFAM,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\text{I2P}_{\\text{ppsim}}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            can effectively identify the co-view regions between images and point clouds and establish high-reliability 2D-3D correspondences. Moreover, to mitigate the one-to-many correspondence issue, we introduce a similarity maximization strategy termed point-max. This strategy effectively filters out outliers, thereby establishing accurate 2D-3D correspondences. To evaluate the efficacy of our framework, we conduct extensive experiments on KITTI Odometry and Oxford Robotcar. The results corroborate the effectiveness of our framework in improving image-to-point cloud registration. To make our results reproducible, the source codes have been released at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"url\" xlink:href=\"https:\/\/cslinzhang.github.io\/I2P\">https:\/\/cslinzhang.github.io\/I2P<\/jats:ext-link>\n          <\/jats:p>","DOI":"10.1145\/3697839","type":"journal-article","created":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T13:28:12Z","timestamp":1727443692000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["I2P Registration by Learning the Underlying Alignment Feature Space from Pixel-to-Point Similarities"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-3926-540X","authenticated-orcid":false,"given":"Yunda","family":"Sun","sequence":"first","affiliation":[{"name":"School of Software Engineering, Tongji University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4360-5523","authenticated-orcid":false,"given":"Lin","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Software Engineering, Tongji University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6206-526X","authenticated-orcid":false,"given":"Zhong","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Automation, Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8187-2000","authenticated-orcid":false,"given":"Yang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Software Engineering, Tongji University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4301-394X","authenticated-orcid":false,"given":"Shengjie","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Software Engineering, Tongji University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4487-6384","authenticated-orcid":false,"given":"Yicong","family":"Zhou","sequence":"additional","affiliation":[{"name":"Department of Computer and Information Science, University of Macau, Macau, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,26]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00733"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01560"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.16"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00624"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19824-3_2"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01287"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00905"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00060"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00034"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794415"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00878"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2011.2163983"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00425"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2021.3058502"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3183899"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2837226"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01968"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2013.2248594"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-008-0152-6"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00979"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01570"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364916679498"},{"key":"e_1_3_1_27_2","first-page":"8024","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, 8024\u20138035."},{"key":"e_1_3_1_28_2","first-page":"5105","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Qi Charles R.","year":"2017","unstructured":"Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Advances in Neural Information Processing Systems, 5105\u20135114."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3259038"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/83.506761"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3208859"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21620"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00499"},{"key":"e_1_3_1_34_2","unstructured":"Vinit Sarode Xueqian Li Hunter Goforth Yasuhiro Aoki Rangaprasad Arun Srivatsan Simon Lucey and Howie Choset. 2019. PCRNet: Point cloud registration network using pointnet encoding. arXiv:1908.07906. Retrieved from https:\/\/arxiv.org\/abs\/1908.07906"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00881"},{"key":"e_1_3_1_36_2","first-page":"6000","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, 6000\u20136010."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00362"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3237328"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3284591"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCI.2023.3304144"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3213592"},{"key":"e_1_3_1_42_2","first-page":"1","volume-title":"IEEE Trans. Neural Netw. Learn. Syst.","author":"Wu Yue","year":"2023","unstructured":"Yue Wu, Yue Zhang, Wenping Ma, Maoguo Gong, Xiaolong Fan, Mingyang Zhang, A. K. Qin, and Qiguang Miao. 2023. RORNet: Partial-to-partial registration network with reliable overlapping representations. IEEE Trans. Neural Netw. Learn. Syst. (2023), 1\u201314."},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2021.3132375"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs14246301"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3062811"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2023.3329578"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2014.X.007"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3505252"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01702"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_47"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3538648"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/1870121.1870125"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.104"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3697839","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3697839","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:30Z","timestamp":1750295850000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3697839"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,26]]},"references-count":52,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3697839"],"URL":"https:\/\/doi.org\/10.1145\/3697839","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2024,11,26]]},"assertion":[{"value":"2024-04-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-20","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}