{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T10:06:41Z","timestamp":1778753201188,"version":"3.51.4"},"reference-count":42,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T00:00:00Z","timestamp":1672185600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"publisher","award":["62171451"],"award-info":[{"award-number":["62171451"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"publisher","award":["2020JJ5671"],"award-info":[{"award-number":["2020JJ5671"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Natural Science Foundation of Hunan Province of China","award":["62171451"],"award-info":[{"award-number":["62171451"]}]},{"name":"Natural Science Foundation of Hunan Province of China","award":["2020JJ5671"],"award-info":[{"award-number":["2020JJ5671"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Cross-spectral local feature matching between visual and thermal images benefits many vision tasks in low-light environments, including image-to-image fusion and camera re-localization. An essential prerequisite for unleashing the potential of supervised deep learning algorithms in the area of visible\u2013thermal matching is the availability of large-scale and high-quality annotated datasets. However, publicly available datasets are either in relative small quantity scales or have limited pose annotations due to the expensive cost of data acquisition and annotation, which severely hinders the development of this field. In this paper, we proposed a multi-view thermal\u2013visible image dataset for large-scale cross-spectral matching. We first recovered a 3D reference model from a group of collected RGB images, in which a certain image (bridge) shares almost the same pose as the thermal query. We then effectively registered the thermal image to the model based on manually annotating a 2D-2D tie point between the bridge and the thermal. In this way, through simply annotating one same viewpoint image pair, numerous overlapping image pairs between thermal and visible could be available. We also proposed a semi-automatic approach for generating accurate supervision for training multi-view cross-spectral matching. Specifically, our dataset consists of 40,644 cross-modal pairs with well supervision, covering multiple complex scenes. In addition, we also provided the camera metadata, 3D reference model, depth map of the visible images and 6-DoF pose of all images. We extensively evaluated the performance of state-of-the-art algorithms on our dataset and provided a comprehensive analysis of the results. We will publish our dataset and pre-processing code.<\/jats:p>","DOI":"10.3390\/rs15010174","type":"journal-article","created":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T02:52:21Z","timestamp":1672282341000},"page":"174","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["A Multi-View Thermal\u2013Visible Image Dataset for Cross-Spectral Matching"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5565-8297","authenticated-orcid":false,"given":"Yuxiang","family":"Liu","sequence":"first","affiliation":[{"name":"College of Systems and Engineering, National University of Defense Technology, Changsha 437100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Liu","sequence":"additional","affiliation":[{"name":"College of Systems and Engineering, National University of Defense Technology, Changsha 437100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shen","family":"Yan","sequence":"additional","affiliation":[{"name":"College of Systems and Engineering, National University of Defense Technology, Changsha 437100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chen","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Systems and Engineering, National University of Defense Technology, Changsha 437100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jikun","family":"Zhong","sequence":"additional","affiliation":[{"name":"College of Systems and Engineering, National University of Defense Technology, Changsha 437100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Peng","sequence":"additional","affiliation":[{"name":"College of Systems and Engineering, National University of Defense Technology, Changsha 437100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maojun","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Systems and Engineering, National University of Defense Technology, Changsha 437100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1150","DOI":"10.1109\/ICCV.1999.790410","article-title":"Object recognition from local scale-invariant features","volume":"Volume 2","author":"Lowe","year":"1999","journal-title":"Proceedings of the Seventh IEEE International Conference on Computer Vision"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, Z., and Snavely, N. (2018, January 18\u201323). Megadepth: Learning single-view depth prediction from internet photos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00218"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., and Nie\u00dfner, M. (2017, January 21\u201326). Scannet: Richly-annotated 3d reconstructions of indoor scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.261"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhang, X., Ye, P., and Xiao, G. (2020, January 14\u201319). VIFB: A visible and infrared image fusion benchmark. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00060"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.dib.2017.09.038","article-title":"The TNO multiband image data collection","volume":"15","author":"Toet","year":"2017","journal-title":"Data Brief"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jia, X., Zhu, C., Li, M., Tang, W., and Zhou, W. (2021, January 10\u201317). LLVIP: A Visible-infrared Paired Dataset for Low-light Vision. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00389"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., and Frahm, J.M. (2016, January 27\u201330). Structure-from-motion revisited. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1016\/j.cviu.2006.06.010","article-title":"Background-subtraction using contour-based fusion of thermal and visible imagery","volume":"106","author":"Davis","year":"2007","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1007\/s11045-017-0548-y","article-title":"A visible-light and infrared video database for performance evaluation of video\/image fusion methods","volume":"30","author":"Ellmauthaler","year":"2019","journal-title":"Multidimens. Syst. Signal Process."},{"key":"ref_10","unstructured":"INO (2022, October 17). Available online: https:\/\/www.ino.ca\/en\/technologies\/video-analytics-dataset\/."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Aguilera, C.A., Sappa, A.D., and Toledo, R. (2015, January 27\u201330). LGHD: A feature descriptor for matching across non-linear intensity variations. Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada.","DOI":"10.1109\/ICIP.2015.7350783"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1109\/JSTSP.2012.2204036","article-title":"Multimodal stereo vision system: 3D data extraction and algorithm evaluation","volume":"6","author":"Campo","year":"2012","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1016\/j.patrec.2012.08.009","article-title":"Multispectral piecewise planar stereo using Manhattan-world assumption","volume":"34","author":"Barrera","year":"2013","journal-title":"Pattern Recognit. Lett."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"12661","DOI":"10.3390\/s120912661","article-title":"Multispectral image feature points","volume":"12","author":"Aguilera","year":"2012","journal-title":"Sensors"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/LGRS.2018.2867635","article-title":"A local feature descriptor based on combination of structure and texture information for multispectral image matching","volume":"16","author":"Fu","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Fu, Z., Qin, Q., Luo, B., Sun, H., and Wu, C. (2018). HOMPC: A local feature descriptor based on the combination of magnitude and phase congruency information for multi-sensor remote sensing images. Remote Sens., 10.","DOI":"10.3390\/rs10081234"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"105326","DOI":"10.1016\/j.dib.2020.105326","article-title":"Dataset of thermal and visible aerial images for multi-modal and multi-spectral image registration and fusion","volume":"29","year":"2020","journal-title":"Data Brief"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Chen, D.M., Baatz, G., K\u00f6ser, K., Tsai, S.S., Vedantham, R., Pylv\u00e4n\u00e4inen, T., Roimela, K., Chen, X., Bach, J., and Pollefeys, M. (2011, January 20\u201325). City-scale landmark identification on mobile devices. Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition CVPR 2011, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995610"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Irschara, A., Zach, C., Frahm, J.M., and Bischof, H. (2009, January 20\u201325). From structure-from-motion point clouds to fast location recognition. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206587"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Sattler, T., Torii, A., Sivic, J., Pollefeys, M., Taira, H., Okutomi, M., and Pajdla, T. (2017, January 21\u201326). Are large-scale 3d models really necessary for accurate visual localization?. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.654"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1177\/0278364915614638","article-title":"University of Michigan North Campus long-term vision and lidar dataset","volume":"35","author":"Ushani","year":"2016","journal-title":"Int. J. Robot. Res."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., and Sivic, J. (2018, January 18\u201323). Benchmarking 6dof outdoor visual localization in changing conditions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00897"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Firmenichy, D., Brown, M., and S\u00fcsstrunk, S. (2011, January 11\u201314). Multispectral interest points for RGB-NIR image registration. Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium.","DOI":"10.1109\/ICIP.2011.6115818"},{"key":"ref_24","unstructured":"Maddern, W., and Vidas, S. (2012). Towards robust night and day place recognition using visible and thermal imaging. RSS 2012 Workshop: Beyond Laser and Vision: Alternative Sensing Techniques for Robotic Perception, University of Sydney."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3690","DOI":"10.3390\/s140203690","article-title":"Feature point descriptors: Infrared and visible spectra","volume":"14","author":"Ricaurte","year":"2014","journal-title":"Sensors"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Sappa, A.D., Carvajal, J.A., Aguilera, C.A., Oliveira, M., Romero, D., and Vintimilla, B.X. (2016). Wavelet-based visible and infrared image fusion: A comparative study. Sensors, 16.","DOI":"10.3390\/s16060861"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Johansson, J., Solli, M., and Maki, A. (2016, January 11\u201314). An evaluation of local feature detectors and descriptors for infrared images. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-49409-8_59"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Bonardi, F., Ainouz, S., Boutteau, R., Dupuis, Y., Savatier, X., and Vasseur, P. (2017). PHROG: A multimodal feature for place recognition. Sensors, 17.","DOI":"10.3390\/s17051167"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Vidas, S., and Sridharan, S. (2012, January 5\u20137). Hand-held monocular slam in thermal-infrared. Proceedings of the 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV), Guangzhou, China.","DOI":"10.1109\/ICARCV.2012.6485270"},{"key":"ref_30","unstructured":"Maddern, W., Stewart, A., McManus, C., Upcroft, B., Churchill, W., and Newman, P. (June, January 31). Illumination invariant imaging: Applications in robust vision-based localisation, mapping and classification for autonomous vehicles. Proceedings of the Visual Place Recognition in Changing Environments Workshop, IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2205","DOI":"10.1109\/TITS.2016.2515625","article-title":"Practical infrared visual odometry","volume":"17","author":"Borges","year":"2016","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., and Sattler, T. (2019, January 15\u201320). D2-net: A trainable cnn for joint description and detection of local features. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00828"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sun, J., Shen, Z., Wang, Y., Bao, H., and Zhou, X. (2021, January 20\u201325). LoFTR: Detector-free local feature matching with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00881"},{"key":"ref_34","unstructured":"Tang, S., Zhang, J., Zhu, S., and Tan, P. (2022). QuadTree Attention for Vision Transformers. arXiv."},{"key":"ref_35","unstructured":"Aguilera, C.A., Aguilera, F.J., Sappa, A.D., Aguilera, C., and Toledo, R. (July, January 26). Learning cross-spectral similarity measures with deep convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA."},{"key":"ref_36","first-page":"1","article-title":"Cross-modality image matching network with modality-invariant feature representation for airborne-ground thermal infrared and visible datasets","volume":"60","author":"Cui","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., Unagar, A., Larsson, M., Germain, H., Toft, C., Larsson, V., Pollefeys, M., Lepetit, V., Hammarstrand, L., and Kahl, F. (2021, January 20\u201325). Back to the feature: Learning robust camera localization from pixels to pose. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00326"},{"key":"ref_38","first-page":"726","article-title":"Random Sample Consensus: A paradigm for model fitting with application to image analysis and automated cartography","volume":"24","author":"Mach","year":"1981","journal-title":"Readings Comput. Vis."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Kneip, L., Scaramuzza, D., and Siegwart, R. (2011, January 20\u201325). A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition CVPR 2011, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995464"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/s11263-012-0601-0","article-title":"Rotation averaging","volume":"103","author":"Hartley","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., Cadena, C., Siegwart, R., and Dymczyk, M. (2019, January 15\u201320). From Coarse to Fine: Robust Hierarchical Localization at Large Scale. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition CVPR, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01300"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., DeTone, D., Malisiewicz, T., and Rabinovich, A. (2020, January 13\u201319). SuperGlue: Learning Feature Matching with Graph Neural Networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition CVPR, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00499"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/1\/174\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:54:36Z","timestamp":1760147676000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/1\/174"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,28]]},"references-count":42,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["rs15010174"],"URL":"https:\/\/doi.org\/10.3390\/rs15010174","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,28]]}}}