{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T04:59:12Z","timestamp":1773118752162,"version":"3.50.1"},"reference-count":41,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key R&amp;D Program of China","award":["2022YFF1302700"],"award-info":[{"award-number":["2022YFF1302700"]}]},{"name":"National Key R&amp;D Program of China","award":["202303"],"award-info":[{"award-number":["202303"]}]},{"name":"National Key R&amp;D Program of China","award":["QNTD202308"],"award-info":[{"award-number":["QNTD202308"]}]},{"name":"Emergency Open Competition Project of National Forestry and Grassland Administration","award":["2022YFF1302700"],"award-info":[{"award-number":["2022YFF1302700"]}]},{"name":"Emergency Open Competition Project of National Forestry and Grassland Administration","award":["202303"],"award-info":[{"award-number":["202303"]}]},{"name":"Emergency Open Competition Project of National Forestry and Grassland Administration","award":["QNTD202308"],"award-info":[{"award-number":["QNTD202308"]}]},{"name":"Outstanding Youth Team Project of Central Universities","award":["2022YFF1302700"],"award-info":[{"award-number":["2022YFF1302700"]}]},{"name":"Outstanding Youth Team Project of Central Universities","award":["202303"],"award-info":[{"award-number":["202303"]}]},{"name":"Outstanding Youth Team Project of Central Universities","award":["QNTD202308"],"award-info":[{"award-number":["QNTD202308"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Automatic recognition of species is important for the conservation and management of biodiversity. However, since closely related species are visually similar, it is difficult to distinguish them by images alone. In addition, traditional species-recognition models are limited by the size of the dataset and face the problem of poor generalization ability. Visual-language models such as Contrastive Language-Image Pretraining (CLIP), obtained by training on large-scale datasets, have excellent visual representation learning ability and demonstrated promising few-shot transfer ability in a variety of few-shot species recognition tasks. However, limited by the dataset on which CLIP is trained, the performance of CLIP is poor when used directly for few-shot species recognition. To improve the performance of CLIP for few-shot species recognition, we proposed a few-shot species-recognition method incorporating geolocation information. First, we utilized the powerful feature extraction capability of CLIP to extract image features and text features. Second, a geographic feature extraction module was constructed to provide additional contextual information by converting structured geographic location information into geographic feature representations. Then, a multimodal feature fusion module was constructed to deeply interact geographic features with image features to obtain enhanced image features through residual connection. Finally, the similarity between the enhanced image features and text features was calculated and the species recognition results were obtained. Extensive experiments on the iNaturalist 2021 dataset show that our proposed method can significantly improve the performance of CLIP\u2019s few-shot species recognition. Under ViT-L\/14 and 16-shot training species samples, compared to Linear probe CLIP, our method achieved a performance improvement of 6.22% (mammals), 13.77% (reptiles), and 16.82% (amphibians). Our work provides powerful evidence for integrating geolocation information into species-recognition models based on visual-language models.<\/jats:p>","DOI":"10.3390\/rs16122238","type":"journal-article","created":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T03:46:29Z","timestamp":1718855189000},"page":"2238","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["CLIP-Driven Few-Shot Species-Recognition Method for Integrating Geographic Information"],"prefix":"10.3390","volume":"16","author":[{"given":"Lei","family":"Liu","sequence":"first","affiliation":[{"name":"School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China"}]},{"given":"Linzhe","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China"}]},{"given":"Feng","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China"},{"name":"Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China"},{"name":"State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1000-8455","authenticated-orcid":false,"given":"Feixiang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China"},{"name":"Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China"},{"name":"State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China"}]},{"given":"Fu","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China"},{"name":"Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China"},{"name":"State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1111\/ele.13936","article-title":"Biodiversity Promotes Ecosystem Functioning despite Environmental Change","volume":"25","author":"Hong","year":"2022","journal-title":"Ecol. Lett."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"e2117297119","DOI":"10.1073\/pnas.2117297119","article-title":"Biodiversity Impacts and Conservation Implications of Urban Land Expansion Projected to 2050","volume":"119","author":"Simkin","year":"2022","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1098\/rstb.2003.1442","article-title":"Automated Species Identification: Why Not?","volume":"359","author":"Gaston","year":"2004","journal-title":"Philos. Trans. R. Soc. B Biol. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1007\/s41870-019-00379-7","article-title":"A Study on the Machine Learning Techniques for Automated Plant Species Identification: Current Trends and Challenges","volume":"13","author":"Bojamma","year":"2021","journal-title":"Int. J. Inf. Tecnol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1038\/s41467-022-27980-y","article-title":"Perspectives in Machine Learning for Wildlife Conservation","volume":"13","author":"Tuia","year":"2022","journal-title":"Nat. Commun."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"9453","DOI":"10.1002\/ece3.5410","article-title":"Wildlife Surveillance Using Deep Learning Methods","volume":"9","author":"Chen","year":"2019","journal-title":"Ecol. Evol."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"12051","DOI":"10.1002\/ece3.7970","article-title":"An Approach to Rapid Processing of Camera Trap Images with Minimal Human Input","volume":"11","author":"Duggan","year":"2021","journal-title":"Ecol. Evol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.ecoinf.2017.07.004","article-title":"Towards Automatic Wild Animal Monitoring: Identification of Animal Species in Camera-Trap Images Using Very Deep Convolutional Neural Networks","volume":"41","author":"Salazar","year":"2017","journal-title":"Ecol. Inform."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1111\/1749-4877.12667","article-title":"Recognition of Big Mammal Species in Airborne Thermal Imaging Based on YOLO V5 Algorithm","volume":"18","author":"Xie","year":"2023","journal-title":"Integr. Zool."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Huang, S., Xu, Z., Tao, D., and Zhang, Y. (2016, January 27\u201330). Part-Stacked CNN for Fine-Grained Visual Categorization. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.132"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Lv, X., Xia, H., Li, N., Li, X., and Lan, R. (2022). MFVT: Multilevel Feature Fusion Vision Transformer and RAMix Data Augmentation for Fine-Grained Visual Categorization. Electronics, 11.","DOI":"10.21203\/rs.3.rs-1800078\/v1"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"108419","DOI":"10.1016\/j.asoc.2022.108419","article-title":"Multi-Scale Sparse Network with Cross-Attention Mechanism for Image-Based Butterflies Fine-Grained Classification","volume":"117","author":"Li","year":"2022","journal-title":"Appl. Soft Comput."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"He, J., Kortylewski, A., and Yuille, A. (2023, January 2\u20137). CORL: Compositional Representation Learning for Few-Shot Classification. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00388"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"102237","DOI":"10.1016\/j.ecoinf.2023.102237","article-title":"A Few-Shot Rare Wildlife Image Classification Method Based on Style Migration Data Augmentation","volume":"77","author":"Zhang","year":"2023","journal-title":"Ecol. Informatics"},{"key":"ref_15","unstructured":"Snell, J., Swersky, K., and Zemel, R. (2017). Prototypical Networks for Few-Shot Learning. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Guo, Z., Zhang, L., Jiang, Y., Niu, W., Gu, Z., Zheng, H., Wang, G., and Zheng, B. (2020, January 5\u201314). Few-Shot Fish Image Generation and Classification. Proceedings of the Global Oceans 2020: Singapore\u2014U.S. Gulf Coast, Biloxi, MS, USA.","DOI":"10.1109\/IEEECONF38699.2020.9389005"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1149186","DOI":"10.3389\/fmars.2023.1149186","article-title":"Few-Shot Fine-Grained Fish Species Classification via Sandwich Attention CovaMNet","volume":"10","author":"Zhai","year":"2023","journal-title":"Front. Mar. Sci."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lu, J., Zhang, S., Zhao, S., Li, D., and Zhao, R. (2024). A Metric-Based Few-Shot Learning Method for Fish Species Identification with Limited Samples. Animals, 14.","DOI":"10.3390\/ani14050755"},{"key":"ref_19","unstructured":"Xu, S.-L., Zhang, F., Wei, X.-S., and Wang, J. (March, January 22). Dual Attention Networks for Few-Shot Fine-Grained Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA."},{"key":"ref_20","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1007\/s11263-023-01891-x","article-title":"CLIP-Adapter: Better Vision-Language Models with Feature Adapters","volume":"132","author":"Gao","year":"2023","journal-title":"Int. J. Comput. Vis."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2337","DOI":"10.1007\/s11263-022-01653-1","article-title":"Learning to Prompt for Vision-Language Models","volume":"130","author":"Zhou","year":"2022","journal-title":"Int. J. Comput. Vis."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Avidan, S., Brostow, G., Ciss\u00e9, M., Farinella, G.M., and Hassner, T. (2022). Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification. Computer Vision\u2014ECCV 2022, Springer.","DOI":"10.1007\/978-3-031-20050-2"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Guo, Z., Zhang, R., Qiu, L., Ma, X., Miao, X., He, X., and Cui, B. (2023, January 7\u201314). CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v37i1.25152"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Parashar, S., Lin, Z., Li, Y., and Kong, S. (2023, January 6\u201310). Prompting Scientific Names for Zero-Shot Species Recognition. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.610"},{"key":"ref_26","unstructured":"Menon, S., and Vondrick, C. (2022, January 29). Visual Classification via Description from Large Language Models. Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Mac Aodha, O., Cole, E., and Perona, P. (2019, January 27). Presence-Only Geographical Priors for Fine-Grained Image Classification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00969"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1111\/2041-210X.13335","article-title":"Thinking like a Naturalist: Enhancing Computer Vision of Citizen Science Images by Harnessing Contextual Data","volume":"11","author":"Terry","year":"2020","journal-title":"Methods Ecol. Evol."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1016\/j.isprsjprs.2021.10.002","article-title":"Digital Taxonomist: Identifying Plant Species in Community Scientists\u2019 Photographs","volume":"182","author":"She","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Yang, L., Li, X., Song, R., Zhao, B., Tao, J., Zhou, S., Liang, J., and Yang, J. (2022, January 18). Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01067"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, L., Han, B., Chen, F., Mou, C., and Xu, F. (2024). Utilizing Geographical Distribution Statistical Data to Improve Zero-Shot Species Recognition. Animals, 14.","DOI":"10.3390\/ani14121716"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_33","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2024, January 26). An Image Is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. Available online: https:\/\/arxiv.org\/abs\/2010.11929v2."},{"key":"ref_34","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhang, R., Guo, Z., Zhang, W., Li, K., Miao, X., Cui, B., Qiao, Y., Gao, P., and Li, H. (2022, January 18\u201324). PointCLIP: Point Cloud Understanding by CLIP. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00836"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Maniparambil, M., Vorster, C., Molloy, D., Murphy, N., McGuinness, K., and O\u2019Connor, N.E. (2023, January 2\u20136). Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCVW60793.2023.00034"},{"key":"ref_37","unstructured":"Deng, H., Zhang, Z., Bao, J., and Li, X. (2023). AnoVL: Adapting Vision-Language Models for Unified Zero-Shot Anomaly Localization. arXiv."},{"key":"ref_38","unstructured":"Van Horn, G. (2024, January 18). Oisin Mac Aodha. iNat Challenge 2021-FGVC8. Available online: https:\/\/kaggle.com\/competitions\/inaturalist-2021."},{"key":"ref_39","first-page":"2579","article-title":"Visualizing Data Using T-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Chu, G., Potetz, B., Wang, W., Howard, A., Song, Y., Brucher, F., Leung, T., and Adam, H. (2019, January 27). Geo-Aware Networks for Fine-Grained Recognition. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00033"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Tang, K., Paluri, M., Fei-Fei, L., Fergus, R., and Bourdev, L. (2015, January 7). Improving Image Classification with Location Context. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.121"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/12\/2238\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:01:27Z","timestamp":1760108487000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/12\/2238"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":41,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["rs16122238"],"URL":"https:\/\/doi.org\/10.3390\/rs16122238","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,20]]}}}