{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T16:40:18Z","timestamp":1778085618577,"version":"3.51.4"},"reference-count":225,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,6,30]],"date-time":"2024-06-30T00:00:00Z","timestamp":1719705600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004063","name":"Knut and Alice Wallenberg Foundation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Science Foundation\u2013funded AI Institute","award":["2112606"],"award-info":[{"award-number":["2112606"]}]},{"name":"Intelligent Cyberinfrastructure with Computational Learning in the Environment"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Spatial Algorithms Syst."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>\n            Large pre-trained models, also known as\n            <jats:italic>foundation models<\/jats:italic>\n            (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have not yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial domains, including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality, such as toponym recognition, location description recognition, and US state-level\/county-level dementia time series forecasting, the task-agnostic large learning models (LLMs) can outperform task-specific fully supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image\u2013based urban noise intensity classification, and remote sensing image scene classification), existing FMs still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing an FM for GeoAI is to address the multimodal nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal FM that can reason over various types of geospatial data through geospatial alignments. We conclude this article by discussing the unique risks and challenges to developing such a model for GeoAI.\n          <\/jats:p>","DOI":"10.1145\/3653070","type":"journal-article","created":{"date-parts":[[2024,3,20]],"date-time":"2024-03-20T12:08:54Z","timestamp":1710936534000},"page":"1-46","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":91,"title":["On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7818-7309","authenticated-orcid":false,"given":"Gengchen","family":"Mai","sequence":"first","affiliation":[{"name":"SEAI Lab, Department of Geography, University of Georgia, Athens, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3208-4208","authenticated-orcid":false,"given":"Weiming","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2926-4023","authenticated-orcid":false,"given":"Jin","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Computing, University of Georgia, Athens, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1934-632X","authenticated-orcid":false,"given":"Suhang","family":"Song","sequence":"additional","affiliation":[{"name":"College of Public Health, University of Georgia, Athens, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8192-7681","authenticated-orcid":false,"given":"Deepak","family":"Mishra","sequence":"additional","affiliation":[{"name":"Department of Geography, University of Georgia, Athens, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9170-2424","authenticated-orcid":false,"given":"Ninghao","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computing, University of Georgia, Athens, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4359-6302","authenticated-orcid":false,"given":"Song","family":"Gao","sequence":"additional","affiliation":[{"name":"Geospatial Data Science Lab, Department of Geography, University of Wisconsin-Madison, Madison, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8132-9048","authenticated-orcid":false,"given":"Tianming","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computing, University of Georgia, Athens, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4430-6373","authenticated-orcid":false,"given":"Gao","family":"Cong","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5515-4125","authenticated-orcid":false,"given":"Yingjie","family":"Hu","sequence":"additional","affiliation":[{"name":"GeoAI Lab, Department of Geography, University at Buffalo, Buffalo, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4608-4110","authenticated-orcid":false,"given":"Chris","family":"Cundy","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1835-3336","authenticated-orcid":false,"given":"Ziyuan","family":"Li","sequence":"additional","affiliation":[{"name":"School of Business, University of Connecticut, Storrs, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8910-9445","authenticated-orcid":false,"given":"Rui","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Geographical Sciences, University of Bristol, Bristol, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4034-7784","authenticated-orcid":false,"given":"Ni","family":"Lao","sequence":"additional","affiliation":[{"name":"Google, Mountain View, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,7]]},"reference":[{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.278"},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2533888.2533938"},{"key":"e_1_3_4_4_2","first-page":"24206","article-title":"VATT: Transformers for multimodal self-supervised learning from raw video, audio and text","volume":"34","author":"Akbari Hassan","year":"2021","unstructured":"Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. VATT: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems 34 (2021), 24206\u201324221.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_5_2","doi-asserted-by":"publisher","DOI":"10.1111\/jgs.17215"},{"key":"e_1_3_4_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSEC.2018.2888775"},{"key":"e_1_3_4_7_2","article-title":"Flamingo: A visual language model for few-shot learning","volume":"2204","author":"Alayrac Jean-Baptiste","year":"2022","unstructured":"Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andy Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, and Karen Simonyan. 2022. Flamingo: A visual language model for few-shot learning. ArXiv abs\/2204.14198 (2022).","journal-title":"ArXiv"},{"key":"e_1_3_4_8_2","doi-asserted-by":"publisher","DOI":"10.3366\/ijhac.2015.0136"},{"key":"e_1_3_4_9_2","unstructured":"Alzheimer\u2019s Association et\u00a0al. 2021. Changing the trajectory of Alzheimer\u2019s disease: How a treatment by 2025 saves lives and dollars. 2015. Retrieved July 18 2018 from https:\/\/www.alz.org\/media\/Documents\/changing-the-trajectory-r.pdf (2021)."},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","unstructured":"Alzheimer\u2019s Association et\u00a0al. 2022. Alzheimer\u2019s Disease Facts and Figures. More Than Normal Aging: Understanding Mild Cognitive Impairment. Alzheimer\u2019s Association. (2022). 10.1002\/alz.13089","DOI":"10.1002\/alz.13089"},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04930-9_46"},{"key":"e_1_3_4_12_2","doi-asserted-by":"publisher","unstructured":"Anas Awadalla Irena Gao Joshua Gardner Jack Hessel Yusuf Hanafy Wanrong Zhu Kalyani Marathe Yonatan Bitton Samir Gadre Jenia Jitsev Simon Kornblith Pang Wei Koh Gabriel Ilharco Mitchell Wortsman and Ludwig Schmidt. 2023. OpenFlamingo. (March2023). 10.5281\/zenodo.7733589","DOI":"10.5281\/zenodo.7733589"},{"key":"e_1_3_4_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01002"},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.259"},{"key":"e_1_3_4_15_2","article-title":"On the opportunities and risks of foundation models","author":"Bommasani Rishi","year":"2021","unstructured":"Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et\u00a0al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).","journal-title":"arXiv preprint arXiv:2108.07258"},{"key":"e_1_3_4_16_2","volume-title":"Workshop on Deep Learning for Knowledge Graphs (DL4KG@ ISWC\u201922)","author":"Brate Ryan","year":"2022","unstructured":"Ryan Brate, Minh-Hoang Dang, Fabian Hoppe, Yuan He, Albert Mero\u00f1o-Pe\u00f1uela, and Vijay Sadashivaiah. 2022. Improving language model predictions via prompts enriched with knowledge graphs. In Workshop on Deep Learning for Knowledge Graphs (DL4KG@ ISWC\u201922)."},{"key":"e_1_3_4_17_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877\u20131901.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"3","key":"e_1_3_4_18_2","first-page":"431","article-title":"Geographically weighted regression","volume":"47","author":"Brunsdon Chris","year":"1998","unstructured":"Chris Brunsdon, Stewart Fotheringham, and Martin Charlton. 1998. Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician) 47, 3 (1998), 431\u2013443.","journal-title":"Journal of the Royal Statistical Society: Series D (The Statistician)"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.abe8628"},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.12644"},{"key":"e_1_3_4_21_2","first-page":"1","article-title":"HyperQuaternionE: A hyperbolic embedding model for qualitative spatial and temporal reasoning","author":"Cai Ling","year":"2022","unstructured":"Ling Cai, Krzysztof Janowicz, Rui Zhu, Gengchen Mai, Bo Yan, and Zhangyu Wang. 2022. HyperQuaternionE: A hyperbolic embedding model for qualitative spatial and temporal reasoning. GeoInformatica (2022), 1\u201339.","journal-title":"GeoInformatica"},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-020-2923-3"},{"key":"e_1_3_4_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSC.2014.44"},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compenvurbsys.2021.101706"},{"key":"e_1_3_4_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482293"},{"key":"e_1_3_4_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00290"},{"key":"e_1_3_4_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.trc.2021.103091"},{"key":"e_1_3_4_28_2","volume-title":"International Conference on Machine Learning","author":"Cole Elijah","year":"2023","unstructured":"Elijah Cole, Grant Van Horn, Christian Lange, Alexander Shepard, Patrick Leary, Pietro Perona, Scott Loarie, and Oisin Mac Aodha. 2023. Spatial implicit neural representations for global-scale species mapping. In International Conference on Machine Learning. PMLR."},{"key":"e_1_3_4_29_2","first-page":"197","article-title":"SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery","volume":"35","author":"Cong Yezhen","year":"2022","unstructured":"Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David Lobell, and Stefano Ermon. 2022. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery. Advances in Neural Information Processing Systems 35 (2022), 197\u2013211.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449857"},{"key":"e_1_3_4_31_2","first-page":"159","article-title":"Change of support and the modifiable areal unit problem","volume":"3","author":"Cressie Noel A.","year":"1996","unstructured":"Noel A. Cressie. 1996. Change of support and the modifiable areal unit problem. Geographical Systems 3 (1996), 159\u2013180.","journal-title":"Geographical Systems"},{"key":"e_1_3_4_32_2","article-title":"AD-AutoGPT: An autonomous GPT for Alzheimer\u2019s disease infodemiology","author":"Dai Haixing","year":"2023","unstructured":"Haixing Dai, Yiwei Li, Zhengliang Liu, Lin Zhao, Zihao Wu, Suhang Song, Ye Shen, Dajiang Zhu, Xiang Li, Sheng Li, et\u00a0al. 2023. AD-AutoGPT: An autonomous GPT for Alzheimer\u2019s disease infodemiology. arXiv preprint arXiv:2306.10095 (2023).","journal-title":"arXiv preprint arXiv:2306.10095"},{"key":"e_1_3_4_33_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-1721"},{"key":"e_1_3_4_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_4_35_2","article-title":"An image is worth 16x16 words: Transformers for image recognition at scale","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR (2021).","journal-title":"ICLR"},{"key":"e_1_3_4_36_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs11161902"},{"key":"e_1_3_4_37_2","doi-asserted-by":"publisher","DOI":"10.1080\/15481603.2020.1724707"},{"key":"e_1_3_4_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3557915.3561025"},{"key":"e_1_3_4_39_2","article-title":"Geographic and geopolitical biases of language models","author":"Faisal Fahim","year":"2022","unstructured":"Fahim Faisal and Antonios Anastasopoulos. 2022. Geographic and geopolitical biases of language models. arXiv preprint arXiv:2212.10408 (2022).","journal-title":"arXiv preprint arXiv:2212.10408"},{"key":"e_1_3_4_40_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1428"},{"key":"e_1_3_4_41_2","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219885"},{"key":"e_1_3_4_42_2","doi-asserted-by":"publisher","DOI":"10.57020\/ject.1297961"},{"key":"e_1_3_4_43_2","doi-asserted-by":"publisher","DOI":"10.1080\/24694452.2017.1352480"},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2015759118"},{"key":"e_1_3_4_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3422622"},{"key":"e_1_3_4_46_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1119"},{"key":"e_1_3_4_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-017-9385-8"},{"key":"e_1_3_4_48_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jag.2024.103743"},{"key":"e_1_3_4_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3511933"},{"key":"e_1_3_4_50_2","first-page":"1","article-title":"A spectral\u2013spatial jointed spectral super-resolution and its application to HJ-1A satellite images","volume":"19","author":"Han Xiaolin","year":"2021","unstructured":"Xiaolin Han, Huan Zhang, Jing-Hao Xue, and Weidong Sun. 2021. A spectral\u2013spatial jointed spectral super-resolution and its application to HJ-1A satellite images. IEEE Geoscience and Remote Sensing Letters 19 (2021), 1\u20135.","journal-title":"IEEE Geoscience and Remote Sensing Letters"},{"key":"e_1_3_4_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/2642918.2647403"},{"key":"e_1_3_4_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_4_53_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.384"},{"key":"e_1_3_4_54_2","first-page":"27903","article-title":"Spatial-temporal super-resolution of satellite imagery via conditional pixel synthesis","volume":"34","author":"He Yutong","year":"2021","unstructured":"Yutong He, Dingjie Wang, Nicholas Lai, William Zhang, Chenlin Meng, Marshall Burke, David Lobell, and Stefano Ermon. 2021. Spatial-temporal super-resolution of satellite imagery via conditional pixel synthesis. Advances in Neural Information Processing Systems 34 (2021), 27903\u201327915.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_55_2","unstructured":"Danny Hernandez Jared Kaplan Tom Henighan and Sam McCandlish. 2021. Scaling Laws for Transfer. (2021). arxiv:cs.LG\/2102.01293"},{"key":"e_1_3_4_56_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_57_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2012.06.001"},{"key":"e_1_3_4_58_2","article-title":"Training compute-optimal large language models","volume":"2203","author":"Hoffmann Jordan","year":"2022","unstructured":"Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre. 2022. Training compute-optimal large language models. CoRR abs\/2203.15556 (2022).","journal-title":"CoRR"},{"issue":"1","key":"e_1_3_4_59_2","first-page":"411","article-title":"spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing","volume":"7","author":"Honnibal Matthew","year":"2017","unstructured":"Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear 7, 1 (2017), 411\u2013420.","journal-title":"To appear"},{"key":"e_1_3_4_60_2","doi-asserted-by":"publisher","DOI":"10.1111\/gec3.12404"},{"key":"e_1_3_4_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/2675354.2675356"},{"key":"e_1_3_4_62_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2023.2266495"},{"key":"e_1_3_4_63_2","volume-title":"11th International Conference on Geographic Information Science (GIScience 2021)-Part I","author":"Hu Yingjie","year":"2020","unstructured":"Yingjie Hu and Jimin Wang. 2020. How do people describe locations during a natural disaster: An analysis of tweets from Hurricane Harvey. In 11th International Conference on Geographic Information Science (GIScience 2021)-Part I. Schloss Dagstuhl-Leibniz-Zentrum f\u00fcr Informatik."},{"key":"e_1_3_4_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_4_65_2","article-title":"Language is not all you need: Aligning perception with language models","author":"Huang Shaohan","year":"2023","unstructured":"Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Qiang Liu, et\u00a0al. 2023. Language is not all you need: Aligning perception with language models. arXiv preprint arXiv:2302.14045 (2023).","journal-title":"arXiv preprint arXiv:2302.14045"},{"key":"e_1_3_4_66_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2022.2040510"},{"key":"e_1_3_4_67_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2022.11.021"},{"key":"e_1_3_4_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTARS.2021.3076630"},{"key":"e_1_3_4_69_2","doi-asserted-by":"publisher","unstructured":"Gabriel Ilharco Mitchell Wortsman Ross Wightman Cade Gordon Nicholas Carlini Rohan Taori Achal Dave Vaishaal Shankar Hongseok Namkoong John Miller Hannaneh Hajishirzi Ali Farhadi and Ludwig Schmidt. 2021. OpenCLIP. (July2021). 10.5281\/zenodo.5143773","DOI":"10.5281\/zenodo.5143773"},{"key":"e_1_3_4_70_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2019.1684500"},{"key":"e_1_3_4_71_2","doi-asserted-by":"publisher","DOI":"10.1002\/aaai.12043"},{"key":"e_1_3_4_72_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-45738-3_18"},{"key":"e_1_3_4_73_2","doi-asserted-by":"publisher","DOI":"10.5555\/2590208.2590209"},{"key":"e_1_3_4_74_2","volume-title":"Time Series Analysis: Forecasting and Control","author":"Jenkins Gwilym M.","year":"2011","unstructured":"Gwilym M. Jenkins, George E. P. Box, and Gregory C. Reinsel. 2011. Time Series Analysis: Forecasting and Control. Vol. 734. John Wiley & Sons."},{"issue":"1","key":"e_1_3_4_75_2","first-page":"276","article-title":"DeepCrowd: A deep model for large-scale citywide crowd density and flow prediction","volume":"35","author":"Jiang Renhe","year":"2021","unstructured":"Renhe Jiang, Zekun Cai, Zhaonan Wang, Chuang Yang, Zipei Fan, Quanjun Chen, Kota Tsubouchi, Xuan Song, and Ryosuke Shibasaki. 2021. DeepCrowd: A deep model for large-scale citywide crowd density and flow prediction. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2021), 276\u2013290.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_4_76_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658810701626343"},{"key":"e_1_3_4_77_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49004-5_23"},{"key":"e_1_3_4_78_2","volume-title":"Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd ed.)","author":"Jurafsky Dan","year":"2009","unstructured":"Dan Jurafsky and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd ed.). Prentice Hall, Pearson Education International."},{"key":"e_1_3_4_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00180"},{"key":"e_1_3_4_80_2","article-title":"Scaling up GANs for text-to-image synthesis","author":"Kang Minguk","year":"2023","unstructured":"Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, and Taesung Park. 2023. Scaling up GANs for text-to-image synthesis. arXiv preprint arXiv:2303.05511 (2023).","journal-title":"arXiv preprint arXiv:2303.05511"},{"key":"e_1_3_4_81_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.landusepol.2020.104919"},{"key":"e_1_3_4_82_2","article-title":"Scaling laws for neural language models","volume":"2001","author":"Kaplan Jared","year":"2020","unstructured":"Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. CoRR abs\/2001.08361 (2020).","journal-title":"CoRR"},{"key":"e_1_3_4_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_3_4_84_2","doi-asserted-by":"publisher","DOI":"10.1191\/0309132502ph389oa"},{"key":"e_1_3_4_85_2","first-page":"4171","volume-title":"NAACL-HLT 2019","author":"Kenton Jacob Devlin, Ming-Wei Chang,","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT 2019. 4171\u20134186."},{"key":"e_1_3_4_86_2","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.12305"},{"key":"e_1_3_4_87_2","article-title":"DiffusionSat: A generative foundation model for satellite imagery","author":"Khanna Samar","year":"2023","unstructured":"Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach, Marshall Burke, David Lobell, and Stefano Ermon. 2023. DiffusionSat: A generative foundation model for satellite imagery. arXiv preprint arXiv:2312.03606 (2023).","journal-title":"arXiv preprint arXiv:2312.03606"},{"key":"e_1_3_4_88_2","volume-title":"2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14\u201316, 2014, Conference Track Proceedings","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14\u201316, 2014, Conference Track Proceedings. arXiv:http:\/\/arxiv.org\/abs\/1312.6114v10"},{"key":"e_1_3_4_89_2","article-title":"Segment anything","author":"Kirillov Alexander","year":"2023","unstructured":"Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, et\u00a0al. 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).","journal-title":"arXiv preprint arXiv:2304.02643"},{"key":"e_1_3_4_90_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i4.20375"},{"issue":"2","key":"e_1_3_4_91_2","first-page":"1","article-title":"Dementia mortality in the United States, 2000-2017.","volume":"68","author":"Kramarow Ellen A.","year":"2019","unstructured":"Ellen A. Kramarow and Betzaida Tejada-Vera. 2019. Dementia mortality in the United States, 2000-2017. National Vital Statistics Reports: From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System 68, 2 (2019), 1\u201329.","journal-title":"National Vital Statistics Reports: From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System"},{"key":"e_1_3_4_92_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_4_93_2","doi-asserted-by":"publisher","DOI":"10.1007\/11496168_1"},{"key":"e_1_3_4_94_2","doi-asserted-by":"publisher","DOI":"10.5311\/JOSIS.2021.23.161"},{"key":"e_1_3_4_95_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.splurobonlp-1.9"},{"key":"e_1_3_4_96_2","article-title":"xView: Objects in context in overhead imagery","author":"Lam Darius","year":"2018","unstructured":"Darius Lam, Richard Kuzma, Kevin McGee, Samuel Dooley, Michael Laielli, Matthew Klaric, Yaroslav Bulatov, and Brendan McCord. 2018. xView: Objects in context in overhead imagery. arXiv preprint arXiv:1802.07856 (2018).","journal-title":"arXiv preprint arXiv:1802.07856"},{"key":"e_1_3_4_97_2","article-title":"Neural architectures for named entity recognition","author":"Lample Guillaume","year":"2016","unstructured":"Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).","journal-title":"arXiv preprint arXiv:1603.01360"},{"key":"e_1_3_4_98_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.19"},{"key":"e_1_3_4_99_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2018863118"},{"key":"e_1_3_4_100_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i1.16101"},{"key":"e_1_3_4_101_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589132.3625598"},{"key":"e_1_3_4_102_2","article-title":"BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).","journal-title":"arXiv preprint arXiv:2301.12597"},{"key":"e_1_3_4_103_2","first-page":"12888","volume-title":"International Conference on Machine Learning","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning. PMLR, 12888\u201312900."},{"key":"e_1_3_4_104_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2021.1912347"},{"key":"e_1_3_4_105_2","article-title":"Autonomous GIS: The next-generation AI-powered GIS","author":"Li Zhenlong","year":"2023","unstructured":"Zhenlong Li and Huan Ning. 2023. Autonomous GIS: The next-generation AI-powered GIS. arXiv preprint arXiv:2305.06453 (2023).","journal-title":"arXiv preprint arXiv:2305.06453"},{"key":"e_1_3_4_106_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.317"},{"key":"e_1_3_4_107_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1003"},{"key":"e_1_3_4_108_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.229"},{"key":"e_1_3_4_109_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i5.16548"},{"key":"e_1_3_4_110_2","doi-asserted-by":"publisher","DOI":"10.3390\/ijgi6110321"},{"key":"e_1_3_4_111_2","doi-asserted-by":"publisher","DOI":"10.5194\/agile-giss-3-9-2022"},{"key":"e_1_3_4_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_4_113_2","article-title":"A ConvNet for the 2020s","author":"Liu Zhuang","year":"2022","unstructured":"Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A ConvNet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"e_1_3_4_114_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485125"},{"key":"e_1_3_4_115_2","article-title":"TransFlower: An explainable transformer-based model with flow-to-flow attention for commuting flow prediction","author":"Luo Yan","year":"2024","unstructured":"Yan Luo, Zhuoyue Wan, Yuzhong Chen, Gengchen Mai, Fu-lai Chung, and Kent Larson. 2024. TransFlower: An explainable transformer-based model with flow-to-flow attention for commuting flow prediction. arXiv preprint arXiv:2402.15398 (2024).","journal-title":"arXiv preprint arXiv:2402.15398"},{"key":"e_1_3_4_116_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00969"},{"key":"e_1_3_4_117_2","doi-asserted-by":"publisher","DOI":"10.5555\/AAI28548341"},{"key":"e_1_3_4_118_2","article-title":"Geo-foundation model","author":"Mai Gengchen","year":"2024","unstructured":"Gengchen Mai. 2024. Geo-foundation model. International Encyclopedia of Geography: People, the Earth, Environment and Technology. Wiley.","journal-title":"International Encyclopedia of Geography: People, the Earth, Environment and Technology"},{"key":"e_1_3_4_119_2","doi-asserted-by":"publisher","DOI":"10.1145\/3557915.3561043"},{"key":"e_1_3_4_120_2","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.13012"},{"key":"e_1_3_4_121_2","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.12629"},{"key":"e_1_3_4_122_2","doi-asserted-by":"publisher","DOI":"10.1145\/3281354.3281359"},{"key":"e_1_3_4_123_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2021.2004602"},{"key":"e_1_3_4_124_2","doi-asserted-by":"publisher","DOI":"10.1145\/3360901.3364432"},{"key":"e_1_3_4_125_2","volume-title":"ICLR 2020","author":"Mai Gengchen","year":"2020","unstructured":"Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, and Ni Lao. 2020. Multi-scale representation learning for spatial feature distributions using grid cells. In ICLR 2020. openreview."},{"key":"e_1_3_4_126_2","doi-asserted-by":"publisher","DOI":"10.5194\/agile-giss-2-8-2021"},{"key":"e_1_3_4_127_2","first-page":"1","article-title":"Towards general-purpose representation learning of polygonal geometries","author":"Mai Gengchen","year":"2022","unstructured":"Gengchen Mai, Chiyu Jiang, Weiwei Sun, Rui Zhu, Yao Xuan, Ling Cai, Krzysztof Janowicz, Stefano Ermon, and Ni Lao. 2022. Towards general-purpose representation learning of polygonal geometries. GeoInformatica (2022), 1\u201352.","journal-title":"GeoInformatica"},{"key":"e_1_3_4_128_2","volume-title":"International Conference on Machine Learning","author":"Mai Gengchen","year":"2023","unstructured":"Gengchen Mai, Ni Lao, Yutong He, Jiaming Song, and Stefano Ermon. 2023. CSP: Self-supervised contrastive spatial pre-training for geospatial-visual representations. In International Conference on Machine Learning. PMLR."},{"key":"e_1_3_4_129_2","article-title":"SSIF: Learning continuous image representation for spatial-spectral super-resolution","author":"Mai Gengchen","year":"2023","unstructured":"Gengchen Mai, Ni Lao, Weiwei Sun, Yuchi Ma, Jiaming Song, Chenlin Meng, Hongxu Ma, Jinmeng Rao, Ziyuan Li, and Stefano Ermon. 2023. SSIF: Learning continuous image representation for spatial-spectral super-resolution. arXiv preprint arXiv:2310.00413 (2023).","journal-title":"arXiv preprint arXiv:2310.00413"},{"key":"e_1_3_4_130_2","doi-asserted-by":"publisher","DOI":"10.1201\/9781003308423-6"},{"key":"e_1_3_4_131_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2023.06.016"},{"key":"e_1_3_4_132_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-14745-7_2"},{"key":"e_1_3_4_133_2","article-title":"Large language models are geographically biased","author":"Manvi Rohin","year":"2024","unstructured":"Rohin Manvi, Samar Khanna, Marshall Burke, David Lobell, and Stefano Ermon. 2024. Large language models are geographically biased. arXiv preprint arXiv:2402.02680 (2024).","journal-title":"arXiv preprint arXiv:2402.02680"},{"key":"e_1_3_4_134_2","volume-title":"the 12th International Conference on Learning Representations (ICLR\u201924)","author":"Manvi Rohin","year":"2024","unstructured":"Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, and Stefano Ermon. 2024. GeoLLM: Extracting geospatial knowledge from large language models. In the 12th International Conference on Learning Representations (ICLR\u201924)."},{"key":"e_1_3_4_135_2","doi-asserted-by":"publisher","DOI":"10.1145\/2063518.2063519"},{"key":"e_1_3_4_136_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.rse.2011.11.007"},{"key":"e_1_3_4_137_2","doi-asserted-by":"publisher","DOI":"10.1145\/3557915.3560972"},{"key":"e_1_3_4_138_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.416"},{"key":"e_1_3_4_139_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.154"},{"key":"e_1_3_4_140_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compenvurbsys.2021.101651"},{"key":"e_1_3_4_141_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-812959-3.00003-4"},{"key":"e_1_3_4_142_2","unstructured":"OpenAI. 2022. Introducing ChatGPT. (2022). Retrieved from https:\/\/openai.com\/blog\/chatgpt"},{"key":"e_1_3_4_143_2","article-title":"GPT-4 technical report","year":"2023","unstructured":"OpenAI. 2023. GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023).","journal-title":"arXiv preprint arXiv:2303.08774"},{"key":"e_1_3_4_144_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et\u00a0al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730\u201327744.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_145_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-817772-3.00009-4"},{"key":"e_1_3_4_146_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41583-019-0202-9"},{"key":"e_1_3_4_147_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_4_148_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1005"},{"key":"e_1_3_4_149_2","doi-asserted-by":"publisher","DOI":"10.1145\/3281354.3281362"},{"key":"e_1_3_4_150_2","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.13064"},{"key":"e_1_3_4_151_2","first-page":"8748","volume-title":"International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_4_152_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_4_153_2","unstructured":"Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. (2019)."},{"issue":"140","key":"e_1_3_4_154_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1\u201367.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_4_155_2","article-title":"Hierarchical text-conditional image generation with clip latents","author":"Ramesh Aditya","year":"2022","unstructured":"Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).","journal-title":"arXiv preprint arXiv:2204.06125"},{"key":"e_1_3_4_156_2","volume-title":"11th International Conference on Geographic Information Science (GIScience 2021)-Part I","author":"Rao Jinmeng","year":"2020","unstructured":"Jinmeng Rao, Song Gao, Yuhao Kang, and Qunying Huang. 2020. LSTM-TrajGAN: A deep learning approach to trajectory privacy protection. In 11th International Conference on Geographic Information Science (GIScience 2021)-Part I. Schloss Dagstuhl-Leibniz-Zentrum f\u00fcr Informatik."},{"key":"e_1_3_4_157_2","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.12769"},{"key":"e_1_3_4_158_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589132.3625611"},{"key":"e_1_3_4_159_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2023.2262550"},{"key":"e_1_3_4_160_2","unstructured":"Hannah Rashkin Vitaly Nikolaev Matthew Lamm Lora Aroyo Michael Collins Dipanjan Das Slav Petrov Gaurav Singh Tomar Iulia Turc and David Reitter. 2021. Measuring attribution in natural language generation models. (2021)."},{"key":"e_1_3_4_161_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_4_162_2","article-title":"A generalist agent","author":"Reed Scott","year":"2022","unstructured":"Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, et\u00a0al. 2022. A generalist agent. arXiv preprint arXiv:2205.06175 (2022).","journal-title":"arXiv preprint arXiv:2205.06175"},{"key":"e_1_3_4_163_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93417-4_34"},{"key":"e_1_3_4_164_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-24638-z"},{"key":"e_1_3_4_165_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_4_166_2","doi-asserted-by":"publisher","DOI":"10.1177\/0309132513498339"},{"key":"e_1_3_4_167_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2002"},{"key":"e_1_3_4_168_2","first-page":"36479","article-title":"Photorealistic text-to-image diffusion models with deep language understanding","volume":"35","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et\u00a0al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479\u201336494.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_169_2","doi-asserted-by":"publisher","DOI":"10.1080\/17538947.2020.1738568"},{"key":"e_1_3_4_170_2","volume-title":"36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade W. Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa R. Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. 2022. LAION-5B: An open large-scale dataset for training next generation image-text models. In 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track. Retrieved from https:\/\/openreview.net\/forum?id=M3Y74vmsMcY"},{"key":"e_1_3_4_171_2","doi-asserted-by":"publisher","DOI":"10.5194\/agile-giss-4-42-2023"},{"key":"e_1_3_4_172_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW60847.2023.00073"},{"key":"e_1_3_4_173_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2015.1100731"},{"key":"e_1_3_4_174_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-26752-4"},{"key":"e_1_3_4_175_2","doi-asserted-by":"publisher","DOI":"10.1109\/IGARSS.2019.8900532"},{"key":"e_1_3_4_176_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.121"},{"key":"e_1_3_4_177_2","doi-asserted-by":"publisher","DOI":"10.2307\/143141"},{"key":"e_1_3_4_178_2","article-title":"MLP-Mixer: An all-MLP architecture for vision","author":"Tolstikhin Ilya","year":"2021","unstructured":"Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, and Alexey Dosovitskiy. 2021. MLP-Mixer: An all-MLP architecture for vision. arXiv preprint arXiv:2105.01601 (2021).","journal-title":"arXiv preprint arXiv:2105.01601"},{"key":"e_1_3_4_179_2","article-title":"Llama: Open and efficient foundation language models","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, et\u00a0al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).","journal-title":"arXiv preprint arXiv:2302.13971"},{"key":"e_1_3_4_180_2","article-title":"SpaceNet: A remote sensing dataset and challenge series","author":"Etten Adam Van","year":"2018","unstructured":"Adam Van Etten, Dave Lindenbaum, and Todd M. Bacastow. 2018. SpaceNet: A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232 (2018).","journal-title":"arXiv preprint arXiv:1807.01232"},{"key":"e_1_3_4_181_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_182_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2017.1368523"},{"key":"e_1_3_4_183_2","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.12579"},{"key":"e_1_3_4_184_2","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.12627"},{"key":"e_1_3_4_185_2","doi-asserted-by":"publisher","DOI":"10.1145\/3440207"},{"key":"e_1_3_4_186_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2018.1431838"},{"key":"e_1_3_4_187_2","article-title":"Image as a foreign language: BEiT pretraining for all vision and vision-language tasks","author":"Wang Wenhui","year":"2022","unstructured":"Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, et\u00a0al. 2022. Image as a foreign language: BEiT pretraining for all vision and vision-language tasks. arXiv preprint arXiv:2208.10442 (2022).","journal-title":"arXiv preprint arXiv:2208.10442"},{"key":"e_1_3_4_188_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S19-2156"},{"key":"e_1_3_4_189_2","article-title":"Chain of thought prompting elicits reasoning in large language models","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 (2022).","journal-title":"arXiv preprint arXiv:2201.11903"},{"key":"e_1_3_4_190_2","volume-title":"7th International Conference on Learning Representations, ICLR 2019","author":"Wu Chien-sheng","year":"2019","unstructured":"Chien-sheng Wu, Richard Socher, and Caiming Xiong. 2019. Global-to-local memory pointer networks for task-oriented dialogue. In 7th International Conference on Learning Representations, ICLR 2019."},{"key":"e_1_3_4_191_2","article-title":"A survey of graph prompting methods: Techniques, applications, and challenges","author":"Wu Xuansheng","year":"2023","unstructured":"Xuansheng Wu, Kaixiong Zhou, Mingchen Sun, Xin Wang, and Ninghao Liu. 2023. A survey of graph prompting methods: Techniques, applications, and challenges. arXiv preprint arXiv:2303.07275 (2023).","journal-title":"arXiv preprint arXiv:2303.07275"},{"key":"e_1_3_4_192_2","unstructured":"Yuxin Wu Alexander Kirillov Francisco Massa Wan-Yen Lo and Ross Girshick. 2019. Detectron2. Retrieved from https:\/\/github.com\/facebookresearch\/detectron2. (2019)."},{"key":"e_1_3_4_193_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2017.2685945"},{"key":"e_1_3_4_194_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM51629.2021.00088"},{"key":"e_1_3_4_195_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589132.3625616"},{"key":"e_1_3_4_196_2","doi-asserted-by":"publisher","DOI":"10.1145\/3139958.3140054"},{"key":"e_1_3_4_197_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2020.1768260"},{"key":"e_1_3_4_198_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01067"},{"key":"e_1_3_4_199_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.3003310"},{"key":"e_1_3_4_200_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2016.1244608"},{"key":"e_1_3_4_201_2","first-page":"37309","article-title":"Deep bidirectional language-knowledge graph pretraining","volume":"35","author":"Yasunaga Michihiro","year":"2022","unstructured":"Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D. Manning, Percy S. Liang, and Jure Leskovec. 2022. Deep bidirectional language-knowledge graph pretraining. Advances in Neural Information Processing Systems 35 (2022), 37309\u201337323.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_202_2","doi-asserted-by":"publisher","DOI":"10.1145\/3347146.3359067"},{"key":"e_1_3_4_203_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2022.2055036"},{"key":"e_1_3_4_204_2","doi-asserted-by":"publisher","DOI":"10.1145\/3512467"},{"key":"e_1_3_4_205_2","article-title":"Florence: A new foundation model for computer vision","author":"Yuan Lu","year":"2021","unstructured":"Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, et\u00a0al. 2021. Florence: A new foundation model for computer vision. arXiv preprint arXiv:2111.11432 (2021).","journal-title":"arXiv preprint arXiv:2111.11432"},{"key":"e_1_3_4_206_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compenvurbsys.2018.11.008"},{"key":"e_1_3_4_207_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01179"},{"key":"e_1_3_4_208_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.landurbplan.2020.104003"},{"key":"e_1_3_4_209_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.landurbplan.2018.08.020"},{"key":"e_1_3_4_210_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2020.1726923"},{"key":"e_1_3_4_211_2","article-title":"Text2Seg: Remote sensing image semantic segmentation via text-guided visual foundation models","author":"Zhang Jielu","year":"2023","unstructured":"Jielu Zhang, Zhongliang Zhou, Gengchen Mai, Lan Mu, Mengxuan Hu, and Sheng Li. 2023. Text2Seg: Remote sensing image semantic segmentation via text-guided visual foundation models. arXiv preprint arXiv:2304.10597 (2023).","journal-title":"arXiv preprint arXiv:2304.10597"},{"key":"e_1_3_4_212_2","article-title":"Adding conditional control to text-to-image diffusion models","author":"Zhang Lvmin","year":"2023","unstructured":"Lvmin Zhang and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543 (2023).","journal-title":"arXiv preprint arXiv:2302.05543"},{"key":"e_1_3_4_213_2","doi-asserted-by":"publisher","DOI":"10.1177\/0361198105190200109"},{"key":"e_1_3_4_214_2","article-title":"OPT: Open pre-trained transformer language models","author":"Zhang Susan","year":"2022","unstructured":"Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et\u00a0al. 2022. OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).","journal-title":"arXiv preprint arXiv:2205.01068"},{"key":"e_1_3_4_215_2","volume-title":"International Conference on Representation Learning (ICLR\u201922)","author":"Zhang X.","year":"2022","unstructured":"X. Zhang, A. Bosselut, M. Yasunaga, H. Ren, P. Liang, C. Manning, and J. Leskovec. 2022. GreaseLM: Graph REASoning enhanced language models for question answering. In International Conference on Representation Learning (ICLR\u201922)."},{"key":"e_1_3_4_216_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2017.09.007"},{"key":"e_1_3_4_217_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.rse.2018.05.006"},{"key":"e_1_3_4_218_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1139"},{"key":"e_1_3_4_219_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compenvurbsys.2022.101915"},{"key":"e_1_3_4_220_2","article-title":"A survey of large language models","author":"Zhao Wayne Xin","year":"2023","unstructured":"Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et\u00a0al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).","journal-title":"arXiv preprint arXiv:2303.18223"},{"key":"e_1_3_4_221_2","article-title":"Places: A 10 million image database for scene recognition","author":"Zhou Bolei","year":"2017","unstructured":"Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_4_222_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00472"},{"key":"e_1_3_4_223_2","first-page":"1","article-title":"Spatial regression graph convolutional neural networks: A deep learning paradigm for spatial multivariate distributions","author":"Zhu Di","year":"2021","unstructured":"Di Zhu, Yu Liu, Xin Yao, and Manfred M. Fischer. 2021. Spatial regression graph convolutional neural networks: A deep learning paradigm for spatial multivariate distributions. GeoInformatica (2021), 1\u201332.","journal-title":"GeoInformatica"},{"key":"e_1_3_4_224_2","doi-asserted-by":"publisher","DOI":"10.1080\/24694452.2019.1694403"},{"key":"e_1_3_4_225_2","doi-asserted-by":"publisher","DOI":"10.1080\/13658816.2022.2092115"},{"key":"e_1_3_4_226_2","doi-asserted-by":"publisher","DOI":"10.1093\/geronb\/gbx147"}],"container-title":["ACM Transactions on Spatial Algorithms and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3653070","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3653070","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:56:55Z","timestamp":1750291015000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3653070"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,30]]},"references-count":225,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3653070"],"URL":"https:\/\/doi.org\/10.1145\/3653070","relation":{},"ISSN":["2374-0353","2374-0361"],"issn-type":[{"value":"2374-0353","type":"print"},{"value":"2374-0361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,30]]},"assertion":[{"value":"2023-04-21","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-12","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}