{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T17:03:44Z","timestamp":1770743024776,"version":"3.49.0"},"reference-count":36,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2024,4,26]],"date-time":"2024-04-26T00:00:00Z","timestamp":1714089600000},"content-version":"vor","delay-in-days":116,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,4,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We present Text-to-OverpassQL, a task designed to facilitate a natural language interface for querying geodata from OpenStreetMap (OSM). The Overpass Query Language (OverpassQL) allows users to formulate complex database queries and is widely adopted in the OSM ecosystem. Generating Overpass queries from natural language input serves multiple use-cases. It enables novice users to utilize OverpassQL without prior knowledge, assists experienced users with crafting advanced queries, and enables tool-augmented large language models to access information stored in the OSM database. In order to assess the performance of current sequence generation models on this task, we propose OverpassNL,1 a dataset of 8,352 queries with corresponding natural language inputs. We further introduce task specific evaluation metrics and ground the evaluation of the Text-to-OverpassQL task by executing the queries against the OSM database. We establish strong baselines by finetuning sequence-to-sequence models and adapting large language models with in-context examples. The detailed evaluation reveals strengths and weaknesses of the considered learning strategies, laying the foundations for further research into the Text-to-OverpassQL task.<\/jats:p>","DOI":"10.1162\/tacl_a_00654","type":"journal-article","created":{"date-parts":[[2024,4,26]],"date-time":"2024-04-26T18:14:10Z","timestamp":1714155250000},"page":"562-575","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":8,"title":["Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap"],"prefix":"10.1162","volume":"12","author":[{"given":"Michael","family":"Staniek","sequence":"first","affiliation":[{"name":"Computational Linguistics, Heidelberg University, Germany. zuefle@cl.uni-heidelberg.de"}]},{"given":"Raphael","family":"Schumann","sequence":"additional","affiliation":[{"name":"Computational Linguistics, Heidelberg University, Germany. rschuman@cl.uni-heidelberg.de"}]},{"given":"Maike","family":"Z\u00fcfle","sequence":"additional","affiliation":[{"name":"Computational Linguistics, Heidelberg University, Germany. zuefle@cl.uni-heidelberg.de"},{"name":"School of Informatics, University of Edinburgh, UK"}]},{"given":"Stefan","family":"Riezler","sequence":"additional","affiliation":[{"name":"Computational Linguistics, Heidelberg University, Germany. riezler@cl.uni-heidelberg.de"},{"name":"IWR, Heidelberg University, Germany"}]}],"member":"281","published-online":{"date-parts":[[2024,4,30]]},"reference":[{"key":"2024042618135983600_bib1","first-page":"1877","article-title":"Language models are few-shot learners","volume-title":"Advances in Neural Information Processing Systems","author":"Brown","year":"2020"},{"key":"2024042618135983600_bib2","article-title":"Teaching large language models to self-debug","author":"Chen","year":"2023","journal-title":"arXiv preprint arXiv:2304.05128"},{"key":"2024042618135983600_bib3","doi-asserted-by":"publisher","first-page":"740","DOI":"10.18653\/v1\/N16-1088","article-title":"A corpus and semantic parser for multilingual natural language querying of OpenStreetMap","volume-title":"Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Haas","year":"2016"},{"key":"2024042618135983600_bib4","doi-asserted-by":"publisher","first-page":"963","DOI":"10.18653\/v1\/P17-1089","article-title":"Learning a neural semantic parser from user feedback","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Iyer","year":"2017"},{"key":"2024042618135983600_bib5","first-page":"1062","article-title":"Learning to transform natural to formal languages","volume-title":"Proceedings of the 20th National Conference on Artificial Intelligence - Volume 3","author":"Kate","year":"2005"},{"key":"2024042618135983600_bib6","first-page":"1062","article-title":"Learning to transform natural to formal languages","volume-title":"Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05)","author":"Kate","year":"2005"},{"key":"2024042618135983600_bib7","article-title":"Adam: A method for stochastic optimization","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Kingma","year":"2015"},{"key":"2024042618135983600_bib8","first-page":"6","article-title":"NLmaps: A natural language interface to query OpenStreetMap","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations","author":"Lawrence","year":"2016"},{"key":"2024042618135983600_bib9","doi-asserted-by":"publisher","first-page":"1820","DOI":"10.18653\/v1\/P18-1169","article-title":"Improving a neural semantic parser by counterfactual learning from human bandit feedback","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Lawrence","year":"2018"},{"issue":"1","key":"2024042618135983600_bib10","doi-asserted-by":"publisher","first-page":"73","DOI":"10.14778\/2735461.2735468","article-title":"Constructing an interactive natural language interface for relational databases","volume":"8","author":"Li","year":"2014","journal-title":"Proceedings of the VLDB Endowment"},{"key":"2024042618135983600_bib11","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2012.12627","article-title":"Bridging textual and tabular data for cross-domain text-to-sql semantic parsing","author":"Xi","year":"2020","journal-title":"CoRR"},{"key":"2024042618135983600_bib12","doi-asserted-by":"publisher","first-page":"100","DOI":"10.18653\/v1\/W15-3049","article-title":"What makes good in-context examples for GPT-3?","volume-title":"Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures","author":"Liu","year":"2022"},{"key":"2024042618135983600_bib13","article-title":"Self-refine: Iterative refinement with self-feedback","author":"Madaan","year":"2023","journal-title":"arXiv preprint arXiv:2303 .17651"},{"key":"2024042618135983600_bib14","unstructured":"OpenAI. 2023. Gpt-4 technical report. ArXiv, abs\/2303.08774."},{"key":"2024042618135983600_bib15","unstructured":"OpenStreetMap Foundation. 2019. Who uses openstreetmap? \u2014 openstreetmap. [Online; accessed 24-July-2023]."},{"key":"2024042618135983600_bib16","unstructured":"OpenStreetMap Wiki. 2022. Stats\u2014openstreetmap wiki. [Online; accessed 24-July-2023]."},{"key":"2024042618135983600_bib17","unstructured":"OpenStreetMap Wiki. 2023. Overpass api\/ overpass ql\u2014openstreetmap wiki. [Online; accessed 24-July-2023]."},{"key":"2024042618135983600_bib18","doi-asserted-by":"publisher","first-page":"311","DOI":"10.3115\/1073083.1073135","article-title":"BLEU: A method for automatic evaluation of machine translation","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni","year":"2002"},{"key":"2024042618135983600_bib19","doi-asserted-by":"publisher","first-page":"392","DOI":"10.18653\/v1\/W15-3049","article-title":"chrF: Character n-gram F-score for automatic MT evaluation","volume-title":"Proceedings of the Tenth Workshop on Statistical Machine Translation","author":"Popovi\u0107","year":"2015"},{"key":"2024042618135983600_bib20","article-title":"DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction","volume-title":"Thirty-seventh Conference on Neural Information Processing Systems","author":"Pourreza","year":"2023"},{"key":"2024042618135983600_bib21","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","author":"Raffel","year":"2019","journal-title":"ArXiv"},{"key":"2024042618135983600_bib22","doi-asserted-by":"publisher","first-page":"3982","DOI":"10.18653\/v1\/D19-1410","article-title":"Sentence-BERT: Sentence embeddings using Siamese BERT-networks","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Reimers","year":"2019"},{"key":"2024042618135983600_bib23","article-title":"Toolformer: Language models can teach themselves to use tools","volume-title":"Thirty-seventh Conference on Neural Information Processing Systems","author":"Schick","year":"2023"},{"key":"2024042618135983600_bib24","doi-asserted-by":"publisher","first-page":"9895","DOI":"10.18653\/v1\/2021.emnlp-main.779","article-title":"PICARD: Parsing incrementally for constrained auto-regressive decoding from language models","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Scholak","year":"2021"},{"key":"2024042618135983600_bib25","article-title":"Sql-palm: Improved large language modeladaptation for text-to-sql","author":"Sun","year":"2023","journal-title":"arXiv preprint arXiv:2306.00739"},{"key":"2024042618135983600_bib26","doi-asserted-by":"publisher","first-page":"466","DOI":"10.1007\/3-540-44795-4_40","article-title":"Using multiple clause constructors in inductive logic programming for semantic parsing","volume-title":"Machine Learning: ECML 2001","author":"Tang","year":"2001"},{"key":"2024042618135983600_bib27","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.13971","article-title":"Llama: Open and efficient foundation language models","author":"Touvron","year":"2023"},{"key":"2024042618135983600_bib28","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.685","article-title":"Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021","author":"Wang","year":"2021"},{"key":"2024042618135983600_bib29","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1162\/tacl_a_00461","article-title":"ByT5: Towards a token-free future with pre-trained byte-to-byte models","volume":"10","author":"Xue","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"issue":"OOPSLA","key":"2024042618135983600_bib30","doi-asserted-by":"publisher","DOI":"10.1145\/3133887","article-title":"Sqlizer: Query synthesis from natural language","volume":"1","author":"Yaghmazadeh","year":"2017","journal-title":"Proceedings of the ACM on Programming Languages"},{"key":"2024042618135983600_bib31","article-title":"Gra{pp}a: Grammar-augmented pre-training for table semantic parsing","volume-title":"International Conference on Learning Representations","author":"Tao","year":"2021"},{"key":"2024042618135983600_bib32","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1425","article-title":"Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Tao","year":"2018"},{"key":"2024042618135983600_bib33","first-page":"1050","article-title":"Learning to parse database queries using inductive logic programming","volume-title":"Proceedings of AAAI\/IAAI","author":"Zelle","year":"1996"},{"key":"2024042618135983600_bib34","first-page":"658","article-title":"Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars","volume-title":"Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence","author":"Zettlemoyer","year":"2005"},{"key":"2024042618135983600_bib35","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1537","article-title":"Editing-based SQL query generation for cross-domain context-dependent questions","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Zhang","year":"2019"},{"key":"2024042618135983600_bib36","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1709.00103","article-title":"Seq2sql: Generating structured queries from natural language using reinforcement learning","author":"Zhong","year":"2017","journal-title":"CoRR"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00654\/2367419\/tacl_a_00654.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00654\/2367419\/tacl_a_00654.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,26]],"date-time":"2024-04-26T18:14:25Z","timestamp":1714155265000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00654\/120832\/Text-to-OverpassQL-A-Natural-Language-Interface"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":36,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00654","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}