{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,28]],"date-time":"2025-08-28T00:06:01Z","timestamp":1756339561220,"version":"3.44.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:p>\n            Geographic Information Retrieval (GIR) systems process text queries with geographic location to identify relevant geographic objects for users. Although recent advancements have leveraged Pre-trained Language Models (PLMs) for their robust semantic comprehension, these models typically depend on extensive\n            <jats:italic toggle=\"yes\">labeled queries<\/jats:italic>\n            and require considerable computational resources. Deviating from this prevailing trend, we propose GeoBloom, a lightweight framework that surpasses the effectiveness of PLMs with fewer or no labeled queries, with remarkable efficiency in both time and space.\n          <\/jats:p>\n          <jats:p>GeoBloom tackles critical challenges such as the lack of labeled queries, low data (labeled) efficiency, and high computational demands. At its core, it employs Bloom filters to encode text at a fine-grained term level and uses intersecting bits to create a robust unsupervised text similarity metric. A specialized Bloom Filter Evaluator is proposed to assess the importance of each intersecting bit, focusing on those associated with ground truth, improving effectiveness with fewer training labels. For enhanced search efficiency, the evaluator exploits the inherent sparsity of Bloom filters, achieving remarkably low time and space complexities. This efficiency is further boosted by a tree-based index that partitions the search space while preserving effectiveness. Extensive experiments show that GeoBloom surpasses state-of-the-art baselines in both unsupervised (up to 15.66% improvement) and supervised settings (up to 10.94% improvement) on real datasets in terms of NDCG@5. Furthermore, GeoBloom operates up to 80x faster and saves up to 74.72% memory and 87.64% disk space over PLM-based alternatives, rendering it highly potent for real-world applications.<\/jats:p>","DOI":"10.14778\/3718057.3718064","type":"journal-article","created":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T18:11:49Z","timestamp":1756318309000},"page":"1348-1361","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["GeoBloom: Revisiting Lightweight Models for Geographic Information Retrieval"],"prefix":"10.14778","volume":"18","author":[{"given":"Yi","family":"Li","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore"}]},{"given":"Gao","family":"Cong","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2025,8,27]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-021-00419-9"},{"key":"e_1_2_1_2_1","volume-title":"In Workshop on Geographic Information Retrieval, SIGIR '06","author":"Andrade Leonardo","year":"2006","unstructured":"Leonardo Andrade and M\u00e1rio J Silva. 2006. Relevance ranking for geographic IR.. In In Workshop on Geographic Information Retrieval, SIGIR '06."},{"key":"e_1_2_1_3_1","unstructured":"Sanjeev Arora Yingyu Liang and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. (2017)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_5_1","volume-title":"Advances in Neural Information Processing Systems","author":"Burges Christopher","unstructured":"Christopher Burges, Robert Ragno, and Quoc Le. 2006. Learning to Rank with Nonsmooth Cost Functions. In Advances in Neural Information Processing Systems, Vol. 19. MIT Press."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 277\u2013288","author":"Chen Yen-Yu","year":"2006","unstructured":"Yen-Yu Chen, Torsten Suel, and Alexander Markowetz. 2006. Efficient query processing in geographic web search engines. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 277\u2013288."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow","author":"Christoforaki Maria","year":"2011","unstructured":"Maria Christoforaki, Jinru He, Constantinos Dimopoulos, Alexander Markowetz, and Torsten Suel. 2011. Text vs. space: efficient geo-search query processing. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow, Scotland, UK) (CIKM '11). Association for Computing Machinery, New York, NY, USA, 423\u2013432."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687666"},{"key":"e_1_2_1_9_1","volume-title":"2008 IEEE 24th International conference on data engineering. IEEE, 656\u2013665","author":"Felipe Ian De","year":"2008","unstructured":"Ian De Felipe, Vagelis Hristidis, and Naphtali Rishe. 2008. Keyword search on spatial databases. In 2008 IEEE 24th International conference on data engineering. IEEE, 656\u2013665."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171\u20134186."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Ding Ruixue","year":"2023","unstructured":"Ruixue Ding, Boli Chen, Pengjun Xie, Fei Huang, Xin Li, Qiang Zhang, and Yao Xu. 2023. MGeo: A Multi-Modal Geographic Pre-Training Method. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)."},{"key":"e_1_2_1_12_1","unstructured":"Matthijs Douze Alexandr Guzhva Chengqi Deng Jeff Johnson Gergely Szilvasy Pierre-Emmanuel Mazar\u00e9 Maria Lomeli Lucas Hosseini and Herv\u00e9 J\u00e9gou. 2024. The Faiss library. (2024). arXiv:2401.08281 [cs.LG]"},{"key":"e_1_2_1_13_1","first-page":"4","article-title":"Summary cache: a scalable wide-area Web cache sharing protocol","volume":"28","author":"Fan Li","year":"1998","unstructured":"Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder. 1998. Summary cache: a scalable wide-area Web cache sharing protocol. SIGCOMM Comput. Commun. Rev. 28, 4 (oct 1998), 254\u2013265.","journal-title":"SIGCOMM Comput. Commun. Rev."},{"key":"e_1_2_1_14_1","unstructured":"Yao Fu and Mirella Lapata. 2022. Latent Topology Induction for Understanding Contextualized Representations. arXiv:2206.01512 [cs.CL]"},{"key":"e_1_2_1_15_1","volume-title":"Web-Age Information Management","author":"Gao Yunpeng","unstructured":"Yunpeng Gao, Yao Wang, and Shengwei Yi. 2016. Preference-Aware Top-k Spatio-Textual Queries. In Web-Age Information Management. Springer International Publishing, Cham, 186\u2013197."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (Shinjuku","author":"Goodwin Bob","year":"2017","unstructured":"Bob Goodwin, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, and Yuxiong He. 2017. BitFunnel: Revisiting Signatures for Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (Shinjuku, Tokyo, Japan) (SIGIR '17). Association for Computing Machinery, New York, NY, USA, 605\u2013614."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","author":"Guo Jiafeng","unstructured":"Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (Indianapolis, Indiana, USA) (CIKM '16). Association for Computing Machinery, New York, NY, USA, 55\u201364."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 27th International Conference on Neural Information Processing Systems -","volume":"2","author":"Hu Baotian","year":"2014","unstructured":"Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 2042\u20132050."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD '22)","author":"Huang Jizhou","year":"2022","unstructured":"Jizhou Huang, Haifeng Wang, Yibo Sun, Yunsheng Shi, Zhengjie Huang, An Zhuo, and Shikun Feng. 2022. ERNIE-GeoL: A Geography-and-Language Pretrained Model and its Applications in Baidu Maps. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD '22). Association for Computing Machinery, New York, NY, USA, 3029\u20133039."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505665"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196909"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2012"},{"key":"e_1_2_1_24_1","volume-title":"GeoGLUE: A Geo-Graphic Language Understanding Evaluation Benchmark. CoRR abs\/2305.06545","author":"Li Dongyang","year":"2023","unstructured":"Dongyang Li, Ruixue Ding, Qiang Zhang, Zheng Li, Boli Chen, Pengjun Xie, Yao Xu, Xin Li, Ning Guo, Fei Huang, and Xiaofeng He. 2023. GeoGLUE: A Geo-Graphic Language Understanding Evaluation Benchmark. CoRR abs\/2305.06545 (2023). arXiv:2305.06545"},{"key":"e_1_2_1_25_1","first-page":"4","article-title":"IR-Tree: An Efficient Index for Geographic Document Search","volume":"23","author":"Li Zhisheng","year":"2011","unstructured":"Zhisheng Li, Ken C. K. Lee, Baihua Zheng, Wang-Chien Lee, Dik Lee, and Xufa Wang. 2011. IR-Tree: An Efficient Index for Geographic Document Search. IEEE Trans. on Knowl. and Data Eng. 23, 4 (apr 2011), 585\u2013599.","journal-title":"IEEE Trans. on Knowl. and Data Eng."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588691"},{"key":"e_1_2_1_27_1","volume-title":"Evaluating Systems for Multilingual and Multimodal Information Access: 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008","author":"Mandl Thomas","year":"2009","unstructured":"Thomas Mandl, Paula Carvalho, Giorgio Maria Di Nunzio, Fredric Gey, Ray R Larson, Diana Santos, and Christa Womser-Hacker. 2009. GeoCLEF 2008: The CLEF 2008 cross-language geographic information retrieval track overview. In Evaluating Systems for Multilingual and Multimodal Information Access: 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17\u201319, 2008, Revised Selected Papers 9. Springer, 808\u2013821."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1096985.1096993"},{"key":"e_1_2_1_29_1","volume-title":"Advances in Neural Information Processing Systems","volume":"26","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, Vol. 26. Curran Associates, Inc."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. In Neural Information Processing Systems.","DOI":"10.1007\/978-1-4614-8265-9_751"},{"key":"e_1_2_1_31_1","volume-title":"Efficiently updatable neural-network-based evaluation functions for computer shogi. The 28th World Computer Shogi Championship Appeal Document 185","author":"Nasu Yu","year":"2018","unstructured":"Yu Nasu. 2018. Efficiently updatable neural-network-based evaluation functions for computer shogi. The 28th World Computer Shogi Championship Appeal Document 185 (2018)."},{"key":"e_1_2_1_32_1","volume-title":"Information Security and Cryptology","author":"Palmieri Paolo","unstructured":"Paolo Palmieri, Luca Calderoni, and Dario Maio. 2015. Spatial Bloom Filters: Enabling Privacy in Location-Aware Applications. In Information Security and Cryptology. Springer International Publishing, Cham, 16\u201336."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_2_1_34_1","volume-title":"Computer Science - Theory and Applications","author":"Porat Ely","unstructured":"Ely Porat. 2009. An Optimal Bloom Filter Replacement Based on Matrix Solving. In Computer Science - Theory and Applications. Springer Berlin Heidelberg, Berlin, Heidelberg, 263\u2013273."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Ross S Purves Paul Clough Christopher B Jones Mark H Hall Vanessa Murdock et al. 2018. Geographic information retrieval: Progress and challenges in spatial search of text. Foundations and Trends\u00ae in Information Retrieval 12 2\u20133 (2018) 164\u2013318.","DOI":"10.1561\/1500000034"},{"key":"e_1_2_1_36_1","volume-title":"SIGIR '94","author":"Robertson S. E.","unstructured":"S. E. Robertson and S. Walker. 1994. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In SIGIR '94. Springer London, London, 232\u2013241."},{"key":"e_1_2_1_37_1","volume-title":"Advances in Spatial and Temporal Databases","author":"Rocha-Junior Jo\u00e3o B.","unstructured":"Jo\u00e3o B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil N\u00f8rv\u00e5g. 2011. Efficient Processing of Top-k Spatial Keyword Queries. In Advances in Spatial and Temporal Databases. Springer Berlin Heidelberg, Berlin, Heidelberg, 205\u2013222."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9671.2008.01084.x"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661935"},{"key":"e_1_2_1_40_1","volume-title":"A statistical interpretation of term specificity and its application in retrieval","author":"Jones Karen Sparck","unstructured":"Karen Sparck Jones. 1988. A statistical interpretation of term specificity and its application in retrieval. Taylor Graham Publishing, GBR, 132\u2013142."},{"key":"e_1_2_1_41_1","volume-title":"Query the trajectory based on the precise track: a Bloom filter-based approach. GeoInformatica 25, 2 (01","author":"Wang Zengjie","year":"2021","unstructured":"Zengjie Wang, Wen Luo, Linwang Yuan, Hong Gao, Fan Wu, Xu Hu, and Zhaoyuan Yu. 2021. Query the trajectory based on the precise track: a Bloom filter-based approach. GeoInformatica 25, 2 (01 Apr 2021), 397\u2013416."},{"key":"e_1_2_1_42_1","volume-title":"2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom). IEEE, 12\u201317","author":"Yang Mingyang","year":"2015","unstructured":"Mingyang Yang, Long Zheng, Yanchao Lu, Minyi Guo, and Jie Li. 2015. Cloud-assisted spatio-textual k nearest neighbor joins in sensor networks. In 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom). IEEE, 12\u201317."},{"key":"e_1_2_1_43_1","volume-title":"Yew Soon Ong, and Bin Cui","author":"Yin Ziqi","year":"2024","unstructured":"Ziqi Yin, Shanshan Feng, Shang Liu, Gao Cong, Yew Soon Ong, and Bin Cui. 2024. LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries. arXiv:2403.07331 [cs.IR]"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401159"},{"key":"e_1_2_1_45_1","volume-title":"International Journal of Intelligent Systems 2023 (29","author":"Zeng Meng","year":"2023","unstructured":"Meng Zeng, Beiji Zou, Xiaoyan Kui, Chengzhang Zhu, Ling Xiao, Zhi Chen, and Jingyu Du. 2023. PA-LBF: Prefix-Based and Adaptive Learned Bloom Filter for Spatial Data. International Journal of Intelligent Systems 2023 (29 Mar 2023), 4970776."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1139"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence 33","author":"Zhao Ji","year":"2019","unstructured":"Ji Zhao, Dan Peng, Chuhan Wu, Huan Chen, Meiyu Yu, Wanji Zheng, Li Ma, Hua Chai, Jieping Ye, and Xiaohu Qie. 2019. Incorporating Semantic Similarity with Geographic Correlation for Query-POI Relevance Learning. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 1270\u20131277."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3718057.3718064","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T18:12:19Z","timestamp":1756318339000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3718057.3718064"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1]]},"references-count":47,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10.14778\/3718057.3718064"],"URL":"https:\/\/doi.org\/10.14778\/3718057.3718064","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,1]]},"assertion":[{"value":"2025-08-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}