{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T05:18:47Z","timestamp":1781587127114,"version":"3.54.5"},"reference-count":28,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2020,5,1]],"date-time":"2020-05-01T00:00:00Z","timestamp":1588291200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005632","name":"Polish National Center for Research and Development","doi-asserted-by":"publisher","award":["POIR.01.02.00-00-0154\/16"],"award-info":[{"award-number":["POIR.01.02.00-00-0154\/16"]}],"id":[{"id":"10.13039\/501100005632","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007751","name":"AGH University of Science and Technology","doi-asserted-by":"publisher","award":["5.72.230.442"],"award-info":[{"award-number":["5.72.230.442"]}],"id":[{"id":"10.13039\/501100007751","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Complementing information about particular points, places, or institutions, i.e., so-called Points of Interest (POIs) can be achieved by matching data from the growing number of geospatial databases; these include Foursquare, OpenStreetMap, Yelp, and Facebook Places. Doing this potentially allows for the acquisition of more accurate and more complete information about POIs than would be possible by merely extracting the information from each of the systems alone. Problem: The task of Points of Interest matching, and the development of an algorithm to perform this automatically, are quite challenging problems due to the prevalence of different data structures, data incompleteness, conflicting information, naming differences, data inaccuracy, and cultural and language differences; in short, the difficulties experienced in the process of obtaining (complementary) information about the POI from different sources are due, in part, to the lack of standardization among Points of Interest descriptions; a further difficulty stems from the vast and rapidly growing amount of data to be assessed on each occasion. Research design and contributions: To propose an efficient algorithm for automatic Points of Interest matching, we: (1) analyzed available data sources\u2014their structures, models, attributes, number of objects, the quality of data (number of missing attributes), etc.\u2014and defined a unified POI model; (2) prepared a fairly large experimental dataset consisting of 50,000 matching and 50,000 non-matching points, taken from different geographical, cultural, and language areas; (3) comprehensively reviewed metrics that can be used for assessing the similarity between Points of Interest; (4) proposed and verified different strategies for dealing with missing or incomplete attributes; (5) reviewed and analyzed six different classifiers for Points of Interest matching, conducting experiments and follow-up comparisons to determine the most effective combination of similarity metric, strategy for dealing with missing data, and POIs matching classifier; and (6) presented an algorithm for automatic Points of Interest matching, detailing its accuracy and carrying out a complexity analysis. Results and conclusions: The main results of the research are: (1) comprehensive experimental verification and numerical comparisons of the crucial Points of Interest matching components (similarity metrics, approaches for dealing with missing data, and classifiers), indicating that the best Points of Interest matching classifier is a combination of random forest algorithm coupled with marking of missing data and mixing different similarity metrics for different POI attributes; and (2) an efficient greedy algorithm for automatic POI matching. At a cost of just 3.5% in terms of accuracy, it allows for reducing POI matching time complexity by two orders of magnitude in comparison to the exact algorithm.<\/jats:p>","DOI":"10.3390\/ijgi9050291","type":"journal-article","created":{"date-parts":[[2020,5,4]],"date-time":"2020-05-04T03:29:39Z","timestamp":1588562979000},"page":"291","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Towards Automatic Points of Interest Matching"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0146-5921","authenticated-orcid":false,"given":"Mateusz","family":"Piech","sequence":"first","affiliation":[{"name":"Department of Computer Science, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakow, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6684-0748","authenticated-orcid":false,"given":"Aleksander","family":"Smywinski-Pohl","sequence":"additional","affiliation":[{"name":"Department of Computer Science, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakow, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8494-628X","authenticated-orcid":false,"given":"Robert","family":"Marcjan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakow, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0535-7220","authenticated-orcid":false,"given":"Leszek","family":"Siwik","sequence":"additional","affiliation":[{"name":"Department of Computer Science, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakow, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2020,5,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Scheffler, T., Schirru, R., and Lehmann, P. (2012). Matching Points of Interest from Different Social Networking Sites. Lecture Notes in Computer Science, Springer.","DOI":"10.1007\/978-3-642-33347-7_24"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1109\/TPAMI.2007.1078","article-title":"A Normalized Levenshtein Distance Metric","volume":"29","author":"Yujian","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1080\/15230406.2014.880327","article-title":"A weighted multi-attribute method for matching user-generated Points of Interest","volume":"41","author":"McKenzie","year":"2014","journal-title":"Cartogr. Geogr. Inf. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Novack, T., Peters, R., and Zipf, A. (2018). Graph-Based Matching of Points-of-Interest from Collaborative Geo-Datasets. ISPRS Int. J. -Geo-Inf., 7.","DOI":"10.3390\/ijgi7030117"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Almeida, A., Alves, A., and Gomes, R. (2018). Automatic POI Matching Using an Outlier Detection Based Approach. Advances in Intelligent Data Analysis XVII, Springer International Publishing.","DOI":"10.1007\/978-3-030-01768-2_4"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, F.T., Ting, K.M., and Zhou, Z.H. (2008, January 15\u201319). Isolation Forest. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy.","DOI":"10.1109\/ICDM.2008.17"},{"key":"ref_7","unstructured":"(2019, September 01). Factual Crosswalk API. Available online: https:\/\/www.factual.com\/blog\/crosswalk-api\/."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1002\/wics.108","article-title":"Record linkage","volume":"2","author":"Herzog","year":"2010","journal-title":"WIREs Comput. Stat."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Li, L., Xing, X., Xia, H., and Huang, X. (2016). Entropy-weighted instance matching between different sourcing points of interest. Entropy, 18.","DOI":"10.3390\/e18020045"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1080\/13658816.2016.1188930","article-title":"Similarity matching for integrating spatial information extracted from place descriptions","volume":"31","author":"Kim","year":"2017","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Deng, Y., Luo, A., Liu, J., and Wang, Y. (2019). Point of Interest Matching between Different Geospatial Datasets. ISPRS Int. J. Geo-Inf., 8.","DOI":"10.3390\/ijgi8100435"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1068\/b35097","article-title":"How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets","volume":"37","author":"Haklay","year":"2010","journal-title":"Environ. Plan. B Plan. Des."},{"key":"ref_13","unstructured":"Hochmair, H.H., Juh\u00e1sz, L., and Cvetojevic, S. (2018, January 15\u201317). Data quality of points of interest in selected mapping and social media platforms. Proceedings of the LBS 2018: 14th International Conference on Location Based Services, Zurich, Switzerland."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. The Semantic Web, Springer.","DOI":"10.1007\/978-3-540-76298-0_52"},{"key":"ref_15","unstructured":"(2019, September 01). OpenStreetMap TagInfo. Available online: https:\/\/taginfo.openstreetmap.org\/."},{"key":"ref_16","unstructured":"(2019, September 01). OpenStreetMap Wiki. Available online: https:\/\/wiki.openstreetmap.org\/."},{"key":"ref_17","unstructured":"Cohen, W.W., Ravikumar, P., and Fienberg, S.E. (2003, January 24\u201327). A Comparison of String Metrics for Matching Names and Records. Proceedings of the KDD Workshop On Data Cleaning and Object Consolidation, Washington, DC, USA."},{"key":"ref_18","unstructured":"(2019, September 01). FuzzyWuzzy: Fuzzy String Matching in Python. Available online: https:\/\/chairnerd.seatgeek.com\/fuzzywuzzy-fuzzy-string-matching-in-python\/."},{"key":"ref_19","unstructured":"Blanchard, E., Harzallah, M., Briand, H., and Kuntz, P. (2005, January 13\u201314). A Typology of Ontology-Based Semantic Measures. Proceedings of the EMOI-INTEROP 2005, Porto, Portugal."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Fix, E., and Hodges, J.L. (1951). Discriminatory Analysis-Nonparametric Discrimination: Consistency Properties, University of California. Technical Report.","DOI":"10.1037\/e471672008-001"},{"key":"ref_21","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1007\/BF00116251","article-title":"Induction of decision trees","volume":"1","author":"Quinlan","year":"1986","journal-title":"Mach. Learn."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1109\/5326.897072","article-title":"Neural networks for classification: A survey","volume":"30","author":"Zhang","year":"2000","journal-title":"IEEE Trans. Syst. Man Cybern. Part C Appl. Rev."},{"key":"ref_25","unstructured":"Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., and Isard, M. (2016, January 2\u20134). Tensorflow: A system for large-scale machine learning. Proceedings of the 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), Savannah, GA, USA."},{"key":"ref_26","unstructured":"(2019, September 01). Keras: The Python Deep Learning library. Available online: https:\/\/keras.io\/."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_28","first-page":"41","article-title":"GPS for environmental applications: Accuracy and precision of locational data","volume":"60","author":"August","year":"1994","journal-title":"Photogramm. Eng. Remote Sens."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/9\/5\/291\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:09:47Z","timestamp":1760364587000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/9\/5\/291"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,1]]},"references-count":28,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2020,5]]}},"alternative-id":["ijgi9050291"],"URL":"https:\/\/doi.org\/10.3390\/ijgi9050291","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,1]]}}}