{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T05:02:54Z","timestamp":1776920574775,"version":"3.51.2"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,2,26]],"date-time":"2025-02-26T00:00:00Z","timestamp":1740528000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,2,26]],"date-time":"2025-02-26T00:00:00Z","timestamp":1740528000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100006245","name":"Ministry of Science and Technology, Israel","doi-asserted-by":"publisher","award":["3-17937"],"award-info":[{"award-number":["3-17937"]}],"id":[{"id":"10.13039\/501100006245","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002808","name":"Carlsbergfondet","doi-asserted-by":"publisher","award":["Semper Ardens: Accelerate programme (project nr. CF21-0454)"],"award-info":[{"award-number":["Semper Ardens: Accelerate programme (project nr. CF21-0454)"]}],"id":[{"id":"10.13039\/501100002808","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002702","name":"Aalborg University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002702","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The writings of one ancient civilization often overlap in time and space with others. Many of these sources comprise unstructured text in ancient languages, causing scholars studying these civilizations to be siloed, often relying on sources in specific languages. Most recent efforts to extract structured information from historical scripts into place (toponym) and people databases (prospographies) have followed this pattern, focusing on one civilization and selected sources. The path to creating a common database runs through aligning names or toponyms between sources from disparate languages utilizing different scripts. Existing multi-lingual orthographic (string-based) comparison often relies on transliteration to a common script (Latin\/English). Transliteration often creates multiple options and even more confusion. However, when integrating sources that overlap in space and time, the languages often share a common phonetic background. This commonality may prove beneficial. In this work, we present a benchmark for comparing toponyms from two linguistically and culturally related languages, namely Hebrew and Arabic. We provide a benchmark comprised of a set of dataset pairs created from historical sources written in Medieval variants of these languages, later historical Gazetteers and a modern dataset curated from Wikidata. We empirically evaluate several toponym comparison approaches over the benchmark: transliteration to a common script, direct transliteration, and phonetic comparison using a common phonetic representation. We discuss the results and the limitations of the various methods and outline future work.<\/jats:p>","DOI":"10.1007\/s10579-025-09812-9","type":"journal-article","created":{"date-parts":[[2025,2,26]],"date-time":"2025-02-26T04:02:14Z","timestamp":1740542534000},"page":"2427-2451","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype"],"prefix":"10.1007","volume":"59","author":[{"given":"Tomer","family":"Sagi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Moran","family":"Zaga","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sinai","family":"Rusinek","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marcell R.","family":"Fekete","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Johannes","family":"Bjerva","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Katja","family":"Hose","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,2,26]]},"reference":[{"key":"9812_CR1","doi-asserted-by":"publisher","unstructured":"Ardanuy, M. C., & Sporleder, C. (2017). Toponym disambiguation in historical documents using semantic and geographic features. In: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage. DATeCH2017, (pp. 175\u2013180). Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/3078081.3078099","DOI":"10.1145\/3078081.3078099"},{"key":"9812_CR2","doi-asserted-by":"publisher","unstructured":"Bast, H., & Buchhold, B. (2017). QLever: A query engine for efficient SPARQL+Text search. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. CIKM \u201917, (pp. 647\u2013656). Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/3132847.3132921","DOI":"10.1145\/3132847.3132921"},{"key":"9812_CR3","unstructured":"Benjamin, Asher, A., Zunz, L., & Lebrecht, F. The Itinerary of Rabbi Benjamin of Tudela. A. Asher & co."},{"key":"9812_CR4","doi-asserted-by":"publisher","unstructured":"Bharadwaj, A., Mortensen, D., Dyer, C., & Carbonell, J. (2016). Phonologically aware neural model for named entity recognition in low resource transfer settings. In: Su, J., Duh, K., Carreras, X. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, (pp. 1462\u20131472). Association for Computational Linguistics, Austin, Texas. https:\/\/doi.org\/10.18653\/v1\/D16-1153https:\/\/aclanthology.org\/D16-1153 Accessed 2024-12-19.","DOI":"10.18653\/v1\/D16-1153"},{"key":"9812_CR5","unstructured":"Carlson, Thomas A. Historical index of the medieval middle east (HIMME). https:\/\/medievalmideast.org\/index.html Accessed: 01\/01\/2024."},{"key":"9812_CR6","doi-asserted-by":"publisher","first-page":"425","DOI":"10.1016\/B0-08-044854-2\/00002-X","volume":"9","author":"JC Catford","year":"2006","unstructured":"Catford, J. C., & Esling, J. H. (2006). Articulatory phonetics. Encyclopedia of Language and Linguistics, 9, 425\u2013442.","journal-title":"Encyclopedia of Language and Linguistics"},{"key":"9812_CR7","doi-asserted-by":"crossref","unstructured":"Christen, P. (2008). Febrl: An open source data cleaning, deduplication and record linkage system with a graphical user interface. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (pp. 1065\u20131068).","DOI":"10.1145\/1401890.1402020"},{"key":"9812_CR8","volume-title":"Atlas Du Monde Arabo-Islamique a I\u2019Epoque Classique: IXe-Xe Si\u00e8cles (The Arab-Islamic World and Classic Europe Atlas: 9th-10th Centuries)","author":"G Cornu","year":"1983","unstructured":"Cornu, G. (1983). Atlas Du Monde Arabo-Islamique a I\u2019Epoque Classique: IXe-Xe Si\u00e8cles (The Arab-Islamic World and Classic Europe Atlas: 9th-10th Centuries). Brill."},{"key":"9812_CR9","unstructured":"Dolgopolsky, A. B. (1986). A probabilistic hypothesis concerning the oldest relationships among the language families of northern Eurasia. Typology, relationship and time: A collection of papers on language change and relationship by soviet linguists, 27\u201350."},{"key":"9812_CR10","volume-title":"The Encyclopaedia of Islam","author":"HAR Gibb","year":"1998","unstructured":"Gibb, H. A. R. (1998). The Encyclopaedia of Islam. Brill Archive."},{"issue":"1","key":"9812_CR11","doi-asserted-by":"publisher","first-page":"332","DOI":"10.1093\/ietisy\/e89-d.1.332","volume":"89","author":"R Gong","year":"2006","unstructured":"Gong, R., & Chan, T. K. (2006). Syllable alignment: A novel model for phonetic string search. IEICE Transactions on Information and Systems, 89(1), 332\u2013339.","journal-title":"IEICE Transactions on Information and Systems"},{"key":"9812_CR12","doi-asserted-by":"publisher","unstructured":"Grossner, K., & Mostern, R. (2021). Linked Places in World Historical Gazetteer. In: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Geospatial Humanities. GeoHumanities \u201921, (pp. 40\u201343). Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/3486187.3490203","DOI":"10.1145\/3486187.3490203"},{"issue":"10","key":"9812_CR13","doi-asserted-by":"publisher","first-page":"1109","DOI":"10.1080\/13658810701851453","volume":"22","author":"J Hastings","year":"2008","unstructured":"Hastings, J. (2008). Automated conflation of digital gazetteer data. International Journal of Geographical Information Science, 22(10), 1109\u20131127.","journal-title":"International Journal of Geographical Information Science"},{"key":"9812_CR14","doi-asserted-by":"publisher","first-page":"414","DOI":"10.1080\/01621459.1989.10478785","volume":"84","author":"MA Jaro","year":"1989","unstructured":"Jaro, M. A. (1989). Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical association, 84, 414\u2013420.","journal-title":"Journal of the American Statistical association"},{"key":"9812_CR15","doi-asserted-by":"publisher","unstructured":"Joshi, T., Joy, J., Kellner, T., Khurana, U., Kumaran, A., & Sengar, V. (2008). Crosslingual location search. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR \u201908, (pp. 211\u2013218). ACM, New York, NY, USA. https:\/\/doi.org\/10.1145\/1390334.1390372","DOI":"10.1145\/1390334.1390372"},{"key":"9812_CR16","doi-asserted-by":"crossref","unstructured":"Keskustalo, H., Pirkola, A., Visala, K., Lepp\u00e4nen, E., & J\u00e4rvelin, K. (2003). Non-adjacent digrams improve matching of cross-lingual spelling variants. In: String Processing and Information Retrieval: 10th International Symposium, SPIRE 2003, Manaus, Brazil, October 8-10, 2003. Proceedings 10, (pp. 252\u2013265). Springer.","DOI":"10.1007\/978-3-540-39984-1_19"},{"key":"9812_CR17","doi-asserted-by":"publisher","unstructured":"Klementiev, A., & Roth, D. (2006). Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In: Calzolari, N., Cardie, C., Isabelle, P. (eds.) Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, (pp. 817\u2013824). Association for Computational Linguistics, Sydney, Australia. https:\/\/doi.org\/10.3115\/1220175.1220278https:\/\/aclanthology.org\/P06-1103 Accessed 2024-12-19.","DOI":"10.3115\/1220175.1220278"},{"key":"9812_CR18","unstructured":"Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, (vol. 10, pp. 707\u2013710). Soviet Union."},{"key":"9812_CR19","unstructured":"MEHDIE: The MEHDIE toponym matching tool. https:\/\/tool.mehdie.org\/ Accessed: 01\/01\/2024."},{"key":"9812_CR21","unstructured":"MEHDIE Project: The MEHDIE Transliteration Service and Python Package. https:\/\/pypi.org\/project\/translit-me\/ Accessed: 01\/01\/2024."},{"key":"9812_CR20","unstructured":"MEHDIE Project: The translit-me source code repository. https:\/\/gitlab.com\/m8417\/hebrew-transliteration-service Accessed: 01\/01\/2024."},{"key":"9812_CR22","doi-asserted-by":"crossref","unstructured":"Martins, B. (2011). A supervised machine learning approach for duplicate detection over gazetteer records. In: International Conference on GeoSpatial Sematics, (pp. 34\u201351). Springer.","DOI":"10.1007\/978-3-642-20630-6_3"},{"key":"9812_CR23","unstructured":"Maxim Romanov: al-\u1e6eurayy\u0101 Project. https:\/\/althurayya.github.io\/ Accessed: 01\/01\/2024."},{"key":"9812_CR24","unstructured":"Mortensen, D. R., Littell, P., Bharadwaj, A., Goyal, K., Dyer, C., & Levin, L. (2016). Panphon: A resource for mapping IPA segments to articulatory feature vectors. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, (pp. 3475\u20133484)."},{"key":"9812_CR25","unstructured":"National Geospatial-Intelligence Agency: Geographic Names Server. https:\/\/geonames.nga.mil\/geonames\/GNSHome\/index.html Accessed: 01\/01\/2024."},{"issue":"6","key":"9812_CR26","doi-asserted-by":"publisher","first-page":"907","DOI":"10.1017\/S1351324915000315","volume":"22","author":"JR Novak","year":"2016","unstructured":"Novak, J. R., Minematsu, N., & Hirose, K. (2016). Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework. Natural Language Engineering, 22(6), 907\u2013938.","journal-title":"Natural Language Engineering"},{"issue":"2","key":"9812_CR27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3377455","volume":"53","author":"G Papadakis","year":"2020","unstructured":"Papadakis, G., Skoutas, D., Thanos, E., & Palpanas, T. (2020). Blocking and filtering techniques for entity resolution: A survey. ACM Computing Surveys (CSUR), 53(2), 1\u201342.","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"9812_CR28","unstructured":"Pleiades Team: Pleiades. https:\/\/pleiades.stoa.org\/ Accessed: 01\/01\/2024."},{"key":"9812_CR29","doi-asserted-by":"publisher","unstructured":"Recchia, G., & Louwerse, M. (2013). A comparison of string similarity measures for toponym matching. In: Proceedings of The First ACM SIGSPATIAL International Workshop on Computational Models of Place. COMP \u201913, (pp. 54\u201361). Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/2534848.2534850","DOI":"10.1145\/2534848.2534850"},{"issue":"2","key":"9812_CR30","doi-asserted-by":"publisher","first-page":"324","DOI":"10.1080\/13658816.2017.1390119","volume":"32","author":"R Santos","year":"2018","unstructured":"Santos, R., Murrieta-Flores, P., Calado, P., & Martins, B. (2018). Toponym matching through deep neural networks. International Journal of Geographical Information Science, 32(2), 324\u2013348.","journal-title":"International Journal of Geographical Information Science"},{"issue":"9","key":"9812_CR31","doi-asserted-by":"publisher","first-page":"913","DOI":"10.1080\/17538947.2017.1371253","volume":"11","author":"R Santos","year":"2018","unstructured":"Santos, R., Murrieta-Flores, P., & Martins, B. (2018). Learning to combine multiple string similarity metrics for effective toponym matching. International Journal of Digital Earth, 11(9), 913\u2013938.","journal-title":"International Journal of Digital Earth"},{"issue":"1","key":"9812_CR32","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","volume":"147","author":"TF Smith","year":"1981","unstructured":"Smith, T. F., Waterman, M. S., et al. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195\u2013197.","journal-title":"Journal of Molecular Biology"},{"key":"9812_CR34","doi-asserted-by":"publisher","DOI":"10.3390\/ijgi8020077","author":"K Sun","year":"2019","unstructured":"Sun, K., Zhu, Y., & Song, J. (2019). Progress and challenges on entity alignment of geographic knowledge bases. ISPRS International Journal of Geo-Information. https:\/\/doi.org\/10.3390\/ijgi8020077","journal-title":"ISPRS International Journal of Geo-Information"},{"key":"9812_CR33","doi-asserted-by":"publisher","first-page":"628","DOI":"10.1007\/978-3-319-68288-4_37","volume-title":"The Semantic Web - ISWC 2017","author":"Z Sun","year":"2017","unstructured":"Sun, Z., Hu, W., & Li, C. (2017). Cross-lingual entity alignment via joint attribute-preserving embedding. In C. d\u2019Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudr\u00e9-Mauroux, J. Sequeda, C. Lange, & J. Heflin (Eds.), The Semantic Web - ISWC 2017 (pp. 628\u2013644). Springer."},{"key":"9812_CR35","unstructured":"TravelLab: Benjamin of Tudela. (2024). Accessed: March 1st, (n.d.). https:\/\/teipublisher.info\/exist\/apps\/TraveLab\/Benjamin%20of%20Tudela.xml"},{"issue":"6","key":"9812_CR36","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1016\/j.compenvurbsys.2011.05.004","volume":"35","author":"M Tucci","year":"2011","unstructured":"Tucci, M., & Giordano, A. (2011). Positional accuracy, positional uncertainty, and feature change detection in historical maps: Results of an experiment. Computers, Environment and Urban Systems, 35(6), 452\u2013463. https:\/\/doi.org\/10.1016\/j.compenvurbsys.2011.05.004","journal-title":"Computers, Environment and Urban Systems"},{"key":"9812_CR37","volume-title":"A dictionary of modern written Arabic","author":"H Wehr","year":"1979","unstructured":"Wehr, H. (1979). A dictionary of modern written Arabic. Otto Harrassowitz Verlag."},{"key":"9812_CR38","unstructured":"Y\u0101q\u016bt, al-R\u016bm\u012b al-Hamaw\u012b. (1977). Kit\u0101b Mu\u2019jam al-Buld\u0101n (The Countries Dictionary Book). D\u0101r \u1e62\u0101dir, Beirut. Original work published in the 13th Century. 5 vols."},{"issue":"5","key":"9812_CR39","doi-asserted-by":"publisher","first-page":"1143","DOI":"10.1007\/s00778-022-00747-z","volume":"31","author":"R Zhang","year":"2022","unstructured":"Zhang, R., Trisedya, B. D., Li, M., Jiang, Y., & Qi, J. (2022). A benchmark and comprehensive survey on knowledge graph entity alignment via representation learning. The VLDB Journal, 31(5), 1143\u20131168.","journal-title":"The VLDB Journal"},{"key":"9812_CR40","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Liu, H., Chen, J., Chen, X., Liu, B., Xiang, Y., & Zheng, Y. (2020). An industry evaluation of embedding-based entity alignment. In: Proceedings of the 28th International Conference on Computational Linguistics: Industry Track, (pp. 179\u2013189).","DOI":"10.18653\/v1\/2020.coling-industry.17"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-025-09812-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-025-09812-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-025-09812-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T06:41:06Z","timestamp":1757140866000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-025-09812-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,26]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9]]}},"alternative-id":["9812"],"URL":"https:\/\/doi.org\/10.1007\/s10579-025-09812-9","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-4136375\/v1","asserted-by":"object"}]},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"value":"1574-020X","type":"print"},{"value":"1574-0218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,26]]},"assertion":[{"value":"21 January 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}